alignment
alignment #
Utility method to visualize the alignment and errors between one or more reference and hypothesis pairs.
collect_error_counts #
collect_error_counts(output)
Retrieve three dictionaries, which count the frequency of how often each word or character was substituted, inserted, or deleted. The substitution dictionary has, as keys, a 2-tuple (from, to). The other two dictionaries have the inserted/deleted words or characters as keys.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output
|
Union[WordOutput, CharacterOutput]
|
The processed output of reference and hypothesis pair(s). |
required |
Returns:
Type | Description |
---|---|
Tuple[dict, dict, dict]
|
A three-tuple of dictionaries, in the order substitutions, insertions, deletions. |
Source code in src/jiwer/alignment.py
228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 |
|
visualize_alignment #
visualize_alignment(
output,
show_measures=True,
skip_correct=True,
line_width=None,
)
Visualize the output of jiwer.process_words and
jiwer.process_characters. The visualization
shows the alignment between each processed reference and hypothesis pair.
If show_measures=True
, the output string will also contain all measures in the
output.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output
|
Union[WordOutput, CharacterOutput]
|
The processed output of reference and hypothesis pair(s). |
required |
show_measures
|
bool
|
If enabled, the visualization will include measures like the WER or CER |
True
|
skip_correct
|
bool
|
If enabled, the visualization will exclude correct reference and hypothesis pairs |
True
|
line_width
|
Optional[int]
|
If set, try, at best effort, to spit sentences into multiple lines if they exceed the width. |
None
|
Returns:
Type | Description |
---|---|
str
|
The visualization as a string |
Example
This code snippet
import jiwer
out = jiwer.process_words(
["short one here", "quite a bit of longer sentence"],
["shoe order one", "quite bit of an even longest sentence here"],
)
print(jiwer.visualize_alignment(out))
will produce this visualization:
=== SENTENCE 1 ===
REF: # short one here
HYP: shoe order one *
I S D
=== SENTENCE 2 ===
REF: quite a bit of # # longer sentence #
HYP: quite * bit of an even longest sentence here
D I I S I
=== SUMMARY ===
number of sentences: 2
substitutions=2 deletions=2 insertions=4 hits=5
mer=61.54%
wil=74.75%
wip=25.25%
wer=88.89%
When show_measures=False
, only the alignment will be printed:
=== SENTENCE 1 ===
REF: # short one here
HYP: shoe order one *
I S D
=== SENTENCE 2 ===
REF: quite a bit of # # longer sentence #
HYP: quite * bit of an even longest sentence here
D I I S I
When setting line_width=80
, the following output will be split into multiple lines:
=== SENTENCE 1 ===
REF: This is a very long sentence that is *** much longer than the previous one
HYP: This is a very loong sentence that is not much longer than the previous one
S I
REF: or the one before that
HYP: or *** one before that
D
Source code in src/jiwer/alignment.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|
visualize_error_counts #
visualize_error_counts(
output,
show_substitutions=True,
show_insertions=True,
show_deletions=True,
top_k=None,
)
Visualize which words (or characters), and how often, were substituted, inserted, or deleted.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output
|
Union[WordOutput, CharacterOutput]
|
The processed output of reference and hypothesis pair(s). |
required |
show_substitutions
|
bool
|
If true, visualize substitution errors. |
True
|
show_insertions
|
bool
|
If true, visualize insertion errors. |
True
|
show_deletions
|
bool
|
If true, visualize deletion errors. |
True
|
top_k
|
Optional[int]
|
If set, only visualize the k most frequent errors. |
None
|
Returns:
Type | Description |
---|---|
str
|
A string which visualizes the words/characters and their frequencies. |
Example
The code snippet
import jiwer
out = jiwer.process_words(
["short one here", "quite a bit of longer sentence"],
["shoe order one", "quite bit of an even longest sentence here"],
)
print(jiwer.visualize_error_counts(out))
will print the following:
=== SUBSTITUTIONS ===
short --> order = 1x
longer --> longest = 1x
=== INSERTIONS ===
shoe = 1x
an even = 1x
here = 1x
=== DELETIONS ===
here = 1x
a = 1x
Source code in src/jiwer/alignment.py
265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 |
|