The recognizer fbdtw uses a fairly standard implementation of the Dynamic Time Warp (DTW) algorithm. The user defines the vocabulary by specifying single words or a complete list of words. From each word a given number of references (default is 3) are prompted. Each recorded utterance is displayed for inspection. Sometimes, for example, the automatic word boundary detection fails. The user can reject such utterances with obvious faults.
Each utterance is stored in an own file. The file name is build from the label and a counter as shown in the following examplephysik | # | 5 | .wav |
label | separator | count | extension |
Multiple users are supported. For each user a separate directory is created. All reference utterances from one user are then stored in the corresponding directory. All user directories are placed in a common directory (default name userdata).
We use the term users but the concept can be employed likewise to handle different sets of words from the same speaker. During recognition any number of users can be selected at the same time. For training a new word, however, exactly one and only one user has to be active. Likewise, a number of operations such as deleting words require that exactly one user is active. In this way ambiguities are avoidedand it is possible, to use the same word concurrently in different subsets.
No further information besides the names of the directories and the files is needed in fbdtw. Therefore - in addition to the commands in fbdtw - standard techniques can be applied to maintain the data. As an example, copying an user is achieved by simply copying the user directory (and restarting fbdtw). It is also possible to edit the wav-files and then calculate new feature files.
In this example four users are available. Currently only one speaker (steve) is loaded. The vocabulary consists of German city names. The last test utterance is shown in the window below the control window. The recognition yielded the word Darmstadt. For comparison all scores are shown in an additional window.
By default all test utterances are stored in a directory autosave with subdirectories for individual users. If only one user is activated, the utterances are stored in the corresponding directory. A special directory unknown is provided for the case of multiple active users. In both cases the utterances are named in#n.wav, n denoting the counter.Path: userdata # location of references NRef: 3 # number of references per word TraceFrame: 0 # if > 0, use trace segmentation with specified number of segments Feat: true # cepstral coefficients DFeat: false # delta cepstral SubMean: true # subtract global mean DeltaOrd: 2 # order of delta cepstral coefficients UseFilter: true # highpass filter CityBlock: false # city block distance (default is Euclidian) WBDlogFile: false # logfile for word boundary detector AutoSave: true # automatically save all utterances SampFreq: 16000 # sampling frequency # the next options define what windows are displayed and their location # options with a * are written automatically ShowWave: true ShowResult: true *UttPlot: -4 312 506 260 *WDemo: 1 0 500 298 *Result: 501 311 328 210