Installation and use
The recognizer fbdtw uses a fairly standard implementation of the Dynamic Time Warp (DTW) algorithm. The user defines the vocabulary by specifying single words or a complete list of words. From each word a given number of references (default is 3) are prompted. Each recorded utterance is displayed for inspection. Sometimes, for example, the automatic word boundary detection fails. The user can reject such utterances with obvious faults.
Each utterance is stored in an own file. The file name is build from the label and a counter as shown in the following example
physik | # | 5 | .wav |
label | separator | count | extension |
In this case the German word Physik was spoken. This is the 5-th utterance of the word. Both the waveform and the derived feature vectors are stored.
Multiple users are supported. For each user a separate directory is created. All reference utterances from one user are then stored in the corresponding directory. All user directories are placed in a common directory (default name userdata).
We use the term users but the concept can be employed likewise to handle different sets of words from the same speaker. During recognition any number of users can be selected at the same time. For training a new word, however, exactly one and only one user has to be active. Likewise, a number of operations such as deleting words require that exactly one user is active. In this way ambiguities are avoidedand it is possible, to use the same word concurrently in different subsets.
No further information besides the names of the directories and the files is needed in fbdtw. Therefore - in addition to the commands in fbdtw - standard techniques can be applied to maintain the data. As an example, copying an user is achieved by simply copying the user directory (and restarting fbdtw). It is also possible to edit the wav-files and then calculate new feature files.
In this example four users are available. Currently only one speaker (steve) is loaded. The vocabulary consists of German city names. The last test utterance is shown in the window below the control window. The recognition yielded the word Darmstadt. For comparison all scores are shown in an additional window.
By default all test utterances are stored in a directory autosave with subdirectories for individual users. If only one user is activated, the utterances are stored in the corresponding directory. A special directory unknown is provided for the case of multiple active users. In both cases the utterances are named in#n.wav, n denoting the counter.
Menu options
- OnOff+: Same as recognize button
- Verbose+: More informtion
- Verbose-: Less informtion
- SingleWord: Toggle mode between single word mode (i. e. wait after recognition) and continuous mode.
- New: create a new user
- Rename: rename an existing user
- Info: some information on currently selected users
- Remove: remove one user (user has to be empty)
- New: Train one new word
- NewFromList: Train all words in a list, i. e. a text file with one word per line
- Rebuild: calculates feature files from audio files
- Remove: delete selected words
- Clear: delete all words
- StoreLast: store the last utterance as reference. The recognition result is used as name assuming a correct recognition.
- PlayLast: Play the last utterance.
- System
- User
- Vocab
Download
jar-archive
file with properties (Rename to fbdtw.ini)