Installation and use

The recognizer fbdtw uses a fairly standard implementation of the Dynamic Time Warp (DTW) algorithm. The user defines the vocabulary by specifying single words or a complete list of words. From each word a given number of references (default is 3) are prompted. Each recorded utterance is displayed for inspection. Sometimes, for example, the automatic word boundary detection fails. The user can reject such utterances with obvious faults.

Each utterance is stored in an own file. The file name is build from the label and a counter as shown in the following example


physik # 5 .wav
label separator count extension

In this case the German word Physik was spoken. This is the 5-th utterance of the word. Both the waveform and the derived feature vectors are stored.

Multiple users are supported. For each user a separate directory is created. All reference utterances from one user are then stored in the corresponding directory. All user directories are placed in a common directory (default name userdata).

We use the term users but the concept can be employed likewise to handle different sets of words from the same speaker. During recognition any number of users can be selected at the same time. For training a new word, however, exactly one and only one user has to be active. Likewise, a number of operations such as deleting words require that exactly one user is active. In this way ambiguities are avoidedand it is possible, to use the same word concurrently in different subsets.

No further information besides the names of the directories and the files is needed in fbdtw. Therefore - in addition to the commands in fbdtw - standard techniques can be applied to maintain the data. As an example, copying an user is achieved by simply copying the user directory (and restarting fbdtw). It is also possible to edit the wav-files and then calculate new feature files.


FBDTW

In this example four users are available. Currently only one speaker (steve) is loaded. The vocabulary consists of German city names. The last test utterance is shown in the window below the control window. The recognition yielded the word Darmstadt. For comparison all scores are shown in an additional window.

By default all test utterances are stored in a directory autosave with subdirectories for individual users. If only one user is activated, the utterances are stored in the corresponding directory. A special directory unknown is provided for the case of multiple active users. In both cases the utterances are named in#n.wav, n denoting the counter.

Menu options


Download

jar-archive
file with properties (Rename to fbdtw.ini)