FBDTW

The recognizer fbdtw uses a fairly standard implementation of the Dynamic Time Warp (DTW) algorithm. The user defines the vocabulary by specifying single words or a complete list of words. From each word a given number of references (default is 3) are prompted. Each recorded utterance is displayed for inspection. Sometimes, for example, the automatic word boundary detection fails. The user can reject such utterances with obvious faults.

Each utterance is stored in an own file. The file name is build from the label and a counter as shown in the following example
physik # 5 .wav
label separator count extension
In this case the German word Physik was spoken. This is the 5-th utterance of the word. Both the waveform and the derived feature vectors are stored.

Multiple users are supported. For each user a separate directory is created. All reference utterances from one user are then stored in the corresponding directory. All user directories are placed in a common directory (default name userdata).

We use the term users but the concept can be employed likewise to handle different sets of words from the same speaker. During recognition any number of users can be selected at the same time. For training a new word, however, exactly one and only one user has to be active. Likewise, a number of operations such as deleting words require that exactly one user is active. In this way ambiguities are avoidedand it is possible, to use the same word concurrently in different subsets.

No further information besides the names of the directories and the files is needed in fbdtw. Therefore - in addition to the commands in fbdtw - standard techniques can be applied to maintain the data. As an example, copying an user is achieved by simply copying the user directory (and restarting fbdtw). It is also possible to edit the wav-files and then calculate new feature files.

In this example four users are available. Currently only one speaker (steve) is loaded. The vocabulary consists of German city names. The last test utterance is shown in the window below the control window. The recognition yielded the word Darmstadt. For comparison all scores are shown in an additional window.

By default all test utterances are stored in a directory autosave with subdirectories for individual users. If only one user is activated, the utterances are stored in the corresponding directory. A special directory unknown is provided for the case of multiple active users. In both cases the utterances are named in#n.wav, n denoting the counter.

Menu options

Options

In the file fbdtw.ini you can set a number of properties. The settings can be changed through the system menu.
Path:  userdata      # location of references
NRef:  3             # number of references per word
TraceFrame:  0       # if > 0, use trace segmentation with specified number of segments
Feat:  true          # cepstral coefficients
DFeat:  false        # delta cepstral
SubMean:  true       # subtract global mean
DeltaOrd:  2         # order of delta cepstral coefficients
UseFilter:  true     # highpass filter
CityBlock:  false    # city block distance (default is Euclidian)
WBDlogFile:  false   # logfile for word boundary detector
AutoSave:  true      # automatically save all utterances
SampFreq:  16000     # sampling frequency

# the next options define what windows are displayed and their location
# options with a * are written automatically
ShowWave:  true       
ShowResult:  true
*UttPlot:  -4 312 506 260
*WDemo:  1 0 500 298
*Result:  501 311 328 210

Download

jar-archive
file with properties (Rename to fbdtw.ini)