Completely free toolkits I have written for you!
Well, except SRILM and the SGI/STL, which aren't mine, and have "terms".- A mini Hidden Markov Model toolkit. Covers most
basic algorithms, including Forward-Backward and Baum-Welch, along with an exercise sheet to
use HMMS and tag parts of speech. Also included are programs to convert between alternate
formalisms of HMMs and run basic Finite Automaton algorithms (subset algorithm for
determinization, reducing the number of states of a DFA, etc.)
- An Artificial Neural Net Toolkit.. Covers
Backpropagation training, and an exercise sheet with a toolkit that allows you to recognize
hand-drawn digits (if you have a pen and tablet)
- A Perl implementation of my maximum likelihood
vocabulary selection technique. The problem is this - Given a large number of text corpora
each with its own characteristic vocabulary, how can we choose a subset of the union of these
vocabularies that best matches a completely separate corpus. This program is also available
as part of the SRI language modeling toolkit. Read the companion paper
- A C++/STL implementation of my word segmentation
algorithm. The problem is this - how do children learn individual words in a stream of sounds
where word boundaries are not clearly demarcated? This algorithm shows that this problem
probably has a statistical solution! Read the
companion paper in Computational
Linguistics. This program has been updated (Feb 2009) to compile under GCC 4.1.2 on
request from the community.
- The SGI STL reference! It has seen me through some tough
times before STL documentation was more accessible and still comes in handy.
- The SRI language modeling
toolkit.
