SVM: Using a set of training images (or sounds, or words, etc) for computer recognition and categorization
I originally tried this code out as a music recognizer. The plan was to pull out the frequencies (which turned out to be very difficult on anything other than a .wav) and the beat (using FFT) and then use those to categorize the music.
I tried to tackle way too much and none of it ended up working right. It also wasn’t a great use of SVM as building a training library was very slow, and I kept changing how I wanted to categorize everything.
Finally I decided it was time to try again on a project more within my reach. Perhaps a simpler “bag of words” style problem. So I borrowed the Spam classifier code to use for recognizing valid vs invalid recipes. In particular I wanted to look at cake recipes.
Where are the text samples from?
You may be familiar with the game Portal. It is probably one of my favorite games of all time. At one point in the game you encounter a very strange cake recipe. It starts out somewhat normal, but quickly delves into nonsensical or outright dangerous ingredients.
I decided to take this entire recipe, rate the elements as valid or invalid for cake making, and then test the resultant model against a cake recipe downloaded from allrecipes.com
Unfortunately, it did not quite work correctly. In fact it incorrectly ranked a string that perfectly matches one from the training model (“4 large eggs”). I am still trying to work out why. But I did spend a long time learning about Machine Learning and I have dozens of new projects I would like to try with it.