Monday, May 6

My Introduction to Machine Learning

Open classroom exercises


A couple of months ago, I worked through some of the exercises for Andrew Ng's OpenClassroom Machine Learning Class. I think the site is now defunct (or maybe morphed into coursera) but the assignments were still available at the time of this post. The experience left me with a pretty good basis for starting to learn about using artificial neural networks (in addition to other ML algorithms) to solve optimization and classification problems. This is a very exciting field for me to peer into as it involves some promising prospects for the creation of intelligent machines. Deep learning, one ML research area that seems to be prevalent, is essentially just the clever utilization of very large neural network models to learn complex relationships. One of my current goals is to comprehend some of the strengths, weaknesses, and challenges involved with deep learning networks and their implementation.


Childhood dream


My first exposure to neural networks was Jeff Hawkins' book, On Intelligence, where he proposed ways that neural networks might be organized to mimic the neocortex of the human brain and achieve a powerful self-teaching mechanism. At the time, this book had me very excited - the idea seemed simple enough to implement (a 13 year old kid could loosely comprehend it) and the notion of building an actual learning machine is very appealing. Mary Shelley's Frankenstein had people excited when it was written and all Dr. Frankenstein did was assemble limbs - he had an intact brain and didn't have to put one together from scratch! However, I wasn't scrutinizing enough to consider questioning the claims in the book and I didn't know enough about neural networks to evaluate them. Since then, I've heard some critical things from people in the machine learning and artificial intelligence communities but I still haven't been exposed to the topic sufficiently to make a judgement on the value of that book nor the validity of the claims therein. It looks like Mr. Hawkins' company, Numenta, has been pretty closed about their work but recently announced that they would be open sourcing their Cortical Learning Algorithm. I'm interested in learning more about the progress he has made -  in any case, his book made a lasting impression on me what with inspiration and so forth.


My current conception


While neural networks may provide powerful and flexible algorithms, there is still a great deal to be discovered about their behavior and, historically, a kind of stigma surrounding their use. It seems that there was a little bit of hype followed by disenchantment in their history. Machines weren't powerful enough to implement them at the scale required for useful or impressive results and they were seen as flexible but inefficient toys for solving problems that weren't necessarily novel. Another limitation was the lack of an appropriate back-propagation algorithm that could allow for many-layer ANNs. Today, neural networks can be seen in a variety of useful applications, and the niche seems to be growing.

One strong application of ANNs is the classification of datasets. Classification problems involve training a network with samples of a known class such that a decision boundary is defined between two regions in the feature space. Instances that lie on one side of that boundary will be classified in one way while instances that lie on the other will be classified in the opposite way. The process of 'learning' in this case can be reduced to manipulating the boundary between those two classifications. If the data used to 'train' a neural network is representative of the task as a whole, then one can expect relatively accurate classifications of new data.



Moving Forward



The "ann" class


Recently, I decided to use what I had learned about object oriented programming from helping with Pysimiam to build a less-sloppy neural network implementation in GNU Octave. I decided to use Octave's clunky object-oriented utilities because there is a strong opportunity for vectorization with ML (and because I was able to use some of the code that I wrote when I worked through Ng's exercises originally). Also, I wanted to see how much work it would be to implement a well behaved class-based implementation in Octave from scratch. What I learned may never be useful to me again - I'm working to transition into a stronger familiarity with Python presently as I see more opportunity in the future with it. However, I anticipate that many of the challenges that I faced while writing this implementation will translate across a variety of languages.

The neural network class "ann" can be instantiated with a vector of any length such that each element in the supplied vector would correspond to a layer in the constructed neural network. The value of each element in the supplied vector drives the number of nodes in that layer. For example:

>>A = ann([4,4,1]);

builds an ann object with four input arguments, one output argument, and one hidden layer with four nodes and assigns it to variable "A".

The primary challenge that I faced with Octave's object-oriented utilities is the fact that class methods can't be called to modify a given class instance without assignment. For example, if I wanted to change a value in A such as the learning rate "alpha", I would have to do the following:

A = set(A,'alpha',[foo]);

The problem here is that "A" doesn't point to some class structure that is modified. The entire class object "A" is being passed (copied) into the method "set" which returns the modified class. This is inefficient because all of the object data in "A" has to be copied into the set method and then copied again back into "A". A better implementation would modify that object - but this doesn't appear to be possible with Octave's available Object-Oriented utilities.

I tried my hand at the Kaggle Digit Recognition challenge with my "ann" code and was able to get about 90% accuracy on the first attempt. This is by no means exceptional, but considering that "ann" was working with unprocessed, raw data from a bitmap image to generate classifications about the hand written shapes in those images, I was pleased with the results.


Machine Learning Coursera Course


I'm just starting on the Machine Learning Coursera course that recently reopened. I anticipate a lot of review will be involved as this class is also prepared by Andrew Ng. My goal is to take the time to derive some of the equations from scratch (such as the back-propagation algorithm for gradient descent) using vector calculus. I anticipate learning a great deal from this class, and will give an update on what I learn.

The code for the Octave ann class is available on this Github repository.

No comments:

Post a Comment