Part of the material has been review for me as I studied Ng's OpenClassroom materials in the past. I posted on that experience a little bit in My Introduction to Machine Learning. Although my primary goal when working through those materials was to understand how to implement backpropagation of artificial neural networks for supervised learning, this iteration of the class on Coursera has greatly increased my intuition for how to improve and by what metrics to judge the performance of a learning algorithm.
One large contrast between this course and the OpenClassroom materials is the in-video questions and review quizzes. Distributed practice is a great method for learning concepts and keeping the viewer engaged; Coursera courses in general seem to do a great job of keeping the learning experience interactive. Ng's Machine Learning, being sort of the "flagship" Coursera course, is no exception.
Good Lectures
Ng's lectures are very good at explaining the motivations and the nuances of employing machine learning algorithms. Every algorithm as presented has some prototypical application upon which analogies and concepts are based. He provides a lot of insights based on his own experiences with the profession of machine learning as to common pitfalls that people tend to run into when implementing a classification algorithm. For instance, it is common for someone, when their algorithm does not perform well, to conclude that the solution is to find more training data examples or features. In real-world applications, 'finding more training data' can be a significant project on its own. Furthermore, in the case of over-fitting, it would actually be detrimental to increase the number of features in the training set.This type of meta-knowledge for the application of learning algorithms is incredibly useful to me as an aspiring data scientist. Some of the techniques such as cross-validation or generating learning curves were entirely unknown to me when I was playing around with that Kaggle assignment. If I had been aware of and made use of those techniques, I would have generated much better classification accuracy in that project and done so in a much shorter amount of time by correctly tailoring my algorithm to the data.
Programming Exercises
The class involves mandatory weekly programming exercises that center around some particular algorithm from the week's lectures. The tasks can range from anywhere between classifying spam emails to teaching a neural network to recognize hand-written digits. The exercises are well developed and, in the interest of time, are provided with a lot more content than what the student generates. For instance, all of the data is imported and pre-processed in the provided script files and functions. All that the student is usually tasked with is implementing one or more cogs in the system - some particular cost function or kernel for the task at hand. The submission process is also completely automated by the supplied "submit" script - all that the user has to do is update the indicated code and run it.This is all very impressive but it can be rather limiting. I understand that the motivation is to make the coding task as clean and standardized as possible to allow for reasonable evaluation of the student's work and for the student to focus on the particular concepts that they are trying to practice. However, the whole experience is so convenient and constrained that it feels too easy. I remember the satisfaction that I felt after implementing an ANN from scratch for the tutorial Kaggle Digit Recognition Challenge and these exercises, although they involve some incredibly exciting algorithms, don't invoke that sensation in me. I could be a complete outlier in that view - I haven't had a chance yet to learn what other students' opinions are on the forums.
No comments:
Post a Comment