Nascent Roboticist: GNU Octave

Showing posts with label GNU Octave. Show all posts

Monday, March 9

A*

A* finding the shortest path between two points in a grid.

A* (A star) is an algorithm that has applications for finding the shortest path between two points in Euclidean space. It is a special case of Dijkstra's algorithm for finding the minimal traversal between nodes in a weighted graph. The primary difference between Dijkstra's algorithm and A* is the application of a heuristic which favors expansion of certain nodes based on their minimum (absolute) distance to the destination node. Those Wikipedia links provide good summaries of both algorithms. If this sounds like a foreign language, I can promise you that it is interesting and worthy of exploration. In this post, I will attempt to give some background, describe some motivations for using A*, and demonstrate the details of the algorithm in my own words.

Graphs

Example graph

A graph is a structure that represents the connectivity of a set of objects. There are two primary types of entities in a graph: edges and nodes. Nodes represent the intersection of edges, they usually indicate points of interest but their meaning can be more abstract. Edges represent a link between two nodes; by definition, they span exactly two nodes. Graphs come in a variety of flavors, but the most basic representation is a list of edges identifying the pairs of nodes that they span. For instance, imagine that an edge is defined by a pair of numbers, with each number being unique to a particular node. The graph depicted to the right would be defined by the following set: [6,4], [4,5], [4,3], [3,2], [2,5], [1,2], [5,1]. The edge [1,2] would indicate an edge spanning nodes 1 and 2. Typically, an edge is defined by an unordered pair, this means that the order [1,2] or [2,1] does not matter. On a directed graph, there is a difference between [1,2] and [2,1]; there is a particular meaning to the order or direction hence the qualifier directed. If not specified, a graph is assumed to be undirected*.

*While undirected graphs are the "default version" they are actually instantiated by directed graphs. That is, an undirected graph can always be represented by a directed graph while the converse is not necessarily true.

Rather than storing a list of edges, a graph can also be represented by a matrix. There are two standard matrix representations: the adjacency matrix and the incidence matrix. The adjacency matrix is always square with size equal to the number of nodes in the graph. Each element in the matrix is associated with a particular edge whose indices indicate the nodes that it spans. The adjacency matrix of an undirected graph will always be symmetric. The image below depicts an adjacency matrix for both directed and undirected graphs. Notice that the undirected graph's adjacency matrix is symmetric.

(left) A directed graph and corresponding adjacency matrix.
(right) An undirected graph and corresponding adjacency matrix.

Depending on the number of edges in a graph, the adjacency matrix may turn out to be relatively sparse (most of the matrix containing zeros). For a large graph with a sparse adjacency matrix, it may be more efficient to use the incidence matrix representation. With this representation, the columns correspond to edges and the rows correspond to nodes in the graph. By definition, because each column represents an edge, it must contain exactly two non-zero elements indicating the nodes that the edge spans. Take a look at the adjacency and incidence matrices below. Both matrices describe the "example matrix" at the beginning of this section.

(left) Incidence matrix representation of the "example graph" above.
(right) Adjacency matrix representation of the "example graph" above.

It is also possible to have weights associated with edges. A typical piece of information that is represented by a weighted edge is the distance between the nodes that it spans. A graph traversal is a sequence of connected nodes that span two particular nodes. In the example graph, the sequence [6-4-5-1] traverses the graph between nodes 6 and 1. Note that [6-4-3-2-1] is also a traversal. Both of these sequences can be scored by summing the weights of the edges. In the current example, assume that each edge had an equal weight w; the first traversal [6-4-5-1] would have a weight of 3w while [6-4-3-2-1] would have a weight of 4w. Both traversals have the same start and endpoint but one of them has a higher cost. What if we wanted to find the minimum-cost traversal between two nodes in a graph?

If each edge had an equal weighting, the minimum-cost traversal will be the first result of a breadth-first search. A breadth-first search starts at the original "root" node and expands it by adding each of its adjacent nodes to a frontier list. On each iteration, it steps through the list and expands them by adding their unexplored adjacent nodes to the list. It also removes each node from the list as it is expanded. In this way, the algorithm only has to store those nodes that are on the frontier of the search. The process terminates when the destination node is discovered. Breadth-first search is considered complete because it is guaranteed to find a traversal if it exists (even if the graph is infinitely large).

A* on a grid with more bulbulous obstacles.

For weighted graphs with unequal edge weights, Dijkstra's algorithm is the optimal method of determining the minimal-cost traversal. Similarly, for graphs whose nodes have associated euclidean coordinates, A* is the optimal search method that is also complete. By "associated euclidean coordinates" I mean that the nodes of the graph represent real points in an environment. If a path exists between the origin and destination in a graph, not only is A* guaranteed to find it, it will do so in the shortest possible number of steps. There may be other path finding algorithms that perform better than A* but they will not be complete, or they will only perform better under restricted conditions.

There is a whole lot more to learn about graphs and their relationship to the rest of mathematics. Practically, they are a useful tool for modeling systems. They are particularly prevalent in the fields of robotics, computer science, and optimization.

Why graphs?

You might be thinking, "That is all well and good, but I don't live on a graph, what does this have to do with the real world?" It turns out, most navigational structures that we are familiar with can be represented as graphs without losing too much information. In the context of navigating a road, place a node on any intersection, destination, entrance, or exit and draw edges between them. You now have a graph representing a system of roadways. Most public transit systems have some reliable set of routes and transfer points that can be represented in the form of a graph including subway systems, trains, buses, and even airport networks. For the purpose of navigating the interior of a building, put a node at the center of each room and draw an edge between those with doorways connecting them. You now have a graph summarizing the navigationally important features of the floorplan.

3D Voxel grid from the
popular Minecraft PC Game.

2D occupancy grid generated
from an indoor environment.
https://www.cs.cmu.edu/~minerva/maps.html

If you want to represent an environment in greater detail. You can represent a 2-D layout with a grid or even a 3-D environment with a voxel grid. A grid is easily represented with a special type of graph, a lattice graph. A lattice graph has a familiar diagonal structure and is usually sparse. Any obstacles in the grid or regions that cannot be traversed can be marked as such or simply omitted from the graph altogether. One particularly nice feature of a lattice graph is efficiency. Because elements in a grid (with the exception of edges) have the same connective structure, they don't need to be explicitly defined. This can save a lot of working memory when dealing with lattice graphs in software.

An A* implementation with GNU Octave

It works "a pretty-prooty good".

I decided to implement A* in GNU Octave. It's been a few months between implementing this algorithm and writing this post, so I may get a few things wrong when describing things. In the above .gif, you can see the algorithm in action. The image depicts a grid with white elements representing traversable space and black elements representing obstacles. On initialization, the user selects an origin and destination node. The starting element is indicated with blue and the destination with green*. While the algorithm runs, the elements that are stored on the frontier stack are colored with green and the elements that have already been evaluated are colored with a magenta/yellow gradient depending on their distance from the origin node. When a traversal is found, it is traced in blue.

*There were some problems during recording that prevented these elements from being displayed properly in the gifs.

Observe how the frontier begins to propagate around the
first obstacle.

The distance between elements is 1 for horizontally and vertically adjacent elements and sqrt(2) for diagonally adjacent elements; this is just the distance between their centers. The absolute distance between each node and the destination is also computed as the distance between their centers. The main feature of this euclidean distance computation is that it does not depend on the adjacency of the nodes and the destination; it is part of the heuristic which makes A* optimal for this class of search problems.

For some of these sample runs, the coloration is more
apparent. I intended for the color to change from magenta
to yellow with distance.

The algorithm works like breadth-first search. Nodes are expanded if they represent the shortest distance from the origin/root. For each iteration, the nodes on the frontier stack are scored according to their accumulated distance from the root. In addition to this accumulated distance score, a heuristic is applied that augments their total score. The heuristic assumes that a "shortcut" exists between each node and the destination. The distance of this shortcut is the absolute distance from the destination; because the nodes in this graph are embedded in euclidean space, the total score after being augmented by the heuristic can be thought of as the minimum possible traversal if there were no obstacles between the destination and the current node. Because of this heuristic, nodes that make progress towards the destination are favored over those that move away from the destination.

This was almost a straight shot with the exception
of the small circle in front of the destination.

To emphasize the importance of this heuristic, consider Dijkstra's algorithm. If you were to depict Dijkstra's algorithm on a similar lattice graph, the frontier would resemble a circle expanding away from the origin/root. This is because a circle happens to describe the set of points that are all equally distant from the root in euclidean space. Because of A*'s absolute euclidean heuristic, the frontier moves rapidly towards the destination and appears to flow around obstacles, seeking the shortest path.

It is worth mentioning that, due to the grid's connective structure, the shortest path between two points is restricted to combinations of horizontal, vertical, or diagonal edges. In reality, a straight line is the shortest distance between two points but on a grid with this connective structure, the shortest distance is restricted to something more akin to the manhattan distance.

This animation really depicts the power of the heuristic
for influencing the shape of the frontier boundary.

Although A* is optimal, this code certainly is not. If I were to re-write this implementation, I would separate the graphics from the logic and allow for more flexibility in the graph representation. The current implementation requires a lattice graph generated from a regular grid which limits its utility outside of a demonstration tool. I would also prefer to use a language like C/C++ or Python that is more suited to "looping" than Octave/Matlab. Loops are relatively slow in Octave/Matlab; unless you can vectorize this algorithm, they aren't particularly suited for running it.

Notice how the solution speeds up after the unexplored
space outside of the final corridor has been filled.

Other ways of improving performance could include segmenting the grid according to its structure;redefining the graph in a way that summarizes important information while doing away with redundant information. Possible approaches to segmenting the grid include creating a Voronoi Diagram of the boundaries and using it as the graph representation or compressing the grid using a Quadtree structure before generating an augmented lattice graph. Both of these approaches have potentially adverse affects on the resulting path as they reduce the resolution of the space that the graph structure describes.

If you are interested in checking out the algorithm, you can download it from my Github Repo.

In order to run the algorithm, you need to download GNU Octave by following the instructions here. The implementation utilizes some Octave-specific syntax so it will not run with Matlab.

I you have any questions or criticisms, please comment below. Thank you.

Saturday, June 15

Coursera: Machine Learning

I started the latest track of Andrew Ng's Machine Learning Coursera Course a month ago. There are about two more weeks in the class and my experience so far has been positive. We've covered Linear classifiers, Logistic regression by gradient descent, multi-layer neural networks, and SVMs. This week's topic is unsupervised learning via K-means clustering.

Part of the material has been review for me as I studied Ng's OpenClassroom materials in the past. I posted on that experience a little bit in My Introduction to Machine Learning. Although my primary goal when working through those materials was to understand how to implement backpropagation of artificial neural networks for supervised learning, this iteration of the class on Coursera has greatly increased my intuition for how to improve and by what metrics to judge the performance of a learning algorithm.

One large contrast between this course and the OpenClassroom materials is the in-video questions and review quizzes. Distributed practice is a great method for learning concepts and keeping the viewer engaged; Coursera courses in general seem to do a great job of keeping the learning experience interactive. Ng's Machine Learning, being sort of the "flagship" Coursera course, is no exception.

Good Lectures

Ng's lectures are very good at explaining the motivations and the nuances of employing machine learning algorithms. Every algorithm as presented has some prototypical application upon which analogies and concepts are based. He provides a lot of insights based on his own experiences with the profession of machine learning as to common pitfalls that people tend to run into when implementing a classification algorithm. For instance, it is common for someone, when their algorithm does not perform well, to conclude that the solution is to find more training data examples or features. In real-world applications, 'finding more training data' can be a significant project on its own. Furthermore, in the case of over-fitting, it would actually be detrimental to increase the number of features in the training set.

This type of meta-knowledge for the application of learning algorithms is incredibly useful to me as an aspiring data scientist. Some of the techniques such as cross-validation or generating learning curves were entirely unknown to me when I was playing around with that Kaggle assignment. If I had been aware of and made use of those techniques, I would have generated much better classification accuracy in that project and done so in a much shorter amount of time by correctly tailoring my algorithm to the data.

Programming Exercises

The class involves mandatory weekly programming exercises that center around some particular algorithm from the week's lectures. The tasks can range from anywhere between classifying spam emails to teaching a neural network to recognize hand-written digits. The exercises are well developed and, in the interest of time, are provided with a lot more content than what the student generates. For instance, all of the data is imported and pre-processed in the provided script files and functions. All that the student is usually tasked with is implementing one or more cogs in the system - some particular cost function or kernel for the task at hand. The submission process is also completely automated by the supplied "submit" script - all that the user has to do is update the indicated code and run it.

This is all very impressive but it can be rather limiting. I understand that the motivation is to make the coding task as clean and standardized as possible to allow for reasonable evaluation of the student's work and for the student to focus on the particular concepts that they are trying to practice. However, the whole experience is so convenient and constrained that it feels too easy. I remember the satisfaction that I felt after implementing an ANN from scratch for the tutorial Kaggle Digit Recognition Challenge and these exercises, although they involve some incredibly exciting algorithms, don't invoke that sensation in me. I could be a complete outlier in that view - I haven't had a chance yet to learn what other students' opinions are on the forums.

Moving Forward

Nevertheless, I'm very excited by the knowledge and experience that I have obtained throughout this course and greatly appreciate the efforts of Andrew Ng and his TAs in providing this learning experience. There is a great deal to explore about just the application of machine learning algorithms - not to mention their potential adaptations. My next goal is to read up on the subject of neural networks and built a better context for understanding these machine learning algorithms. I recognize that the ANN implementation that Ng presents is just a special case called a 'feed forward' neural networks. It would be interesting to learn how neural networks behave in real time so I've found an old book called Introduction to the Theory of Neural Computation that looks promising and I've recently begun reading Sebastian Seung's Connectome.

Monday, May 6

My Introduction to Machine Learning

Open classroom exercises

A couple of months ago, I worked through some of the exercises for Andrew Ng's OpenClassroom Machine Learning Class. I think the site is now defunct (or maybe morphed into coursera) but the assignments were still available at the time of this post. The experience left me with a pretty good basis for starting to learn about using artificial neural networks (in addition to other ML algorithms) to solve optimization and classification problems. This is a very exciting field for me to peer into as it involves some promising prospects for the creation of intelligent machines. Deep learning, one ML research area that seems to be prevalent, is essentially just the clever utilization of very large neural network models to learn complex relationships. One of my current goals is to comprehend some of the strengths, weaknesses, and challenges involved with deep learning networks and their implementation.

Childhood dream

My first exposure to neural networks was Jeff Hawkins' book, On Intelligence, where he proposed ways that neural networks might be organized to mimic the neocortex of the human brain and achieve a powerful self-teaching mechanism. At the time, this book had me very excited - the idea seemed simple enough to implement (a 13 year old kid could loosely comprehend it) and the notion of building an actual learning machine is very appealing. Mary Shelley's Frankenstein had people excited when it was written and all Dr. Frankenstein did was assemble limbs - he had an intact brain and didn't have to put one together from scratch! However, I wasn't scrutinizing enough to consider questioning the claims in the book and I didn't know enough about neural networks to evaluate them. Since then, I've heard some critical things from people in the machine learning and artificial intelligence communities but I still haven't been exposed to the topic sufficiently to make a judgement on the value of that book nor the validity of the claims therein. It looks like Mr. Hawkins' company, Numenta, has been pretty closed about their work but recently announced that they would be open sourcing their Cortical Learning Algorithm. I'm interested in learning more about the progress he has made - in any case, his book made a lasting impression on me what with inspiration and so forth.

My current conception

While neural networks may provide powerful and flexible algorithms, there is still a great deal to be discovered about their behavior and, historically, a kind of stigma surrounding their use. It seems that there was a little bit of hype followed by disenchantment in their history. Machines weren't powerful enough to implement them at the scale required for useful or impressive results and they were seen as flexible but inefficient toys for solving problems that weren't necessarily novel. Another limitation was the lack of an appropriate back-propagation algorithm that could allow for many-layer ANNs. Today, neural networks can be seen in a variety of useful applications, and the niche seems to be growing.

One strong application of ANNs is the classification of datasets. Classification problems involve training a network with samples of a known class such that a decision boundary is defined between two regions in the feature space. Instances that lie on one side of that boundary will be classified in one way while instances that lie on the other will be classified in the opposite way. The process of 'learning' in this case can be reduced to manipulating the boundary between those two classifications. If the data used to 'train' a neural network is representative of the task as a whole, then one can expect relatively accurate classifications of new data.

Project Euler

Portrait of Leonhard Euler (1707-1783)

Fun and challenging problems

I've been working my way through these Project Euler exercises since September 2012. From what I gather, these problems originally spawned off of a thread on mathschallenge.net in 2001 and moved to a seperate domain in 2006. The gist of these tasks is that a problem is presented that requires the derivation of some mathematical truth (such as the number of twin primes below some value or the last 10 digits of !1000). All of the tasks are meant to be performed by algorithms implemented on a modern computer in under one minute.

For instance, the latest problem that I have solved is number fifty. It asks for the prime number less than 1,000,000 which is the sum of the largest chain of consecutive primes.

-----------------------------------------------------------------

Problem 50

The prime 41, can be written as the sum of six consecutive primes:

41 = 2 + 3 + 5 + 7 + 11 + 13

This is the longest sum of consecutive primes that adds to a prime below one-hundred.

The longest sum of consecutive primes below one-thousand that adds to a prime, contains 21 terms, and is equal to 953.

Which prime, below one-million, can be written as the sum of the most consecutive primes?

----------------------------------------------------------------