Machine Learning Approaches
Machine learning is a challenging area of computer science and engineering and there are many different approaches to building machine learning systems. As yet there is no formal classification of these different approaches but in his book The Master Algorithm1 Pedro Domingos of the University of Washington provides a coherent and accessible overview to the different methods currently being pursued. The book describes five “tribes” which each emphasize a different method of machine learning with each being particularly well suited to solving for some core challenge. Domingos begins by identifying five basic methods through which a computer can build a model and then associates these with the different approaches currently taken to machine learning. First is filling in gaps in existing knowledge through inverse deduction. Secondly mimicking the human brain which is associated with a connectionist, neural network approach. Thirdly evolutionary selection, which is associated with techniques that enable the computer to simulate evolution. Forth is reducing uncertainties through statistics and Bayesian inference. And lastly making contrasts between old and new sets of information through analogy. The symbolist’s approach is said to operate on the basis of formal logic and more specifically the premise of inverse deduction. The approach is to think of deductive learning as being the inverse of deduction. Deduction is going from general rules two specific facts, the opposite of that is called induction which is going from specific facts to general rules. For example, if we can figure out that 2 + 2 is 4 then we can also fill in the gaps in the question where we know that we have 2 and have to find what we have to add to that to get 4. The system has to ask itself “what is the knowledge that is missing?” and acquire that knowledge through analysis of existing datasets.2
A second approach is based upon the networked structure of the brain and how the brain learns through encoding patterns within neural networks. This is the neural network approach. An artificial neural network is an interconnected group of nodes, akin to the vast network of neurons in a brain. Here, each circular node represents an artificial neuron and an arrow represents a connection from the output of one artificial neuron to the input of another. The network is trained on data so that a specific set of connections between the nodes forms to represent the pattern. The weight of the connections between nodes is altered with each iteration so that the output to the system better matches the desired output. For example, Google used this approach to train its computers to identify cats in Youtube videos. Much of the breakthrough in machine learning in recent years have come from this approach and because it is well suited for dealing with big data we will be looking more closely at how neural nets and deep learning work in the coming modules.
Another very different approach is that of trying to simulate the process of evolution. Genetic algorithms work the way evolution does, through the production of variety, the exposure of these variants to an operating environment and then selection, cross mixing, duplication and iteration on the whole process. You have a population of individuals each of which is described by specific characteristics and then each of these individuals goes out in the world and is evaluated based on it success at the given task, those that perform well gain a payoff of a higher fitness value and will, therefore, have a higher chance of being the parents of the next generation. Individuals that have performed well cross mix and random mutation is added to create a new population and the process is iterated on. After some number of generations of this you actually have things that are doing non-trivial functions, indeed algorithms can learn surprisingly powerful things this way.
The Bayesian approach deals with uncertainty through probabilistic inference. Bayesian inference is a method of statistical inference in which Bayes’ theorem is used to update the probability for a hypothesis as more evidence or information becomes available. You create a hypothesis that there will be some outcomes that are more likely, then update a hypothesis as more data comes in. After some iteration of this some hypotheses become more likely than others. This Bayesian approach is used for example in message spam filtering. The system typically uses a bag of words to identify spam messages, it then goes through the message and everytime it finds more evidence confirming or disconfirming this hypothesis it adjusts the probability that it should be rejected or accepted. Bayesian ideas have had a big impact on machine learning in the past 20 years or so because of the flexibility they provide in building structured models of real-world phenomena. Algorithmic advances and increasing computational resources have made it possible to fit rich, highly structured models which were previously considered intractable.
The fifth approach is that of analogy, analogy is a powerful and fundamental tool that our brains use to categorize new information by comparing it to what we already know, to see how closely it resembles other things and thus whether we can place it in to or near to a category that we already know. The general method is that of the “nearest neighbor” principle, essentially asking what is the thing closest to and then positioning it indifferent to other things based on its similarity to them. A popular method here is support vector machine SVM. Given a set of training examples, each marked as belonging to one or the other of two categories, a support vector machine training algorithm builds a model that assigns new examples to one category or the other. This approach is at the heart of a lot of outcomes that are extremely effective for some kinds of Machine Learning. Support vector machines were probably the most powerful type of learning that was common until recently. Handwritten characters can be recognized using SVM and support vector clustering is often used in industrial applications. Amazon’s and Netflix’s recommendation systems are based on this method of analogy. If someone else has given five stars to something you have and one start to something else that you have given one start to, then by analogy the system extrapolates out to recommend to you something that that person with similar taste to you has liked.
The idea is that each one of them has a problem they can solve better than all the others and it has a particular master algorithm that solves that problem. So, for example, the problem that the symbolist solve that none of the others know how to solve is the problem of learning knowledge that you can compose in many different ways and they learn that knowledge with inverse deduction. Connectionist solved the credit assignment problem through the development of complex networks where individual nodes and connections are adjusted based upon how well they contribute to match the desired output. The evolutionary approach solves the problem of learning structure. The Bayesian approach can deal with uncertainty, the fact that all the knowledge that you learn is uncertain it knows how to update the probabilities of hypotheses to better match the desired outcome. The analogy approach uses a comparison between things to categorize them based upon similarity or difference.
1. Amazon.com. (2018). [online] Available at: https://www.amazon.com/Master-Algorithm-Ultimate-Learning-Machine/dp/1501299387 [Accessed 12 Feb. 2018].
2. YouTube. (2018). Pedro Domingos: “The Master Algorithm” | Talks at Google. [online] Available at: https://www.youtube.com/watch?v=B8J4uefCQMc [Accessed 12 Feb. 2018].