Machine Learning Overview
At a recent Google Cloud conference Rob Craft product lead for Cloud Machine Learning got up on stage and told the crowd that quote1 “On words of nine years ago we got out of the rules business, everyone in this room probably writes rules for a living if you write code, if this, then that, those are rules, if the following things are met the following things should execute, the stored procedure sees this, the stored procedure writes that, those are all rule-based systems. What if you were able to declare through a statistical model here is what good looks like and the confidence that good is this thing and why doesn’t the system then determined on its own how it should determine to get to that good thing, that is what a predictive type of system tries to do.”
What he is describing here is the shift that has taken place over just the past decade towards machine learning becoming an ever more popular method for building software systems. Machine learning has seen explosive growth over the past decade and application within many different areas. For example, the machine learning algorithms on Yelp’s website help the company’s staff to compile, categorize, and label images more efficiently. Machine learning applications are being used at Facebook to filter out spam and poor-quality content, and the company is also researching computer vision algorithms that can “read” images to visually impaired people. Baidu’s R&D lab uses machine learning to build what the company calls Deep Voice, a deep neural network that can generate entirely synthetic human voices that are very difficult to distinguish from genuine human speech.
Machine learning refers to the process through which a computer can construct an algorithm based upon the analysis of data. Such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions, through building a model from sample inputs. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms with good performance are difficult or infeasible. In such cases we tell the computer what we want the output to be and then it builds the model based upon the data that will be able to produce those results when presented with new data sources to process. Machine learning can be largely characterized as an optimization process over some set of data. To solve any machine learning problem we want to find a metric that tells us how far we are from the solution and try to minimize that value; minimize the error which is called the loss function. The formal definition is stated as such2 A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E. Which essentially means that the computer is given a task, and some metric for success and with each iteration, over the task it is performing, it gets better at doing it, as measured by the performance metric.
For example, Google used a machine learning algorithm to drastically reduce the electrical consumption in its data centers. Using a system of neural networks trained on different operating scenarios and parameters within their data centers, they created an efficient and adaptive framework to understand data center dynamics and to optimize their efficiency. They accomplished this by taking the historical data that had already been collected by thousands of sensors within the data center and using it to train a set of deep neural networks. The machine learning system analyzed the internal arrangement within the data center and tried different configurations to assess the efficiency of energy consumption, it stays iterating, adjusting the configuration and trying to reduce that value, learning at each iteration. Ultimately the algorithm managed to reduce the amount of energy use for cooling by up to 40 percent.3 This is the key to most machine learning problems, you take the problem and minimize the error by using gradient descent. Trying different options to see which reduces the error by the most and then iterating on this.
Supervise or Unsupervised
Machine Learning systems are typically categorized as being either supervised or unsupervised. The biggest difference is that supervised learning deals with labeled data while unsupervised learning deals with unlabeled data. Labeled data is a group of samples that have been tagged with one or more labels. The process of labeling typically takes a set of unlabeled data and attempts to apply meaningful tags to that data that are informative of its contents. For example, these labels might indicate whether a photo contains a mountain or a lake, what type of action is being performed in a video, what the topic of a news article is, what the overall sentiment of a tweet is etc. Labeling can be a time-consuming exercise that is often done by humans.
After obtaining a labeled dataset, machine learning models can be applied to the data so that new unlabeled data can be presented to the model and a likely label can be guessed or predicted for that piece of unlabeled data automated by the algorithm. Techniques that can work with unlabeled data are called unsupervised learning. With unsupervised learning, we are trying to get the machine to find and create different categories within the data. In an unsupervised approach, you are trying to build a prediction where you don’t actually have the outcome as a reference for training the algorithm, but we let the model work on its own to discover information that may not be visible to the human eye. Clustering is one such example where a set of inputs is to be divided into groups, this involves the analysis of patterns and sets of unlabeled data to find groups that are similar. Unsupervised learning is important, because most of the time, the data that we get in the real world doesn’t have little tags attached to tell us what it is, and you need to perform some kind of analytics before going any further.
With supervised learning, the computer is presented with example inputs and their desired outputs, given by a “teacher”, and the goal is to learn a general rule that maps inputs to outputs. We do this by training the model, that is we load the model with knowledge so that we could have it predict future instances. For example, we teach a model by training it with some data from a labeled data set, and then provide it with new data for it to try and match the original labels. Main types of supervised learning are classification and regression. Spam filtering is an example of classification, where the inputs are email messages and the classes are “spam” and “not spam”. Likewise one might feed the system a data set of flowers to have it classify the different types.
A core objective of a machine learning system is to generalize from its experience. Generalization in this context is the ability of a learning machine to perform accurately on new, unseen examples/tasks after having experienced a learning data set. The training examples come from some generally unknown probability distribution and the system has to build a general model about this space that enables it to produce sufficiently accurate predictions in new cases. The key aspect of machine learning that makes it an important method with respect to big data is that we don’t have to hardcode prespecified rules. The iterative aspect of machine learning is important because as models are exposed to new data, they are able to independently adapt and evolve. They learn from previous computations to produce reliable, repeatable decisions and results. There are many different approaches to machine learning in the next module we will give an overview to some of the primary approaches taken.
1. YouTube. (2018). Introduction to Google Cloud Machine Learning (Google Cloud Next ’17). [online] Available at: https://www.youtube.com/watch?v=COSXg5HKaO4 [Accessed 10 Feb. 2018].
2. Cs.swarthmore.edu. (2018). [online] Available at: https://goo.gl/scVw6Q [Accessed 10 Feb. 2018].
3. Google. (2016). DeepMind AI reduces energy used for cooling Google data centers by 40%. [online] Available at: https://blog.google/topics/environment/deepmind-ai-reduces-energy-used-for/ [Accessed 10 Feb. 2018].