Big Data & Advanced Analytics
The term big data analytics refers to the use of advanced computerized analytical methods for the processing of large complex data sets. In so doing organizations can uncover hidden patterns, correlations and other insights that enable them to perform new functions and deliver new services. Big data and advanced analytics represent a new paradigm in computing and a new stage to the digital transformation that organizations, economies, and societies are currently going through. The programmable computers that formed the basis of the information revolution in the 20th century are giving way to a new computing paradigm. Today the information revolution has moved on from the personal computer, spreadsheets, emails and web pages into a new information computing model built around mobile devices, IoT, big data, advanced analytics and cloud computing. This new wave to the information revolution moves up from the micro-level of individual computerized devices to whole systems of people, technology, and information networks, that generate huge amounts of unstructured data which require new algorithmic approaches to interpret.
The first wave of the information revolution was built around implementing the digital format, putting data into well-structured databases, leveraging the capabilities of individual processors and the personal computer. Information was still relatively confined to specific places, well-structured and limited in volume and variety, with individual computers executing on a well-defined set of rules written by a programmer. Computers of the past have operated in predictable environments using structured and uniform data to perform prescribed operations.
The rise of the internet, blogging, social networking and the Internet of Things – combined with the proliferation of mobile devices – has created a new type of data, what is called big data that comes in massive volumes, is unstructured and heterogeneous. 90% of the data in the world today is estimated to have been created in the past two years; A tweet, a photo taken at a birthday party, an online purchase made, a barcode, a web search, a video watched, an image tag, all of these contain little traces of data that can, when combined en masse, contain valuable insight. This unstructured big data now accounts for an estimated 80% of all data and it largely goes unutilized due to our lack of capabilities and thus gets the name “dark data” as we have not previously had the capacity to capture or use.
Whereas the information revolution was built on the exponential growth in speed of the microprocessor – Moore’s Law – this next generation is built on this explosion in unstructured data. The information revolution has created already a huge amount of data but with the rise of the Internet of Things – as we embed chips in all kinds of devices and objects – we are starting to sense our world like never before with the amount of data from this set to grow exponentially to an overwhelmingly vast amounts in the coming years.
Just as the locus has moved from data to big data, computing has moved from personal computing to cloud computing. Cloud computing is the networking of many computers and the making of their capabilities available over the internet as a service on-demand. This next generation of computer systems has to read and interpret vast amounts of unstructured data and use it to identify patterns, often in real time. This could mean to read over, learn and understand a law book or thousands of medical journals or it could be to find a certain pattern, like a face, in millions of images or to analyze incoming communication, e-mail, voice or a social stream, to be able to give a better customer experience, it could also be to give a quick, accurate and personal reply to a customer that interacts with the company being able to understand them and find the right information for them.
These systems use information from many different sources to cross-correlate them and reason about the “meaning” in the data. This next generation goes beyond the programmable computing paradigm in that these new systems learn directly from the data instead of being programmed explicitly. Machine learning methods are largely based on data instead of logic, they use data to build logic. The basic problem-solving patterns that these smart systems use applies to virtually all areas. This type of computing opens up a new world of insight in that humans are quite bad at dealing with large amounts of information, we are able to take in and cross-reference just a few data points in a rational fashion. A cloud-based application can potentially cross-analyze millions of different facts across many different domains creating a new kind of insight that is currently beyond our capabilities. In many ways, it is an important capability for managing today’s large, complex industrial systems effectively.
As a concrete example, we could take new developments in the insurance industry. Instead of depending on a few data points to determine the cost of coverage, car insurance companies are now trying to develop dynamic insurance premiums based upon big data. They do this by using mobile phone sensors and an app which runs in the background to establish how good the driver actually is, not only looking at how well they’re driving but what roads they’re using, how often they drive, how dangerous those roads are, the weather conditions etc. and then the person pays their insurance based on when and how well they have been driving. Likewise these companies also now use machine learning algorithms to look at voice data on the claims that come into their call centers.