The Power of Information


The process by which intelligent systems develop assigns an inherent value to data.

Aug 9, 2015

Consider, for a moment, the plotline of the recently released Blockbuster hit Ex Machina. In this movie, Nathan Bateman is the CEO of Bluebook, the world’s most popular search engine (insinuating Google). Nathan reveals that he has harvested the personal information of billions of Bluebook users, using their search queries as indicators of human thought in order to develop the first machine ever to display strong artificial intelligence. Now, this movie is science fiction: the human race is far from developing anything that resembles consciousness, as the so-called “Ava” appears to possess. But the prospect of intelligent systems developing from exposure to data is no fable. In fact, this phenomenon is becoming more and more pervasive in our world each day.

We see the term everywhere: Big Data. What the heck does it mean, and why should I care? In grade school, we were taught the basics of the scientific method: a researcher presents a particular hypothesis, collects a set of data by some means of experiment, and uses this data as empirical evidence to either support or negate the hypothesis, depending on the data analysis. Maybe, with an arbitrarily large quantity of empirical evidence, we can call this data “big.” If this is what comes to mind at the sound of Big Data, then the concept needs to be clarified, with what I see as great urgency.

Machine learning algorithms attempt to endow machines with the ability to capture operational knowledge through examples, relying upon the content variety of the examples to facilitate this knowledge. The ultimate aim in this field is to discover a learning procedure that is efficient at finding complex structure in large, high-dimensional datasets, and to show that this is how the brain learns to perform the various functions that it does. Google image search, for example, now makes use of a machine learning algorithm inspired by the design of the primate visual cortex to convert the images from its search space into text captions. In the past, this engine would simply search the web for text related to our queries, presenting to us the images that happened to be on the pages where text matches were found. What Google does now, however, is it converts candidate images into text captions by an intelligent inference process, and then matches our queries with these captions. Essentially, this is like having a human go through every image on the web and label it with a text caption for our searching purposes. Google systems can do this automatically, however, thanks to the incredibly vast library of image and search query data that lies at the company’s disposal. Using the knowledge gathered from Google’s data assets, these programs learn to translate images into high-level internal representations that summarize their content, and to translate these representations into human language thereafter.

To some extent, these machine learning systems follow a process very similar to the scientific method: in each case, an analysis is performed upon a set of data, and from that analysis an inference model is developed. In the case of the science experiment, that inference model is relatively easy to understand. In machine learning, however, the models are not quite so simple. Any good A.I. scientist in today’s world would admit that he could not fully interpret what his processing systems have learned; he never needed to, thanks to data and mathematics. There is very little engineering that goes into the creation of these models, contrary to popular belief. These models crave data, however, and their arcane functionalities have been shown to improve exponentially with the quantity of information that they are fed.

If there is just one thing that you try to understand about Artificial Intelligence, let it be the following: this field is not about engineering smart systems. Artificial Intelligence is about understanding the science of learning, in its most pure and refined form. As humans, our individual differences are due as much to nurture as they are nature; the sensory and cognitive experiences to which we are exposed come to define our intellectual characteristics, and even our potential. The process by which intelligent systems develop assigns an inherent value to data, and this is the power of information.