Industrial Revolution #4

Paul Krugman at New York Times consumes Robert Gordon’s analysis of economic growth and the role of technology and comes up more hopeful than Gordon. The kernel in Krugman’s hope is that Big Data analytics can provide a shortcut to intelligent machines by bypassing the requirement for specification and programming that was once assumed to be a requirement for artificial intelligence. Instead, we don’t specify but use “data-intensive ways” to achieve a better result. And we might get to IR#4, following Gordon’s taxonomy where IR stands for “industrial revolution.” IR#1 was steam and locomotives  IR#2 was everything up to computers. IR#3 is computers and cell phones and whatnot.

Krugman implies that IR#4 might spur the typical economic consequences of grand technological change, including the massive displacement of workers, but like in previous revolutions it is also assumed that economic growth built from new industries will ultimately eclipse the negatives. This is not new, of course. Robert Anton Wilson argued decades ago for the R.I.C.H. economy (Rising Income through Cybernetic Homeostasis). Wilson may have been on acid, but Krugman wasn’t yet tuned in, man. (A brief aside: the Krugman/Wilson notions probably break down over extraction and agribusiness/land rights issues. If labor is completely replaced by intelligent machines, the land and the ingredients it contains nevertheless remain a bottleneck for economic growth. Look at the global demand for copper and rare earth materials, for instance.)

But why the particular focus on Big Data technologies? Krugman’s hope teeters on the assumption that data-intensive algorithms possess a fundamentally different scale and capacity than human-engineered approaches. Having risen through the computational linguistics and AI community working on data-driven methods for approaching intelligence, I can certainly sympathize with the motivation, but there are really only modest results to report at this time.… Read the rest

Pressing Snobs into Hell

Paul Vitanyi has been a deep advocate for Kolmogorov complexity for many years. His book with Ming Li, An Introduction to Kolmogorov Complexity and Its Applications, remains on my book shelf (and was a bit of an investment in grad school).

I came across a rather interesting paper by Vitanyi with Rudi Cilibrasi called “Clustering by Compression” that illustrates perhaps more easily and clearly than almost any other recent work the tight connections between meaning, repetition, and informational structure. Rather than describing the paper, however, I wanted to conduct an experiment that demonstrates their results. To do this, I asked the question: are the writings of Dante more similar to other writings of Dante than to Thackeray? And is the same true of Thackeray relative to Dante?

Now, we could pursue these questions at many different levels. We might ask scholars, well-versed in the works of each, to compare and contrast the two authors. They might invoke cultural factors, the memes of their respective eras, and their writing styles. Ultimately, though, the scholars would have to get down to some textual analysis, looking at the words on the page. And in so doing, they would draw distinctions by lifting features of the text, comparing and contrasting grammatical choices, word choices, and other basic elements of the prose and poetry on the page. We might very well be able to take parts of the knowledge of those experts and distill it into some kind of a logical procedure or algorithm that would parse the texts and draw distinctions based on the distributions of words and other structural cues. If asked, we might say that a similar method might work for the so-called language of life, DNA, but that it would require a different kind of background knowledge to build the analysis, much less create an algorithm to perform the same task.… Read the rest

Intelligence versus Motivation

Nick Bostrom adds to the dialog on desire, intelligence, and intentionality with his recent paper, The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents. The argument is largely a deconstruction of the general assumption that there is somehow an inexorable linkage between intelligence and moral goodness. Indeed, he even proposes that intelligence and motivation are essentially orthogonal (“The Orthogonality Thesis”) but that there may be a particular subset of possible trajectories towards any goal that are common (self-preservation, etc.) The latter is scoped by his “instrumental convergence thesis” where there might be convergences towards central tenets that look an awful lot like the vagaries of human moral sentiments. But they remain vagaries and should not be taken to mean that advanced artificial agents will act in a predictable manner.… Read the rest

Universal Artificial Social Intelligence

Continuing to develop the idea that social reasoning adds to Hutter’s Universal Artificial Intelligence model, below is his basic layout for agents and environments:

A few definitions: The Agent (p) is a Turing machine that consists of a working tape and an algorithm that can move the tape left or right, read a symbol from the tape, write a symbol to the tape, and transition through a finite number of internal states as held in a table. That is all that is needed to be a Turing machine and Turing machines can compute like our every day notion of a computer. Formally, there are bounds to what they can compute (for instance, whether any given program consisting of the symbols on the tape will stop at some point or will run forever without stopping (this is the so-called “halting problem“). But it suffices to think of the Turing machine as a general-purpose logical machine in that all of its outputs are determined by a sequence of state changes that follow from the sequence of inputs and transformations expressed in the state table. There is no magic here.

Hutter then couples the agent to a representation of the environment, also expressed by a Turing machine (after all, the environment is likely deterministic), and has the output symbols of the agent consumed by the environment (y) which, in turn, outputs the results of the agent’s interaction with it as a series of rewards (r) and environment signals (x), that are consumed by agent once again.

Where this gets interesting is that the agent is trying to maximize the reward signal which implies that the combined predictive model must convert all the history accumulated at one point in time into an optimal predictor.… Read the rest

Multitudes and the Mathematics of the Individual

The notion that there is a path from reciprocal altruism to big brains and advanced cognitive capabilities leads us to ask whether we can create “effective” procedures that shed additional light on the suppositions that are involved, and their consequences. Any skepticism about some virulent kind of scientism then gets whisked away by the imposition of a procedure combined with an earnest interest in careful evaluation of the outcomes. That may not be enough, but it is at least a start.

I turn back to Marcus Hutter, Solomonoff, and Chaitin-Kolmogorov at this point.  I’ll be primarily referencing Hutter’s Universal Algorithmic Intelligence (A Top-Down Approach) in what follows. And what follows is an attempt to break down how three separate factors related to intelligence can be explained through mathematical modeling. The first and the second are covered in Hutter’s paper, but the third may represent a new contribution, though perhaps an obvious one without the detail work that is needed to provide good support.

First, then, we start with a core requirement of any goal-seeking mechanism: the ability to predict patterns in the environment external to the mechanism. This is well-covered since Solomonoff in the 60s who formalized the implicit arguments in Kolmogorov algorithmic information theory (AIT), and that were subsequently expanded on by Greg Chaitin. In essence, given a range of possible models represented by bit sequences of computational states, the shortest sequence that predicts the observed data is also the optimal predictor for any future data also produced by the underlying generator function. The shortest sequence is not computable, but we can keep searching for shorter programs and come up with unique optimizations for specific data landscapes. And that should sound familiar because it recapitulates Occam’s Razor and, in a subset of cases, Epicurus’ Principle of Multiple Explanations.… Read the rest

Bostrom on the Hardness of Evolving Intelligence

At 38,000 feet somewhere above Missouri, returning from a one day trip to Washington D.C., it is easy to take Nick Bostrom’s point that bird flight is not the end-all of what is possible for airborne objects and mechanical contrivances like airplanes in his paper, How Hard is Artificial Intelligence? Evolutionary Arguments and Selection Effects. His efforts to try to bound and distinguish the evolution of intelligence as either Hard or Not-Hard runs up against significant barriers, however. As a practitioner of the art, finding similarities between a purely physical phenomena like flying and something as complex as human intelligence falls flat for me.

But Bostrom is not taking flying as more than a starting point for arguing that there is an engineer-able possibility for intelligence. And that possibility might be bounded by a number of current and foreseeable limitations, not least of which is that computer simulations of evolution require a certain amount of computing power and representational detail in order to be a sufficient simulation. His conclusion is that we may need as much as another 100 years of improvements in computing technology just to get to a point where we might succeed at a massive-scale evolutionary simulation (I’ll leave to the reader to investigate his additional arguments concerning convergent evolution and observer selection effects).

Bostrom dismisses as pessimistic the assumption that a sufficient simulation would, in fact, require a highly detailed emulation of some significant portion of the real environment and the history of organism-environment interactions:

A skeptic might insist that an abstract environment would be inadequate for the evolution of general intelligence, believing instead that the virtual environment would need to closely resemble the actual biological environment in which our ancestors evolved … However, such extreme pessimism seems unlikely to be well founded; it seems unlikely that the best environment for evolving intelligence is one that mimics nature as closely as possible.

Read the rest

Randomness and Meaning

The impossibility of the Chinese Room has implications across the board for understanding what meaning means. Mark Walker’s paper “On the Intertranslatability of all Natural Languages” describes how the translation of words and phrases may be achieved:

  1. Through a simple correspondence scheme (word for word)
  2. Through “syntactic” expansion of the languages to accommodate concepts that have no obvious equivalence (“optometrist” => “doctor for eye problems”, etc.)
  3. Through incorporation of foreign words and phrases as “loan words”
  4. Through “semantic” expansion where the foreign word is defined through its coherence within a larger knowledge network.

An example for (4) is the word “lepton” where many languages do not have a corresponding concept and, in fact, the concept is dependent on a bulwark of advanced concepts from particle physics. There may be no way to create a superposition of the meanings of other words using (2) to adequately handle “lepton.”

These problems present again for trying to understand how children acquire meaning in learning a language. As Walker points out, language learning for a second language must involve the same kinds of steps as learning translations, so any simple correspondence theory has to be supplemented.

So how do we make adequate judgments about meanings and so rapidly learn words, often initially with a course granularity but later with increasingly sharp levels of focus? What procedure is required for expanding correspondence theories to operate in larger networks? Methods like Latent Semantic Analysis and Random Indexing show how this can be achieved in ways that are illuminating about human cognition. In each case, the methods provide insights into how relatively simple transformations of terms and their occurrence contexts can be viewed as providing a form of “triangulation” about the meaning of words.… Read the rest

On the Soul-Eyes of Polar Bears

I sometimes reference a computational linguistics factoid that appears to be now lost in the mists of early DoD Tipster program research: Chinese linguists only agree on the segmentation of texts into words about 80% of the time. We can find some qualitative agreement on the problematic nature of the task, but the 80% is widely smeared out among the references that I can now find. It should be no real surprise, though, because even English with white-space tokenization resists easy characterization of words versus phrases: “New York” and “New York City” are almost words in themselves, though just given white-space tokenization are also phrases. Phrases lift out with common and distinct usage, however, and become more than the sum of their parts; it would be ridiculously noisy to match a search for “York” against “New York” because no one in the modern world attaches semantic significance to the “York” part of the phrase. It exists as a whole and the nature of the parts has dissolved against this wholism.

John Searle’s Chinese Room argument came up again today. My son was waxing, as he does, in a discussion about mathematics and order, and suggested a poverty of our considerations of the world as being purely and completely natural. He meant in the sense of “materialism” and “naturalism” meaning that there are no mystical or magical elements to the world in a metaphysical sense. I argued that there may nonetheless be something that is different and indescribable by simple naturalistic calculi: there may be qualia. It led, in turn, to a qualification of what is unique about the human experience and hence on to Searle’s Chinese Room.

And what happens in the Chinese Room?… Read the rest

Teleology, Chapter 5

Harry spent most of that summer involved in the Santa Fe Sangre de Cristo Church, first with the church summer camp, then with the youth group. He seemed happy and spent the evenings text messaging with his new friends. I was jealous in a way, but refused to let it show too much. Thursdays he was picked up by the church van and went to watch movies in a recreation center somewhere. I looked out one afternoon as the van arrived and could see Sarah’s bright hair shining through the high back window of the van.

Mom explained that they seemed to be evangelical, meaning that they liked to bring as many new worshippers into the religion as possible through outreach and activities. Harry didn’t talk much about his experiences. He was too much in the thick of things to be concerned with my opinions, I think, and snide comments were brushed aside with a beaming smile and a wave. “You just don’t understand,” Harry would dismissively tell me.

I was reading so much that Mom would often demand that I get out of the house on weekend evenings after she had encountered me splayed on the couch straight through lunch and into the shifting evening sunlight passing through the high windows of our thick-walled adobe. I would walk then, often for hours, snaking up the arroyos towards the mountains, then wend my way back down, traipsing through the thick sand until it was past dinner time.

It was during this time period that I read cyberpunk authors and became intrigued with the idea that someday, one day, perhaps computing machines would “wake up” and start to think on their own.… Read the rest

On the Non-Simulation of Human Intelligence

There is a curious dilemma that pervades much machine learning research. The solutions that we are trying to devise are supposed to minimize behavioral error by formulating the best possible model (or collection of competing models). This is also the assumption of evolutionary optimization, whether natural or artificial: optimality is the key to efficiently outcompeting alternative structures, alternative alleles, and alternative conceptual models. The dilemma is whether such optimality is applicable to the notoriously error prone, conceptual flexible, and inefficient reasoning of people. In other words, is machine learning at all like human learning?

I came across a paper called “Multi-Armed Bandit Bayesian Decision Making” while trying to understand what Ted Dunning is planning to talk about at the Big Data Science Meetup at SGI in Fremont, CA a week from Saturday (I’ll be talking, as well) that has a remarkable admission concerning this point:

Human behaviour is after all heavily influenced by emotions, values, culture and genetics; as agents operating in a decentralised system humans are notoriously bad at coordination. It is this fact that motivates us to develop systems that do coordinate well and that operate outside the realms of emotional biasing. We use Bayesian Probability Theory to build these systems specifically because we regard it as common sense expressed mathematically, or rather `the right thing to do’.

The authors continue on to suggest that therefore such systems should instead be seen as corrective assistants for the limitations of human cognitive processes! Machines can put the rational back into reasoned decision-making. But that is really not what machine learning is used for today. Instead, machine learning is used where human decision-making processes are unavailable due to the physical limitations of including humans “in the loop,” or the scale of the data involved, or the tediousness of the tasks at hand.… Read the rest