Singularity and its Discontents

Kimmel botIf a machine-based process can outperform a human being is it significant? That weighty question hung in the background as I reviewed Jürgen Schmidhuber’s work on traffic sign classification. Similar results have emerged from IBM’s Watson competition and even on the TOEFL test. In each case, machines beat people.

But is that fact significant? There are a couple of ways we can look at these kinds of comparisons. First, we can draw analogies to other capabilities that were not accessible by mechanical aid and show that the fact that they outperformed humans was not overly profound. The wheel quickly outperformed human legs for moving heavy objects. The cup outperformed the hands for drinking water. This then invites the realization that the extension of these physical comparisons leads to extraordinary juxtapositions: the airline really outperformed human legs for transport, etc. And this, in turn, justifies the claim that since we are now just outperforming human mental processes, we can only expect exponential improvements moving forward.

But this may be a category mistake in more than the obvious differentiator of the mental and the physical. Instead, the category mismatch is between levels of complexity. The number of parts in a Boeing 747 is 6 million versus one moving human as the baseline (we could enumerate the cells and organelles, etc., but then we would need to enumerate the crystal lattices of the aircraft steel, so that level of granularity is a wash). The number of memory addresses in a big server computer is 64 x 10^9 or higher, with disk storage in the TBs (10^12). Meanwhile, the human brain has 100 x 10^9 neurons and 10^14 connections. So, with just 2 orders of magnitude between computers and brains versus 6 between humans and planes, we find ourselves approaching Kurzweil’s argument that we have to wait until 2040.… Read the rest

Curiouser and Curiouser

georgeJürgen Schmidhuber’s work on algorithmic information theory and curiosity is worth a few takes, if not more, for the researcher has done something that is both flawed and rather brilliant at the same time. The flaws emerge when we start to look deeply into the motivations for ideas like beauty (is symmetry and noncomplex encoding enough to explain sexual attraction? Well-understood evolutionary psychology is probably a better bet), but the core of his argument is worth considering.

If induction is an essential component of learning (and we might suppose it is for argument’s sake), then why continue to examine different parameterizations of possible models for induction? Why be creative about how to explain things, like we expect and even idolize of scientists?

So let us assume that induction is explained by the compression of patterns into better and better models using an information theoretic-style approach. Given this, Schmidhuber makes the startling leap that better compression and better models are best achieved by information harvesting behavior that involves finding novelty in the environment. Thus curiosity. Thus the implementation of action in support of ideas.

I proposed a similar model to explain aesthetic preferences for mid-ordered complex systems of notes, brush-strokes, etc. around 1994, but Schmidhuber’s approach has the benefit of not just characterizing the limitations and properties of aesthetic systems, but also justifying them. We find interest because we are programmed to find novelty, and we are programmed to find novelty because we want to optimize our predictive apparatus. The best optimization is actively seeking along the contours of the perceivable (and quantifiable) universe, and isolating the unknown patterns to improve our current model.… Read the rest

Industrial Revolution #4

Paul Krugman at New York Times consumes Robert Gordon’s analysis of economic growth and the role of technology and comes up more hopeful than Gordon. The kernel in Krugman’s hope is that Big Data analytics can provide a shortcut to intelligent machines by bypassing the requirement for specification and programming that was once assumed to be a requirement for artificial intelligence. Instead, we don’t specify but use “data-intensive ways” to achieve a better result. And we might get to IR#4, following Gordon’s taxonomy where IR stands for “industrial revolution.” IR#1 was steam and locomotives  IR#2 was everything up to computers. IR#3 is computers and cell phones and whatnot.

Krugman implies that IR#4 might spur the typical economic consequences of grand technological change, including the massive displacement of workers, but like in previous revolutions it is also assumed that economic growth built from new industries will ultimately eclipse the negatives. This is not new, of course. Robert Anton Wilson argued decades ago for the R.I.C.H. economy (Rising Income through Cybernetic Homeostasis). Wilson may have been on acid, but Krugman wasn’t yet tuned in, man. (A brief aside: the Krugman/Wilson notions probably break down over extraction and agribusiness/land rights issues. If labor is completely replaced by intelligent machines, the land and the ingredients it contains nevertheless remain a bottleneck for economic growth. Look at the global demand for copper and rare earth materials, for instance.)

But why the particular focus on Big Data technologies? Krugman’s hope teeters on the assumption that data-intensive algorithms possess a fundamentally different scale and capacity than human-engineered approaches. Having risen through the computational linguistics and AI community working on data-driven methods for approaching intelligence, I can certainly sympathize with the motivation, but there are really only modest results to report at this time.… Read the rest

Sparse Grokking

Jeff Hawkins of Palm fame shows up in the New York Times hawking his Grok for Big Data predictions. Interestingly, if one drills down into the details of Grok, we see once again that randomized sparse representations are the core of the system. That is, if we assign symbols random representational vectors that are sparse, we can sum the vectors for co-occurring symbols and, following J.R. Firth’s pithy “words shall be known by the company that they keep” start to develop a theory of meaning that would not offend Wittgenstein.

Is there anything new to Hawkins’ effort? For certain types of time-series prediction, the approach parallels artificial neural network designs, replacing the complexity of shifting, multi-epoch training regimens that, in effect, build the high-dimensional distances between co-occurring events by gradually moving time-correlated data together and uncorrelated data apart with an end-run around all the computational complexity. But then there is Random Indexing, which I’ve previously discussed, here. If one restricts Random Indexing to operating on temporal patterns, or on spatial patterns, then the results start to look like Numenta’s offering.

While there is a bit of opportunism in Hawkins latching onto Big Data to promote an application of methods he has been working on for years, there are very real opportunities for trying to mine leading indicators to help with everything from ecommerce to research and development. Many flowers will bloom, grok, die, and be reborn.… Read the rest

Bats and Belfries

Thomas Nagel proposes a radical form of skepticism in his new book, Minds and Cosmos, continuing his trajectory through subjective experience and moral realism first began with bats zigging and zagging among the homunculi of dualism reimagined in the form of qualia. The skepticism involves disputing materialistic explanations and proposing, instead, that teleological ones of an unspecified form will likely apply, for how else could his subtitle that paints the “Neo-Darwinian Concept of Nature” as likely false hold true?

Nagel is searching for a non-religious explanation, of course, because just enervating nature through fiat is hardly an explanation at all; any sort of powerful, non-human entelechy could be gaming us and the universe in a non-coherent fashion. But what parameters might support his argument? Since he apparently requires a “significant likelihood” argument to hold sway in support of the origins of life, for instance, we might imagine what kind of thinking could result in highly likely outcomes that begin with inanimate matter and lead to goal-directed behavior while supporting a significant likelihood of that outcome. The parameters might involve the conscious coordination of the events leading towards the emergence of goal-directed life, thus presupposing a consciousness that is not our own. We are back then to our non-human entelechy looming like an alien or like a strange creator deity (which is not desirable to Nagel). We might also consider the possibility that there are properties to the universe itself that result in self-organization and that either we don’t yet know or that we are only beginning to understand. Elliot Sober’s critique suggests that the 2nd Law of Thermodynamics results in what I might call “patterned” behavior while not becoming “goal-directed” per se.… Read the rest

Universal Artificial Social Intelligence

Continuing to develop the idea that social reasoning adds to Hutter’s Universal Artificial Intelligence model, below is his basic layout for agents and environments:

A few definitions: The Agent (p) is a Turing machine that consists of a working tape and an algorithm that can move the tape left or right, read a symbol from the tape, write a symbol to the tape, and transition through a finite number of internal states as held in a table. That is all that is needed to be a Turing machine and Turing machines can compute like our every day notion of a computer. Formally, there are bounds to what they can compute (for instance, whether any given program consisting of the symbols on the tape will stop at some point or will run forever without stopping (this is the so-called “halting problem“). But it suffices to think of the Turing machine as a general-purpose logical machine in that all of its outputs are determined by a sequence of state changes that follow from the sequence of inputs and transformations expressed in the state table. There is no magic here.

Hutter then couples the agent to a representation of the environment, also expressed by a Turing machine (after all, the environment is likely deterministic), and has the output symbols of the agent consumed by the environment (y) which, in turn, outputs the results of the agent’s interaction with it as a series of rewards (r) and environment signals (x), that are consumed by agent once again.

Where this gets interesting is that the agent is trying to maximize the reward signal which implies that the combined predictive model must convert all the history accumulated at one point in time into an optimal predictor.… Read the rest

Multitudes and the Mathematics of the Individual

The notion that there is a path from reciprocal altruism to big brains and advanced cognitive capabilities leads us to ask whether we can create “effective” procedures that shed additional light on the suppositions that are involved, and their consequences. Any skepticism about some virulent kind of scientism then gets whisked away by the imposition of a procedure combined with an earnest interest in careful evaluation of the outcomes. That may not be enough, but it is at least a start.

I turn back to Marcus Hutter, Solomonoff, and Chaitin-Kolmogorov at this point.  I’ll be primarily referencing Hutter’s Universal Algorithmic Intelligence (A Top-Down Approach) in what follows. And what follows is an attempt to break down how three separate factors related to intelligence can be explained through mathematical modeling. The first and the second are covered in Hutter’s paper, but the third may represent a new contribution, though perhaps an obvious one without the detail work that is needed to provide good support.

First, then, we start with a core requirement of any goal-seeking mechanism: the ability to predict patterns in the environment external to the mechanism. This is well-covered since Solomonoff in the 60s who formalized the implicit arguments in Kolmogorov algorithmic information theory (AIT), and that were subsequently expanded on by Greg Chaitin. In essence, given a range of possible models represented by bit sequences of computational states, the shortest sequence that predicts the observed data is also the optimal predictor for any future data also produced by the underlying generator function. The shortest sequence is not computable, but we can keep searching for shorter programs and come up with unique optimizations for specific data landscapes. And that should sound familiar because it recapitulates Occam’s Razor and, in a subset of cases, Epicurus’ Principle of Multiple Explanations.… Read the rest

Reciprocity and Abstraction

Fukuyama’s suggestion is intriguing but needs further development and empirical support before it can be considered more than a hypothesis. To be mildly repetitive, ideology derived from scientific theories should be subject to even more scrutiny than religious-political ideologies if for no other reason than it can be. But in order to drill down into the questions surrounding how reciprocal altruism might enable the evolution of linguistic and mental abstractions, we need to simplify the problems down to basics, then work outward.

So let’s start with reciprocal altruism as a mere mathematical game. The iterated prisoner’s dilemma is a case study: you and a compatriot are accused of a heinous crime and put in separate rooms. If you deny involvement and so does your friend you will each get 3 years prison. If you admit to the crime and so does your friend you will both get 1 year (cooperation behavior). But if you or your co-conspirator deny involvement while fingering the other, one gets to walk free while the other gets 6 years (defection strategy). Joint fingering is equivalent to two denials at 3 years since the evidence is equivocal. What does one do as a “rational actor” in order to minimize penalization? The only solution is to betray your friend while denying involvement (deny, deny, deny): you get either 3 years (assuming he also denies involvement), or you walk (he denies), or he fingers you also which is the same as dual denials at 3 years each. The average years served are 1/3*3 + 1/3*0 + 1/3*3 = 3 years versus 1/2*1 + 1/2*6 = 3.5 years for admitting to the crime.

In other words it doesn’t pay to cooperate.… Read the rest

Science, Pre-science, and Religion

Francis Fukuyama in The Origins of Political Order: From Prehuman Times to the French Revolution draws a bright line from reciprocal altruism to abstract reasoning, and then through to religious belief:

Game theory…suggests that individuals who interact with one another repeatedly tend to gravitate toward cooperation with those who have shown themselves to be honest and reliable, and shun those who have behaved opportunistically. But to do this effectively, they have to be able to remember each other’s past behavior and to anticipate likely future behavior based on an interpretation of other people’s motives.

Then, language allows transmission of historical patterns (largely gossip in tight-knit social groups) and abstractions about ethical behaviors until, ultimately:

The ability to create mental models and to attribute causality to invisible abstractions is in turn the basis for the emergence of religion.

But this can’t be the end of the line. Insofar as abstract beliefs can attribute repetitive weather patterns to Olympian gods, or consolidate moral reasoning to a monotheistic being, the same mechanisms of abstraction must be the basis for scientific reasoning as well. Either that or the cognitive capacities for linguistic abstraction and game theory are not cross-applicable to scientific thinking, which seems unlikely.

So the irony of assertions that science is just another religion is that they certainly share a similar initial cognitive evolution, while nevertheless diverging in their dependence on faith and supernatural expectations, on the one hand, and channeling the predictive models along empirical contours on the other.… Read the rest

On the Structure of Brian Eno

I recently came across an ancient document, older than my son, dating to 1994 when I had a brief FAX-based exchange of communiques with Brian Eno, the English eclectic electronic musician and producer of everything from Bowie’s Low through to U2’s Joshua Tree and Jane Siberry. Eno had been pointed at one of my colleague’s efforts (Eric in the FAXes, below) at using models of RNA replication to create music by the editor of Whole Earth Catalog who saw Eric present at an Artificial Life conference. I was doing other, somewhat related work, and Eric allowed me to correspond with Mr. Eno. I did, resulting in a brief round of FAXes (email was fairly new to the non-specialist in 1994).

I later dropped off a copy of a research paper I had written at his London office and he was summoned down from an office/loft and shook his head in the negative about me. I was shown the door by the receptionist.

Below is my last part of the FAX interchange. Due to copyright and privacy concerns, I’ll only show my part of the exchange (and, yes, I misspelled “Britain”). Notably, Brian still talks about the structure of music and art in recent interviews.

Read the rest