Active Deep Learning

BrainDeep Learning methods that use auto-associative neural networks to pre-train (with bottlenecking methods to ensure generalization) have recently been shown to perform as well and even better than human beings at certain tasks like image categorization. But what is missing from the proposed methods? There seem to be a range of challenges that revolve around temporal novelty and sequential activation/classification problems like those that occur in natural language understanding. The most recent achievements are more oriented around relatively static data presentations.

Jürgen Schmidhuber revisits the history of connectionist research (dating to the 1800s!) in his October 2014 technical report, Deep Learning in Neural Networks: An Overview. This is one comprehensive effort at documenting the history of this reinvigorated area of AI research. What is old is new again, enhanced by achievements in computing that allow for larger and larger scale simulation.

The conclusions section has an interesting suggestion: what is missing so far is the sensorimotor activity loop that allows for the active interrogation of the data source. Human vision roams over images while DL systems ingest the entire scene. And the real neural systems have energy constraints that lead to suppression of neural function away from the active neural clusters.

Read the rest

The Deep Computing Lessons of Apollo

Apollo 11With the arrival of the Apollo 11 mission’s 45th anniversary, and occasional planning and dreaming about a manned mission to Mars, the role of information technology comes again into focus. The next great mission will include a phalanx of computing resources, sensors, radars, hyper spectral cameras, laser rangefinders, and information fusion visualization and analysis tools to knit together everything needed for the astronauts to succeed. Some of these capabilities will be autonomous, predictive, and knowledgable.

But it all began with the Apollo Guidance Computer or AGC, the rather sophisticated for-its-time computer that ran the trigonometric and vector calculations for the original moonshot. The AGC was startlingly simple in many ways, made up exclusively of NOR gates to implement Arithmetic Logic Unit-like functionality, shifts, and register opcodes combined with core memory (tiny ferromagnetic loops) in both RAM and ROM forms (the latter hand-woven by graduate students).

Using NOR gates to create the entire logic of the central processing unit is guided by a few simple principles. A NOR gate combines both NOT and OR functionality together and has the following logical functionality:

[table id=1 /]

The NOT-OR logic can be read as “if INPUT1 or INPUT2 is set to 1, then the OUTPUT should be 1, but then take the logical inversion (NOT) of that”. And, amazingly, circuits built from NORs can create any Boolean logic. NOT A is just NOR(A,A), which you can see from the following table:

[table id=2 /]

AND and OR can similarly be constructed by layering NORs together. For Apollo, the use of just a single type of integrated circuit that packaged NORs into chips improved reliability.

This level of simplicity has another important theoretical result that bears on the transition from simple guidance systems to potentially intelligent technologies for future Mars missions: a single layer of Boolean functions can only compute simple things.… Read the rest

Inching Towards Shannon’s Oblivion

SkynetFollowing Bill Joy’s concerns over the future world of nanotechnology, biological engineering, and robotics in 2000’s Why the Future Doesn’t Need Us, it has become fashionable to worry over “existential threats” to humanity. Nuclear power and weapons used to be dreadful enough, and clearly remain in the top five, but these rapidly developing technologies, asteroids, and global climate change have joined Oppenheimer’s misquoted “destroyer of all things” in portending our doom. Here’s Max Tegmark, Stephen Hawking, and others in Huffington Post warning again about artificial intelligence:

One can imagine such technology outsmarting financial markets, out-inventing human researchers, out-manipulating human leaders, and developing weapons we cannot even understand. Whereas the short-term impact of AI depends on who controls it, the long-term impact depends on whether it can be controlled at all.

I almost always begin my public talks on Big Data and intelligent systems with a presentation on industrial revolutions that progresses through Robert Gordon’s phases and then highlights Paul Krugman’s argument that Big Data and the intelligent systems improvements we are seeing potentially represent a next industrial revolution. I am usually less enthusiastic about the timeline than nonspecialists, but after giving a talk at PASS Business Analytics Friday in San Jose, I stuck around to listen in on a highly technical talk concerning statistical regularization and deep learning and I found myself enthused about the topic once again. Deep learning is using artificial neural networks to classify information, but is distinct from traditional ANNs in that the systems are pre-trained using auto-encoders to have a general knowledge about the data domain. To be clear, though, most of the problems that have been tackled are “subsymbolic” for image recognition and speech problems.… Read the rest

Computing the Madness of People

Bubble playing cardThe best paper I’ve read so far this year has to be Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-sample Performance by David Bailey, Jonathan Borwein, Marcos López de Prado, and Qiji Jim Zhu. The title should ring alarm bells with anyone who has ever puzzled over the disclaimers made by mutual funds or investment strategists that “past performance is not a guarantee of future performance.” No, but we have nothing but that past performance to judge the fund or firm on; we could just pick based on vague investment “philosophies” like the heroizing profiles in Kiplingers seem to promote or trust that all the arbitraging has squeezed the markets into perfect equilibria and therefore just use index funds.

The paper’s core tenets extend well beyond financial charlatanism, however. They point out that the same problem arises in drug discovery where main effects of novel compounds may be due to pure randomness in the sample population in a way that is masked by the sample selection procedure. The history of mental illness research has similar failures, with the head of NIMH remarking that clinical trials and the DSM for treating psychiatric symptoms is too often “shooting in the dark.”

The core suggestion of the paper is remarkably simple, however: use held-out data to validate models. Remarkably simple but apparently rarely done in quantitative financial analysis. The researchers show how simple random walks can look like a seasonal price pattern, and how by sending binary signals about market performance to clients (market will rise/market will fall) investment advisors can create a subpopulation that thinks they are geniuses as other clients walk away due to losses. These rise to the level of charlatanism but the problem of overfitting is just one of pseudo-mathematics where insufficient care is used in managing the data.… Read the rest

Saving Big Data from the Zeros

ZerosBecause of the hype cycle, Big Data inevitably attracts dissenters who want to deflate a bit the lofty expectations that are built around new technologies that appear mystifying to those on the outside of the Silicon Valley machine. The first response is generally “so what?” and that there is nothing new here, just rehashing efforts like grid computing and Beowulf and whatnot. This skepticism is generally a healthy inoculation against aggrandizement and any kind of hangover from unmet expectations. Hence, the NY Times op-ed from April 6th, Eight (No, Nine!) Problems with Big Data should be embraced for enumerating eight or nine different ways that Big Data technologies, algorithms and thinking might be stretching the balloon of hope towards a loud, but ineffectual, pop.

The eighth of the list bears some scrutiny, though. The authors, who I am not familiar with, focus on the overuse of trigrams in building statistical language models. And they note that language is very productive and that even a short sentence from Rob Lowe, “dumbed-down escapist fare,” doesn’t appear in the indexed corpus of Google. Shades of “colorless green ideas…” from Chomsky, but an important lesson in how to manage the composition of meaning. Dumbed-down escapist fare doesn’t translate well back-and-forth through German via the Google translate capability. For the authors, that shows the failure of the statistical translation methodology linked to Big Data, and ties in to their other concerns about predicting rare occurrences or even, in the case of Lowe’s quote, zero occurrences.

In reality, though, these methods of statistical translation through parallel text learning date to the late 1980s and reflect a distinct journey through ways of thinking about natural language and computing.… Read the rest

Parsimonious Portmanteaus

portmanteauMeaning is a problem. We think we might know what something means but we keep being surprised by the facts, research, and logical difficulties that surround the notion of meaning. Putnam’s Representation and Reality runs through a few different ways of thinking about meaning, though without reaching any definitive conclusions beyond what meaning can’t be.

Children are a useful touchstone concerning meaning because we know that they acquire linguistic skills and consequently at least an operational understanding of meaning. And how they do so is rather interesting: first, presume that whole objects are the first topics for naming; next, assume that syntactic differences lead to semantic differences (“the dog” refers to the class of dogs while “Fido” refers to the instance); finally, prefer that linguistic differences point to semantic differences. Paul Bloom slices and dices the research in his Précis of How Children Learn the Meanings of Words, calling into question many core assumptions about the learning of words and meaning.

These preferences become useful if we want to try to formulate an algorithm that assigns meaning to objects or groups of objects. Probabilistic Latent Semantic Analysis, for example, assumes that words are signals from underlying probabilistic topic models and then derives those models by estimating all of the probabilities from the available signals. The outcome lacks labels, however: the “meaning” is expressed purely in terms of co-occurrences of terms. Reconciling an approach like PLSA with the observations about children’s meaning acquisition presents some difficulties. The process seems too slow, for example, which was always a complaint about connectionist architectures of artificial neural networks as well. As Bloom points out, kids don’t make many errors concerning meaning and when they do, they rapidly compensate.… Read the rest

Algorithmic Aesthetics

Jared Tarbell’s work in algorithmic composition via processing.org continues to amaze me. See more, here. The relatively compact descriptions of complex landscapes lend themselves to treatment as aesthetic phenomena where the scale of the grammars versus the complexity of the results asks the question what is art and how does it relate to human neurosystems?

 

 … Read the rest

Novelty in the Age of Criticism

Gary Cutting from Notre Dame and the New York Times knows how to incite an intellectual riot, as demonstrated by his most recent The Stone piece, Mozart vs. the Beatles. “High art” is superior to “low art” because of its “stunning intellectual and emotional complexity.” He sums up:

My argument is that this distinctively aesthetic value is of great importance in our lives and that works of high art achieve it much more fully than do works of popular art.

But what makes up these notions of complexity and distinctive aesthetic value? One might try to enumerate those values or create a list. Or, alternatively, one might instead claim that time serves as a sieve for the values that Cutting is claiming make one work of art superior to another, thus leaving open the possibility for the enumerated list approach to be incomplete but still a useful retrospective system of valuation.

I previously argued in a 1994 paper (published in 1997), Complexity Formalisms, Order and Disorder in the Structure of Art, that simplicity and random chaos exist in a careful balance in art that reflects our underlying grammatical systems that are used to predict the environment. And Jürgen Schmidhuber took the approach further by applying algorithmic information theory to novelty seeking behavior that leads, in turn, to aesthetically pleasing models. The reflection of this behavioral optimization in our sideline preoccupations emerges as art, with the ultimate causation machine of evolution driving the proximate consequences for men and women.

But let’s get back to the flaw I see in Cutting’s argument that, in turn, fits better with Schmidhuber’s approach: much of what is important in art is cultural novelty. Picasso is not aesthetically superior to the detailed hyper-reality of Dutch Masters, for instance, but is notable for his cultural deconstruction of the role of art as photography and reproduction took hold.… Read the rest