Apprendre à traduire

Google’s translate has always been a useful tool for awkward gists of short texts. The method used was based on building a phrase-based statistical translation model. To do this, you gather up “parallel” texts that are existing, human, translations. You then “align” them by trying to find the most likely corresponding phrases in each sentence or sets of sentences. Often, between languages, fewer or more sentences will be used to express the same ideas. Once you have that collection of phrasal translation candidates, you can guess the most likely translation of a new sentence by looking up the sequence of likely phrase groups that correspond to that sentence. IBM was the progenitor of this approach in the late 1980’s.

It’s simple and elegant, but it always was criticized for telling us very little about language. Other methods that use techniques like interlingual transfer and parsers showed a more linguist-friendly face. In these methods, the source language is parsed into a parse tree and then that parse tree is converted into a generic representation of the meaning of the sentence. Next a generator uses that representation to create a surface form rendering in the target language. The interlingua must be like the deep meaning of linguistic theories, though the computer science versions of it tended to look a lot like ontological representations with fixed meanings. Flexibility was never the strong suit of these approaches, but their flaws were much deeper than just that.

For one, nobody was able to build a robust parser for any particular language. Next, the ontology was never vast enough to accommodate the rich productivity of real human language. Generators, being the inverse of the parser, remained only toy projects in the computational linguistic community.… Read the rest

Boredom and Being a Decider

tds_decider2_v6Seth Lloyd and I have rarely converged (read: absolutely never) on a realization, but his remarkable 2013 paper on free will and halting problems does, in fact, converge on a paper I wrote around 1986 for an undergraduate Philosophy of Language course. I was, at the time, very taken by Gödel, Escher, Bach: An Eternal Golden Braid, Douglas Hofstadter’s poetic excursion around the topic of recursion, vertical structure in ricercars, and various other topics that stormed about in his book. For me, when combined with other musings on halting problems, it led to a conclusion that the halting problem could be probabilistically solved by an observer who decides when the recursion is too repetitive or too deep. Thus, it prescribes an overlay algorithm that guesses about the odds of another algorithm when subjected to a time or resource constraint. Thus we have a boredom algorithm.

I thought this was rather brilliant at the time and I ended up having a one-on-one with my prof who scoffed at GEB as a “serious” philosophical work. I had thought it was all psychedelically transcendent and had no deep understanding of more serious philosophical work beyond the papers by Kripke, Quine, and Davidson that we had been tasked to read. So I plead undergraduateness. Nevertheless, he had invited me to a one-on-one and we clashed over the concept of teleology and directedness in evolutionary theory. How we got to that from the original decision trees of halting or non-halting algorithms I don’t recall.

But now we have an argument that essentially recapitulates that original form, though with the help of the Hartmanis-Stearns theorem to support it. Whatever the algorithm that runs in our heads, it needs to simulate possible outcomes and try to determine what the best course of action might be (or the worst course, or just some preference).… Read the rest

Evolving Visions of Chaotic Futures

FlutterbysMost artificial intelligence researchers think unlikely the notion that a robot apocalypse or some kind of technological singularity is coming anytime soon. I’ve said as much, too. Guessing about the likelihood of distant futures is fraught with uncertainty; current trends are almost impossible to extrapolate.

But if we must, what are the best ways for guessing about the future? In the late 1950s the Delphi method was developed. Get a group of experts on a given topic and have them answer questions anonymously. Then iteratively publish back the group results and ask for feedback and revisions. Similar methods have been developed for face-to-face group decision making, like Kevin O’Connor’s approach to generating ideas in The Map of Innovation: generate ideas and give participants votes equaling a third of the number of unique ideas. Keep iterating until there is a consensus. More broadly, such methods are called “nominal group techniques.”

Most recently, the notion of prediction markets has been applied to internal and external decision making. In prediction markets,  a similar voting strategy is used but based on either fake or real money, forcing participants towards a risk-averse allocation of assets.

Interestingly, we know that optimal inference based on past experience can be codified using algorithmic information theory, but the fundamental problem with any kind of probabilistic argument is that much change that we observe in society is non-linear with respect to its underlying drivers and that the signals needed are imperfect. As the mildly misanthropic Nassim Taleb pointed out in The Black Swan, the only place where prediction takes on smooth statistical regularity is in Las Vegas, which is why one shouldn’t bother to gamble.… Read the rest

The Goldilocks Complexity Zone

FractalSince my time in the early 90s at Santa Fe Institute, I’ve been fascinated by the informational physics of complex systems. What are the requirements of an abstract system that is capable of complex behavior? How do our intuitions about complex behavior or form match up with mathematical approaches to describing complexity? For instance, we might consider a snowflake complex, but it is also regular in it’s structure, driven by an interaction between crystal growth and the surrounding air. The classic examples of coastlines and fractal self-symmetry also seem complex but are not capable of complex behavior.

So what is a good way of thinking about complexity? There is actually a good range of ideas about how to characterize complexity. Seth Lloyd rounds up many of them, here. The intuition that drives many of them is that complexity seems to be associated with distributions of relationships and objects that are somehow juxtapositioned between a single state and a uniformly random set of states. Complex things, be they living organisms or computers running algorithms, should exist in a Goldilocks zone when each part is examined and those parts are somehow summed up to a single measure.

We can easily construct a complexity measure that captures some of these intuitions. Let’s look at three strings of characters:

x = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

y = menlqphsfyjubaoitwzrvcgxdkbwohqyxplerz

z = the fox met the hare and the fox saw the hare

Now we would likely all agree that y and z are more complex than x, and I suspect most would agree that y looks like gibberish compared with z. Of course, y could be a sequence of weirdly coded measurements or something, or encrypted such that the message appears random.… Read the rest