Free Will and Algorithmic Information Theory

I was recently looking for examples of applications of algorithmic information theory, also commonly called algorithmic information complexity (AIC). After all, for a theory to be sound is one thing, but when it is sound and valuable it moves to another level. So, first, let’s review the broad outline of AIC. AIC begins with the problem of randomness, specifically random strings of 0s and 1s. We can readily see that given any sort of encoding in any base, strings of characters can be reduced to a binary sequence. Likewise integers.

Now, AIC states that there are often many Turing machines that could generate a given string and, since we can represent those machines also as a bit sequence, there is at least one machine that has the shortest bit sequence while still producing the target string. In fact, if the shortest machine is as long or a bit longer (given some machine encoding requirements), then the string is said to be AIC random. In other words, no compression of the string is possible.

Moreover, we can generalize this generator machine idea to claim that given some set of strings that represent the data of a given phenomena (let’s say natural occurrences), the smallest generator machine that covers all the data is a “theoretical model” of the data and the underlying phenomena. An interesting outcome of this theory is that it can be shown that there is, in fact, no algorithm (or meta-machine) that can find the smallest generator for any given sequence. This is related to Turing Incompleteness.

In terms of applications, Gregory Chaitin, who is one of the originators of the core ideas of AIC, has proposed that the theory sheds light on questions of meta-mathematics and specifically that it demonstrates that mathematics is a quasi-empirical pursuit capable of producing new methods rather than being idealistically derived from analytic first-principles. Stephen Wolfram’s work on two dimensional cellular automata bolsters this idea in linking simple but Turing complete machines to physical models. Other efforts include using AIC to understand cognitive decision making by people, assuming that parts of their decision process are guided by some kind of minimal model constraints.

I’m investigating an application to the problem of free will. Now free will is a complex topic that dates at least back to Greek thinkers. In simplified form, free will contrasts with causal determinism, or the idea that everything in the natural world is caused by something. So our human choices aren’t really free in a strong sense (called “libertarian free will”), but are predetermined by a long chain of causation. My choice of a burger or a hot dog must be just an illusion rather than a conscious choice among options that are fully disconnected from all that brain chemistry, my upbringing, evolutionary contingent paths, etc. One way to consider free will, then, is that libertarian free will is incompatible with causal determinism.

But what of “compatibilist” ideas of how free will can coexist with causal determinism? They take on a range of forms, though an incompatibilist can generally argue that they “smuggle” free will-like higher-order choicemaking into the argument while determinism still controls the “ultimate” source of the choices.

But let’s consider some simplified logical models of how decisions might be made and see how AIC can help. First, imagine a world where a species of machines encounters either 3s or 8s. 3s are food for the machines while 8s are poison. The simplest model that the machines can adopt is to simply choose 3s, something like “if n = 3, eat, else ignore.” Now let’s imagine that we introduce new food options where 4s and 2s are food and 7s and 9s are poisons. A minimal machine evolves that infers the rule “if n < 7, eat, else ignore.” It beats out machines that look like this “if n = 3 OR n = 2 OR n = 4, eat, else ignore” given that the extra states provide no clear added value, at least given the integer foods and poisons that have been seen so far. It does so at small risk, however. What if 1s are poison? Well, a new modified rule is required under that eventuality, but the generalizing properties of the compact form of modeling using an inequality provides better generalization up until the counter examples arrive before the machines.

This idea of a minimal model as a series of separating hyperplanes between data points is perfectly causally determined by the machine’s universe, but the future outcome against new data is an emergent property of the model. So let’s add another module to the machine that makes choices about when to eat. If a machine eats too many good integers, it will get feel bad. A new module adds to the decision rule “if n < 7 AND m < 5, eat, else ignore” where m is the sum of recent feedings. Assume that the new module records feedings and returns m to the original feeding rule. Now we have an external governer that is also perfectly causal.

The architecture of these rules always centers on minimally sufficient decision criteria that were causally derived, but their status as minimal maximizes their applicability to the larger potential universe of outcomes. If these machines move to a new adaptive landscape with 6s that are perfectly edible, for instance, they generalize. That is, they are capable of new behaviors that were nevertheless intrinsic to their model formation. And, likewise, if the rules were too minimal, the revised form of the rule becomes the new minimal form.

Now let’s expand this idea to very complex human interactions. Assume that the human mind is constructed of mental models that are logical machines (no quantum weirdness or Descartian homunculi). These are very large semantic networks, procedural memory stores, short term memory, and all the other stuff we know and love about our minds. Most of them are fuzzy in the same sense that the minimal machines are; we generalize as best we can from experience and social expectations. It’s all perfectly causal, but the machines are minimal and therefore generalize resulting in outcomes that are not perfectly predictable by the inputs. Layer on self-awareness monitoring modules and you get even more of the same.

If, then, we say source determinism is the criteria for free will in the sense that we are only free if the source of our decision is us, then it is almost like that. Since the rule system encoded in our cognitions has causal sources in the environment, from education, from evolution, from bouts of madness, etc., the existence of our cognition is source deterministic. But the outcomes are in the rules themselves.

There is a strong analogy with the evolutionary algorithm. We can argue that the biological machines produced by evolution are completely determined by the adaptive landscape and constraints of chemistry, physics, and biology below that, but predicting the exact forms those machines take on is almost impossible because the solutions may derive from legacy adaptive solutions that are repurposed (panda’s thumb, etc.) At some point you just have to label it creative and emergent, not narrowly or “merely” causally derived from the working materials of previous generations.

Compatibilism lives on, then, with model complexity, derived from simplicity criteria, providing a strong argument against source determinism, if not more.