There is a curious dilemma that pervades much machine learning research. The solutions that we are trying to devise are supposed to minimize behavioral error by formulating the best possible model (or collection of competing models). This is also the assumption of evolutionary optimization, whether natural or artificial: optimality is the key to efficiently outcompeting alternative structures, alternative alleles, and alternative conceptual models. The dilemma is whether such optimality is applicable to the notoriously error prone, conceptual flexible, and inefficient reasoning of people. In other words, is machine learning at all like human learning?
I came across a paper called “Multi-Armed Bandit Bayesian Decision Making” while trying to understand what Ted Dunning is planning to talk about at the Big Data Science Meetup at SGI in Fremont, CA a week from Saturday (I’ll be talking, as well) that has a remarkable admission concerning this point:
Human behaviour is after all heavily influenced by emotions, values, culture and genetics; as agents operating in a decentralised system humans are notoriously bad at coordination. It is this fact that motivates us to develop systems that do coordinate well and that operate outside the realms of emotional biasing. We use Bayesian Probability Theory to build these systems specifically because we regard it as common sense expressed mathematically, or rather `the right thing to do’.
The authors continue on to suggest that therefore such systems should instead be seen as corrective assistants for the limitations of human cognitive processes! Machines can put the rational back into reasoned decision-making. But that is really not what machine learning is used for today. Instead, machine learning is used where human decision-making processes are unavailable due to the physical limitations of including humans “in the loop,” or the scale of the data involved, or the tediousness of the tasks at hand.
For example, automatic parts-of-speech tagging could be done by row after row of professional linguists who mark-up the text with the correct parts-of-speech. Where occasionally great ambiguity arises, they would have meetings to reach agreement on the correct assignment of the part. This kind of thing is still done. I worked with a company that creates conceptual models of the biological results expressed in research papers. The models are created by PhD biologists who are trained in the conceptual ontology they have developed over the years through a process of arguing and consensus development. Yahoo! originally used teams of ontologists to classify web pages. Automatic machine translation is still unacceptable for most professional translation tasks, though it can be useful for gisting.
So the argument that the goal of these systems is to overcome the cognitive limitations of people is mostly incorrect, I think. Instead, the real reason why we explore topics like Bayesian probability theory for machine learning is that the mathematics gives us traction against the problems. For instance, we could try to study the way experts make decisions about parts-of-speech and create a rules system that contained every little rule. This would be an “expert system,” but even the creation of such a system requires careful assessment of massive amounts of detail. That scalability barrier rises again and emotional biases are not much at play except where they result in boredom and ennui due to sheer tedium.