The meaning of words and phrases can be a bit hard to pin down. Indeed, the meaning of meaning itself is problematical. I can point to a dictionary and say, well, there is where we keep the meanings of things, but that is just a record of the way in which we use the language. I’m personally fond of a kind of philosophical perspective on this matter of meaning that relies on a form of holism. That is, words and their meanings are defined by our usages of them, our historical interactions with them in different contexts, and subtle distinctive cues that illuminate how words differ and compare. Often, but not always, the words are tied to things in the world, as well, and therefore have a fastness that resists distortions and distinctions.
This is, of course, a critical area of inquiry when trying to create intelligent machines that deal with language. How do we imbue the system with meaning, represent it within the machine, and apply it to novel problems that show intelligent behavior? In approaching the problem, we must therefore be achieving some semblance of intelligence in a fairly rigorous way since we are simulating it with logical steps.
The history of philosophical and linguistic interest in these topics is fascinating, ranging from Wittgenstein’s notion of a language game that builds up rules of use to Firth’s expansion to formalization of collocation of words as critical to meaning. In artificial intelligence, this concept of collocation has been expanded further to include interchangeability of contexts. Thus, boat and ship occur in more similar contexts than boat and bank.
A general approach to acquiring these contexts is based on the idea of dimensionality reduction in various forms. If we build up a matrix of word co-occurrences with other words, we just have a convenient and high-dimensional lookup table, but sussing-out some kind of underlying pattern requires that we restrict or reduce the relationships to only those that are repetitive in a significant way.
And here is where we run into a zoo of possibilities. We can, for example, use statistical methods like factor analysis after making certain assumptions about the distributions of terms. We can apply parallel concepts from matrix mathematics that involve singular value decomposition and the elimination of a number of low-scoring singular values. We can apply an artificial neural network to predict context terms but set the number of nodes in the hidden layer to a number much lower than the number of terms, thus bottlenecking the system into merging statistically insignificant patterns. We can also just add together random vectors that represent each term within their contexts and amazingly achieve similar results.
In each case, the terms that are related by context move together in this reduced dimensional space and push away terms that are not related. This seems something like meaning, uses Firth’s collocates, and can easily be shown to group related words together.
But Wittgenstein’s language games were likened to chess where the meaning of a piece, like a rook, is purely derived from the rules that encompass its range of available actions, and he seemed to be getting at the discourse and social events that draw distinctions, not merely written collocates and texts. Those are subsequent to the learning of the language game.
It’s here where the computational methods start to break down. And we can see it also in relatively simple discourse where negatives are modifying content words. Content words occur but their meaning is no longer related to collocation. So “not at the bank” is vaguely related to “not money” but, if you query semantic similarity with something like “the vault is full” the word not has little functional say in the much rarer valences of “bank” and “vault.” The negation is critical to meaning in this case, but it hardly registers with most distributional and collocate-based approaches to distributional semantics.
At least at the first level. It seems that things might be improved by overlaying fine-tuned classifiers on top of core distributional relationships. So now, though “bank” and “vault” trigger similar semantic initial similarities, the overlay model nixes those relationships by being tuned to carve out the effects of the negation.
Ines Skelac and Andrew Landrić in Meaning as Use: From Wittgenstein to Google’s Word2vec point to other issues that arise, like having synonymy relative to a subset of contexts. This happens all too regularly in language: “He’s trying to torpedo the effort!” In this context, the metaphorical use of torpedo has little to do with the other meanings of the word, yet most distributional approaches can’t distinguish this because a word can only have one context representation that embodies all of the contexts it occurs in as a reduced representational signature. Real meanings often disconnect from one another.
But that’s just the beginning of our language games. We learn discourse contexts, physical contexts, imaginary/hypothetical contexts, social contexts, political contexts, and so forth. These shifting rules are only partially uncovered over the course of our lives. In simulations that focus on collocate structure, they ignore the implications of physicality and social relationships that make some rules untenable.