Meaning is a problem. We think we might know what something means but we keep being surprised by the facts, research, and logical difficulties that surround the notion of meaning. Putnam’s Representation and Reality runs through a few different ways of thinking about meaning, though without reaching any definitive conclusions beyond what meaning can’t be.
Children are a useful touchstone concerning meaning because we know that they acquire linguistic skills and consequently at least an operational understanding of meaning. And how they do so is rather interesting: first, presume that whole objects are the first topics for naming; next, assume that syntactic differences lead to semantic differences (“the dog” refers to the class of dogs while “Fido” refers to the instance); finally, prefer that linguistic differences point to semantic differences. Paul Bloom slices and dices the research in his Précis of How Children Learn the Meanings of Words, calling into question many core assumptions about the learning of words and meaning.
These preferences become useful if we want to try to formulate an algorithm that assigns meaning to objects or groups of objects. Probabilistic Latent Semantic Analysis, for example, assumes that words are signals from underlying probabilistic topic models and then derives those models by estimating all of the probabilities from the available signals. The outcome lacks labels, however: the “meaning” is expressed purely in terms of co-occurrences of terms. Reconciling an approach like PLSA with the observations about children’s meaning acquisition presents some difficulties. The process seems too slow, for example, which was always a complaint about connectionist architectures of artificial neural networks as well. As Bloom points out, kids don’t make many errors concerning meaning and when they do, they rapidly compensate.
I’ve previously proposed a model for lexical acquisition that uses a coding hierarchy based on co-occurrence or other features. As new terms are observed, the hierarchy builds, in an unsupervised manner, by making local swaps and consolidations based on minimum description length principles. Thus, it bears a close relationship to Nevill-Manning’s SEQUITUR approach to sequence learning. There is a limitation to the approach in that in a tree-like grammar the complexity of examining all possible re-arrangements of the grammar when new symbols arrive seems to put a massive burden on any cognitive correlates that we might claim exist. Thus the system just uses local swaps and consolidations.
It’s worth considering how such an approach might solve the cluster labeling problem. If we cluster things together based on the parsimonious coding approach, the objects and their grammatical coordinations move higher up the tree. What is missing is a preference for adding new, distinctive terms that differentiate one grouping from another. For instance, in the toy sample given in my paper, “Financial Institution” or “Retail Bank” are not applied to the appropriate bank cluster, nor is “River Bank” applied to the other bank cluster. Instead we are just left with the shared context terms. I think this might be correctable in a larger grouping, however, by allowing for a distinguishing series of portmanteaus to be constructed by composition from nearby (in the semantic region) concepts. So, as the co-occurrences of bank and teller and ATM and loan pile up and get coded into groupings, the nearby finance, bank, retail bank, investment bank grouping is used to create a common portmanteau out of the most distinctive terms out of the set, and such that they most distinguish from the river semantic set.