The buzz about ChatGPT and related efforts has been surprisingly resistant to the standard deflationary pressure of the Gartner hype cycle. Quantum computing definitely fizzled but appears to be moving towards the plateau of productivity with recent expansions of the number of practical qubits available by IBM and Origin in China, as well as additional government funding out of national security interests and fears. But ChatGPT attracted more sustained attention because people can play with it easily without needing to understand something like Shor’s algorithm for factoring integers. Instead, you just feed it a prompt and are amazed that it writes so well. And related image generators are delightful (as above) and may represent a true displacement of creative professionals even at this early stage, with video hallucinators evolving rapidly too.
But are Large Language Models (LLMs) like ChatGPT doing much more than stitching together recorded fragments of texts ingested from an internet-scale corpus of text? Are they inferring patterns that are in any way beyond just being stochastic parrots? And why would scaling up a system result in qualitative new capabilities, if there are any at all?
Some new work covered in Quanta Magazine has some intriguing suggestions that there is a bit more going on in LLMs, although the subtitle contains the word “understanding” that I think is premature. At heart is the idea that as networks scale up given ordering rules that are not highly uniform or correlated they tend to break up into collections of subnetworks that are distinct (substitute “graphs” for networks if you are a specialist). The theory, then, is that the ingest of sufficient magnitudes of text into a sufficiently large network and the error-minimization involved in tuning that network to match output to input also segregates groupings that the Quanta author and researchers at Princeton and DeepMind refer to as skills.… Read the rest