Vijay Pande of VC Andreessen Horowitz (who passed on my startups twice but, hey, it’s just business!) has a relevant article in New York Times concerning fears of the “black box” of deep learning and related methods: is the lack of explainability and limited capacity for interrogation of the underlying decision making a deal-breaker for applications to critical areas like medical diagnosis or parole decisions? His point is simple, and related to the previous post’s suggestion of the potential limitations of our capacity to truly understand many aspects of human cognition. Even the doctor may only be able to point to a nebulous collection of clinical experiences when it comes to certain observational aspects of their jobs, like in reading images for indicators of cancer. At least the algorithm has been trained on a significantly larger collection of data than the doctor could ever encounter in a professional lifetime.
So the human is almost as much a black box (maybe a gray box?) as the algorithm. One difference that needs to be considered, however, is that the deep learning algorithm might make unexpected errors when confronted with unexpected inputs. The classic example from the early history of artificial neural networks involved a DARPA test of detecting military tanks in photographs. The apocryphal to legendary formulation of the story is that there was a difference in the cloud cover between the tank images and the non-tank images. The end result was that the system performed spectacularly on the training and test data sets but then failed miserably on new data that lacked the cloud cover factor. I recalled this slightly differently recently and substituted film grain for the cloudiness. In any case, it became a discussion point about the limits of data-driven learning that showed how radically incorrect solutions could be created without careful understanding of how the systems work.
How can the fears of radical failure be reduced? In medicine we expect automated decision making to be backed up by a doctor who serves as a kind of meta-supervisor. When the diagnosis or prognosis looks dubious, the doctor will order more tests or countermand the machine. When the parole board sees the parole recommendation, they always have the option of ignoring it based on special circumstances. In each case, it is the presence of some anomaly in the recommendation or the input data that would lead to reconsideration. Similarly, it is certainly possible to automate that scrutiny at a meta-level. In machine learning, statistical regularization is used to reduce or eliminate outliers in data sets in an effort to prevent overtraining or overfitting on noisy data elements. In much the same way, the regularization process can provide warnings and clues about data viability. And, in turn, unusual outputs that are statistically unlikely given the history of the machine’s decisions can trigger warnings about anomalous results.