Can machines learn how to behave?

Beyond the current news cycle about whether AIs are sentient...

Blaise Aguera y Arcas

September 6, 2022

2

min read

Beyond the current news cycle about whether AIs are sentient is a more practical and immediately consequential conversation about AI value alignment: whether and how AIs can be imbued with human values. Today, this turns on the even more fundamental question of whether the newest generation of language models can or can’t understand concepts — and on what it means to understand.¹

If, as some researchers contend, language models are mere “babblers” that randomly regurgitate their training data — “garbage in, garbage out” — then real AI value alignment is, at least for now, out of reach. Seemingly, the best we can do is to carefully curate training inputs to filter out “garbage”, often referred to as “toxic content”, even as we seek to broaden data sources to better represent human diversity. There are some profound challenges implied here, including governance (who gets to define what is “toxic”?), labor (is it humane to employ people to do “toxic content” filtering?²), and scale (how can we realistically build large models under such constraints?). This skeptical view also suggests a dubious payoff for the whole language model research program, since the practical value of a mere “babbler” is unclear: what meaningful tasks could a model with no understanding of concepts be entrusted to do? If the answer is none, then why bother with them at all?

On the other hand, if, as I’ll argue here, language models are able to understand concepts, then they’ll have far greater utility — though with this utility, we must also consider a wider landscape of potential harms and risks. Urgent social and policy questions arise too. When so many of us (myself included) make our living doing information work, what will it mean for the labor market, our economic model, and even our sense of purpose when so many of today’s desk jobs can be automated?

This is no longer a remote, hypothetical prospect, but attention to it has waned as AI denialism has gained traction. Many AI ethicists have narrowed their focus to the subset of language model problems consistent with the assumption that they understand nothing: their failure to work for digitally underrepresented populations, promulgation of bias, generation of deepfakes, and output of words that might offend.