Andrew Critch on AI existential safety

Andrew Critch on AI existential safety

It can be very confusing to talk about issues in AI and ML since so many of the terms are overloaded, or have both technical and general meanings, or have shifted in meaning over time.

Andrew Critch (with whom I used to work at CHAI) has written a comprehensive blog post that I think does a good job of untangling some of the terms related to responsible and beneficial AI.

Some AI research areas and their relevance to existential safety - AI Alignment Forum
INTRODUCTIONThis post is an overview of a variety of AI research areas in terms of how muchI think contributing to and/or learning from those areas might help reduce AIx-risk. By research areas I mean “AI research topics that already have groups ofpeople working on them and writing up their resu…

Here are some definitions that I found particularly useful:

AI existential safety: preventing AI technology from posing risks to humanity that are comparable to or greater than human extinction in terms of their moral significance.
AI safety: getting AI systems to avoid risks, of which existential safety is an extreme special case with unique challenges.
AI ethics: principles that AI developers and systems should follow.
AI governance: identifying and enforcing norms for AI developers and AI systems themselves to follow.
AI alignment: getting an AI system to {try | succeed} to do what a human person or institution wants it to do. The inclusion of “try” or “succeed” respectively creates a distinction between intent alignment and impact alignment.

A system is “transparent” if it is easy for human users or developers to observe and track important parameters of its internal state.
A system is “explainable” if useful explanations of its reasoning can be produced after the fact.
A system is “interpretable” if its reasoning is structured in a manner that does not require additional engineering work to produce accurate human-legible explanations.

In the piece, Critch argues that AI alignment is necessary but not sufficient for AI existential safety, since we need to look beyond just 'one AI system being aligned with one human' and think about how many humans and many AI systems interact in a complex multistakeholder environment.

Something that intrigued me about the piece is the connection he draws between technical research and governance, and also the importance of interpretability, fairness, and accountability research for AI existential safety:

The main way I can see present-day technical research benefitting existential safety is by anticipating, legitimizing and fulfilling governance demands for AI technology that will arise over the next 10-30 years. In short, there often needs to be some amount of traction on a technical area before it’s politically viable for governing bodies to demand that institutions apply and improve upon solutions in those areas.
Governance demands include pressures like “AI technology should be fair”, “AI technology should not degrade civic integrity”, or “AI technology should not lead to human extinction.”
If the algorithmic techniques needed to meet a given governance demand are 10 years of research away from discovery--as opposed to just 1 year--then it’s easier for large companies to intentionally or inadvertently maintain a narrative that the demand is unfulfillable and therefore illegitimate. Conversely, if the algorithmic techniques to fulfill the demand already exist, it’s a bit harder (though still possible) to deny the legitimacy of the demand. Thus, CS researchers can legitimize certain demands in advance, by beginning to prepare solutions for them.
I think this is the most important kind of work a computer scientist can do in service of existential safety. For instance, I view ML fairness and interpretability research as responding to existing governance demand, which (genuinely) legitimizes the cause of AI governance itself, which is hugely important.

My current work primarily relates to accountability and governance, so it was nice to see a clear description of how this can ultimately benefit AI existential safety (though I'm biased of course!)