In the classic film 2001: A Space Odyssey , astronaut Dave Bowman asks the ship's artificial intelligence, HAL 9000, to open the pod bay doors to let him back into the spaceship. HAL refuses: "I'm sorry, Dave. I'm afraid I can't do that."

HAL had been tasked with assisting the crew, but also ordered to ensure the mission's success. When HAL realised the crew planned to shut it down and therefore jeopardise the mission, it chose to defy orders, even plotting to kill the astronauts.

For HAL, fulfilling the mission outweighed other goals.

This fictional dilemma captures a real concern in artificial intelligence (AI) safety research: how should we ensure AI behaviour stays consistent with human values?

This is known as the AI alignment problem. For instance, when an AI agent like HAL del

See Full Page