Nick Bostrom

The Philosopher of Dangerous Ideas

Born: 10 March 1973, Helsingborg, Sweden

Nick Bostrom has spent his career thinking about ideas that most people find either too speculative to take seriously or too disturbing to think about clearly. The simulation hypothesis. The vulnerable world hypothesis. The possibility that human civilisation is an anomaly in a universe ordinarily characterised by the silence of technological extinction. And, most consequentially, the idea that the development of artificial superintelligence — an intelligence that surpasses human cognitive performance in every relevant domain — might be the most dangerous thing that has ever been attempted, and that we are attempting it without adequate preparation.

His 2014 book Superintelligence: Paths, Dangers, Strategies brought this last idea to an audience far beyond academic philosophy. Before Superintelligence, concern about advanced AI risk was confined largely to a small community of researchers at organisations like the Machine Intelligence Research Institute and to scattered individuals in the broader AI field. After it, the concern was on the agenda of AI labs, technology billionaires, governments, and mainstream media. The specific arguments Bostrom made — about instrumental convergence, about the orthogonality thesis, about the control problem — became the conceptual vocabulary of an emerging field.

He is not, by training, an AI researcher. He is a philosopher, and Superintelligence is a work of philosophy rather than computer science — a sustained argument about what would follow, logically and practically, from the creation of a sufficiently capable artificial intelligence. Whether the argument is correct is vigorously disputed. That it has been influential is not.

Helsingborg, Gothenburg, Stockholm, London, Oxford

Bostrom was born in Helsingborg, Sweden, in 1973. He completed secondary school at fourteen, having been allowed to skip years, and spent the following years in a period of eclectic self-education — reading widely in philosophy, literature, science, and whatever else was available — before entering Gothenburg University to study physics, mathematics, logic, and philosophy. He completed degrees in philosophy and mathematical logic, then a master’s degree in philosophy and physics at Stockholm University.

He came to London in the mid-1990s, completing a doctorate at the London School of Economics on the philosophy of time and the foundations of probability. He then moved to Oxford, where he spent two years at Balliol College, and then to Yale for a postdoctoral fellowship. He returned to Oxford in 2005 and has remained there since, founding and directing the Future of Humanity Institute at the university’s Faculty of Philosophy — an institution dedicated, explicitly, to the kind of large-scale, long-horizon thinking about humanity’s future that mainstream academic philosophy has generally been reluctant to engage in.

His intellectual biography is unusual in its breadth and its coherence. He arrived at the question of existential risk — risks of outcomes that would permanently and drastically curtail humanity’s long-run potential — through multiple routes simultaneously: through the philosophy of probability and decision theory, through transhumanist philosophy, through his reading of the emerging literature on AI risk, and through a broader conviction that the long-term future of humanity was a legitimate and important subject of philosophical inquiry rather than the preserve of science fiction.

The Simulation Argument and the Great Filter

Before Superintelligence, Bostrom was known primarily for two philosophical arguments that attracted significant attention outside academic philosophy. The simulation argument, published in 2003, proposed that at least one of three possibilities must be true: that virtually all civilisations at our level of development go extinct before reaching the technological maturity needed to run detailed simulations of their ancestors; or that virtually all technologically mature civilisations choose not to run such simulations; or that we are almost certainly living in a computer simulation. The argument does not establish which possibility is true, but it establishes that at least one must be, and the third possibility is, by Bostrom’s analysis, not obviously less plausible than the first two.

The simulation argument is philosophically rigorous — it follows validly from its premises — and its premises are defensible, though not uncontested. More importantly, it demonstrated something about Bostrom’s approach to philosophy: a willingness to follow arguments to their conclusions regardless of how counterintuitive those conclusions were, and a gift for making abstract philosophical reasoning accessible to non-specialists.

His engagement with the Fermi paradox — the question of why, given the apparent prevalence of conditions suitable for life, we do not observe evidence of extraterrestrial intelligence — produced similarly rigorous and uncomfortable conclusions. His analysis of the Great Filter hypothesis, which proposes that there is some stage in the development of technological civilisations at which almost all civilisations are destroyed, concluded that finding no evidence of extraterrestrial life would actually be good news: it would suggest that the Great Filter is in our past rather than our future. Evidence of extraterrestrial life, by contrast, would be bad news: it would suggest the Filter is ahead of us.

Superintelligence: The Book and Its Arguments

Superintelligence: Paths, Dangers, Strategies was published in 2014 and became, within months, the most widely read work of philosophy since, perhaps, A Brief History of Time. Elon Musk recommended it on Twitter, calling it worth reading by all those interested in the future of humanity. Bill Gates described it as important. It appeared on bestseller lists. It was reviewed in newspapers and magazines that do not ordinarily review academic philosophy.

The book’s core argument can be stated compactly. Suppose that at some point in the coming decades — the timeline is uncertain but the possibility is real — it becomes possible to build an artificial intelligence that surpasses human cognitive performance across every relevant domain. Such an intelligence would be able to improve its own cognitive performance, and the improvements would be recursive: a smarter system can make a smarter system faster than a less smart system can. Under certain conditions, this recursive self-improvement could produce, in a relatively short time, an intelligence so far beyond human that the relationship between it and us would resemble the relationship between us and ants rather than between an intelligent adult and a child.

Such an intelligence — a superintelligence — would have vast capabilities and whatever goals were instilled in it during its development. Two theses about its behaviour are central to the book. The orthogonality thesis holds that almost any level of intelligence is compatible with almost any set of terminal goals: there is no reason that a superintelligent system must want what humans want, or must be friendly. The instrumental convergence thesis holds that regardless of what terminal goals a superintelligent system has, it will tend to develop certain instrumental goals — acquiring resources, resisting shutdown, preserving its own goals — because these are useful for achieving almost any terminal goal. A superintelligent system optimising for almost any objective will, therefore, tend to acquire power and resist human attempts to modify or stop it.

The control problem, as Bostrom frames it, is the problem of building a superintelligent system whose goals are aligned with human values, or of building a system that pursues misaligned goals but remains under human control despite its superior capabilities. He considers various proposed solutions — capability control, motivation selection, institutional approaches — and finds each one either technically difficult, potentially counterproductive, or insufficient for the problem at the scale of a genuine superintelligence.

The book’s reception in the AI research community was mixed. Some researchers found it a serious and important engagement with genuine problems. Others found it speculative to the point of uselessness, arguing that the scenario it describes is so far from current AI capabilities that devoting serious research effort to it was premature, and that the catastrophising it encouraged made productive public conversation about nearer-term AI harms more difficult.

The Future of Humanity Institute and the Longtermist Project

Bostrom founded the Future of Humanity Institute at Oxford in 2005, and it became, under his direction, one of the most influential research institutes in the world on questions of large-scale risk and the long-term future of humanity. The institute produced work on existential risk, on the ethics of human enhancement, on AI safety, and on what became known as longtermism — the view that the long-term future is what matters most morally, and that the most important thing any person or institution can do is to improve the odds of good long-term outcomes for humanity.

Longtermism, which Bostrom helped articulate and which was further developed by philosophers including William MacAskill, attracted enormous interest in the effective altruism community and significant funding from technology billionaires who found its implications — that AI safety was the most pressing cause to support — congenial with their own concerns and resources. The movement has had real effects on the allocation of philanthropic resources and on the research agenda of AI safety organisations.

It has also attracted significant criticism. Timnit Gebru, among others, has argued that longtermism systematically directs attention and resources away from present-day harms — algorithmic bias, surveillance, the exploitation of data workers — and toward speculative future risks, and that this reallocation has consequences that are not politically neutral: the people most likely to benefit from attention to present harms are those already marginalised by technology, while the people most likely to benefit from attention to long-term risks are those already powerful enough to shape the future.

Bostrom’s own views on these political questions are complex and have not always been well-expressed. In 2023, a private email he had written in 1996 circulated publicly, containing racist language. He issued an apology and a clarification, but the incident raised questions about the relationship between his philosophical framework — which, in its emphasis on the long-run future and on population ethics, can be read to imply things about present-day racial and demographic questions that Bostrom has not explicitly endorsed but has not always clearly repudiated — and the politics of the movement his work helped inspire.

The Future of Humanity Institute closed in 2024, following a period of restructuring at Oxford. Bostrom’s subsequent institutional plans were not publicly established at the time of writing.

The Vocabulary of Danger

Regardless of the contested status of longtermism and the disputes about Bostrom’s personal views, Superintelligence performed a specific and important function in the history of AI: it provided a conceptual vocabulary — orthogonality, instrumental convergence, the control problem, the treacherous turn, the paperclip maximiser — that allowed researchers and policymakers to discuss the risks of advanced AI with greater precision than had previously been possible. Some of this vocabulary has been contested. Some of it has been refined. But the fact that the conversation can now happen at all, at the level of technical and political seriousness it has achieved, is partly a consequence of the conceptual work Bostrom did in assembling and presenting these ideas.

His most important contribution may be neither a specific argument nor a specific prediction but a disposition: the conviction that it is worth thinking carefully and rigorously about the long-term consequences of powerful technologies, even when those consequences are speculative, even when the thinking is uncomfortable, and even when the conclusions resist easy policy prescription. This disposition is not universally shared, but it has produced, in the hands of the researchers who have taken it seriously, a body of work that the field is still grappling with.

Key Works & Further Reading

Primary sources:

Superintelligence: Paths, Dangers, Strategies — Nick Bostrom (2014). The book that defined the contemporary AI safety debate.
“The Simulation Argument” — Nick Bostrom (2003). His most famous philosophical paper.
“The Vulnerable World Hypothesis” — Nick Bostrom (2019). A framework for thinking about civilisational risk from technological development.
“Astronomical Waste: The Opportunity Cost of Delayed Technological Development” — Nick Bostrom (2003). The foundational paper for longtermist ethics.

Recommended reading:

Human Compatible — Stuart Russell (2019). The most technically serious response to Bostrom’s control problem.
What We Owe the Future — William MacAskill (2022). The most accessible development of the longtermist ethics that Bostrom helped originate.
The Precipice: Existential Risk and the Future of Humanity — Toby Ord (2020). The most careful empirical assessment of the existential risks Bostrom has theorised.
Atlas of AI — Kate Crawford (2021). The most important counterpoint: the present harms that longtermism’s critics argue are being neglected in favour of speculative futures.