From Theory to Machines: Pioneers of Twentieth-Century Computing
How formal theory, wartime urgency, and engineering ambition converged in a single decade to build the first real computers.
AI HISTORY SERIES --- EPISODE 5
From Theory to Machines
Pioneers of Twentieth-Century Computing
How Formal Theory, Wartime Urgency, and Engineering Ambition Built the First Real Computers
Introduction: The Century That Built the Machine
The four episodes that precede this one have taken us on a long journey through time. We began in the mythological deep past, with the dreams of ancient civilizations: the bronze guardian Talos, the animate clay of the Golem, the imagined mechanical servants of Chinese legend. We moved forward through the extraordinary automata of Hero of Alexandria, Al-Jazari, and Leonardo da Vinci --- machines that could be programmed to behave in predetermined ways, physical embodiments of the idea that complex, purposeful behavior could arise from mechanism. We arrived in the seventeenth century with the philosophers --- Descartes, Hobbes, Leibniz, Pascal --- who turned the engineer’s intuitions into rigorous theory, asking with new precision what thought was, whether it could be mechanized, and what a thinking machine would actually have to do. And in Episode 4, we met the Victorian visionaries who translated theory into blueprint: Babbage, who designed the architecture of the modern computer in brass and steel more than a century before the technology existed to build it; Lovelace, who wrote its first program; and Boole, who provided the mathematics of binary logic on which all digital electronics would eventually rest.
In Episode 5, we cross the threshold. All of that preparation --- the centuries of myth, philosophy, mechanical ingenuity, and mathematical theory --- converges in a single extraordinary decade, roughly 1935 to 1945, when the first real computers were built. The men and women who built them were working in disparate contexts --- a German engineer in his parents’ apartment, a British mathematician at a secret wartime establishment, an American mathematician sketching architecture on a train --- but they were all drawing on the same intellectual inheritance. And they were all responding to the same historical pressure: the most destructive and technologically demanding war in human history, which created an urgent demand for computational power that no human calculator could satisfy.
“The twentieth century did not merely build better calculating machines. It invented a new kind of object --- the programmable universal computer --- and in doing so, made the age of artificial intelligence inevitable.”
But the story of this episode is not only a story of wartime engineering. It is also the story of theoretical breakthroughs of the deepest importance: Alan Turing’s 1936 paper on computable numbers, which gave computing its mathematical foundations before any electronic computer existed; Claude Shannon’s 1948 paper on information theory, which gave it the mathematical language for reasoning about data, noise, and communication; and the von Neumann architecture that, by storing programs and data in the same memory, created the flexible, general-purpose computer that could run any algorithm its programmer could devise. Together, these theoretical achievements and the machines they inspired transformed computing from a philosopher’s dream and an engineer’s ambition into an operational technology --- and placed artificial intelligence firmly within the realm of practical possibility.
Section 1: Early Formalism and Information Theory
Before the first electronic computers were switched on, before the first vacuum tube flickered to life in a wartime laboratory, a series of theoretical breakthroughs were laying the mathematical foundations without which those machines could not have been understood, designed, or used. The most important of these --- and the one whose influence on AI has been perhaps the deepest and most lasting --- came from an unlikely corner: the mathematical theory of communication.
Claude Shannon and the Mathematics of Information
Claude Elwood Shannon was born in Petoskey, Michigan, in 1916, and showed an early aptitude for both mathematics and practical electronics --- as a child he built a telegraph system connecting his house to a friend’s using barbed wire fencing. He studied mathematics and electrical engineering at the University of Michigan, then pursued graduate work at MIT, where in 1937 he published the master’s thesis we encountered in Episode 4: the work showing that Boolean algebra could be used to design and analyze electrical switching circuits. This thesis established the mathematical language of digital circuit design. But it was Shannon’s later work, published in 1948, that would have the more transformative impact on the theory of AI.
In the summer of 1948, Shannon published a paper in the Bell System Technical Journal titled “A Mathematical Theory of Communication.” This paper, which is widely regarded as one of the most important scientific papers of the twentieth century, introduced a rigorous mathematical framework for thinking about information: what it is, how it can be measured, how it can be transmitted over unreliable channels, and how much of it can be reliably conveyed given the constraints of a particular communication system.
Entropy, Bits, and the Quantification of Information
The central concept of Shannon’s theory is entropy, borrowed from thermodynamics but given a new and precisely defined meaning in the context of information. Shannon defined the information content of a message as a function of the probability of that message: a message that was certain to be sent carries no information (you already knew it was coming), while a message that was completely unexpected carries the maximum information. The mathematical formula he derived for this relationship is formally identical to the thermodynamic entropy formula, a connection whose deep significance physicists are still exploring today.
Shannon also introduced the concept of the bit --- short for binary digit --- as the fundamental unit of information. A single bit is the amount of information required to distinguish between two equally likely possibilities: yes or no, zero or one, heads or tails. All information, Shannon showed, can in principle be encoded in sequences of bits, and the minimum number of bits required to encode a message is precisely its information content as measured by his entropy formula. This was not a merely theoretical result; it was a practical engineering tool that told designers exactly how much channel capacity they needed to transmit a given amount of information reliably.
Shannon’s theory also addressed the problem of noise --- the unavoidable degradation that affects any real communication channel. His channel capacity theorem established that, for any channel with a given level of noise, there exists a maximum rate at which information can be transmitted with arbitrarily low error probability, and that this rate can in principle be achieved through appropriate encoding. This result was both unexpected and practically enormously important: it meant that reliable communication was always possible, in principle, if you were willing to pay the cost in encoding complexity, and it gave engineers a precise target to aim for.
Why Information Theory Matters for AI
The connection between Shannon’s information theory and artificial intelligence is deep and multifaceted, and it has grown stronger rather than weaker as the field has developed. At the most basic level, the bit is the fundamental unit of digital computation as well as digital communication: every variable, every instruction, every learned parameter in a modern AI system is ultimately a string of bits, and Shannon’s theory provides the framework for understanding the information content of those strings and the efficiency of the representations they encode.
More profoundly, the concept of entropy --- Shannon’s measure of uncertainty or unpredictability --- has become one of the central tools of statistical machine learning. Cross-entropy is the loss function used to train the large language models that power modern AI assistants, including the one that has contributed to this series. Maximum entropy methods are used to build probabilistic models that make the fewest unwarranted assumptions about the data they are trained on. Information gain, a variant of entropy, is the criterion used to build decision trees. The entire field of machine learning is, in a deep sense, the application of Shannon’s framework to the problem of learning from data.
“Shannon gave computation its language for talking about data and uncertainty. Every machine learning algorithm ever written speaks, at its foundation, in Shannon’s vocabulary.”
Shannon was also one of the first people to think seriously about AI itself. In 1950 --- the same year as Turing’s famous paper on computing machinery and intelligence --- Shannon published a paper titled “Programming a Computer for Playing Chess,” which laid out the basic principles of game-playing AI that would guide the field for decades. He was not merely a theorist of communication; he was a pioneer of the idea that machines could be programmed to engage in intelligent-seeming behavior.
Reflection: Shannon’s information theory is not a historical precursor to AI --- it is a living, active foundation of modern AI. From the bit to entropy to channel capacity, the mathematical concepts he introduced in 1948 are present in every machine learning system built today, often without their users being aware of how directly they are drawing on Shannon’s foundational work.
Section 2: Konrad Zuse and the First Programmable Machines
While the theoretical foundations of computing were being laid in American universities and research laboratories, a lone engineer in Berlin was quietly and independently building what are now regarded as the world’s first programmable digital computers. Konrad Zuse was born in Berlin in 1910 and trained as a civil engineer, a background that gave him practical skills in structural calculation --- and a keen appreciation of how much time and effort those calculations required. His response to that appreciation was to spend the better part of a decade building machines that could perform them automatically.
The Z1: Mechanical Binary Computing
Zuse began work on his first computing machine in 1936, building it in his parents’ living room in Berlin with his own money and the help of a few friends. The Z1, completed in 1938, was a fully mechanical binary calculator --- built not from conventional decimal gears and wheels of the kind Babbage had used, but from thin metal plates cut to precise shapes and assembled into a mechanical implementation of binary logic. The choice of binary was not incidental; Zuse had independently recognized that binary arithmetic was both simpler to implement in hardware and more fundamental than the decimal arithmetic that all previous computing devices had used.
The Z1 used a 22-bit floating-point representation for numbers and could be programmed using a punched tape cut from discarded 35mm cinema film. It had a limited instruction set, but it was --- in principle --- a programmable binary computer. In practice, the mechanical tolerance problems that had defeated Babbage repeated themselves: the thin metal plates of the Z1 required a level of manufacturing precision that Zuse could not consistently achieve, and the machine was unreliable in operation. But the design was sound, and the experience taught Zuse invaluable lessons that he would apply in his subsequent machines.
The Z3: The World’s First Working Programmable Computer
Zuse’s third machine, the Z3, completed in May 1941, is now generally recognized as the world’s first working programmable, fully automatic digital computer. Unlike the Z1, the Z3 used electromechanical relays rather than mechanical plates for its logic elements --- the same technology used in telephone switching exchanges --- which gave it the electrical speed and mechanical reliability that the all-mechanical Z1 had lacked. It operated in binary floating-point arithmetic, reading its program from punched film and executing instructions at a rate of roughly three to four per second.
The Z3 was a remarkable achievement by any standard. It could perform the four basic arithmetic operations, compute square roots, and convert between floating-point and integer representations. It had a memory of 64 twenty-two-bit words --- modest by any later standard, but sufficient for many practical engineering calculations. It was programmable in the sense that its sequence of operations was determined by the punched film tape, not by its physical wiring. And it operated automatically, without human intervention, once the program had been loaded and the machine set running.
What the Z3 could not do, crucially, was conditional branching --- the ability to choose between different sequences of operations depending on a computed result. This limitation, which Babbage had explicitly addressed in the Analytical Engine and which Lovelace had exploited in her algorithm for Bernoulli numbers, meant that the Z3 was not fully general-purpose in the theoretical sense. Zuse was aware of this limitation and had designed a successor, the Z4, that would address it; but the Z4 was not completed until after the war, and by then the American machines had overtaken his work.
Isolation and Rediscovery
Zuse’s work was conducted in almost complete intellectual isolation from the parallel efforts in Britain and the United States. Germany’s wartime conditions --- the bombing of Berlin, the disruption of scientific communication, the difficulty of obtaining materials --- made his achievement all the more extraordinary. The Z3 itself was destroyed in a bombing raid in 1943. Zuse rebuilt the Z4, hiding it in a Bavarian village as the war ended, and eventually sold it to the Swiss Federal Technical University in Zurich, where it operated reliably until 1955.
Because Zuse published little in English and his work was largely unknown outside Germany until historians began examining it in the 1960s and 1970s, he received far less recognition than his achievement deserved during his lifetime. The question of whether the Z3 or the American ENIAC deserves the title of “first computer” has been debated by historians for decades, with the answer depending on precisely how “computer” is defined. What is not seriously disputed is that Zuse independently arrived at the key design decisions --- binary arithmetic, floating-point representation, program-controlled operation from a stored sequence --- that characterized the computers that followed, and that he did so without knowledge of the parallel efforts in Britain and America.
Reflection: Zuse’s achievement is a powerful reminder that the history of invention is rarely the story of a single genius working in isolation toward an inevitable goal. The same ideas --- binary computation, programmable control, floating-point arithmetic --- emerged independently in multiple places simultaneously, driven by the same underlying intellectual inheritance and the same practical pressures. This convergence is evidence that the time for the computer had truly come.
Section 3: Atanasoff, Berry, and Early American Efforts
While Zuse was building in Berlin and Turing was theorizing in Cambridge, a third independently conceived computing project was underway in the American Midwest --- one that would become the subject of one of the most contentious priority disputes in the history of technology, a legal battle that was not finally resolved until 1973.
The Atanasoff-Berry Computer
John Vincent Atanasoff was a professor of mathematics and physics at Iowa State College (now Iowa State University) who, by the late 1930s, had grown deeply frustrated by the time required to solve systems of linear equations by hand. The solution of such systems --- a routine requirement in physics, engineering, and statistics --- involved hundreds or thousands of arithmetic operations, each of which had to be performed manually by human computers using desk calculators. Atanasoff wanted a machine that could automate this process.
Over the winter of 1937—38, driving alone one night on the back roads of Iowa in an attempt to clear his mind after a frustrating day’s work, Atanasoff had what he later described as a flash of insight. He stopped at a roadhouse, ordered a bourbon, and spent several hours working out the key design decisions for what would become the Atanasoff-Berry Computer: it would be electronic rather than mechanical, using vacuum tubes or equivalent electronic components for its logic elements; it would use binary rather than decimal arithmetic; and it would use capacitors for memory, with the capacitor charge refreshed periodically to prevent decay --- a technique that is the direct ancestor of modern DRAM memory.
Working with his graduate student Clifford Berry, Atanasoff built the machine between 1939 and 1942. The completed ABC, as it came to be known, was designed specifically to solve systems of simultaneous linear equations using a method of successive elimination. It was electronic, binary, and partially automatic --- but it was not programmable in the general sense, and it required human intervention at each step of the elimination process. The ABC was a special-purpose machine, not a general-purpose computer.
The Priority Dispute and Its Resolution
The ABC might have remained a historical footnote but for a visit Atanasoff received in the summer of 1941 from a young engineer named John Mauchly, who was then teaching at a small Philadelphia college and thinking about building a high-speed computing machine. Mauchly spent several days at Iowa State, examining the ABC and discussing its design with Atanasoff in detail. Three years later, Mauchly and his colleague J. Presper Eckert began building ENIAC at the University of Pennsylvania --- the machine generally regarded as the first large-scale, general-purpose electronic computer. Sperry Rand, the company that eventually acquired the ENIAC patents, sued Honeywell for infringement in 1967. Honeywell’s defense was that the ENIAC patents were invalid because the key ideas had already been anticipated by Atanasoff.
In 1973, after a trial that lasted 135 days and produced 20,000 pages of transcript, Federal Judge Earl Larson ruled that the ENIAC patents were indeed invalid, in part because Mauchly had derived some of his key ideas from his examination of the ABC. The ruling gave Atanasoff a measure of legal recognition but settled the historical question only partially: ENIAC remained the more influential machine, and most historians today regard the question of priority as less interesting than the question of influence. What the Atanasoff-Berry Computer established, independent of the legal dispute, was that electronic binary computation was feasible before the wartime emergency made it urgent --- and that the transition from mechanical to electronic computing was, by the late 1930s, an idea whose time had fully come.
Reflection: The Atanasoff-Berry Computer illustrates a recurring pattern in the history of technology: the same fundamental idea --- in this case, electronic binary computation --- emerging independently in multiple places at nearly the same time, driven by the same convergent pressures of intellectual inheritance and practical need. The disputes over priority that often follow such convergences are less interesting than the convergence itself, which tells us something important about the readiness of an era for a particular technological leap.
Section 4: Wartime Breakthroughs --- Colossus and ENIAC
The decisive impetus for the rapid development of electronic computing came from the most destructive conflict in human history. The Second World War created, on both sides, an insatiable demand for computational power: for codebreaking, for ballistic trajectory calculation, for logistical optimization, for radar signal processing, for cryptographic key generation. The machines that human computers could produce, working in organized pools with mechanical desk calculators, were nowhere near fast enough to meet these demands. The result was an unprecedented investment --- of money, talent, and urgency --- in the development of electronic computing devices that has no parallel in peacetime science.
Colossus: The Secret Codebreaker
The most remarkable of the wartime computing projects, and the one that remained most completely hidden from public knowledge for decades after the war, was the British effort at Bletchley Park to break the ciphers used by the German military. The Enigma codebreaking effort, led in part by Alan Turing, is well known; less well known but arguably more technically significant was the effort to break the Lorenz cipher, used by the German High Command for its highest-level communications.
The Lorenz cipher was far more complex than Enigma, and the codebreaking process required a scale of computation that no human team could approach. The solution was Colossus: an electronic computing machine designed by the GPO engineer Tommy Flowers and built at the Post Office Research Station at Dollis Hill, London, in the winter of 1943—44. The first Colossus was operational at Bletchley Park by February 1944; an improved Colossus Mark 2, with five times the processing speed, was operational by June 5, 1944 --- the day before D-Day, in time to help confirm that the German defenses had been deceived about the invasion’s location.
Colossus was not a general-purpose computer in the modern sense: it was specifically designed to attack the Lorenz cipher, and its operations were optimized for that task. It could not be programmed to perform arbitrary computations. But it was electronic, it was fast --- operating at 5,000 characters per second, far beyond any mechanical device --- and it demonstrated, with operational urgency, that electronic computing at scale was both feasible and enormously powerful. Ten Colossus machines were eventually built; together, they are estimated to have shortened the war by months and saved hundreds of thousands of lives.
Colossus was classified Top Secret and its existence was not publicly acknowledged until the late 1970s. The engineers who built it --- above all Tommy Flowers, whose contribution was perhaps the most significant --- were forbidden from discussing their work for decades, and received little of the public recognition that their achievement merited. The secrecy also meant that Colossus had essentially no direct influence on the postwar development of computing in Britain; its lessons had to be learned again, independently, by the engineers who built the first postwar British computers.
ENIAC: The Giant Brain
On the American side, the equivalent wartime computing project was ENIAC --- the Electronic Numerical Integrator and Computer --- built at the Moore School of Electrical Engineering at the University of Pennsylvania by J. Presper Eckert and John Mauchly, and funded by the United States Army for the calculation of artillery firing tables. ENIAC was not the first electronic computer --- Colossus preceded it by nearly two years --- but it was the first large-scale, general-purpose electronic computer, and its completion in 1945 marked the beginning of the electronic computing era in a way that Colossus, with its narrow purpose and its secrecy, could not.
ENIAC was, by any measure, a machine of staggering physical scale. It contained 17,468 vacuum tubes, 7,200 crystal diodes, 1,500 relays, and 70,000 resistors, weighed approximately 30 tons, occupied a room 30 feet by 60 feet, and consumed 150 kilowatts of electrical power. When it was switched on, according to legend, the lights in the surrounding district dimmed. It could perform 5,000 additions per second --- faster by a factor of roughly a thousand than any mechanical calculator --- and it was designed to be flexible enough to tackle a range of computational problems beyond firing tables.
“ENIAC was not just faster than what came before. It was different in kind --- the first machine that made it clear to the world that electronic computing was not a curiosity but a transformative technology.”
The principal limitation of ENIAC, from the perspective of programmability, was that it had to be “programmed” by physically reconnecting cables and setting switches --- a process that could take days for a complex calculation. This was, in the language of Babbage, more like building a new machine for each problem than programming an existing one. The stored-program concept --- the idea that the program should be stored in the machine’s memory alongside its data, and loaded into the machine as easily as data --- was the crucial missing piece, and it would be supplied, in theoretical form, by John von Neumann before ENIAC was even publicly unveiled.
ENIAC was formally dedicated in February 1946, by which point it had already been secretly used for calculations related to the hydrogen bomb. Its public demonstrations of high-speed calculation caused a sensation: newspapers dubbed it the “Giant Brain” and declared that the era of mechanical thinking had arrived. The characterization was premature and philosophically muddled --- ENIAC was not, in any serious sense, a thinking machine --- but it reflected a genuine public intuition that something fundamental had changed. The computer was real. The question of what it could and could not do was now an engineering question as well as a philosophical one.
Reflection: The wartime computers --- Colossus and ENIAC, along with the Z3 and several other machines built in the same period --- demonstrated something that no amount of philosophical argument or paper design could have established: that electronic computation at scale was operationally feasible, practically useful, and strategically important. Wartime urgency had compressed into a few years a technological development that might otherwise have taken decades.
Section 5: Alan Turing and the Universal Machine
The engineering achievements of Colossus, ENIAC, and the Z3 were extraordinary. But engineering, however brilliant, builds on theoretical foundations. The theoretical foundation of modern computing --- the mathematical framework that tells us what computation is, what it can and cannot do, and why --- was laid not in a wartime laboratory but in a 1936 paper by a twenty-four-year-old Cambridge mathematician named Alan Mathison Turing. That paper, “On Computable Numbers, with an Application to the Entscheidungsproblem,” is one of the most important scientific papers of the twentieth century, and its implications for AI are as profound as its implications for computing.
The Entscheidungsproblem and the Turing Machine
The paper was written in response to a challenge posed by the German mathematician David Hilbert. In 1928, Hilbert had asked whether there existed a definite mechanical procedure --- an algorithm --- that could, given any mathematical proposition, determine in a finite number of steps whether that proposition was provable within a given formal system. This was the “Entscheidungsproblem” (decision problem), and it was one of the central open questions in the foundations of mathematics.
To answer this question, Turing needed a precise, formal definition of what “a definite mechanical procedure” meant. He provided one by describing an abstract machine --- now universally known as the Turing machine --- that could read and write symbols on an infinite tape, move the tape left or right, and change its internal state according to a finite set of rules. Simple as this description is, Turing showed that any computation that could be performed by any definite mechanical procedure could be performed by an appropriately configured Turing machine. The Turing machine was, in this sense, a definition of computation itself: to say that something was computable was to say that a Turing machine could compute it.
Using this definition, Turing was able to answer Hilbert’s question --- and the answer was no. There is no general algorithm that can determine, for any mathematical proposition, whether it is provable. Turing proved this by showing that the “halting problem” --- the problem of determining whether a given Turing machine will eventually halt when given a particular input, or run forever --- is undecidable: no Turing machine can solve it in general. Since any decision procedure for the Entscheidungsproblem could be used to solve the halting problem, and the halting problem is undecidable, the Entscheidungsproblem has no general solution.
The Universal Turing Machine
The result about undecidability was the paper’s explicit conclusion, and it was important. But the concept introduced to reach that conclusion --- the Turing machine, and in particular the Universal Turing Machine --- turned out to be even more important. The Universal Turing Machine is a single Turing machine that can simulate any other Turing machine given a description of that machine as input on its tape. It is, in other words, a programmable computer: a machine that can be configured, through its input, to perform any computation that any other machine can perform.
This concept --- the universal machine, the machine that can simulate all other machines --- is the theoretical heart of modern computing. It tells us that the general-purpose computer is not merely a convenient engineering choice but a mathematical necessity: if you can build any specific computing machine, you can, with sufficient memory and time, build a machine that can do everything any specific machine can do. Babbage had designed such a machine; Turing had proved, mathematically, that such a machine must be possible and had characterized precisely what it could and could not do.
“Turing’s Universal Machine is not just the theoretical ancestor of the modern computer. It is the definition of what a computer is --- a machine that can simulate any other machine. Everything else is engineering.”
Turing’s Wartime Work and Practical Influence
Turing’s contribution to practical computing was not limited to his theoretical work. During the war, as a senior figure in the codebreaking effort at Bletchley Park, he led the team that broke the German naval Enigma cipher, an achievement that historians estimate shortened the Battle of the Atlantic by approximately two years and saved tens of thousands of lives. The electromechanical “Bombes” that his team developed to attack Enigma were not general-purpose computers, but they were sophisticated computing devices that demonstrated Turing’s exceptional ability to translate abstract mathematical insight into practical engineering solutions.
After the war, Turing worked at the National Physical Laboratory on the design of a stored-program electronic computer called the ACE (Automatic Computing Engine), producing in 1945 one of the most detailed and sophisticated early computer designs, which incorporated many ideas about programming and software that were ahead of their time. He later moved to the University of Manchester, where he worked on the Manchester Baby --- one of the first stored-program computers to actually run a program, in June 1948. His 1950 paper “Computing Machinery and Intelligence,” which proposed the Imitation Game (Turing Test) as a practical criterion for machine intelligence, launched the field of AI as an intellectual discipline and remains its most cited foundational document.
Turing’s life ended in tragedy. In 1952, he was convicted of “gross indecency” for a consensual homosexual relationship, at a time when homosexuality was a criminal offence in Britain, and subjected to chemical castration as an alternative to imprisonment. He died in June 1954 at the age of forty-one, apparently from cyanide poisoning; the coroner ruled suicide, though the precise circumstances have never been definitively established. In 2013, he received a royal pardon. In 2021, his face was placed on the British fifty-pound note. The belated recognition is welcome but cannot undo the injustice of his treatment, which deprived the world of one of its greatest minds at the height of his powers.
Reflection: Alan Turing is not merely a founding figure of computer science and AI --- he is, in a deep sense, the person who gave both fields their theoretical identity. His definition of computation, his proof of the limits of computability, his concept of the universal machine, and his criterion for machine intelligence are not historical curiosities. They are the foundations on which the entire edifice of modern computing and AI rests.
Section 6: Von Neumann Architecture and the Stored Program
By the end of the war, the electronic computer existed. Colossus had cracked the Lorenz cipher. ENIAC had calculated artillery tables and hydrogen bomb parameters. Zuse’s Z4 had survived the ruins of Berlin. But these machines, for all their power, were awkward to use: programming them required physical reconfiguration, a process too slow and cumbersome to be practical for scientific or commercial use at scale. The key innovation that transformed the computer from a specialized engineering instrument into a general-purpose intellectual tool was the stored-program concept, and the man most closely associated with its formal articulation was one of the twentieth century’s most formidable mathematicians: John von Neumann.
John von Neumann: The Universal Mathematician
John von Neumann was born in Budapest in 1903, into a wealthy Jewish banking family, and displayed mathematical abilities from earliest childhood that are almost without parallel in the history of the subject: he was reading university-level mathematics at eight, produced his first original mathematical paper at seventeen, and had earned his doctorate in mathematics from Budapest while simultaneously completing a chemical engineering degree from Zurich by the age of twenty-two. By the time he joined the Institute for Advanced Study in Princeton in 1933, escaping the rising tide of Nazism in Europe, he was already recognized as one of the most versatile and powerful mathematical minds of his generation.
Von Neumann’s contributions to mathematics and physics --- to quantum mechanics, game theory, ergodic theory, functional analysis, and numerical analysis, among many other fields --- would each be sufficient to secure an enduring reputation. His contribution to computing, though perhaps less profound in purely theoretical terms than Turing’s, was more immediately practical: he articulated, with characteristic clarity and precision, the architectural principles that have governed the design of virtually every digital computer built since 1945.
The EDVAC Report and Stored-Program Computing
In June 1945, while ENIAC was still being completed, von Neumann circulated among engineers and scientists a document titled “First Draft of a Report on the EDVAC” --- a preliminary design for ENIAC’s successor. This document, drafted hastily and never formally published in von Neumann’s lifetime, became one of the most influential technical documents in the history of computing. It described, in systematic and mathematically rigorous terms, the architecture of a stored-program electronic computer: a machine in which programs were stored in the same memory as data, and in which the CPU fetched and executed instructions from that memory in sequence, with conditional branching allowing the execution path to depend on computed results.
The key insight --- that programs and data should be stored in the same memory and represented in the same format --- was not new in 1945. Turing had anticipated it in his 1936 paper on universal computation, and several engineers working on wartime computers had arrived at similar ideas independently. But von Neumann’s report articulated the idea with a clarity and completeness that made it widely accessible, and its informal circulation (bearing only von Neumann’s name, rather than the names of Eckert, Mauchly, and the other engineers who had contributed to the design) inadvertently established him as the principal author of what became known as the “von Neumann architecture.”
The Architecture and Its Components
The von Neumann architecture, as it came to be understood and standardized in the years after the EDVAC report, consists of five functional components. The Central Processing Unit (CPU) executes instructions; it contains an arithmetic-logic unit (ALU) that performs arithmetic and logical operations, and a control unit that fetches instructions from memory, decodes them, and orchestrates the other components to execute them. The Memory stores both programs (instructions) and data in a single address space, accessible by the CPU. The Input devices allow programs and data to be loaded into the machine. The Output devices allow results to be extracted. And the Bus connects these components, allowing them to communicate.
What made this architecture revolutionary was not any single component but their combination, and in particular the decision to store programs in the same memory as data. This meant that a program could modify another program --- or itself --- by writing new values into memory, creating the possibility of self-modifying code and, more importantly, of compilers: programs that translated high-level descriptions of computations into the machine instructions that the CPU could execute. It meant that loading a new program was no different from loading new data: there was no need to rewire the machine. And it meant that the machine could, in principle, generate and execute new programs at runtime, opening the door to the kind of adaptive, self-modifying behavior that AI researchers would soon begin to explore.
“Storing the program in memory alongside the data was not just a convenience. It was the architectural decision that made software possible --- and with software, the possibility of artificial intelligence.”
Von Neumann Architecture and AI
The connection between the von Neumann architecture and artificial intelligence is deep and direct. The iterative algorithms that are the backbone of machine learning --- gradient descent, backpropagation, expectation-maximization --- require a machine that can execute the same sequence of operations thousands or millions of times, updating data values with each iteration. A machine that required physical reconfiguration to change its program could not practically implement such algorithms. The stored-program computer made them trivial: the iterative loop is simply a program that, at each iteration, jumps back to its own beginning.
The stored-program architecture also made it possible to write increasingly sophisticated software: assemblers, compilers, operating systems, interpreters. Each of these layers of software abstraction made the machine easier to program and extended the range of problems that could practically be addressed. The history of AI from the 1950s onward is, in large part, the history of increasingly sophisticated programs running on computers whose fundamental architecture had been fixed by von Neumann’s 1945 report. Even the modern innovations that seem most architecturally radical --- GPU-based parallel computing, neuromorphic chips, quantum processors --- are, in most cases, extensions or alternatives to the von Neumann architecture rather than departures from its fundamental principles.
Reflection: The von Neumann architecture is so fundamental, so ubiquitous, and so deeply embedded in our technological infrastructure that it has become effectively invisible. We no longer think of it as an architectural choice; we think of it as the nature of computers. But it was a choice --- an extraordinarily consequential choice, made in 1945, on the basis of mathematical insight, engineering experience, and a clear-eyed understanding of what a general-purpose computing machine needed to be able to do.
Conclusion: The Foundations Are Complete
The story told in this episode spans roughly fifteen years --- from Turing’s 1936 paper on computable numbers to the completion of the first stored-program computers in the late 1940s. In those fifteen years, the gap between Babbage’s blueprint and a working general-purpose computer was finally closed. Shannon gave computation its mathematical language. Zuse demonstrated binary programmable computing in physical hardware. Atanasoff and Berry showed that electronic computation was feasible before the wartime emergency made it urgent. Colossus and ENIAC demonstrated, under operational conditions, that electronic computers were not just theoretically possible but practically indispensable. Turing provided the theoretical foundations that explained what computation was and what it could in principle achieve. And von Neumann articulated the architectural principles that made stored-program general-purpose computers practical to build and use.
These achievements did not emerge from nowhere. They were built on the mathematical tradition of Boole and De Morgan, on the engineering vision of Babbage, on the programming insight of Lovelace, on the philosophical arguments of Leibniz, Hobbes, Descartes, and Pascal, and ultimately on the ancient human dream --- expressed in the myths of Talos and the Golem and the mechanical servants of every civilization --- of creating intelligence from mechanism. The long arc of history that began with those myths reaches, in the mid-twentieth century, a point of decisive practical realization.
“By 1950, for the first time in human history, the question ‘can machines think?’ had stopped being purely philosophical. It had become an engineering challenge --- and the race was on.”
But building the machine was only the beginning. The computer existed; the question of what to do with it --- how to make it reason, learn, perceive, and understand --- was still almost entirely open. The answers to that question would take decades to develop, and the development would be punctuated by moments of extraordinary optimism and painful disappointment, by breakthrough and setback, by confident predictions of imminent success and long winters of funding cuts and diminished expectations. That story --- the story of AI as a field, from its naming at Dartmouth in 1956 to the deep learning revolution of the present day --- begins in the next episode.
───
Next in the Series: Episode 6
The Birth of AI --- Dartmouth, Symbolic Reasoning, and the First Era of Optimism
The computers existed. The theoretical foundations were in place. Now someone had to declare, formally and publicly, that the project of making machines intelligent was a scientific discipline in its own right. In the summer of 1956, a small group of mathematicians and engineers gathered at Dartmouth College in New Hampshire for a workshop that gave a name to the field: artificial intelligence. We will meet the founders --- McCarthy, Minsky, Simon, Newell --- and trace the extraordinary early years of AI research: the Logic Theorist that proved mathematical theorems, the General Problem Solver that attempted to model human cognition, the checkers-playing programs that beat human champions, and the wave of optimism that led researchers to predict, with extraordinary confidence, that human-level machine intelligence was only years away. We will also trace the seeds of the disappointment that followed, and ask why the first AI summer ended in winter.
--- End of Episode 5 ---