Beta

Curating trusted science content, from across the internet.

Explore Deep Dive

Mathematics: The rise of the machines - Yang-Hui He

The Royal Institution

07/10/2025·

Watch: 1 hour 5 minRead: 6 min

AI & Machine Learning

# birch-test # machine-learning # mathlib # frontier-math # DeepMind

Below is a short summary and detailed review of this video written by FutureFactual:

AI for Mathematics: Bottom-Up, Top-Down and Meta Mathematics

The talk surveys how artificial intelligence is changing mathematics through three intertwined paths: bottom-up, top-down, and meta mathematics. It traces a history from Descartes and Ada Lovelace to modern AI, explains how formal libraries like Lean's Mathlib enable machine-verified proofs, and discusses AI-driven conjecture formation via the Birch test. It also explores the role of large language models in mathematical discovery and the frontier math benchmarks that push research beyond traditional boundaries. Rich with personal anecdotes and historical milestones, the talk offers a nuanced view of both the promise and the limits of AI in mathematics.

Introduction and Context

The speaker opens with a light-hearted reflection on how AI dominates conversations about science today, even for mathematicians. The talk situates AI for mathematics as a growing, vibrant field with a community of hundreds of researchers and a rising number of results. The aim is to inform attendees where AI sits on the spectrum of hype versus practical impact and to present three main directions in which AI will reshape mathematical practice: bottom-up, top-down, and meta mathematics. The talk also points to a broader historical arc, from early automata and the Turing test to modern large language models, setting the stage for a nuanced discussion of what AI can and cannot do in mathematics.

Foundations: What is AI, What is Mathematics

To anchor the discussion, the speaker offers working definitions. AI is described, in practice, as methods built on networks of activation functions in which neural networks approximate complex input-output mappings. The idea is that universal approximation theorems guarantee the ability to approximate many functions given sufficient capacity. The historical timeline spans Descartes and Ada Lovelace, with Lovelace foreseeing computational creativity, Turing proposing a test to distinguish humans from machines, and the Dartmouth conference giving AI its name. The modern landscape features deep learning, language models, and the recognition that AI is now capable of tackling a broad class of mathematical tasks, from pattern recognition to theorem-proving support.

Mathematics itself is summarized in the Hardy quote as a language for patterns, a view that frames AI's potential to recognize and manipulate mathematical structures. The talk also notes a tension: foundational projects like Principia Mathematica aimed to axiomatize mathematics, but Gödel’s incompleteness theorem showed the inherent limits of such bottom-up formalization. This tension remains central as AI increasingly engages with mathematical reasoning.

Bottom-Up Mathematics: From Euclid to Lean

The bottom-up approach seeks to build mathematics from axioms upward, ensuring every step rests on a verified foundation. The Euclidean program of axiomatizing geometry is recalled, followed by the 20th-century attempt to formalize mathematics through Principia Mathematica. The speaker recounts personal reactions to reading Russell and Whitehead, emphasizing how the project demonstrates both the power and the impracticality of purely sentence-by-sentence formalization for human mathematicians. Gödel’s incompleteness theorem then presents a profound obstacle to a fully axiomatized system and to the dream of a complete formal proof engine that can settle all mathematical truths.

Nevertheless, computation did not disappear. In 1956 the Logic Theory Machine showcased an early machine-assisted proving endeavor, demonstrating that mechanized reasoning about Principia Mathematica's axioms was possible with the then-limited technology. The modern counterpoint is the Mathlib lean project. The speaker highlights how Mathlib consolidates undergraduate and higher mathematics into a coherent, machine-checkable library, and how Lean’s ecosystem has evolved into a formalized mathematical corpus used for both education and research. The Xena project extends this by formalizing substantial portions of foundational mathematics, with hundreds of thousands of lines of code and tens of thousands of statements, enabling the production of verifiable proofs that reach into theorems such as Pythagoras’ theorem. The talk notes that this formalization is not merely an academic exercise; it is a necessary infrastructure for robust AI-driven mathematics, as it provides a pristine target for AI-assisted reasoning and verification.

The role of Terence Tao and the Lean community is emphasized, with Tao describing a future where machine-assisted proofs co-exist with human proof strategy, enabling rapid checking of arguments and the exploration of new ideas within a formally verified framework. The speaker points to a growing corpus of undergraduate materials that have been formalized and the potential for AI to learn mathematical language patterns through interaction with these formal libraries. The bottom-up path thus becomes an ecosystem in which formal proof systems, large code bases, and AI techniques collaborate to advance mathematical knowledge reliably.

Top-Down Mathematics: Conjecture Formation and Discovery

Top-down mathematics treats mathematics as a field shaped by inquiry, intuition, and data-driven exploration. The speaker contrasts formalist and intuitionist philosophies: Hilbert’s formalism seeks strict derivations from axioms, while Brouwer’s intuitionism emphasizes human mental construction. Experimental mathematics is framed as a form of heuristic exploration that treats mathematics as an experimental science with cheap, repeatable experiments on blackboards and in software. The Arnold program is noted as a source of recognition for the value of collaborative, exploratory work. The speaker then introduces the Birch test as a rigorous criterion for AI-assisted conjecture formation: an automatic, interpretable, and non-trivial AI contribution that genuinely yields conjectures of interest to human mathematicians.

The discussion includes historical illustrations of top-down discovery: Gauss’s prime counting work leading to the prime number theorem, which combined manual data collection with the emergence of regression ideas, ultimately requiring complex analysis for proof. The Millennium Prize Problems are cited as exemplars of conjectures born from deep mathematical data and insights, with several results achieved in areas influenced by computational exploration. The talk emphasizes that AI can assist in identifying promising conjectures, selecting important questions, and surfacing patterns that human researchers might otherwise miss. In practice, the Birchs-Swindon-Dyer conjecture and related computations using LMFDB (an elliptic curve database) illustrate how AI-assisted data analysis can inspire new conjectures, even when the initial AI results do not themselves supply a complete proof. The Birches are used as a metaphorical benchmark for an AI system that can autonomously propose non-trivial, interpretable mathematical ideas that pique human interest and lead to rigorous follow-up work.

Meta Mathematics: Large Language Models and Discovery Benchmarks

Meta mathematics concerns AI that operates outside traditional mathematical practice, leveraging language models for reasoning, explanation, and cross-disciplinary integration. The Turing test milestone is revisited in the context of large language models that pass a practical form of human-machine indistinguishability, as demonstrated by ChatGPT’s capabilities. The talk discusses the International Math Olympiad (IMO) and a recent wave of AI systems achieving medals and performing at high levels on geometric problems that were historically challenging for humans. The Frontier Math program by OpenAI and Epoch AI is introduced as a research initiative to benchmark language models on math problems that range from IMO-style questions to cutting-edge research problems. The tiered evaluation, especially Tier 4 problems, tests models with high-precision numerical answers, demanding rigorous reasoning and exact results. The speaker presents a provocative result from a tier 4 evaluation where language models achieved around 10 percent success on very difficult problems, demonstrating that current models can solve non-trivial mathematical questions with significant, verifiable outcomes. The narrative notes ongoing debates about whether language models truly reason or simply pattern-match across vast data corpora, while acknowledging that they can still surface novel conjectures and assist in high-level problem solving. The Frontier Math results underscore a potential future in which AI acts as a genuine co-researcher, capable of proposing questions and contributing to proofs under human oversight.

Personal Journey and Practical Realities

The speaker shares a personal origin story: in 2017, during paternity leave, he turned to machine learning to cope with sleepless nights and used neural networks to predict invariants in geometries known as Clabier manifolds. That experience led to the 2017 paper Machine Learning the String Landscape and, subsequently, textbooks and collaborative works on AI for pure mathematics. The talk recounts the evolution from a nascent field to an established research area with hundreds of papers, textbooks, and a growing set of formal tools like Mathlib and Lean that enable reproducible, verifiable mathematical work. The importance of credible, citable AI-generated content is highlighted, as is the need for robust governance and reproducibility in AI-assisted mathematics infrastructure.

Impact, Funding, and the Road Ahead

The talk addresses the funding landscape, noting hesitancy and conservatism in grant reviews. It argues for dedicated centers for AI in mathematics, citing initiatives such as BIMSA in Beijing and Center for AI for Mathematics in the United States, while noting the UK’s lack of a centralized AI for mathematics center. The speaker advocates sustained investment to build a globally unified platform where AI tools, formal libraries, and high-quality mathematical content coexist. The concluding message emphasizes human-AI collaboration as the future of mathematical discovery, spanning bottom-up precision, top-down conjectural insight, and meta mathematical reasoning through intelligent language models. The talk ends with an invitation to support and participate in this exciting direction, envisioning a future where AI enhances the trustworthiness, efficiency, and creativity of mathematical practice.

To find out more about the video and The Royal Institution go to: Mathematics: The rise of the machines - Yang-Hui He.

Related posts

featured

The World, The Universe And Us

·13/03/2026

Mathematics is Undergoing the Biggest Change in its History

featured

Quanta Magazine

·26/05/2026

The AI Revolution in Math Has Arrived

featured

Science Friday

·27/03/2026

Move over, vibe-coding. Vibe-proving is here for math

featured

Scientific American

·25/03/2026

Can AI do math, or does it just act like a calculator?