Buildings in the Desert and Corridors of Logic: How Algebraic Geometry (Scheme Theory) Is Shaping the Future of AI
Subtitle: Étale Sheaves, Category Theory, and Sheaf Neural Networks — A Complete Guide at Four Levels, from High School to Graduate School
Prologue: Why You Should Read This Article
In 2025, the words "Sheaf," "Categorical," and "Algebraic Geometry" are appearing with increasing frequency in papers accepted at NeurIPS and ICML.
This is no coincidence.
If you are a machine learning engineer or an AI research scientist, this article concerns your career.
If you have ever implemented a GNN in PyTorch, you have probably struggled with oversmoothing. If you have ever tried to improve Transformer attention, you have probably been unable to explain mathematically why a particular weighting scheme works.
If you have ever stared at the scaling laws of LLMs and wondered why increasing parameters improves performance, you are not alone.
Algebraic geometry is beginning to answer these questions.
In 2022, Bodnar et al. presented "Neural Sheaf Diffusion" at NeurIPS, introducing the 200-year-old theory of sheaves into GNNs and resolving both oversmoothing and heterophily in a single stroke.
In 2024, Gavranović et al. presented "Categorical Deep Learning" at ICML, showing that CNNs, RNNs, Transformers, and GNNs can all be described uniformly as monad algebras in category theory.
Sumio Watanabe's Singular Learning Theory (2009) analyzes the "landscape" of neural network loss functions as algebraic varieties, providing a mathematical explanation for the mystery of deep learning generalization.
All of these originate from a single branch of mathematics: algebraic geometry (scheme theory).
Yet algebraic geometry is known as "one of the most difficult fields in all of mathematics."
Hartshorne's textbook is a struggle even for graduate students. Grothendieck's EGA runs to thousands of pages. The temptation to think "this has nothing to do with me" is understandable.
This article was written to break down that wall.
Starting from the "buildings in the desert" metaphor that even a high school student can follow, the article is designed to reach, in four progressive levels, the mathematical rigor that a graduate student needs to read the primary literature. Readers at each level need only follow the paragraphs marked with their icon (🏫🎓📐🔬).
After reading this article, you will be able to understand the following papers not just at the abstract level, but at the level of the main text:
- Bodnar et al., "Neural Sheaf Diffusion" (NeurIPS 2022)
- Barbero et al., "Sheaf Attention Networks" (NeurIPS 2022 Workshop, Oral)
- Gavranović et al., "Categorical Deep Learning" (ICML 2024)
- Watanabe, Algebraic Geometry and Statistical Learning Theory (Cambridge, 2009)
The era in which AI engineers who know algebraic geometry overtake those who do not has begun. This article is the shortest route to closing that gap.
Reader Level Guide
This article uses four icons to indicate the reader level. You do not need to read every paragraph — simply follow those marked with your icon.
| Icon | Target Reader | Prerequisites |
|---|---|---|
| 🏫 | High school students / beginners | High school math (quadratic equations, basic set theory) |
| 🎓 | Undergraduate general education (years 1–2) | Linear algebra, calculus, basic set theory |
| 📐 | Undergraduate upper division (year 3+) | Group theory, ring theory, basic point-set topology |
| 🔬 | Graduate students / researchers | Commutative algebra, homological algebra, basic category theory |
Chapter 1: Seven Reasons AI Engineers Should Learn Algebraic Geometry (Scheme Theory)
The fundamental question "why does deep learning work?" has reached a stage where linear algebra and probability theory alone can no longer provide answers.
Regularization of overfitting, breaking through the limits of GNNs, a unified theory of architectures — many of the breakthroughs at the frontier of 2020s AI research are rooted in tools that pure mathematicians forged over the 19th and 20th centuries to study "prime numbers" and "symmetries of equations": scheme theory, étale cohomology, and category theory.
In mathematics, solving the same equation over different prime worlds — the finite fields $\mathbb{F}_3, \mathbb{F}5, \mathbb{F}7, \mathbb{F}{13}, \mathbb{F}{17}$ — and bundling the symmetries of the solutions as "sheaves" led to the resolution of centuries-old problems such as Fermat's Last Theorem and the Weil conjectures.
In AI, Sheaf Neural Networks — which equip each node of a graph with its own vector space and learn a "translation rule" for each edge — have broken through the limitations of conventional GNNs.
The structural parallels between these two worlds are striking.
The seven reasons below show why this shared structure has practical value for AI engineers.
Reason 1: The true nature of "overfitting" is being mathematically revealed as singularities of algebraic varieties. Sumio Watanabe's Singular Learning Theory (SLT, 2009) analyzes the "landscape" of neural network loss functions as algebraic varieties, explaining why deep learning generalizes well even though it "should theoretically overfit."
Reason 2: The limitations of Graph Neural Networks (GNNs) have been broken through by "sheaves." The Sheaf Neural Networks (SNNs) of Bodnar et al. (2022) introduced "cellular sheaf theory" from algebraic topology into GNNs, dramatically improving accuracy on heterophilic graphs (graphs with many edges between dissimilar nodes) — a known weakness of conventional GNNs.
Reason 3: A "unified theory" of AI architectures is being written in category theory. Gavranović et al. (2024, ICML), in "Categorical Deep Learning," showed that all architectures — CNNs, RNNs, Transformers, GNNs — can be uniformly described as monad algebras in category theory.
Reason 4: "Solving equations in the world of prime numbers" — seemingly unrelated to AI — has become a tool for discovering "hidden symmetries" in data. The Galois action structure of étale sheaves provides AI with a mathematical framework for discovering invisible rotational symmetries in graph data.
Reason 5: The idea of étale topology overcoming the "coarseness" of Zariski topology is mathematically isomorphic to solving the "oversmoothing problem" in GNNs. The oversmoothing phenomenon — where stacking GNN layers makes node features uniform — is structurally the same as the Zariski topology problem of "open sets being too large to distinguish local features."
Reason 6: A century of accumulated results in arithmetic geometry can be reused as "geometry of discrete data." Solving equations over finite fields $\mathbb{F}_3$, $\mathbb{F}5$, $\mathbb{F}7$, $\mathbb{F}{13}$, $\mathbb{F}{17}$ — this "geometry of discrete worlds" — is precisely the geometry of graphs (discrete data).
Reason 7: The world's top AI researchers (DeepMind, Meta AI) are intensively studying algebraic geometry and category theory. Petar Veličković (DeepMind), Michael Bronstein (Oxford/Twitter), Bruno Gavranović (Strathclyde), and other leading figures in AI are publishing papers on algebraic geometry and category theory in rapid succession.
Chapter 2: Overview of This Article
What Each Level of Reader Will Gain
| Level | What you will gain from this article |
|---|---|
| 🏫 High school | The name "algebraic geometry" stops being intimidating. You can see the landscape of "buildings in the desert." You feel that the cutting edge of AI is a "mathematical puzzle" |
| 🎓 General education | You intuitively understand Zariski topology, étale topology, and sheaves as "safety guarantees for reciprocals" and "overlays of transparent sheets." Mathematical terms become meaningful words |
| 📐 Junior/Senior | You can relate ring localization, étale morphisms, stalks, and sheaf Laplacians to AI implementation code (PyTorch). Useful for graduate entrance exams |
| 🔬 Graduate | You can understand the mathematical relationship between étale cohomology and Sheaf Neural Networks at the level of Bodnar et al. (2022) and Gavranović et al. (2024) |
Article Structure
Chapter 1: Seven reasons AI engineers should learn scheme theory
Chapter 2: Overview of this article (you are here)
Chapter 3: The landscape of scheme theory — buildings in the desert, corridors, and the sky castle
Chapter 4: How AI is incorporating scheme theory — a complete guide to the literature
Chapter 5: Watanabe's Singular Learning Theory (SLT) in detail
Chapter 6: The connection with category theory
Chapter 7: References
Chapter 8: Study roadmap
Chapter 9: Outlook — the future at the intersection of mathematics and AI
Chapter 3: The Landscape of Scheme Theory — Buildings in the Desert, Corridors, and the Sky Castle
This chapter is the core of the article.
We solve the same algebraic equation over different "number worlds" — the finite fields $\mathbb{F}_3$ , $\mathbb{F}5$ , $\mathbb{F}7$ , $\mathbb{F}{13}$ , $\mathbb{F}{17}$ , the integers $\mathbb{Z}$ , the rationals $\mathbb{Q}$ , the reals $\mathbb{R}$ , and the complex numbers $\mathbb{C}$ — and depict the landscapes that emerge, at four levels.
3.1 There Is More Than One "World of Numbers"
🏫 For high school students
The "numbers" you know live in the world of the real numbers $\mathbb{R}$. $1, 2, 3, \ldots$, $\frac{1}{2}$, $\pi$, $\sqrt{2}$ — they all line up on the number line.
But mathematicians have many different "number worlds." The most surprising of these are finite fields.
In the world of $\mathbb{F}_3$, there are only three numbers: $0, 1, 2$. And $2 + 1 = 0$ (the rule is: compute the remainder after dividing by 3).
In the world of $\mathbb{F}_5$, there are five numbers: $0, 1, 2, 3, 4$. And $3 + 4 = 2$ (remainder after dividing by 5).
In $\mathbb{F}_7$: $0, 1, 2, 3, 4, 5, 6$.
In $\mathbb{F}_{13}$: $0, 1, \ldots, 12$.
In $\mathbb{F}_{17}$: $0, 1, \ldots, 16$.
You can add, multiply, and even divide (by anything except 0). But there are only finitely many numbers. Strange, isn't it?
🎓 For undergraduates (general education)
The finite field $\mathbb{F}_p$ (where $p$ is prime) is $\mathbb{Z}/p\mathbb{Z}$ — the set of remainders when integers are divided by $p$ — equipped with addition and multiplication. Because $p$ is prime, every nonzero element has a multiplicative inverse, giving it the structure of a field.
Concrete examples:
- $\mathbb{F}_3 = {0, 1, 2}$: $2 \times 2 = 4 \equiv 1 \pmod{3}$, so $2^{-1} = 2$
- $\mathbb{F}_5 = {0, 1, 2, 3, 4}$: $3 \times 2 = 6 \equiv 1 \pmod{5}$, so $3^{-1} = 2$
- $\mathbb{F}_7 = {0, 1, 2, 3, 4, 5, 6}$: $5 \times 3 = 15 \equiv 1 \pmod{7}$, so $5^{-1} = 3$
- $\mathbb{F}_{13} = {0, 1, \ldots, 12}$: $7 \times 2 = 14 \equiv 1 \pmod{13}$, so $7^{-1} = 2$
- $\mathbb{F}_{17} = {0, 1, \ldots, 16}$: $10 \times 12 = 120 \equiv 1 \pmod{17}$, so $10^{-1} = 12$
These finite fields are entirely different "number worlds" from $\mathbb{R}$ or $\mathbb{C}$. Yet in every one of them, we can "solve equations."
📐 For juniors/seniors
$\mathbb{F}_p$ is the prime field of characteristic $p$, and its algebraic closure $\overline{\mathbb{F}_p}$ is constructed as
$$
\bigcup_{n \ge 1} \mathbb{F}_{p^n}.
$$
$\mathrm{Spec},\mathbb{Z}$ is a topological space whose "points" are all the prime ideals $(p)$ ($p = 2, 3, 5, 7, 11, 13, 17, \ldots$) and the zero ideal $(0)$.
The "residue field" at each prime ideal $(p)$ is $\mathbb{F}_p$, and the residue field at the zero ideal $(0)$ is $\mathbb{Q}$.
🔬 For graduate students
$\mathrm{Spec},\mathbb{Z}$ is a 1-dimensional Noetherian integral scheme with closed points $(p)$ corresponding to each prime $p$ and a generic point $(0)$.
The stalk of the structure sheaf
$$
\mathcal{O}_{\mathrm{Spec},\mathbb{Z}}
$$
at $(p)$ is $\mathbb{Z}_{(p)}$ (the localization at $p$), and the stalk at the generic point $(0)$ is $\mathbb{Q}$.
$\mathrm{Spec},\mathbb{Z}$ is the most fundamental base space in arithmetic geometry, and all number fields and finite fields appear as fibers over this space.
3.2 Solving the Same Equation in Different Worlds
🏫 For high school students: Buildings in the desert
Now the mathematical "landscape" comes into view.
Picture this: a vast, endless desert. Scattered across the desert, at regular intervals, stand buildings.
- At location "3" stands the $\mathbb{F}_3$ building (3 stories: floors $0, 1, 2$)
- At location "5" stands the $\mathbb{F}_5$ building (5 stories: floors $0, 1, 2, 3, 4$)
- At location "7" stands the $\mathbb{F}_7$ building (7 stories)
- At location "13" stands the $\mathbb{F}_{13}$ building (13 stories)
- At location "17" stands the $\mathbb{F}_{17}$ building (17 stories)
- ...(one building for every prime number)
And far beyond the horizon — at infinite distance — lies the world of the reals $\mathbb{R}$. Beyond even that, the world of the complex numbers $\mathbb{C}$.
This entire desert is "$\mathrm{Spec},\mathbb{Z}$" — the map of number worlds.
Now, let us shout a single equation across the desert:
$$x^2 - 2 = 0$$
"What $x$ satisfies $x^2 = 2$?"
Each building gives a different answer.
- The $\mathbb{F}_3$ building (location 3): Does any $x$ satisfy $x^2 \equiv 2 \pmod{3}$? $0^2 = 0$, $1^2 = 1$, $2^2 = 4 \equiv 1$. None equal 2! No solution. The building is dark.
- The $\mathbb{F}_5$ building (location 5): Full search: $0^2 = 0$, $1^2 = 1$, $2^2 = 4$, $3^2 = 9 \equiv 4$, $4^2 = 16 \equiv 1$. None equal 2! No solution. Also dark.
- The $\mathbb{F}_7$ building (location 7): $3^2 = 9 \equiv 2$! And $4^2 = 16 \equiv 2$! Two solutions: $x = 3$ and $x = 4$. The lights on floors 3 and 4 switch on.
- The $\mathbb{F}_{13}$ building (location 13): Full search: $0^2 = 0$, $1^2 = 1$, $2^2 = 4$, $3^2 = 9$, $4^2 \equiv 3$, $5^2 \equiv 12$, $6^2 \equiv 10$, $7^2 \equiv 10$, $8^2 \equiv 12$, $9^2 \equiv 3$, $10^2 \equiv 9$, $11^2 \equiv 4$, $12^2 \equiv 1$. None equal 2! No solution. Dark.
- The $\mathbb{F}_{17}$ building (location 17): $6^2 = 36 \equiv 2 \pmod{17}$! And $11^2 = 121 \equiv 2$! Two solutions: $x = 6$ and $x = 11$. Floors 6 and 11 light up.
Beyond the horizon (the world of $\mathbb{R}$), the answers are, as you know, $x = \sqrt{2}$ and $x = -\sqrt{2}$.
Looking down on the desert from above, some buildings are lit, some are dark. The pattern of illumination (which floors are lit) differs from building to building. This "pattern of light and shadow" is the code that arithmetic geometers have spent over a century deciphering.
🎓 For undergraduates: "Forbidden zones" and "safety guarantees"
Now that the desert buildings are visible, let us discuss "Zariski topology."
What is a "topology" in the first place? A topology is a rule for declaring which sets are "open." The open interval $(0, 1)$ from high school is an "open set" — no matter which point you pick inside it, you have room to move slightly and still remain inside.
What is the Zariski topology? In algebraic geometry, "closed sets" are defined as the zero sets (roots) of polynomials.
Consider the polynomial $f(x) = x^2 - 1$.
- The solutions of $f(x) = 0$ are $x = 1$ and $x = -1$: just two points.
- Closed set (forbidden zone): ${1, -1}$ (only two points)
- Open set (safe zone): everything except ${1, -1}$
Why make zones "forbidden"? To create a "safe zone (paradise)."
Suppose you want to use the function $g(x) = \frac{1}{x^2 - 1}$. At $x = 1$ or $x = -1$, the denominator is zero and the computation explodes (division-by-zero error). But if you declare ${1, -1}$ "off limits," then in the remaining space this function is guaranteed to be safely computable.
In other words, "forbidden" is the preparation for issuing a "license of safety."
Here lies a problem. The open sets of the Zariski topology are too large. Over $\mathbb{C}$ in one variable (the affine line $\mathbb{A}^1_\mathbb{C}$), by the fundamental theorem of algebra, a polynomial has at most finitely many roots, so Zariski open sets are always "the entire space minus finitely many points."
It is like a "camera with no zoom." Everywhere you look, you see "the whole space minus a few points," and you cannot examine any particular location in detail.
Note (📐 for juniors and above): The description "the whole space minus finitely many points" applies to $\mathbb{A}^1$ (the affine line, dimension 1). In $\mathbb{A}^n$ ($n \geq 2$), Zariski closed sets can be curves or surfaces, not just finitely many points. For example, $V(x^2 + y^2 - 1)$ in $\mathbb{A}^2_\mathbb{C}$ is a circle — a 1-dimensional closed set.
🏫🎓 Étale topology: Overlaying "transparent sheets" on the desert
To solve the "no zoom" problem of the Zariski topology, mathematicians invented the étale topology.
Returning to the desert metaphor:
The Zariski topology is like drawing lines directly on the desert floor with a marker to demarcate "Zone A goes from here to here." You can only cut out a piece of the ground to examine (inclusion: $U \subset X$).
The étale topology is like overlaying transparent tracing paper (sheets) on top of the desert floor. The sheet is a different piece of paper from the ground, but when laid on top, it aligns perfectly.
Why are sheets necessary?
Consider the equation $x^2 = 4$. The answers are $x = 2$ and $x = -2$ — two solutions.
Standing on the ground (the base space) and staring at the point "4," you feel unsettled because there are two answers. So you build a two-story structure above the ground:
- Floor 2 (Sheet 1): at every location $\mathbb{F}_3, \mathbb{F}5, \mathbb{F}7, \mathbb{F}{13}, \mathbb{F}{17}$, only the "plus-side answer" lives here
- Floor 1 (Sheet 2): only the "minus-side answer" lives here
Within a single floor, the answer is unique. As long as you stay on floor 2, there is no confusion.
"Overlaying sheets" means creating as many parallel worlds (layers) as there are variations of the answer, and stacking them on top of each other.
🏫 Key point for high school students: Instead of asking "is it contained inside?" (inclusion $\subset$), we ask "does it lie on top?" (covering). This is the essence of the étale topology.
📐 For juniors/seniors: Étale morphisms and "liberation from the constraint of inclusion"
This is the most important conceptual shift for those heading to graduate school.
In the Zariski topology, an open set $U$ is a subset of $X$, and only inclusion maps $U \hookrightarrow X$ are allowed.
The étale topology abandons this constraint of "inclusion."
Definition (Étale morphism): A morphism of schemes $f: Y \to X$ is étale if $f$ is simultaneously:
- Flat — algebraically, "not collapsed"
- Unramified — "not branching" (the derivative does not vanish)
- Locally of finite presentation — "describable by finitely many equations"
Intuitively, it is "the algebraic version of a local homeomorphism."
📐 Note: Many introductory texts omit condition 3 and define étale as "flat + unramified," but strictly speaking this condition is required. Over Noetherian schemes, it follows automatically from "finite type," which is why it is often omitted.
Why abandon inclusion?
Consider the map $x \mapsto x^2$: $\mathbb{A}^1 \to \mathbb{A}^1$ (sending $x$ to $x^2$). At $x = 0$, the derivative $2x$ is zero, so the map is ramified and not étale. However, restricting to $\mathbb{A}^1 \setminus {0} \to \mathbb{A}^1 \setminus {0}$, this map becomes an étale morphism — "two sheets covering one ground" (over a field of characteristic $\neq 2$).
In the Zariski topology, these "two sheets" cannot be described as a "subset of one ground." The sheets are not "inside" the ground — they "cover" the ground from above.
In the étale topology, such "morphisms that are not inclusions but locally look the same" are admitted as "open sets." This allows us to handle situations where the solutions of equations branch, in a geometrically natural way.
Important clarification: It is sometimes said that "the étale topology has no inclusion relations," but this is an oversimplification. More precisely, it means "we allow more general morphisms, not limited to inclusions (subset inclusion maps)." Even in the étale site, there is structure arising from composition of morphisms.
🔬 For graduate students: Grothendieck topologies and sites
The étale topology is, precisely speaking, not a "topology" (a family of open sets) but a "site."
Definition (Étale site): For a scheme $X$, the étale site $X_{\text{ét}}$ over $X$ is a category defined by:
- Objects: all étale morphisms $U \to X$ over $X$
- Coverings: families of étale morphisms ${U_i \to U}_{i \in I}$ such that $\coprod_i U_i \to U$ is surjective
In the Zariski site, coverings are families of open immersions; in the étale site, coverings are families of étale morphisms. This allows extraction of cohomological information (e.g., $\ell$-adic cohomology, essential for the proof of the Weil conjectures) that the Zariski topology cannot capture.
3.3 Elevators and the Sky Castle — How to Tell Whether Two Solutions Are "of the Same Kind"
This section is one of the most important parts of the article. "How can we compare solutions from different buildings (different finite fields $\mathbb{F}_3, \mathbb{F}5, \mathbb{F}7, \mathbb{F}{13}, \mathbb{F}{17}$)?" — we explain the concrete mechanism at all four levels, holding nothing back.
🏫 For high school students: Elevators inside the buildings
Let us return to the landscape of desert buildings.
When we shouted the equation $x^2 - 2 = 0$, floors 3 and 4 lit up in the $\mathbb{F}7$ building (location 7), and floors 6 and 11 lit up in the $\mathbb{F}{17}$ building (location 17).
Now the mathematician asks: "Are floor 3 of the $\mathbb{F}7$ building and floor 6 of the $\mathbb{F}{17}$ building of the same kind?"
This is a puzzling question. The "3" in $\mathbb{F}7$ and the "6" in $\mathbb{F}{17}$ are numbers from entirely different worlds. It seems impossible to compare them.
Yet mathematicians discovered a way to compare them. The method is: "Compare how the elevators (staircases) installed in each building move."
Step 1: The elevator in each building
In each building, an elevator is installed between the lit floors. This elevator travels only between the lit floors.
- The $\mathbb{F}_7$ building: Floors 3 and 4 are lit. The elevator goes "floor 3 → floor 4" and "floor 4 → floor 3." There are only two buttons.
- The $\mathbb{F}_{17}$ building: Floors 6 and 11 are lit. The elevator goes "floor 6 → floor 11" and "floor 11 → floor 6."
These elevators actually move by the same mathematical rule. In the $\mathbb{F}7$ building: $3 + 4 = 7 \equiv 0 \pmod{7}$, i.e., $4 \equiv -3 \pmod{7}$. In the $\mathbb{F}{17}$ building: $6 + 11 = 17 \equiv 0 \pmod{17}$, i.e., $11 \equiv -6 \pmod{17}$.
In both buildings, the elevator operates by the same rule: "send $x$ to $-x$"!
The set of all rules by which the elevator shuffles lit floors — that is the "Galois group."
For $x^2 - 2 = 0$, the Galois group consists of two operations: "do nothing" and "swap $x$ and $-x$" (in mathematics, written $\mathbb{Z}/2\mathbb{Z}$).
Step 2: The sky castle — the control tower for all building elevators
High above the desert — far higher than any building — a "sky castle" floats.
Inside this castle is a master elevator control panel. The elevator rule in the castle ("send $x$ to $-x$") is the prototype of the "common rule" for all building elevators.
When the castle shines laser beams down to each building:
- To the $\mathbb{F}_3$ building: No lit floors on the ground level ($\mathbb{F}_3$). However, this building actually has a hidden room (the world of $\mathbb{F}9 = \mathbb{F}{3^2}$), where two solutions are hiding. The elevator operates inside the hidden room, swapping the two hidden solutions.
- To the $\mathbb{F}_5$ building: Similarly, no light on the ground floor. Hidden room ($\mathbb{F}_{25}$) has an elevator.
- To the $\mathbb{F}_7$ building: Floors 3 and 4 light up. The castle's rule "$x \to -x$" manifests in this building as "$3 \to 4$, $4 \to 3$." Since the solutions are visible on the ground floor, the elevator's movement is directly observable.
- To the $\mathbb{F}_{13}$ building: No light on the ground floor. Hidden room ($\mathbb{F}_{169}$) has an elevator.
- To the $\mathbb{F}_{17}$ building: Floors 6 and 11 light up. The castle's rule "$x \to -x$" manifests as "$6 \to 11$, $11 \to 6$."
Key point: Every building has an elevator (a Galois group action). The difference is whether the elevator's motion is "visible on the ground floor" (splitting primes: $p = 7, 17$) or "visible only in the hidden room" (inert primes: $p = 3, 5, 13$).
Step 3: How corridors are connected — the rule for determining "same kind"
Here is the crux. What is the rule for judging floor 3 of the $\mathbb{F}7$ building and floor 6 of the $\mathbb{F}{17}$ building to be "of the same kind" and connecting them with a corridor?
Rule: "Connect solutions that occupy the same position in the sky castle's elevator."
The castle's elevator has "Position A" and "Position B" (corresponding to the two solutions).
- Position A = "the $+\sqrt{2}$ lineage"
- Position B = "the $-\sqrt{2}$ lineage"
In the $\mathbb{F}7$ building, 3 corresponds to the $+\sqrt{2}$ lineage (Position A), and 4 to the $-\sqrt{2}$ lineage (Position B). In the $\mathbb{F}{17}$ building, 6 is Position A, and 11 is Position B.
Therefore:
- Sheet 1 (Corridor A): $\mathbb{F}7$ floor 3 ↔ $\mathbb{F}{17}$ floor 6 (both Position A = $+\sqrt{2}$ lineage)
- Sheet 2 (Corridor B): $\mathbb{F}7$ floor 4 ↔ $\mathbb{F}{17}$ floor 11 (both Position B = $-\sqrt{2}$ lineage)
🏫🎓📐🔬 When there are 3 or more solutions — how does the elevator move?
For $x^2 - 2 = 0$, which has only 2 solutions, the elevator simply "goes back and forth between two floors." What happens for equations with 3, 4, or 5 solutions?
This is where the true excitement of algebraic geometry begins.
🏫 For high school students: A building with 3 lit floors — $x^3 = 1$
Let us shout the equation $x^3 - 1 = 0$ ("what number cubed gives 1?") across all the desert buildings.
The $\mathbb{F}_7$ building (location 7) has 3 lit floors: $x = 1, 2, 4$. Verification:
- $1^3 = 1 \equiv 1 \pmod{7}$ ✓
- $2^3 = 8 \equiv 1 \pmod{7}$ ✓
- $4^3 = 64 \equiv 1 \pmod{7}$ ✓
The $\mathbb{F}_{13}$ building (location 13) has 3 lit floors: $x = 1, 3, 9$:
- $1^3 = 1$ ✓
- $3^3 = 27 \equiv 1 \pmod{13}$ ✓
- $9^3 = 729 \equiv 1 \pmod{13}$ ✓
The $\mathbb{F}_5$ building (location 5) has only 1 lit floor: $x = 1$. The other solutions are hiding in the hidden room ($\mathbb{F}_{25}$). Similarly for the $\mathbb{F}3$ and $\mathbb{F}{17}$ buildings.
The $\mathbb{F}_7$ building has 3 lit floors (1, 2, 4). How does the elevator move among these three floors?
🏫 How the 3-floor elevator works: the "round-and-round elevator"
Look carefully at the three lit floors $1, 2, 4$ in the $\mathbb{F}_7$ building:
- $1 \times 2 = 2$ (floor 1 → floor 2)
- $2 \times 2 = 4$ (floor 2 → floor 4)
- $4 \times 2 = 8 \equiv 1$ (floor 4 → floor 1!)
The operation "multiply by 2" cycles through 1 → 2 → 4 → 1 → 2 → 4 → ... round and round!
The $\mathbb{F}_{13}$ building has the same structure:
- $1 \times 3 = 3$ (floor 1 → floor 3)
- $3 \times 3 = 9$ (floor 3 → floor 9)
- $9 \times 3 = 27 \equiv 1$ (floor 9 → floor 1!)
The operation "multiply by 3" cycles 1 → 3 → 9 → 1 → ... round and round.
This is the cyclic elevator. Unlike the "just goes up and down" elevator for $x^2 - 2 = 0$, this one rotates in one direction through three floors.
Comparing the 2-floor and 3-floor buildings:
| 2 floors ($x^2 = 2$) | 3 floors ($x^3 = 1$) | |
|---|---|---|
| $\mathbb{F}_7$ | Floor 3 ↔ 4 (back and forth) | Floor 1 → 2 → 4 → 1 (cyclic) |
| $\mathbb{F}_{13}$ | No solution (hidden room) | Floor 1 → 3 → 9 → 1 (cyclic) |
| Elevator motion | 2 buttons (go and return) | 1 button (round and round in one direction) |
| Galois group | $\mathbb{Z}/2\mathbb{Z}$ (2 operations) | $\mathbb{Z}/2\mathbb{Z}$ (identity and swap) |
🏫 "Wait — there are 3 floors, but the Galois group has only 2 operations?" Great intuition! In fact, $x^3 - 1 = (x-1)(x^2+x+1)$, and $x = 1$ is a "special floor that is always lit." The Galois group really only moves the remaining 2 floors ($\mathbb{F}_7$: floors 2 and 4), so the essential elevator motion is just "swap two."
🏫 For high school students: A building with 4 lit floors — $x^4 = 1$
Now shout $x^4 - 1 = 0$ ("what number raised to the 4th power gives 1?").
The $\mathbb{F}_5$ building (location 5): all 4 floors light up!
- $x = 1, 2, 3, 4$ (every nonzero element of $\mathbb{F}_5$!)
- $1^4 = 1$ ✓, $2^4 = 16 \equiv 1$ ✓, $3^4 = 81 \equiv 1$ ✓, $4^4 = 256 \equiv 1$ ✓
The $\mathbb{F}_{13}$ building (location 13): 4 lit floors
- $x = 1, 5, 8, 12$
- Verification: $5^4 = 625 \equiv 1 \pmod{13}$ ✓ (likewise for others)
The $\mathbb{F}_{17}$ building (location 17): 4 lit floors
- $x = 1, 4, 13, 16$
- Verification: $4^4 = 256 \equiv 1 \pmod{17}$ ✓
The $\mathbb{F}_7$ building (location 7): only 2 lit floors
- $x = 1, 6$ (where $6 \equiv -1 \pmod{7}$)
🏫 How the 4-floor elevator works: "two kinds of elevators"
Look at the 4 lit floors $1, 4, 13, 16$ of the $\mathbb{F}_{17}$ building:
- $4$ is a "primitive 4th root" ($4^2 = 16 \equiv -1$, $4^4 = 1$, but $4^1 \neq 1$, $4^2 \neq 1$)
- Powers of $4$: $4^1 = 4$, $4^2 = 16$, $4^3 = 64 \equiv 13$, $4^4 = 1$
So the elevator moves in these patterns:
Elevator A (rotate by 1): $1 \to 4 \to 16 \to 13 \to 1$
Elevator B (skip by 2): $1 \to 16 \to 1$, $4 \to 13 \to 4$
Elevator C (reverse rotate by 3): $1 \to 13 \to 16 \to 4 \to 1$
With 2 floors, the elevator "goes back and forth"; with 3 floors, it "rotates in one direction"; with 4 floors, there are multiple rotation patterns!
Moreover, in the $\mathbb{F}_7$ building, only 2 of the 4 floors are lit. This is because $\mathbb{F}_7$ lacks a "primitive 4th root" (a number corresponding to $i$, i.e., a solution of $x^2 = -1$). There is no $x$ in $\mathbb{F}7$ with $x^2 \equiv -1 \equiv 6 \pmod{7}$ (the solutions are hiding in $\mathbb{F}{49}$).
🏫 For high school students: A building with 5 lit floors — $x^5 = 1$
Finally, shout $x^5 - 1 = 0$ ("what number raised to the 5th power gives 1?"), and something surprising happens.
In all of $\mathbb{F}_3, \mathbb{F}5, \mathbb{F}7, \mathbb{F}{13}, \mathbb{F}{17}$, the only lit floor on the ground level is $x = 1$.
Buildings with all 5 floors lit are hard to find in the desert! All five solutions appear on the ground level only for primes satisfying $p \equiv 1 \pmod{5}$ ($p = 11, 31, 41, \ldots$).
🏫 The 5-floor elevator (in the $\mathbb{F}_{11}$ building):
In $\mathbb{F}_{11}$, the solutions of $x^5 - 1 = 0$ are $x = 1, 3, 4, 5, 9$ — five lit floors.
The 5-floor elevator has 4 different movement patterns:
- Rotate by 1: $1 \to 3 \to 9 \to 5 \to 4 \to 1$
- Rotate by 2 (skip): $1 \to 9 \to 4 \to 3 \to 5 \to 1$
- Rotate by 3 (= reverse of 2)
- Rotate by 4 (= reverse of 1)
The more floors there are, the more the elevator movement patterns (elements of the Galois group) increase.
🏫 Summary so far: The relationship between the number of floors and the elevator
| Floors | Example equation | Elevator patterns | Galois group |
|---|---|---|---|
| 2 | $x^2 = 2$ | Back and forth (swap) | $\mathbb{Z}/2\mathbb{Z}$ (2 operations) |
| 3 | $x^3 = 1$ | Round-and-round + reverse | $\mathbb{Z}/2\mathbb{Z}$ (fix $x=1$, swap remaining 2) |
| 4 | $x^4 = 1$ | 1-step rotation, 2-step skip, reverse | $(\mathbb{Z}/4\mathbb{Z})^\times \cong \mathbb{Z}/2\mathbb{Z}$ (2 operations) |
| 5 | $x^5 = 1$ | 1-step, 2-step, 3-step, 4-step rotations | $(\mathbb{Z}/5\mathbb{Z})^\times \cong \mathbb{Z}/4\mathbb{Z}$ (4 operations) |
As the number of floors (solutions) increases, the variety of elevator patterns grows, and so does the number of choices for "which floors to connect by corridors."
🏫 Connection to AI: In Sheaf Neural Networks (SNNs), the elevator movement patterns (restriction maps) are learned automatically from data. Where mathematicians determine the elevator rules using precise tools like "Galois groups," "Frobenius elements," and "cyclic groups," AI finds the optimal elevator patterns by "minimizing a loss function" — a general-purpose optimization method.
🎓 For undergraduates: Cyclotomic fields and the structure of Galois groups
The solutions of $x^n - 1 = 0$ (the $n$th roots of unity) form $\mu_n = {\zeta \in \overline{\mathbb{Q}} \mid \zeta^n = 1}$, a cyclic group $\mathbb{Z}/n\mathbb{Z}$ under multiplication. Choosing a primitive $n$th root $\zeta_n = e^{2\pi i/n}$, we have $\mu_n = {1, \zeta_n, \zeta_n^2, \ldots, \zeta_n^{n-1}}$.
The Galois group of the cyclotomic field $\mathbb{Q}(\zeta_n)$ is:
$$\mathrm{Gal}(\mathbb{Q}(\zeta_n)/\mathbb{Q}) \cong (\mathbb{Z}/n\mathbb{Z})^\times$$
where $(\mathbb{Z}/n\mathbb{Z})^\times$ is the group of units modulo $n$. A Galois element $\sigma_k$ ($\gcd(k, n) = 1$) acts by $\sigma_k(\zeta_n) = \zeta_n^k$.
Concrete examples:
$n = 3$ (3 floors): $(\mathbb{Z}/3\mathbb{Z})^\times = {1, 2}$. $\sigma_1 = e$ (identity), $\sigma_2: \zeta_3 \mapsto \zeta_3^2$.
- In $\mathbb{F}_7$, $\zeta_3 \equiv 2$: $\sigma_2(2) = 2^2 = 4$. So $\sigma_2$ sends "floor 2 → floor 4."
- In $\mathbb{F}_{13}$, $\zeta_3 \equiv 3$: $\sigma_2(3) = 3^2 = 9$. So $\sigma_2$ sends "floor 3 → floor 9."
- In $\mathbb{F}5, \mathbb{F}{17}$, $\zeta_3 \notin \mathbb{F}_p$ (hiding in the hidden room).
$n = 4$ (4 floors): $(\mathbb{Z}/4\mathbb{Z})^\times = {1, 3}$. $\sigma_1 = e$, $\sigma_3: \zeta_4 \mapsto \zeta_4^3 = \zeta_4^{-1} = \overline{\zeta_4}$ (complex conjugation).
- In $\mathbb{F}_5$, $\zeta_4 \equiv 2$: $\sigma_3(2) = 2^3 = 8 \equiv 3$. "Floor 2 → floor 3, floor 3 → floor 2" (swap of $i$ and $-i$).
- In $\mathbb{F}_{17}$, $\zeta_4 \equiv 4$: $\sigma_3(4) = 4^3 = 64 \equiv 13$. "Floor 4 → floor 13, floor 13 → floor 4."
$n = 5$ (5 floors): $(\mathbb{Z}/5\mathbb{Z})^\times = {1, 2, 3, 4} \cong \mathbb{Z}/4\mathbb{Z}$. Four Galois elements, giving 4 elevator patterns.
📐 For juniors/seniors: $S_n$ Galois groups and non-commutativity
The Galois group $(\mathbb{Z}/n\mathbb{Z})^\times$ of the cyclotomic equation $x^n = 1$ is always abelian (commutative). That is, the order of elevator operations does not matter.
However, for general polynomials, the Galois group can be non-abelian.
Example: $x^3 - 2 = 0$
The splitting field of this equation is $\mathbb{Q}(\sqrt[3]{2}, \zeta_3)$, and the Galois group is the symmetric group $S_3$ (a non-abelian group of order 6).
The elements of $S_3$:
| Element | Action ($\alpha = \sqrt[3]{2}, \beta = \zeta_3\sqrt[3]{2}, \gamma = \zeta_3^2\sqrt[3]{2}$) | Building metaphor |
|---|---|---|
| $e$ (identity) | $\alpha \to \alpha, \beta \to \beta, \gamma \to \gamma$ | Elevator stopped |
| $(12)$ | $\alpha \leftrightarrow \beta, \gamma \to \gamma$ | Swap just 2 floors |
| $(13)$ | $\alpha \leftrightarrow \gamma, \beta \to \beta$ | Swap a different pair |
| $(23)$ | $\beta \leftrightarrow \gamma, \alpha \to \alpha$ | Swap yet another pair |
| $(123)$ | $\alpha \to \beta \to \gamma \to \alpha$ | Round-and-round forward |
| $(132)$ | $\alpha \to \gamma \to \beta \to \alpha$ | Round-and-round backward |
Meaning of non-commutativity: Taking elevator A and then elevator B may land you on a different floor than taking B first and then A. This never happened with 2-floor buildings ($\mathbb{Z}/2\mathbb{Z}$) or cyclotomic equations ($(\mathbb{Z}/n\mathbb{Z})^\times$).
Implication for AI: To handle sheaves corresponding to non-abelian Galois groups, the restriction maps $\mathcal{F}_{v \trianglelefteq e}$ must be chosen from non-abelian matrix groups (e.g., subgroups of $O(d)$). The SNNs of Bodnar et al. (2022) use the orthogonal group $O(d)$ for restriction maps, which is an example of a non-abelian group.
🔬 For graduate students: Galois groups of general degree-$n$ equations and $\ell$-adic representations
Let $f(x) \in \mathbb{Z}[x]$ be a monic irreducible polynomial of degree $n$, $K = \mathbb{Q}[x]/(f)$ a subfield of its splitting field, and $L/\mathbb{Q}$ the splitting field with $G = \mathrm{Gal}(L/\mathbb{Q}) \hookrightarrow S_n$.
The factorization type of $f \bmod p$ over $\mathbb{F}_p$ is determined by the Frobenius conjugacy class $[\mathrm{Frob}_p]$ in $G$ (Chebotarev density theorem).
$$f(x) \equiv \prod_{i=1}^{g_p} f_i(x) \pmod{p}, \quad \deg f_i = d_i$$
The conjugacy class of $\mathrm{Frob}_p$ has cycle type $(d_1, d_2, \ldots, g_p)$.
Concrete example ($n = 3$):
| Prime $p$ | Factorization of $x^3 - 2 \bmod p$ | Type of $\mathrm{Frob}_p$ | Conjugacy class in $G = S_3$ |
|---|---|---|---|
| $p = 3$ | $(x-2)(x^2+2x+1)$ | $(1, 2)$ | Transposition class |
| $p = 5$ | $(x-3)(x^2+3x+4)$ | $(1, 2)$ | Transposition class |
| $p = 7$ | Irreducible | $(3)$ | 3-cycle class |
| $p = 13$ | Irreducible | $(3)$ | 3-cycle class |
| $p = 17$ | $(x-8)(x^2+8x+13)$ | $(1, 2)$ | Transposition class |
By the Chebotarev density theorem, the density of primes corresponding to each conjugacy class of $G = S_3$ equals the class size divided by $|G|$: identity class density $1/6$, transposition class $3/6 = 1/2$, 3-cycle class $2/6 = 1/3$.
From the $\ell$-adic perspective: $G = S_3$ has three irreducible representations (trivial $\mathbf{1}$, sign $\epsilon$, standard 2-dimensional $\rho$), and the corresponding Artin $L$-functions are
$$L(s, \mathbf{1}) = \zeta(s), \quad L(s, \epsilon) = L(s, \chi_{-3}), \quad L(s, \rho)$$
each yielding a number-theoretically meaningful $L$-function. "Reading off each building's elevator pattern" as the "trace of the representation" and aggregating over all primes gives the $L$-function.
🏫🎓📐🔬 "Corridor connection rules" in AI — 6 rules and learning via loss functions
Let us reorganize the rules for connecting corridors when there are 3 or more solutions.
Rule 1: Correspondence by sign (for $x^2 = a$ only)
The simplest. Separate into the $+\sqrt{a}$ lineage and $-\sqrt{a}$ lineage. For 2-floor buildings only.
Rule 2: Correspondence by cyclic group generators (for $x^n = 1$)
Choose a primitive $n$th root $\zeta_n$ and label each floor as the "$k$th rotation position" via $\zeta_n^k$ ($k = 0, 1, \ldots, n-1$). Optimal for cyclotomic equations.
Rule 3: Correspondence by Galois group representations (general case)
Choose a representation $\rho: G \to GL(V)$ of the Galois group $G = \mathrm{Gal}(L/\mathbb{Q})$. Different choices of representation yield different "corridor connection patterns" (different $L$-functions). The most general and powerful method.
Rule 4: Correspondence by Frobenius conjugacy classes
Determine which conjugacy class of $G$ the Frobenius $\mathrm{Frob}_p$ belongs to for each prime $p = 3, 5, 7, 13, 17$. Primes whose Frobenius elements lie in the same conjugacy class have "the same type of light pattern."
Rule 5: $p$-adic lifting
Use Hensel's lemma to lift solutions in $\mathbb{F}_p$ to $\mathbb{Z}_p$ (the $p$-adic integer ring), constructing a $p$-adically continuous correspondence.
Rule 6 (AI-specific): Data-driven learning via loss functions
Rules 1–5 apply when the mathematical "correct answer" is known. But in real-world data, the "correct rule" is often unknown.
In Sheaf Neural Networks (SNNs):
- Assign a vector $x_v \in \mathbb{R}^d$ (stalk) to each node $v$ (building)
- Assign a restriction map $\mathcal{F}_{v \trianglelefteq e} \in \mathbb{R}^{d \times d}$ as a learnable parameter to each edge $(u, v)$ (corridor candidate)
-
Loss function: $\mathcal{L} = \sum_{\text{edges } e=(u,v)} |\mathcal{F}{u \trianglelefteq e} x_u - \mathcal{F}{v \trianglelefteq e} x_v|^2 + \lambda \cdot \mathcal{L}_{\text{task}}$
- First term: minimize "discrepancy" (consistency violation) between adjacent nodes
- Second term: task-specific loss (classification accuracy, prediction error, etc.)
- Update restriction map parameters by gradient descent, automatically discovering the optimal "corridor connection pattern"
In other words, AI solves the correspondence problem — which mathematicians handle by hand using "Galois groups," "Frobenius elements," "$p$-adic numbers," and "representation theory" — as a general-purpose optimization problem of "minimizing a loss function."
📐🔬 Mathematically interesting: When the restriction maps are constrained to the orthogonal group $O(d)$ (Barbero et al., 2022), the learned maps have the structure of a "connection Laplacian," corresponding to parallel transport in Riemannian geometry. This is structurally similar to the case where Galois representations are restricted to orthogonal representations (self-dual representations).
3.4 Corridors and Sheets — The Complete Picture of Correspondences
🏫 For high school students
To summarize everything so far, corridors are connected as follows:
- The sky castle (Galois group) holds the "prototype" of the rules for shuffling equation solutions
- The elevator in each building ($\mathbb{F}_3, \mathbb{F}5, \mathbb{F}7, \mathbb{F}{13}, \mathbb{F}{17}$) is how this prototype manifests in each world
- Solutions occupying the same position in the sky castle's elevator are connected by corridors
- A sequence of solutions connected by corridors is a "sheet"
- Sheet 1 (Corridor A): $\mathbb{F}7$ floor 3 ↔ $\mathbb{F}{17}$ floor 6 ($+\sqrt{2}$ lineage)
- Sheet 2 (Corridor B): $\mathbb{F}7$ floor 4 ↔ $\mathbb{F}{17}$ floor 11 ($-\sqrt{2}$ lineage)
- Buildings where solutions are visible on the ground floor ($\mathbb{F}7, \mathbb{F}{17}$): corridors connect ground floors horizontally
- Buildings where solutions exist only in hidden rooms ($\mathbb{F}_3, \mathbb{F}5, \mathbb{F}{13}$): corridors pass through the hidden rooms — invisible from the ground, but visible from above (from the sky castle)
The entire structure of "how the laser beams from the sky castle hit each building" and "how the corridors are connected" — that is the true identity of a "Sheaf."
🎓🏫 Learning "corridor connections" in AI
In AI, instead of humans determining corridor connection rules mathematically, the rules can be learned automatically from data.
Specifically, in Sheaf Neural Networks (SNNs):
- Assign a vector $x_v$ (stalk data) to each node (building)
- Assign a restriction map $\mathcal{F}_{v \trianglelefteq e}$ (translation rule) as a learnable parameter to each edge (corridor candidate)
- Set the loss function as "discrepancy between adjacent nodes": $|\mathcal{F}{u \trianglelefteq e} x_u - \mathcal{F}{v \trianglelefteq e} x_v|^2$
- Minimize this loss by gradient descent to learn the restriction maps (corridor connection rules)
AI thus automatically discovers "which floors of which buildings to connect by corridors so that the entire structure is maximally consistent" from data.
What mathematicians did by hand using "Galois groups," "Frobenius elements," and "$p$-adic numbers," AI performs approximately by the general-purpose method of "minimizing a loss function."
3.5 When the Sky Castle's Rules Descend to Each Building — 3, 4, 5, 6, and 7 Solutions
We now trace concretely, with actual numbers, how the sky castle (Galois group) prototype rules manifest as floor shuffles and elevator movements in each building
$\mathbb{F}_3$, $\mathbb{F}5$, $\mathbb{F}7$, $\mathbb{F}{13}$, $\mathbb{F}{17}$, $\ldots$
when the equation has 3, 4, 5, 6, or 7 solutions.
🏫🎓 Case 4: 6 solutions — $x^6 = 1$
Sky castle rules (prototype): Galois group is $(\mathbb{Z}/6\mathbb{Z})^\times = {1, 5}$. Only 2 rules.
🏫 "Only 2 rules for 6 solutions?" This may seem strange. It is because $6 = 2 \times 3$, and $(\mathbb{Z}/6\mathbb{Z})^\times$ has order $\varphi(6) = 2$. The 6th roots of unity are built from "square roots ($\pm 1$)" and "cube roots," and the only $\mathbb{Q}$-symmetry is "complex conjugation" ($\zeta_6 \to \overline{\zeta_6} = \zeta_6^5$).
- Rule "1" (identity): Do nothing.
- Rule "5" (conjugation): $\zeta_6 \to \zeta_6^5$.
Manifestation in the $\mathbb{F}_7$ building (all 6 floors lit: 1, 2, 3, 4, 5, 6):
In $\mathbb{F}_7$, $|\mathbb{F}_7^\times| = 6$, so every nonzero element is a 6th root of unity. The primitive 6th root is $\zeta_6 \equiv 3$ (order 6: $3^1=3, 3^2=2, 3^3=6, 3^4=4, 3^5=5, 3^6=1$).
Rule "5" ($\zeta_6 \to \zeta_6^5$) descends as:
| Floor ($\zeta_6^k$) | Value in $\mathbb{F}_7$ | Destination via Rule "5" ($\zeta_6^{5k \bmod 6}$) | Destination value |
|---|---|---|---|
| $\zeta_6^0$ | 1 | $\zeta_6^0$ | 1 (fixed) |
| $\zeta_6^1$ | 3 | $\zeta_6^5$ | 5 |
| $\zeta_6^2$ | 2 | $\zeta_6^4$ | 4 |
| $\zeta_6^3$ | 6 | $\zeta_6^3$ | 6 (fixed) |
| $\zeta_6^4$ | 4 | $\zeta_6^2$ | 2 |
| $\zeta_6^5$ | 5 | $\zeta_6^1$ | 3 |
So: floors 1 and 6 are fixed ($1$ and $-1$ are real, so conjugation does not move them), and 3 ↔ 5 and 2 ↔ 4 are swapped.
Manifestation in the $\mathbb{F}_{13}$ building (6 lit floors: 1, 3, 4, 9, 10, 12):
Primitive 6th root $\zeta_6 \equiv 4$ ($4^1=4, 4^2=3, 4^3=12, 4^4=9, 4^5=10, 4^6=1$).
Rule "5" descends as: 1 → 1 (fixed), 4 → 10, 3 → 9, 12 → 12 (fixed), 9 → 3, 10 → 4. Same pattern: 1 and 12 fixed, 4↔10, 3↔9 swapped.
🏫🎓 Case 5: 7 solutions — $x^7 = 1$
Sky castle rules (prototype): Galois group $(\mathbb{Z}/7\mathbb{Z})^\times = {1, 2, 3, 4, 5, 6}$. 6 rules.
$(\mathbb{Z}/7\mathbb{Z})^\times \cong \mathbb{Z}/6\mathbb{Z}$ (cyclic), so Rule "3" generates all rules ($3^1=3, 3^2=2, 3^3=6, 3^4=4, 3^5=5, 3^6=1$).
In $\mathbb{F}_3, \mathbb{F}5, \mathbb{F}7, \mathbb{F}{13}, \mathbb{F}{17}$, the only lit ground floor is $x = 1$. All 7 solutions appear on the ground level only for $p \equiv 1 \pmod{7}$ ($p = 29, 43, 71, \ldots$).
Manifestation in the $\mathbb{F}_{29}$ building (7 lit floors: 1, 7, 16, 20, 23, 24, 25):
Primitive 7th root $\zeta_7 \equiv 7$ ($7^1=7, 7^2=20, 7^3=24, 7^4=23, 7^5=16, 7^6=25, 7^7=1$).
Rule "2" ($\zeta_7 \to \zeta_7^2$): $1 \to 1, 7 \to 20, 20 \to 23, 24 \to 25, 23 \to 7, 16 \to 24, 25 \to 16$
Rule "6" ($\zeta_7 \to \zeta_7^{-1}$, reversal): $1 \to 1, 7 \to 25, 20 \to 16, 24 \to 23, 23 \to 24, 16 \to 20, 25 \to 7$
🏫 Key point: With 7 floors, there are 6 types of elevator movement. Each corresponds to "rotating by how many steps." And all 6 can be generated by repeating a single "basic rule" (Rule "3").
🏫 Grand summary: Number of solutions vs. sky castle rules
| Solutions | Equation | Rules (excl. identity) | Galois group |
|---|---|---|---|
| 2 | $x^2 = 2$ | 1 | $\mathbb{Z}/2\mathbb{Z}$ |
| 3 | $x^3 = 1$ | 1 | $\mathbb{Z}/2\mathbb{Z}$ (fix $x=1$, swap remaining 2) |
| 4 | $x^4 = 1$ | 1 | $\mathbb{Z}/2\mathbb{Z}$ |
| 5 | $x^5 = 1$ | 3 | $\mathbb{Z}/4\mathbb{Z}$ |
| 6 | $x^6 = 1$ | 1 | $\mathbb{Z}/2\mathbb{Z}$ |
| 7 | $x^7 = 1$ | 5 | $\mathbb{Z}/6\mathbb{Z}$ |
📐 For juniors/seniors: Why the number of rules is irregular
The "number of rules" is $\varphi(n) - 1$ (Euler's totient function). The irregularity reflects $\varphi(n)$'s dependence on the prime factorization of $n$: $\varphi(n) = n \prod_{p \mid n} (1 - 1/p)$.
🔬 For graduate students: Non-abelian Galois groups — the case of $x^n - 2$
For $x^n - 2 = 0$ (as opposed to the cyclotomic $x^n = 1$), the Galois group becomes non-abelian:
| Equation | Galois group | Order | Abelian? |
|---|---|---|---|
| $x^3 - 2 = 0$ | $S_3$ (symmetric group) | 6 | No |
| $x^4 - 2 = 0$ | $D_4$ (dihedral group) | 8 | No |
| $x^5 - 2 = 0$ | $F_{20}$ (Frobenius group) | 20 | No |
The use of the orthogonal group $O(d)$ for restriction maps in SNNs is partly motivated by the need to naturally represent such non-commutativity.
3.6 History: What Hard Problems Did Mathematicians Solve with This Landscape?
🏫🎓 For all levels
| Year | Mathematician | Achievement | Desert metaphor |
|---|---|---|---|
| 1940s | André Weil | Formulation of the Weil conjectures | Predicted: "There is a deep regularity in how each building lights up" |
| 1960s | Alexander Grothendieck | Creation of scheme theory and étale cohomology | Created the desert building landscape itself |
| 1974 | Pierre Deligne | Proof of the Weil conjectures | Proved the prediction about the intensity of each building's light |
| 1995 | Andrew Wiles | Proof of Fermat's Last Theorem | Proved a miraculous correspondence between "building light patterns" and "sky castle symmetries" |
The core insight of scheme theory: Not "solve $x^2 - 2 = 0$," but "view the totality of what $x^2 - 2 = 0$ reveals across all number worlds ($\mathbb{F}_3, \mathbb{F}5, \mathbb{F}7, \mathbb{F}{13}, \mathbb{F}{17}, \ldots, \mathbb{Q}, \mathbb{R}, \mathbb{C}$) as a single geometric object (a scheme)."
3.7 Why Is the Étale Topology "Continuous"? — The Magic That Connects Discrete Worlds
🏫 For high school students: "Discrete" yet "connected"?
Between the $\mathbb{F}_3$ building and the $\mathbb{F}_7$ building, there is no $\mathbb{F}4$ or $\mathbb{F}{5.5}$ building. Buildings stand only at prime-number positions. So how can we claim they are "connected by corridors"?
Answer: The buildings are not connected to each other directly — they are connected via the sky castle.
The sky castle (absolute Galois group $\mathrm{Gal}(\overline{\mathbb{Q}}/\mathbb{Q})$) floats above every building. The castle fires "lasers" (Frobenius elements) at each building. The pattern differs from building to building, but all lasers originate from the same castle. The castle is a single connected structure.
🎓 For undergraduates: Why is $\mathrm{Spec},\mathbb{Z}$ connected?
$\mathrm{Spec},\mathbb{Z}$ has points $(2), (3), (5), (7), (11), (13), (17), \ldots$ and the zero ideal $(0)$. In the Zariski topology, closed sets are $V(n) = {(p) \mid p \text{ divides } n}$ (finite) and $\mathrm{Spec},\mathbb{Z}$ itself.
The zero ideal $(0)$ is contained in the closure of every point. Its closure $\overline{{(0)}} = \mathrm{Spec},\mathbb{Z}$. Because this generic point is "the shadow of every building," $\mathrm{Spec},\mathbb{Z}$ is connected.
📐 For juniors/seniors: The profinite topology of the absolute Galois group
The absolute Galois group $G_\mathbb{Q} = \mathrm{Gal}(\overline{\mathbb{Q}}/\mathbb{Q})$ is a profinite group (projective limit of finite groups):
$$G_\mathbb{Q} = \varprojlim_{K/\mathbb{Q} \text{ finite Galois}} \mathrm{Gal}(K/\mathbb{Q})$$
It carries the profinite (Krull) topology: compact, Hausdorff, totally disconnected.
"Totally disconnected yet continuous?"
The "continuity" of a profinite group is different from that of the real line.
A continuous homomorphism
A continuous homomorphism is given by
$\rho: G_{\mathbb{Q}} \to \mathrm{GL}n(\mathbb{Q}\ell)$
means continuity with respect to the profinite topology on $G_{\mathbb{Q}}$ and the $\ell$-adic topology on $\mathrm{GL}n(\mathbb{Q}\ell)$.
Concretely, the image of $\rho$ lies in a compact subgroup of $GL_n(\mathbb{Q}\ell)$ (e.g., a conjugate of $GL_n(\mathbb{Z}\ell)$), and each reduction $\rho \bmod \ell^n: G_\mathbb{Q} \to GL_n(\mathbb{Z}/\ell^n\mathbb{Z})$ is a homomorphism to a finite group (with open kernel).
The Frobenius elements $\mathrm{Frob}p$ for $p = 3, 5, 7, 13, 17, \ldots$ are dense in $G\mathbb{Q}$ (Chebotarev density theorem). In every open neighborhood of $G_\mathbb{Q}$, some prime's Frobenius resides.
This is the true identity of "discrete yet connected." The buildings (primes) are discrete, but the corresponding Frobenius elements are dense in $G_\mathbb{Q}$ (the sky castle), and $G_\mathbb{Q}$ is compact in the profinite topology. "Inside the castle, Frobenius elements are everywhere."
🔬 For graduate students: Equivalence of étale sheaves and Galois representations
The category of lisse $\ell$-adic étale sheaves on $X = \mathrm{Spec},\mathbb{Z}[1/S]$ is equivalent to the category of continuous $\ell$-adic representations of $\pi_1^{\text{ét}}(X, \bar{x})$:
$$\mathrm{Loc}{\mathbb{Q}\ell}(X_{\text{ét}}) \simeq \mathrm{Rep}{\mathbb{Q}\ell}^{\mathrm{cont}}(\pi_1^{\text{ét}}(X))$$
For $X = \mathrm{Spec},\mathbb{Z}[1/S]$, $\pi_1^{\text{ét}}(X) \cong G_{\mathbb{Q}, S}$ (the quotient of $G_\mathbb{Q}$ unramified outside $S$).
This equivalence is the mathematically precise statement that "étale sheaves" and "Galois representations" are two faces of the same coin. The "continuity" of a sheaf is interpreted as "continuity of the corresponding Galois representation with respect to the profinite topology."
3.7b What Are $p$-adic Numbers? — The Deep Foundations Beneath the Buildings
🏫 For high school students: Redefining "closeness"
In the ordinary world of numbers, $1$ and $1.001$ are "close." $1$ and $1{,}000{,}001$ are "far apart." The smaller the absolute difference, the "closer."
In the $p$-adic world, "closeness" is completely redefined.
In the world of 7-adic numbers ($p = 7$): $1$ and $1 + 7 = 8$ are "close." $1$ and $1 + 7^2 = 50$ are even "closer." $1$ and $1 + 7^{10} = 282{,}475{,}250$ are extremely close.
Why? In 7-adic "closeness," the more times the difference is divisible by 7, the closer two numbers are.
Building metaphor: The $\mathbb{F}_7$ building (location 7) has not just 7 above-ground floors, but infinite underground floors. Underground level 1 is $\mathbb{Z}/7^2\mathbb{Z}$ (49 rooms), level 2 is $\mathbb{Z}/7^3\mathbb{Z}$ (343 rooms), and so on. The entire underground is the 7-adic integer ring $\mathbb{Z}_7$. Adding "reciprocal floors" gives the 7-adic number field $\mathbb{Q}_7$.
🎓 For undergraduates: The $p$-adic integer ring and "lifting"
$\mathbb{Z}_p$ is defined as the projective limit $\mathbb{Z}p = \varprojlim{n} \mathbb{Z}/p^n\mathbb{Z}$. Its elements have "$p$-adic expansions" $a_0 + a_1 p + a_2 p^2 + \cdots$ ($0 \leq a_i < p$).
Role in this article — "lifting": Lifting an $\mathbb{F}_p$ solution to $\mathbb{Z}_p$ (Hensel's lemma) was "Rule 5" for corridor connections.
Example: Lift $x = 3$ (solution of $x^2 = 2$ in $\mathbb{F}_7$) to $\mathbb{Z}_7$:
- $3^2 = 9 = 7 + 2$, so $3^2 \equiv 2 \pmod{7}$ ✓ (ground floor)
- By Hensel's lemma: $x \equiv 10 \pmod{49}$, and $10^2 = 100 \equiv 2 \pmod{49}$ ✓ (underground level 1)
- Continuing: $x \equiv 108 \pmod{343}$, and $108^2 = 11664 \equiv 2 \pmod{343}$ ✓ (underground level 2)
The limit of this infinite lifting gives the 7-adic expansion of $\sqrt{2}$. Indeed $\sqrt{2} \in \mathbb{Q}_7$ (the square root of 2 exists as a 7-adic number!).
📐 For juniors/seniors: Local fields and the global-local principle
$\mathbb{Q}_p$ is the fraction field of $\mathbb{Z}_p$ — the completion of $\mathbb{Q}$ with respect to the $p$-adic valuation. $\mathbb{Q}_p$ is a local field.
In arithmetic geometry, the "global-local principle" is central:
$$\text{Global problem (over $\mathbb{Q}$)} \longleftrightarrow \text{Collection of local problems (over each $\mathbb{Q}_p$ and $\mathbb{R}$)}$$
In $\ell$-adic cohomology $H^i_{\text{ét}}(X, \mathbb{Q}\ell)$, $\ell$ is a fixed prime and $\mathbb{Q}\ell$ is the coefficient field. Fixing $\ell$ and varying $p$ gives a unified $\ell$-adic description of "every building's light pattern."
🔬 For graduate students: $p$-adic Hodge theory and modern developments
$p$-adic Hodge theory (Fontaine, Faltings, Bhatt-Morrow-Scholze) constructs a "mysterious isomorphism" between $H^i_{\text{ét}}(X_{\overline{\mathbb{Q}p}}, \mathbb{Q}p)$ and $H^i{\mathrm{dR}}(X/\mathbb{Q}p)$ via Fontaine's period rings $B{\mathrm{dR}}, B{\mathrm{cris}}$.
Perfectoid spaces (Scholze, 2012) unify arbitrary characteristic and characteristic $p$ via "tilting equivalence."
🏫🎓📐🔬 AI applications of $p$-adic numbers
Application 1: $p$-adic neural networks. The ultrametric property $|x + y|_p \leq \max(|x|_p, |y|_p)$ of $\mathbb{Q}_p$ functions as a natural distance for hierarchical data (tree structures, phylogenetic trees, taxonomies).
Application 2: Hensel's lemma and local-global consistency in GNNs. Hensel's lemma gives conditions for local solutions ($\bmod p$) to lift to global solutions ($\mathbb{Z}_p$). This is structurally similar to the condition for locally consistent node features to be globally consistent in GNNs. The sheaf condition (gluing condition) in SNNs is precisely the discrete version of this "local-to-global lifting."
Application 3: $\ell$-adic cohomology and multimodal learning. Viewing information from all primes $p$ through a fixed "lens" $\ell$ mirrors multimodal learning, where different modalities (image, text, audio) are viewed through a fixed "common representation space."
3.8 How Étale Sheaves Solved Hard Problems in Number Theory
🏫 For high school students: Decoding "light patterns" solved great theorems
Analogy: Code-breaking. Think of each building's light pattern as one character of an encrypted message. The full message runs infinitely. Étale sheaves are the tool that describes the "grammar" of this code.
🎓🏫 Example 1: The Weil conjectures — "Light intensity follows a law"
Problem (Weil, 1949): How does the number of solutions $N_q$ of an equation over $\mathbb{F}_q$ grow as $q$ increases?
Solution (Deligne, 1974): The eigenvalues $\alpha_i$ of the Frobenius acting on $H^i_{\text{ét}}(X_{\overline{\mathbb{F}q}}, \mathbb{Q}\ell)$ satisfy $|\alpha_i| = q^{i/2}$.
In the desert metaphor: there is a strict upper bound on the "light intensity" (error in solution count) at each building, relative to the building's height (size of $p$).
🎓🏫 Example 2: Fermat's Last Theorem — "A miraculous match between two castles"
Problem (Fermat, 1637): For $n \geq 3$, $x^n + y^n = z^n$ has no positive integer solutions.
Solution (Wiles, 1995): Two entirely different étale sheaves — one from an elliptic curve, one from a modular form — produce identical light patterns at every building $\mathbb{F}_3, \mathbb{F}5, \mathbb{F}7, \mathbb{F}{13}, \mathbb{F}{17}, \ldots$ This "miraculous match" (the Taniyama-Shimura conjecture, proved by Wiles for the semistable case) implies Fermat's Last Theorem.
📐 For juniors/seniors: $\ell$-adic representations and automorphic representations
The $\ell$-adic Tate module $T_\ell(E) = \varprojlim E[\ell^n]$ of a rational elliptic curve $E/\mathbb{Q}$ defines a 2-dimensional $\ell$-adic representation $\rho_{E, \ell}: G_\mathbb{Q} \to GL_2(\mathbb{Z}_\ell)$. The characteristic polynomial of $\mathrm{Frob}p$ is $\det(I - \rho{E,\ell}(\mathrm{Frob}_p) t) = 1 - a_p t + p t^2$, where $a_p = p + 1 - #E(\mathbb{F}_p)$.
Taniyama-Shimura-Wiles theorem: The sequence ${a_p}_p$ matches the Fourier coefficients of a weight-2 cusp form $f = \sum a_n q^n$.
🔬 For graduate students: The Langlands program
Wiles's theorem is part of the Langlands program:
$${\text{$n$-dimensional $\ell$-adic reps of $G_\mathbb{Q}$}} \longleftrightarrow {\text{Automorphic reps of $GL_n(\mathbb{A}_\mathbb{Q})$}}$$
Étale cohomology is the primary tool for geometrically constructing the left side. Shimura varieties $\mathrm{Sh}(G, X)$ bridge both sides via their étale cohomology.
3.9 How Discoveries in Number Theory Apply to AI
🏫🎓 For all levels: Five bridges from number theory to AI
Bridge 1: "All buildings' light patterns follow a law" (Weil conjectures) → Generalization theory
Both Weil's eigenvalue bounds and Watanabe's RLCT generalization bounds use the same algebraic-geometric tools (zeta functions, singularity resolution).
Bridge 2: "Two castles match" (Taniyama-Shimura) → Domain adaptation
When two data distributions share "the same latent structure," knowledge transfer becomes possible — structurally the same question as matching Galois representations.
Bridge 3: Chebotarev density theorem → Graph sampling theory
"What fraction of buildings have each type of light pattern" is structurally similar to "how many nodes to sample for accurate graph-level statistics."
Bridge 4: Étale covers → Data augmentation
Étale morphisms (locally isomorphic but globally non-trivial) are a mathematical generalization of data augmentation.
Bridge 5: Profinite structure of $G_\mathbb{Q}$ → Multiscale learning
The projective limit structure "coarse quotient → finer quotient" corresponds to progressive/multiscale training.
Chapter 4: How AI Is Incorporating Scheme Theory — A Complete Guide to the Literature
This chapter explains how algebraic geometry, algebraic topology, and category theory are being incorporated into AI, with a complete list of key papers and industrial applications.
4.1 The Lineage of Sheaf Neural Networks (SNNs)
🏫 For high school students
When AI learns from "graph" data (sets of points and lines), it traditionally used a simple method: "gather information from neighboring points, then update your own information" (GNNs).
But this method had a weakness: if you gather too much information, everyone ends up with the same information (oversmoothing). It is like a classroom where everyone whispers to each other until, eventually, everyone believes the same rumor.
SNNs solved this problem using the mathematics of "desert buildings." Each node (building) gets its own vector space (stalk), and when sending information to a neighbor, it passes through a "translation rule" (restriction map).
🎓 For undergraduates
Bodnar et al. (2022) "Neural Sheaf Diffusion" (NeurIPS 2022) introduced cellular sheaves on graphs $G = (V, E)$:
- Assign a vector space $\mathcal{F}(v) \cong \mathbb{R}^d$ (stalk) to each node $v$
- Assign a linear map $\mathcal{F}_{v \trianglelefteq e}: \mathcal{F}(v) \to \mathcal{F}(e)$ (restriction map) to each edge $e = (u, v)$
- Construct the sheaf Laplacian $\Delta_\mathcal{F}$
The coboundary operator $\delta_\mathcal{F}$ computes "discrepancy" between adjacent nodes:
$$(\delta_\mathcal{F} x)e = \mathcal{F}{u \trianglelefteq e} x_u - \mathcal{F}_{v \trianglelefteq e} x_v$$
The sheaf Laplacian $\Delta_\mathcal{F} = \delta_\mathcal{F}^\top \delta_\mathcal{F}$ governs the diffusion equation $\dot{X} = -\Delta_\mathcal{F} X$.
📐 For juniors/seniors
| Algebraic geometry sheaf | SNN implementation | PyTorch code |
|---|---|---|
| Base space $X$ | Graph $G = (V, E)$ | edge_index |
| Stalk $\mathcal{F}_x$ at point $x$ | Feature space $\mathbb{R}^d$ at node $v$ |
x[v] ($d$-dim vector) |
| Restriction map $\rho_{U,V}$ | Linear map $W_{uv}$ on edge $(u,v)$ | nn.Linear(d, d) |
| Section $s \in \Gamma(U, \mathcal{F})$ | Consistent node feature assignment | Optimized x after loss minimization |
| Coboundary operator $\delta$ | Discrepancy computation | F_ue @ x[u] - F_ve @ x[v] |
Important note: The "sheaf" in Bodnar et al. (2022) is a cellular sheaf, belonging to algebraic topology — not the étale sheaf of algebraic geometry. They differ in abstraction level, but share the structure of "assigning a vector space to each point and connecting adjacent spaces by linear maps."
🔬 For graduate students
Bodnar et al.'s cellular sheaf is a covariant functor on the face-relation poset of the graph, assigning vector spaces $\mathcal{F}(v), \mathcal{F}(e)$ and restriction maps $\mathcal{F}_{v \trianglelefteq e}: \mathcal{F}(v) \to \mathcal{F}(e)$ (low-dimensional cell to high-dimensional cell). This differs from the general categorical definition of a presheaf as a contravariant functor $\mathcal{C}^{\mathrm{op}} \to \mathbf{Set}$.
4.2 Sheaf Attention Networks — Fusing Sheaves with Transformer Attention
🏫 For high school students
Have you heard of "Transformers"? They are the core technology behind ChatGPT and Gemini. The secret of Transformers is the "attention" mechanism — an automatic computation of "which word should I focus on right now?"
Sheaf Attention Networks (SheafANs) incorporate the "desert building" mathematics into this attention mechanism. Ordinary attention computes only "how much to focus" (a scalar weight), but SheafAN learns "how to translate while focusing" (a matrix-valued weight = restriction map).
🎓 For undergraduates
Barbero et al. (2022b) "Sheaf Attention Networks" (NeurIPS 2022 Workshop NeurReps, Oral) extended GAT's attention mechanism to cellular sheaves.
Standard GAT attention:
$$\alpha_{ij} = \frac{\exp(\mathrm{LeakyReLU}(\mathbf{a}^\top [\mathbf{W}h_i | \mathbf{W}h_j]))}{\sum_{k \in \mathcal{N}(i)} \exp(\mathrm{LeakyReLU}(\mathbf{a}^\top [\mathbf{W}h_i | \mathbf{W}h_k]))}$$
Here $\alpha_{ij}$ is a scalar attention weight.
SheafAN attention: Extends GAT's scalar weight $\alpha_{ij}$ to matrix-valued restriction maps $\mathcal{F}_{v \trianglelefteq e} \in \mathbb{R}^{d \times d}$.
Key result: When the sheaf is trivial ($d = 1$, restriction maps are identity), SheafAN reduces to standard GAT. That is, GAT is a special case of SheafAN.
📐 For juniors/seniors
Attentive Sheaf Diffusion (ASD) equation:
$$\frac{\partial X}{\partial t} = -\sigma(\Delta_\mathcal{F}(X)) X$$
The difference from standard Neural Sheaf Diffusion ($\dot{X} = -\Delta_\mathcal{F} X$) is that $\Delta_\mathcal{F}$ varies dynamically based on attention weights.
| GAT | SheafAN | |
|---|---|---|
| Attention weights | Scalar $\alpha_{ij} \in \mathbb{R}$ | Matrix $\mathcal{F}_{v \trianglelefteq e} \in \mathbb{R}^{d \times d}$ |
| Information propagation | $h_i' = \sum_j \alpha_{ij} \mathbf{W} h_j$ | $h_i' = \sum_j \mathcal{F}{i \trianglelefteq e}^\top (\mathcal{F}{i \trianglelefteq e} x_i - \mathcal{F}_{j \trianglelefteq e} x_j)$ |
| Oversmoothing | Occurs | Suppressed by sheaf structure |
| Stalk dimension | 1 (scalar) | $d$ (vector) |
4.3 Watanabe's Singular Learning Theory (SLT) — Algebraic Geometry Solves the "Mystery of Generalization"
🏫 For high school students
Deep learning has a great mystery: "Why doesn't it overfit when there are more parameters than data points?"
Sumio Watanabe (Tokyo Institute of Technology) answered this in 2009. The key is "singularities" in algebraic varieties.
🎓 For undergraduates
The loss landscape $L(w)$ of a neural network, viewed over parameter space, has its minimum set $W_0 = {w \mid L(w) = \min L}$ forming an algebraic variety — not a single point, but a complex geometric shape with singularities.
Watanabe quantified the "sharpness" of these singularities using the Real Log Canonical Threshold (RLCT) $\lambda$:
- Regular models (linear regression, etc.): $\lambda = d/2$
- Singular models (neural networks): $\lambda < d/2$ — effective dimension is reduced by singularities
Smaller $\lambda$ = model behaves more "simply" = harder to overfit. Singularities automatically implement Occam's razor.
📐 For juniors/seniors
RLCT definition: The zeta function $\zeta(s) = \int_W K(w)^s \varphi(w) , dw$ has its maximum pole at $-\lambda$.
Hironaka's resolution of singularities (1964, Fields Medal): Any algebraic variety $W_0$ can be blown up to normal crossing form $K(g(\tilde{w})) = \tilde{w}_1^{k_1} \cdots \tilde{w}_d^{k_d}$, from which $\lambda$ is computed.
Watanabe's free energy formula:
$$F_n = nL_n(w_0) + \lambda \log n - (m - 1) \log \log n + O_p(1)$$
For regular models ($\lambda = d/2$), this reduces to BIC ($F_n \approx nL_n + \frac{d}{2} \log n$). For singular models, $\lambda < d/2$, giving a smaller penalty → harder to overfit.
4.4 Complete List of Key Papers and Industrial Applications
Sheaf-based AI models
| Paper | Year | Venue | Content | Industrial application |
|---|---|---|---|---|
| Hansen & Gebhart, "Sheaf Neural Networks" | 2020 | NeurIPS Workshop TDA | First introduction of cellular sheaves to GNNs | Proof of concept |
| Bodnar et al., "Neural Sheaf Diffusion" | 2022 | NeurIPS 2022 | First method to learn sheaf Laplacian from data; solved heterophily and oversmoothing | Recommendation systems, social networks |
| Barbero et al., "Sheaf NNs with Connection Laplacians" | 2022 | ICML Workshop TAG-ML | Riemannian geometry connection Laplacian for SNNs | Molecular property prediction |
| Barbero et al., "Sheaf Attention Networks" | 2022 | NeurIPS 2022 NeurReps (Oral) | Extended GAT attention to sheaves; proved GAT is a special case | Heterogeneous graph analysis |
| Duta et al., "Sheaf Hypergraph Networks" | 2023 | NeurIPS 2023 | Extended sheaf framework to hypergraphs | Protein interactions, co-authorship |
| Bundle Neural Networks | 2025 | ICLR 2025 | Flat vector bundles for computational efficiency | Large-scale graph learning |
| Copresheaf Topological Neural Networks | 2025 | NeurIPS 2025 | Copresheaf unification of sheaves, GNNs, Transformers | Universal framework |
Algebraic geometry-based AI models
| Paper | Year | Content | Application |
|---|---|---|---|
| Watanabe, Algebraic Geometry and Statistical Learning Theory | 2009 | Foundational theory of Bayesian learning via singularity resolution (RLCT) | Overfitting risk quantification |
| Wei, Murfet et al., "Deep Learning is Singular, and That's Good" | 2022 | Empirical demonstration that singularity causes generalization | Model selection |
| Hoogland & Murfet, "Phase Transitions in Neural Networks" | 2022–2024 | Phase transitions in learning explained by RLCT | Learning stability monitoring |
Category theory-based AI models
| Paper | Year | Venue | Content | Application |
|---|---|---|---|---|
| Gavranović et al., "Categorical Deep Learning" | 2024 | ICML 2024 | Monad algebras unify CNN/RNN/GNN/Transformer | Automated architecture design |
| Gavranović, "Fundamental Components of Deep Learning" (PhD) | 2024 | Strathclyde | 2-category of parametric maps unifies forward/backward pass | Theoretical foundation |
Geometric deep learning
| Paper | Year | Content | Application |
|---|---|---|---|
| Bronstein et al., "Geometric Deep Learning" | 2021 | Survey unifying CNN/GNN/Transformer via "symmetry" | Foundation for AlphaFold |
| Cohen & Welling, "Group Equivariant CNNs" | 2016 | Group-theoretic invariance in CNNs | Medical imaging (rotation-invariant tumor detection) |
4.5 Detailed Industrial Applications
🏫🎓📐🔬 For all levels
1. Drug discovery: Molecular graphs with atoms as nodes and bonds as edges; each atom's "chemical environment" is a stalk; restriction maps reflect bond types.
2. Financial fraud detection: Transaction network with account features as stalks; sheaf Laplacian spectral anomalies indicate fraud.
3. Supply chain optimization: Inventory/demand vectors as stalks at each node (factory, warehouse, store); restriction maps reflect logistics costs.
4. Social network analysis: When user relationships are heterophilic, standard GNNs degrade. SheafAN resolves this with sheaf structure.
5. Natural language processing (LLMs): Reinterpreting Transformer self-attention in the sheaf context; each token's "semantic space" as a stalk, attention weights as restriction maps.
6. Recommendation systems: Bipartite graphs of users and items with sheaf structure; restriction maps translate between "user preference space" and "item feature space."
Chapter 5: Watanabe's Singular Learning Theory (SLT) in Detail
5.1 Why This Theory Matters
🏫 For high school students
When you study for a test, which is better: "memorizing the textbook" or "understanding and being able to apply"?
Deep learning AI has enough memory (more parameters than data points) to memorize the textbook. Yet it can correctly answer questions it has never seen — as if it "understands." Why? Watanabe's theory answers this.
🎓 For undergraduates
In standard statistics, the Fisher information matrix is assumed non-degenerate. But in neural networks, it degenerates (rank-deficient) due to symmetries (e.g., permuting hidden neurons does not change the output). Watanabe controls these singularities using Hironaka's resolution of singularities from algebraic geometry.
📐 For juniors/seniors
Regular models: Loss landscape is a parabola ($L(w) \approx \frac{1}{2}(w - w_0)^\top H (w - w_0)$, positive-definite Hessian).
Singular models: Loss landscape is a flat valley (degenerate Hessian). Regular models are "buildings on sharp peaks," singular models are "buildings on vast plateaus."
Wide plateau → small RLCT $\lambda$ → model behaves "simply" → harder to overfit.
🔬 For graduate students
WBIC (Widely Applicable BIC): $\mathrm{WBIC} = nL_n + \lambda \log n$ — generalizes BIC for singular models. RLCT $\lambda$ can be numerically estimated via MCMC sampling.
Phase transitions (Hoogland & Murfet, 2022–): As training progresses, parameters transition between "phases" (regions with different RLCTs). "Early training: high $\lambda$ (complex phase) → later training: low $\lambda$ (simple phase)" may be the source of generalization.
Chapter 6: The Connection with Category Theory
🏫 For high school students
Category theory is the "ultimate abstraction" of mathematics. It ignores the contents of "numbers" or "shapes" and looks only at "arrows (transformation rules) from A to B."
If we view an AI model as "an arrow transforming data $A$ into prediction $B$," then all AI models can be uniformly treated as arrows in a "category."
🎓 For undergraduates
A category $\mathcal{C}$ consists of objects and morphisms between them. Morphisms compose, and identity morphisms exist.
AI model $f: X \to Y$ is a morphism from input space $X$ to output space $Y$.
📐 For juniors/seniors
Gavranović et al. (2024) formulates neural networks as monad algebra homomorphisms in a 2-category of parametric maps.
Key ideas:
- Neural net $f: A \to B$ as a monad algebra homomorphism between $(A, a)$ and $(B, b)$
- Geometric deep learning (Bronstein et al., 2021) is recovered when the monad is the group action monad $M(X) = G \times X$
- Equivariant neural nets ($f(g \cdot x) = g \cdot f(x)$) are precisely monad algebra homomorphisms
🔬 For graduate students
Bodnar et al.'s cellular sheaf is a covariant functor $\mathcal{F}: \mathrm{Face}(G) \to \mathbf{Vect}_\mathbb{R}$ on the face-relation poset. In general category theory, a presheaf is a contravariant functor $\mathcal{C}^{\mathrm{op}} \to \mathbf{Set}$; the cellular sheaf reverses the direction of restriction maps.
The SNN learning process can be interpreted as a neural-network version of sheafification — correcting gluing-condition violations. Loss minimization $|\delta_\mathcal{F} x|^2$ is precisely the process of "correcting inter-node consistency violations."
Chapter 7: References
Algebraic geometry / Arithmetic geometry
- R. Hartshorne, Algebraic Geometry, GTM 52, Springer, 1977.
- J.S. Milne, Étale Cohomology, Princeton Math. Series 33, 1980.
- K. Ueno, Introduction to Algebraic Geometry (in Japanese), Iwanami, 1996.
Geometric deep learning
- M.M. Bronstein et al., "Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges," arXiv:2104.13478, 2021.
- J. Hansen & T. Gebhart, "Sheaf Neural Networks," NeurIPS Workshop TDA, 2020.
- C. Bodnar et al., "Neural Sheaf Diffusion," NeurIPS 2022.
- F. Barbero et al., "Sheaf NNs with Connection Laplacians," ICML Workshop TAG-ML, 2022.
Categorical deep learning
- B. Gavranović et al., "Categorical Deep Learning," ICML 2024.
- B. Gavranović, "Fundamental Components of Deep Learning," PhD Thesis, Strathclyde, 2024.
Singular learning theory
- S. Watanabe, Algebraic Geometry and Statistical Learning Theory, Cambridge, 2009.
- S. Watanabe, Mathematical Theory of Bayesian Statistics, CRC Press, 2018.
- S. Wei et al., "Deep Learning is Singular, and That's Good," IEEE TNNLS, 2022.
Sheaf-based AI
- F. Barbero et al., "Sheaf Attention Networks," NeurIPS 2022 Workshop NeurReps (Oral).
- I. Duta et al., "Sheaf Hypergraph Networks," NeurIPS 2023.
- M. Hajij et al., "Copresheaf Topological Neural Networks," NeurIPS 2025.
Category theory
- S. Mac Lane, Categories for the Working Mathematician, 2nd ed., GTM 5, Springer, 1998.
Chapter 8: Study Roadmap
🏫 From high school to understanding the graduate-level explanations
Mathematics roadmap (estimated 3–5 years)
High school math
↓
Linear algebra (matrices, eigenvalues, vector spaces)
↓
Group theory basics (groups, subgroups, homomorphisms)
↓
Ring theory basics (rings, ideals, quotient rings)
↓
Point-set topology (open sets, continuous maps, compactness)
↓
Manifold theory (differential manifolds, tangent spaces)
↓
Homological algebra (chain complexes, exact sequences, derived functors)
↓
Algebraic geometry (schemes, sheaves, cohomology)
→ Recommended: Hartshorne, "Algebraic Geometry"
↓
Étale cohomology
→ Recommended: Milne, "Étale Cohomology"
AI roadmap (estimated 1–2 years)
Python basics
↓
NumPy / Pandas / Matplotlib
↓
Machine learning fundamentals (scikit-learn)
↓
Deep learning (PyTorch)
↓
Graph Neural Networks (PyG)
↓
Sheaf Neural Networks
→ Recommended: Bodnar et al. (2022) paper + code
↓
Geometric deep learning
→ Recommended: Bronstein et al. (2021) survey
🎓 For undergraduates (general education)
- Study group theory and ring theory
- Study point-set topology
- Implement GNNs in PyTorch (PyTorch Geometric tutorials)
- Read Bodnar et al. (2022) while running the code
📐 For juniors/seniors
- Learn homological algebra (exact sequences, derived functors)
- Read Hartshorne Chapter II (Schemes)
- Learn cellular sheaf theory from Hansen & Ghrist (2019)
- Read Part I of Gavranović (2024) PhD thesis
Chapter 9: Outlook — The Future at the Intersection of Mathematics and AI
🏫🎓📐🔬 For all levels
The 2020s mark a fundamental shift in the relationship between mathematics and AI.
Until the 2010s: The mathematics used in AI centered on linear algebra, probability theory, and optimization. Algebraic geometry and category theory were "pure mathematics," unrelated to AI.
The 2020s: The deepest parts of pure mathematics — sheaf theory, category theory, homological algebra, algebraic geometry — are beginning to rewrite the very design principles of AI.
🔬 For graduate students: Future research directions
-
Direct AI application of étale sheaves: Current SNNs use cellular sheaves; research is advancing to directly introduce étale sheaves' "Galois-action-equipped fibers" into GNNs.
-
AI applications of derived categories: Using the "global twist" information in sheaf cohomology groups ($H^i$) for AI anomaly detection.
-
Topos theory and AI: As suggested by Gavranović (2024), viewing the space of AI architectures as a topos and studying its cohomology as describing "limits of expressiveness."
However, the following caveats are necessary:
- Ideas such as "LLM contradictions as cohomological holes" or "Frobenius-corresponding weight update rules" are still speculative as of 2026 and are not established theory.
- A model called "Étale Graph Neural Networks (ÉGNN)" does not, to my knowledge, exist as an established model (though related research is advancing rapidly).
- Distinguishing established mathematical facts from unestablished research ideas is crucial for the healthy development of this field.
🏫 A message for high school students
The castles of logic that mathematicians spent decades and centuries building with "paper and pencil" are now gaining physical form through "Python and GPUs," being repurposed as blueprints for humanity's most advanced intelligence models.
The $\mathbb{F}_3$ building, the $\mathbb{F}5$ building, the $\mathbb{F}7$ building, the $\mathbb{F}{13}$ building, the $\mathbb{F}{17}$ building — the mathematics that deciphers the "pattern of light and shadow" across these desert buildings is, right now, evolving the AI inside your smartphone to the next stage.
Algebraic geometry is difficult. But if, after reading this article, you can see even a glimpse of the "desert building landscape," then you are already standing at the entrance to this grand adventure.
This article was written in the hope of opening the gate of the "impregnable castle" of algebraic geometry to as many readers as possible.
#(関連)