// Note: This article is merely a rough summary of the author’s recent study of set theory. If you want to delve deeper, I strongly recommend finding a mathematical logic textbook for systematic study. Wikipedia is also good.
As we all know, what distinguishes mathematics from other disciplines most importantly is rigor. For example, the previous article introduced ordinals and cardinals that transcend infinity, where the slightest carelessness can easily lead to errors. As mathematical theories and proof processes become increasingly complex, mathematicians urgently need a system that can mechanically standardize proofs, making it convenient for machines to verify them. Before the invention of computers, someone proposed it—the formal system. This article will start with simple formal systems, introducing propositional logic, first-order logic, Peano arithmetic, and finally ZFC set theory, the formal system that appears most frequently in mainstream mathematics.
With the goal of learning by doing, I slowly built a formal system simulator called “Deductrium“ that runs on web pages, which I later turned into a game.
Introduction: The pq Formal System
Simply put, a formal system is a string collection game. First, you have some initial strings in hand, the system provides some mechanical rules, and by applying these rules to the strings you have, you get new strings, which can then be used with the rules to generate more strings. The book Gödel, Escher, Bach mentions the simplest “pq” formal system:
- All legal strings in this system can only contain these three characters: “s”, “p”, “q”;
- You have these initial strings: “pq”, and if X represents a string containing only “s”, then “XpqX” is an initial string, such as “spqs”, “sssspqssss”;
- Rule: Let X, Y, Z all be strings. If you have “XpYqZ”, then you can get “XpYsqZs”.
For example: Starting with the initial string (X is “s”) “spqs”, applying the rule (X is “s”, Y is “”, Z is “s”) we get “spsqss”, continuing to apply the rule (X is “s”, Y is “s”, Z is “ss”) we get “spssqsss”.
Now, can you tell which strings can be obtained in finite steps and which can never be obtained?
- sspq
- spspq
- sspsqsss
- sspssqssss
Perhaps you’ve already seen that “XpYqZ” represents exactly $X+Y=Z$, where numbers are represented by the count of “s”s. Note that the formal system itself is meaningless; it only gains meaning when we assign meaning to it. The formal system itself only has symbols and rules; any meaning attached to it doesn’t belong to the formal system but is something we externally assign.
Propositional Logic Formal System
Although formal systems themselves have no meaning, we certainly hope they can do meaningful things, such as being used for mathematical proofs. Below we construct a deductive formal system for propositional logic without negation:
- All legal strings in this system are:
- Let $n$ be a natural number, then propositional symbols $A_n$ are all legal strings;
- If X and Y are legal strings, then “(X → Y)” is also a legal string.
- Let X, Y, Z be any legal strings. The initial strings in this system are:
- “(X → (Y → X))” are all initial strings;
- “((X → (Y → Z)) → ((X → Y) → (X → Z)))” are all initial strings.
- Rule:
If you have both strings “(X → Y)” and “X”, then you can get string “Y”.
So far, those arrows and letters have no actual meaning. Below we externally assign specific logical meaning to the system, which can be translated as:
- Term translation: legal strings correspond to propositions, strings you have correspond to theorems, initial strings correspond to axioms.
- Symbol translation:
- Propositional symbol $A_n$ represents specific true or false propositions, whose content we don’t care about for now.
- “(X → Y)” represents the proposition “proposition X implies proposition Y”, i.e., “→” represents logical implication.
- Axiom schemas: For any propositions X, Y, Z:
- The proposition “(X → (Y → X))” is an axiom (a1)
- The proposition “((X → (Y → Z)) → ((X → Y) → (X → Z)))” is an axiom (a2)
- Inference rule (mp):
If propositions “(X → Y)” and “X” are both theorems, then proposition “Y” is also a theorem.
Let’s look at an example of a formal proof, proving that for any proposition “A”, “(A → A)” is a theorem:
- “(A → (A → A))” (a1 axiom)
- “(A → ((A → A) → A))” (a1 axiom)
- “((A → ((A → A) → A)) → ((A → (A → A)) → (A → A)))” (a2 axiom)
- “((A → (A → A)) → (A → A))” (mp rule applied to 3 and 2)
- “(A → A)” (mp rule applied to 4 and 1)
Obviously, every step here can be mechanically verified (such as by computer verification), which perfectly satisfies the wishes of rigorous mathematicians.
Here’s a thinking question for readers:
Prove that for any propositions “A”, “B”, “C”, if “(A → B)” and “(B → C)” are theorems, then “(A → C)” is also a theorem.
Hint: Starting from this (a2) axiom seems like a good choice: “((A → (B → C)) → ((A → B) → (A → C)))”.
a1, a2, and mp actually specify three things: “how to introduce the logical implication symbol”, “how to use the logical implication symbol”, and “how to eliminate the logical implication symbol”, allowing us to express and handle the logic of “what can be implied”, but we currently cannot express “what cannot be implied”. To enable the system to handle negative propositions, we introduce the negation symbol “¬” and add the following to the formal system:
- If X is a proposition, then “¬X” is also a proposition.
- Add axiom schema: If X and Y are any propositions, then “((¬X → ¬Y) → (Y → X))” is an axiom (a3)
The formal system with negation added is the complete propositional logic system. It can be proven that: Let A and B be any propositions, then in this system the following propositions are all theorems:
- (¬¬A → A) (Double negation elimination)
- (A → ¬¬A) (Double negation introduction)
- (A → (¬A → B)) (Principle of explosion)
- ((¬A → B) → ((¬A → ¬B) → A)) (Principle of proof by contradiction)
The proof method is to write out the derivation steps. Independently thinking of these derivations is somewhat difficult. If readers don’t want to do more of these brain-burning reasoning exercises, they can directly refer to the answers on Wikipedia.
Perhaps you’ve noticed that this system doesn’t even have the most basic logical connectives “and” and “or”, but actually “A or B” can be expressed through “(¬A → B)”, and “A and B” can be expressed through “¬(A → ¬B)”. We can also introduce new axioms to add “and” and “or” symbols to the system. Click to expand/collapse how to introduce new symbols and axioms.
Or you might think two symbols “→” and “¬” are too many, going to another kind of extremism: actually, NAND logic alone can express all propositional logic, such as this Sheffer stroke formal system, but it has a cost—too many inference rules.
It’s hard to understand these formal systems just by reading verbal descriptions. Since computers are best at doing these mechanical determinations anyway, I strongly recommend trying to experience them in Deductrium’s formal system. If you find starting Deductrium from scratch too difficult and can’t get to the if-and-only-if and logical symbols, I’ve left a Deductrium creative mode where you can directly use all axioms, symbol definitions, and metatheorems.
There are many ways to define propositional logic formal systems. More commonly, systems directly include “and”, “or”, and “not”. The formal system we introduced is called Hilbert-style, characterized by many axioms but only one inference rule (mp). This inference rule combined with axioms yields the very important deduction metatheorem, which tells us why this one rule is sufficient:
If B can be derived as a theorem in the formal system under the assumption that A is known to be a theorem, then the system can derive (A → B) as a theorem without any assumptions.
See Wikipedia for the proof. Its converse is simpler, being directly the mp rule:
If the system can derive (A → B) as a theorem without assumptions, then B can be derived as a theorem in the formal system under the assumption that A is known to be a theorem.
The deduction metatheorem and mp rule tell us that the external behavior of the symbol “→” is indeed the same as logical implication.
Systems that include both “and” and “or” often have more inference rules, but they are equivalent in some sense—starting from the same assumptions, they can derive the same things, although the intermediate steps may differ greatly.
Soundness and Completeness, Internal and External Theorems
I wonder if readers might doubt whether the formal system with three axiom schemas and one inference rule we just introduced can really handle propositional logic reasoning? This question can actually be divided into two aspects: First (soundness), are the propositions it derives all true propositions? Second (completeness), for all true propositions, can they all be derived from this formal system? For propositional logic, both answers are affirmative. The soundness proof only requires listing the truth tables of all propositions involved in axioms and inference steps and verifying their correctness; while the completeness proof is a bit complex. In the later stages of the game Deductrium, an “automatic proof” metatheorem ability will be unlocked—as long as the system exhaustively verifies that all truth table propositions for atomic propositions hold, there is a method to write out the inference steps within the system.
Note that the relationship between propositions inside and outside the formal system is actually quite subtle. For example, we proved:
In the propositional logic formal system, for any proposition A, “(A → A)” is a theorem.
A proven proposition is called a theorem, and the above statement itself is also a theorem—specifically, it’s a theorem about theorems (in the propositional logic formal system). Note that these two “theorems” are at two different levels and are not the same:
- Things derived by the formal system, we just call them “theorems”, or the system’s “internal theorems”. They are just concepts within a certain formal system. Although they somewhat resemble ordinary mathematical theorems, they are not the same thing. The way to verify internal theorems is to provide inference steps within the system.
- The theorem we proved is a theorem in the theory about formal systems. For convenience of distinction, it’s also called a “metatheorem” or “external theorem”. External theorems are the real mathematical theorems, just like the commutative law of addition in arithmetic or Taylor expansion theorem in advanced mathematics—they are all derived within the mathematical system we’ve learned.
The Ultimate Question: Which Came First, the Chicken or the Egg?
Some people might be confused: to mechanically verify that proofs of mathematical theorems are error-free, we need to define the entire mathematical system as a more complex meta-formal system to describe and determine it. And to verify the correctness of the larger meta-formal system, we need to construct a meta-meta system, mechanically verify meta-meta theorems… (P.S. If you’re interested in this nesting, you can refer to this story of Aladdin’s lamp and the creator god). Specifically, formal systems can be precisely defined using the language of set theory, but as we’ll see later, set theory is defined by formal systems! We’ve fallen into the chicken-and-egg dilemma.
To avoid infinite nesting, we no longer use formal systems or other mechanical methods to verify the correctness of proofs of mathematical theorems. Reading proofs of metatheorems can only rely on our mathematical experience. That is, we cannot keep digging to the bottom; we must stop at some level and choose to directly believe in them. This is like when we learn a foreign language (formal system), we need a native language (our mathematical experience) for teaching, and the ability to acquire the source native language is an innate instinct that no longer depends on any language.
However, isn’t our purpose to formalize all mathematical theory? Mathematical theory should certainly also include formal systems and mathematical logic themselves. Currently, people do it this way: define a formal system that is powerful enough to express propositions in almost any mathematical theory, then we manually prove that this system is sound. As long as we believe the proof process of this system’s soundness is error-free (this part indeed cannot be mechanically verified), we can happily use this system to verify any mathematical proof, even including the proof process of the system’s soundness. Here’s a quote from Gödel, Escher, Bach:
One can never give a final, absolute proof that a proof in some system is correct. Of course, one can give a proof of a proof, or a proof of a proof of a proof—but the validity of the outermost system always remains an unproven assumption, accepted on faith.
This is also much like compiler bootstrapping in computers—the C++ compiler is written in C++! This seems like another chicken-and-egg dilemma, but it’s not: First, manually write a program that translates assembly language into machine code. This program can only be written in machine code, and whether the machine code runs correctly depends on whether we believe the electrons in the CPU chip move according to physical laws. Once this program is written, we can use assembly language to write a more advanced C language compiler, then use C to write a more advanced C++ compiler. Once we have a C++ compiler, we can abandon the original C language version of the C++ compiler and completely switch to using C++ to compile the C++ compiler.
First-Order Logic
In propositional logic, we only use letters to represent “atomic propositions” and haven’t dealt with how to specifically describe their “internal structure”. Let’s now look at first-order logic. First, let’s introduce the syntactic aspects:
In first-order logic, besides the concept of propositions, we also introduce the concepts of “terms” and “predicates”: variables and constant symbols are “atomic terms”, n terms acted upon by an n-ary function yield a term, and acted upon by an n-ary predicate yield an “atomic proposition”. Note that first-order logic is a class of systems; choosing different constant symbols or predicates results in different formal systems. Here are two examples of first-order logic formal systems:
- If terms represent integers, then “0”, “1”, “-2” are constant symbols, “x” is a variable symbol; addition is a binary function, so “add(1,1)” is also a term; “>”, “<”, “=” are all binary predicate symbols, so “2>1”, “add(1,1)=2” are all atomic propositions. Atomic propositions can continue to be used to construct more complex propositions according to propositional logic construction methods, such as: “¬(add(x,0)=0 → x=1)”.
- If terms represent sets, then the empty set “$\phi$” is a constant symbol, “x” is a variable symbol, curly braces are an n-ary function for constructing new sets from given elements; intersection $\cap$ and union $\cup$ are binary functions, “$\in$”, “=”, “$\subset$” are predicates. For example, $\left\{x,y\right\}\cap\phi$ is a term, $\phi\in\left\{\phi\right\}$ is an atomic proposition.
The purpose of introducing variables is to express propositions with quantifiers like for all ($\forall$) and there exists ($\exists$), which is the biggest difference between first-order logic and propositional logic. We continue to add the following rules:
- If A is any proposition and x is any variable, then ($\forall$ x : A) is also a proposition
After adding the new symbol $\forall$, we also need to add axioms that can handle it. Let capital letters be any propositions, lowercase letters be any variables, and Greek letters be any terms:
- (a4) (($\forall$ x : A) → A[x/$\alpha$]), where A[x/$\alpha$] means replacing all free occurrences of x in A with term $\alpha$, and A must satisfy the condition of being substitutable. These concepts will be explained in detail in the optional reading section later.
- (a5) (($\forall$ x : (A → B)) → (($\forall$ x : A) → ($\forall$ x : B)))
- (a6) (A → ($\forall$ x : A)) if x does not occur free in A
- (a7) For any axiom K, ($\forall$ x : K) is also an axiom
These four axioms correspond to the elimination (a4), use (a5), and introduction (a6 and a7) of quantifiers.
We don’t separately introduce axioms about the existence symbol “$\exists$”, but treat it as an abbreviation for “¬$\forall$¬”.
Finally, let’s talk about predicates. Generally, we assume formal systems have the equality predicate “=”, and add two axiom schemas:
- (eq1) $\alpha$=$\alpha$
- (eq2) ($\alpha$=$\beta$ → (A → A’))
where proposition A’ is the result of replacing all or part of the free $\alpha$ in proposition A with $\beta$.
Using (eq1) and (eq2), it’s not difficult to prove that equality has reflexivity and transitivity, i.e.:
- $\alpha$=$\beta$ → $\beta$=$\alpha$
- $\alpha$=$\beta$ → ($\beta$=$\gamma$ → $\alpha$=$\gamma$)
Let me emphasize again: although words like “for all” and “there exists” appear in first-order logic, the formal system only has these mechanized axiom schemas; it cannot understand the meaning behind these quantifiers. Correct logic can be achieved through mechanization alone, although these axiom schemas are somewhat complex. Click here to expand/collapse specific details about substitution/free occurrence in axioms.
Finally, let’s talk about what “first-order logic” means. Propositional logic is also called “zeroth-order logic”; it only discusses relationships between propositions. First-order logic goes inside propositions and can discuss arbitrary terms. Higher-order logic can discuss arbitrary functions between terms, arbitrary predicates, arbitrary propositions, and other relationships. However, generally speaking, first-order logic is sufficient, so second-order and higher logics are rarely mentioned.
The famous Peano axiom system describing properties of natural numbers is constructed on the basis of first-order logic. Click here to expand/collapse details.
ZFC Set Theory
ZFC set theory considers everything, including elements of sets, to be sets. Building on first-order logic, it only adds the predicate “belongs to” symbol “$\in$”, along with 9 axioms about sets:
- Axiom of Extensionality (defines the condition for set equality)
- Axiom of Pairing (defines how to construct sets from elements)
- Axiom of Union (defines the existence of unions)
- Axiom of Power Set (defines the existence of power sets)
- Axiom of Infinity (defines the existence of sets containing natural numbers)
- Axiom Schema of Separation (allows us to filter elements from large sets by conditions to form subsets, equivalent to an array’s filter function)
- Axiom Schema of Replacement (provides a way to construct other sets from known sets and relations, not limited to subsets, equivalent to an array’s map function)
- Axiom of Regularity (eliminates infinitely nested sets like “{ { {…} } }”, eliminating the barber paradox)
- Axiom of Choice (allows us to select one element from each of infinitely many arbitrary sets to form a new set)
The specific forms of these axioms are somewhat complex and won’t be listed here; they can be found online (and also in the game Deductrium).
Note that the above 9 axioms are completely described by first-order logic and don’t include any symbols besides the membership symbol (such as empty set symbol, intersection/union/complement, etc.). However, mathematicians certainly wouldn’t only use these few symbols. Click here to expand/collapse about the specification problem of defining new symbols
The ZFC axiom system can not only simulate Peano axioms and prove relatively complex propositions like the infinity of primes, but can also prove mathematical theories in almost all fields such as analysis and topological geometry by introducing more symbol concept definitions.
There are also some details to note about set theory itself. For example, all sets don’t form a set, and all ordinals can’t form a set either. According to the definition of ordinals, the set formed by all ordinals should also be a larger ordinal, then we could continue stacking even larger ordinals on this basis, which contradicts the largest ordinal. Actually, the axiom of regularity in ZFC set theory excludes things formed by all ordinals from being sets, thus avoiding contradiction. Extending the set theory formal system further by introducing the concept of Class allows us to describe all ordinals: the thing they form is called a “proper class”. Proper classes are very similar to sets, with the only difference being that they cannot appear as elements in other sets or classes.
Set theory has a very close relationship with ordinals. Click here to expand/collapse how to describe the strength of ZFC set theory using ordinals.
Conclusion
Since there are abundant online resources about set theory and I’ve only recently learned about these topics, there are far more exciting aspects of set theory than what’s covered here. For example, this article doesn’t mention the specific content of the famous Gödel’s incompleteness theorem at all. If you want to explore deeper, please refer to any mathematical logic textbook. Although ZFC set theory is the most widely recognized and used formal axiom system in mathematics, when playing Deductrium to later stages, you’ll find that this Hilbert-style formal system’s derivations are really verbose. If you want to derive well-ordering and mathematical induction from the axiom of choice, although proof ideas can be found online, the workload of formally operating within the game system to finally derive them is too large. Apart from Metamath (a mathematical reasoning library based on first-order logic and ZFC set theory formal systems), the proof assistant tools used by mainstream mathematicians today use a completely different system—type theory. This is the hole to be filled in the next article.