# Identity types and how to use them

I had a snappy line at the end of my discussion of sums and products, so I figured it was as good a place as any to stop. After this post, I’ll need one more post to clean up a few things, and then my mini tutorial on type theory will be over. Today, we’re talking about equality, which is probably the hardest part of Martin-Löf type theory.

Equality in logic

When you first learn first-order logic, you learn how to generate a first order language which has:

• Variables
• “Logical symbols”, (,),$\rightarrow$,$\wedge$,$\vee$, $\forall$, $\exists$, and crucially equality, $=$.
• Relation symbols and function symbols. (Also constant, symbols, but these are “nullary” functions).

For a long time it struck me as somewhat strange that equality was included here; after all, isn’t equality just another relation? What makes equality so special? But as I’ve become more familiar with logic and models, and the subtle interplay between object language and metalanguage, I’ve slowly come to understand why equality is often treated as a logical symbol. The very short version is that working with equality almost always happens at the level of our logic (that is, the meta in first order logic). In particular, we want a few things to be true:

• We can substitute equal objects for equal; If $\varphi(x)$ holds, and $x=y$, then $\varphi(y)$ holds. This (Leibniz Law) seems silly and obvious, but it is something we need to prove, and it’s something we need to prove at the level of the logic (since it deals with formulas) if we use equality as a relation symbol, we need to prove this separately for every theory.
• We want equality to be “the least congruence” on a model. But not all congruences are even definable in a given theory, so it’s not clear that such a statement can be reasonably made anywhere besides the logical level.

The point of all this is to say that if we’re trying to give an account of (mathematical) logic, we really do need to include equalities. It was also to point out two important properties of equality:

• Minimality
• Leibniz Law

We will use the first of these properties (minimality) as the basis of our definition, but some features of type theory will allow us to state something more general and without making any appeal at all to congruences.

Minimality

One of the curious properties of type theory is that the logic happens at the object level. We encode logical prepositions as types, as objects in or system; and proofs as terms, again objects in our system. So, since we will give our account of equality at the logical level, it is also going to be an object: a type. That is, for any two terms $\mathsf{x}$ and $\mathsf{y}$, we will have a type $\mathsf{x=y}$.

But hold on: unlike in set theory, we can’t pick objects out of nowhere: to have an object means to have an object of some type. But it hardly makes sense to compare objects of different types, so we have the formation rule for equality (or identity, as type theorists usually call it):

Identity formation: If $x:A$ and $y:A$, then $x=y$ is a type. Note that I haven’t said this type is inhabited, only that it exists—thinking in terms of propositions, we’re saying that it makes sense to propose $x=y$ (although it may be false). Sometimes it’s helpful to denote the type at which this equality lives. In that case, I’ll write a subscript, $\mathsf{x}=_{\mathsf{A}}\mathsf{y}$.

Now we need an introduction rule. There’s an obvious candidate: anything must be equal to itself. That is, equality is reflexive. This seems like the bare minimum we could ask of equality, so we have

Identity introduction: for any $x:A$, there is a term $\mathsf{refl}:x=x$. When we need to be clear, I’ll put a subscript, $\mathsf{refl}_{x}$.

Next is the elimination rule, which is a little more subtle. Here we have to return to that notion of minimality: we will say that identity is the “least reflexive relation”. If you are a type theorists, you can already guess what the elimination rule will be from that statement. But if you’re reading this, I’m going to guess you’re not a type theorist. As is common in math, we have to express “minimality” in an indirect way. As a first approximation, consider a binary relation on some type $A$. We can view this as a function $R:A \rightarrow A \rightarrow \mathsf{Type}$ since it takes two elements of $A$ and returns a proposition. Since we can talk about equality, there are two ways to say that a relation is reflexive, which I’ll call “strong” and “weak” (this is my own terminology, so you won’t find anything by looking it up; you’ll probably find something about lattices):

• weak reflexivity: For any $x:A$, $R(x,x)$. That is, every term is related to itself. In typical type theory notation, we’d write this as $\prod_{x:A}R(x,x)$.
• strong reflexivity: For any $x,y:A$, we have $x=y \rightarrow R(x,y)$. That is, equal terms are related. Again in standard notation, $\prod_{x,y:A}(x=_{A}y \rightarrow R(x,y))$.

The second is “strong” because it implies the other: if $r$ is a proof that $R$ is strongly reflexive, then $\lambda x.r(x,x,\mathsf{refl})$ (the function taking $x$ to the proof that $R(x,x)$, since we know $\mathsf{refl}:x=x$) is a proof that $R$ is weakly reflexive. Our first attempt at an elimination rule:

Identity elimination (almost): if R is a weakly reflexive relation on A, then R is strongly reflexive. Or, more in the spirit of the other elimination rules I’ve given:

Let $R:A \rightarrow A \rightarrow\mathsf{Type}$, and $r:\prod_{x:A}R(x,x)$. Then there is $r' :\prod_{x,y:A}R(x,y)$.

We’ll skip the computation rule for now; I’ll come back to it when I give the full form of the elimination rule. In the meantime, this is actually enough to do a surprising amount

Symmetry, transitivity, and a little more

We can actually use this form of the elimination rule to show symmetry and transitivity.

Equality is symmetric.

Proof: The statement of symmetry is $\prod_{x,y:A}x=y\rightarrow y=x$. Observe that this precisely says that the “flipped identity type” $R(x,y) :\equiv y=x$ is “strongly reflexive”. So, to apply the elimination rule we only need to show that it is weakly reflexive. That is, $\prod_{x:A}x=x$. But this is exactly what $\mathsf{refl}$ gives us.

Equality is transitive.

Proof: This is similar: we want to show $\prod_{x,y,z:A}x=y\rightarrow y=z\rightarrow x=z$. Reordering the arguments, we want to show $\prod_{x,y:A}x=y\rightarrow\prod_{z:A}y=z\rightarrow x=z$. So, if we let $R(x,y) :\equiv \prod_{z:A}y=z\rightarrow x=z$, we want want to show that $R$ is strongly reflexive. But by identity elimination, we only need to show it’s weakly reflexive. That is, we need a function $\prod_{x:A}\prod_{z:A}x=z\rightarrow x=z$. From here we could eliminate again, but we might as well take the pointwise identity function $f(x,p) :\equiv p$. ($f :\equiv \lambda x.\lambda p . p$). And so we have that $R$ is weakly reflexive, and by identity elimination, we are done. (aside: we need to be careful when reordering arguments like we did here. We need to make sure that the later arguments don’t depend on earlier ones. Since $x=y$ doesn’t depend on $z$, this is safe.)

In fact, we can do more than this. Remember earlier I mentioned that a crucial property of equality is the Leibniz law: If $P(x)$ and $x=y$, then $P(y)$. We can actually prove this!

Let $x,y:A$ and $P:A\rightarrow \mathsf{Type}$. Then there is a function $x=y\rightarrow P(x)\rightarrow P(y)$.

Proof. Take $R(x,y) :\equiv P(x)\rightarrow P(y)$. This is weakly reflexive, since $\lambda p.p : P(x)\rightarrow P(x)$, and so by identity elimination is strongly reflexive. I.e., for all $x,y:A$, $x=y\rightarrow P(x)\rightarrow P(y)$.

A nice exercise is to rewrite the proofs of symmetry and transitivity using the Leibniz law (instead of using identity elimination directly).

We can do better still: we can actually show that for any function $f:A\rightarrow B$, if $x=_{A}y$, then $f(x)=_{B}f(y)$. Specialized to $B=\mathsf{Type}$, this says that not only do we have a function from $P(x)$ to $P(y)$, but that $P(x)=P(y)$. The proof is again by identity elimination: we only need to show that for any $x$, $f(x)=f(x)$, but we have $\mathsf{refl}_{f(x)}$.

Let’s back up a bit: when we write out all of these theorems formally in type-theoretic notation, we see that in each case we have a function:

• We have inversion $x=y\rightarrow y=x$, and we write the inverse of $p:x=y$ as $p^{-1}:y=x$,
• We have concatenation $x=y\rightarrow y=x \rightarrow x=z$, and we write the concatenation of $p:x=y$ and $q:y=z$ as $p\cdot q:x=z$
• We have a function which “transports elements of $P(x)$ along an equality” ($p:x=y$) to elements of $P(y)$. We write this function $\mathsf{transport}^P(p,u)$ (here $p:x=y$ and $u:P(x)$.)
• We can “apply functions to equalities” which we write $\mathsf{ap}_f(p)$ (or sometimes just $f(p)$), so $\mathsf{ap}_f:x=y\rightarrow f(x)=f(y)$.

If I had given you the computation rule, you could show that these have the properties we expect inverses, concatenation, and so on to have. (technical stuff: were we to prove these properties, we’d have proven that types form groupoids, with equalities as morphisms, and functions as functors. We almost have that type families are fibrations, but we’re not quite there yet.)

Dependent elimination

I said the elimination principle I gave wasn’t quite the full principle. So, what can’t we do?

Well, suppose we want to generalize $\mathsf{ap}$ to work on dependent functions. When we try to write out this statement, we run into a problem. Let $f:\prod_{x:A}B(x)$. We want to say $f(x) = f(y)$. But $f(x):B(x)$ and $f(y):B(y)$. A priori these are different types. But, we can solve this by transporting: since we have $p:x=y$, we can transport $f(x)$ along $p$ to get some element of $B(y)$. This element is what should be equal to $f(y)$. So the full statement becomes

Given a type $A$, a family $B:A\rightarrow\mathsf{Type}$, and a dependent function $f:\prod_{x:A}B(x)$, for any $x,y:A$, there is a function $\mathsf{apd}_f:\prod_{p:x=y}\mathsf{transport}^B(p,f(x)) = f(y)$.

Trying to prove this we run into a problem: our function $\mathsf{apd}_f$ is a dependent function. But we’ve only given an elimination rule for non-dependent functions. So, we need a dependent eliminator, just like we did for $\Sigma$ types last time. Morally, we eliminate exactly as we did before, but the type encoding “reflexivity” also depends now on the equality we give it. So, we have:

Identity elimination: Let $C:\prod_{x,y:A}(x=y\rightarrow \mathsf{Type})$ and suppose we have $c:\prod_{x:A}C(x,x,\mathsf{refl})$ (“$C$ is weakly reflexive”). Then we have $r:\prod_{x,y:A}\prod_{p:x=y}C(x,y,p)$ such that

Identity computation: $r(x,x,\mathsf{refl}) \equiv c(x)$.

Now, we can construct $\mathsf{apd}_f$ (almost) exactly how we constructed $\mathsf{ap}$. We need only construct $\prod_{x:A}\mathsf{transport}^B(\mathsf{refl},f(x)) = f(x)$. But a quick check of the definitions (using our new computation rule) shows us that $\mathsf{transport}^B(\mathsf{refl},f(x)) \equiv f(x)$ definitionally. So we can use $\mathsf{refl}$.

From here there’s a lot we can do. In fact, with the dependent sums and products we introduced last time, we now have all of the “usual” types of dependent type theory. But I think we’ve seen quite enough for one post.

# 15 More Minutes of Type Theory

In my last post, I gave a quick run-down of simple type theory with functions, products and sums; and briefly sketched the correspondence with propositional logic. We can extend type theory to accommodate predicate logic, which is today’s task. When we’re done, we’ll be one step away from Martin-Löf type theory: identity types.

Putting our intuitionist glasses back on, we want to give an account of “for all” and “exists” where a proof of “for all $x$, $P(x)$” and “exists $x$, $P(x)$” correspond to constructions. But first we need to understand predicates.

Classically, it’s common to interpret a predicate on $A$ as a subset of $A$, or of some Cartesian power ($A^n$). A well-known fact is that for any predicate (subset), $P\subseteq A$, there is a characteristic function $\chi_{P}:A \rightarrow 2$ to the two element set $\{0,1\}$ with $\chi_P(x) = 1$ precisely if $x\in P$. (Or in predicate notation, if $P(x)$.) Since classically, a proposition is either true or false, we can think of a predicate on $A$ as a function from $A$ to the set of “propositions”, where 1=true and 0=false.

As we saw last time, in type theory, we interpret propositions not as truth values (true or false), but as types, so it makes sense to interpret a predicate P on a type A as a function P:A→Type. There’s a certain sleight of hand here: while we interpret propositions to be types, we may not wish to interpret all types as propositions. For example, it seems a bit unnatural to think of ℕ as a proposition. Because of this, we will call a function P:A→Type a type family.

Type families

Type families do more than simply allow us to interpret predicates: If we have some type family P:A→Type, we have, for each a:A a type P(a). That is, this really is an indexed family of types, the same way that $\{X_i\mid i\in I\}$ is an indexed family of sets. If we squint right, this lays down a framework from which we might be able to come up with a type theoretic analog of the set-theoretic axioms of choice, comprehension and replacement (more on those some other time).

From a programming perspective, type families give us a more fine-grained analysis at the type level. For example, we can create a type of vectors of length n (for fixed n) which depends on n. This may seem abstract silly and rather, but n-branching trees (which have a maximum n) are common bases for data structures (e.g., B-trees), and these are usually implemented using an array of pointers which needs to remain the same length, or by hand-coding the requisite number of pointers. In either case, the programmer has to do extra work: either copy code for each case, or ensure (in the real world) that the array is never made larger or smaller. Using a type of length n vectors, we put this check at the type level: an incorrect number of children simply isn’t possible.

Type Families and Predicates

Since a proposition is true if it’s inhabited, a term p:P(a) is a “proof” that P(a) holds. For example we could define the type family isEven:N→Type by induction. In fact, let’s do it from first principles in agda:

data ⊥ : Set where

data ⊤ : Set where
* : ⊤

data ℕ : Set where
zero : ℕ
succ : ℕ → ℕ

{-# BUILTIN NATURAL ℕ #-}

¬ : Set → Set
¬ a = a → ⊥

isEven : ℕ → Set
isEven zero = ⊤
isEven (succ x) = ¬ (isEven x)

p : isEven 8
p = λ z → z (λ z₁ → z₁ (λ z₂ → z₂ (λ z₃ → z₃ *)))


The three blocks starting with data declare types (agda uses Set instead of Type); I’m using ⊥ for the empty type (no constructors, hence the empty “where”), ⊤ for the unit type (with a single constructor), and ℕ for the natural numbers (with two constructors zero and the “successor” function). Agda infers the elimination rule automatically, and gives us some nice syntax for elimination (the pattern matching seen in isEven). The next line tells Agda that I’ve re-implemented the naturals–this allows me to use numerals (like 8 below). After that is the definition of negation from before, and then our type family: zero is even, so we define isEven zero to be a type we know to be inhabited; if n is even, then n+1 isn’t even, so we just use negation.

Finally, there’s an example proof that 8 is even. It looks awful. Two things:

• Agda found that proof automatically in less time than it took me to blink.
• A full, fully formal proof that 8 is even in any formal system is scary. I’d be impressed to see such a readable proof in (e.g.) ZF.

Universal Quantifier

We have predicates, so we need to figure out how to quantify over them, in a “constructive” way. The universal quantifier says that given x:A, there is a proof b:B(x). This looks a lot like a function from the previous post, and in fact we’ll treat it almost the same: given a type family B:A→Type, we can form the type of dependent functions (also called the dependent product), written $\prod_{\mathtt{x:A}}\mathtt{B(x)}$. A dependent function is introduced the same way as a normal (non-dependent) function: if, given an arbitrary x:A we can form a term b:B(x), then we can introduce the dependent function λx.b(notice, though, that the “output type” depends on the input value). As with non-dependent functions we eliminate this by application, and (λx.b)(a) computes to b[x/a].

Dependent functions generalize functions, since a “non-dependent” function A→B can be thought of as $\prod_{\mathtt{x:A}}\mathtt{B(x)}$ where B(x) is constant. But now that we’ve generalized our dependent functions, our elimination rules aren’t good enough: it’s not enough to to be able to define normal functions out of types, we also need to be able to define dependent functions out of types. Intuitively, all we do is replace ‘type’ by ‘type family’, and ‘function’ by ‘dependent function;, and everything works out; the computation rules are even written the same. In practice, the dependent eliminators look much more terrifying than their non-dependent counter-parts. Rather than going through it, I’ll just give an example: the natural numbers

The usual way to write the proposition encoding induction (over the natural numbers) for a formula $\varphi$ is something like

$\varphi(0) \wedge (\forall n\in\mathbb{N} . \varphi(n) \rightarrow \varphi(n+1)) \rightarrow \forall n\in\mathbb{N} . \varphi(n)$

Which reads “If $\varphi(0)$ holds and for all $n\in\mathbb{N}$, if $\varphi(n)$ holds then so does $\varphi(n+1)$, then $\varphi$ for all $n\in\mathbb{N}$. Translating this into type theory using dependent products as “for all” and C instead of $\varphi$ gives us

$\mathtt{C}(0) \rightarrow (\prod_{n:\mathbb{N}}\mathtt{C}(n) \rightarrow\mathtt{C}(n+1)) \rightarrow \prod_{n:\mathbb{N}}\mathtt{C}(n)$.

Besides changing some symbols and formatting, the only difference is that we’ve Curried our function. In fact, the elimination rule for the natural numbers is induction.

Existential quantifier

If we insist on having constructive content, then we had better be able to back a claim that “there exists an $x$ such that $P(x)$” by exhibiting a witness that $P(x)$ holds. That is, the first thing an “existential constructions” should do is exhibit a thing; but it’s not enough to just give a thing and insist it satisfies our property, we must also demonstrate that this thing does, in fact, satisfy $P$. That is, to prove $\exists x. P(x)$, we need to give

• A witness $x$
• A proof that $P(x)$ holds.

As usual, our existential quantifier is limited to some domain; in our case, this will be a type, A. We can package our “existential construction” of $\exists x\in\mathtt{A}.P(x)$ as a pair (x,p) where

• x : A,
• p : P(x)

This motivates one more type constructor in our system: The sum type $\sum_{\mathtt{x:A}}\mathtt{P}(\mathtt{x}).$ We introduce elements of this type as pairs (x,p) with x:A and p:P(x), and we eliminate them just as we would for the product type A×B, except we have to deal with the extra dependency. In short, though, to define a (dependent!) function $f:\prod_{\mathtt{x}:\sum_{\mathtt{a:A}}\mathtt{P}(\mathtt{a})} \mathtt{C}(\mathtt{x})$ it suffices to define $f((\mathtt{a,p})$ for each a:A and p:P(a).

The most obvious functions to define from any pairs are the projections; in the case of Cartesian products A×B, we have functions

• $pr_1 : \mathtt{A\times B \rightarrow A}$
$pr_1(\mathtt{(a,b)}) :\equiv \mathtt{a}$

• $pr_2 : \mathtt{A\times B \rightarrow B}$
$pr_2(\mathtt{(a,b)}) :\equiv \mathtt{b}$

The definitions are the same for dependent pairs, but there’s a catch: for the second projection of $\sum_{\mathtt{x:A}}\mathtt{B(x)}$, what is the type of the second coordinate of (a,b)? It’s not B, since B is a type family, not a type. Rather, it’s B(a). That is, the output type of the second projection should depend on the input value. That is, the second projection is a dependent function. Thus, we have the dependent projections:

• $pr_1 : (\sum_{\mathtt{x:A}}\mathtt{B(x)}) \rightarrow \mathtt{A}$
$pr_1(\mathtt{(a,b)}) :\equiv \mathtt{a}$

• $pr_2 : \prod_{p:\sum_{\mathtt{x:A}}\mathtt{B(x)}} \mathtt{B}(pr_1(p))$
$pr_2(\mathtt{(a,b)}) :\equiv \mathtt{b}$

It’s always good to ensure things are well-typed. In this case, we need to see that $pr_1(\mathtt{(a,b)})\equiv \mathtt{a}$ (definitionally), to ensure that $\mathtt{B(a)}\equiv\mathtt{B}(pr_1(\mathtt{(a,b)}))$, but of course, this is true by the definition of $pr_1$.

Sums and products and exponents

Observe that just as the function type A → B is a special case of a dependent product type, the (binary) product type is a special case of the dependent sum type: If our type family B : A → Type is constant, then we recover exactly the product A×B, where we’re abusing notation and using B both for the constant type family and for its (unique) output value.

This struck me as deeply counter-intuitive when I first saw it; surely the dependent sum should generalize sums and the dependent product should generalize products. Of course, this is true; we can recover binary sums and products from dependent ones (hint: they’re binary!), but what’s more illuminating is to look at “type arithmetic”, or the more familiar set arithmetic.

If $A$ and $B$ are finite sets, there are a few neat observations you (hopefully) learn in your first set theory course:

• $\mid A + B\mid = \mid A\mid + \mid B\mid$ (where + is the disjoint union),
• $\mid A \times B\mid = \mid A\mid\cdot\mid B \mid$,
• $\mid A^B \mid = \mid A\mid ^{\mid B\mid}$ (where $A^B$ is the set of functions from $B$ to $A$, which we’ve called $B\rightarrow A$).

These equations actually motivate the definitions of these operations for cardinal arithmetic. Now, (dependent) sums and products represent operations of arbitrary arity, so the fact that functions are special cases of (dependent) products and (binary) products are special cases of (dependent) sums is really saying:

• Exponentiation is just multiplying something by itself lots of times.
• Multiplication is just adding something to itself lots of times.

And of course, every 4th grader knows both of these facts.

# Type theory in 15 minutes

I’ve posted a couple of things on (homotopy) type theory so far, and both times hinted at giving an actual introduction. Since I can finally feel justified in not doing work for a little bit, I’ll go ahead and do that now. I wanted to cover dependent type theory, but this takes longer than I had thought, so we’ll start with a less ambitious type theory: the simply typed lambda calculus.

Mathematics as construction

Type theory arises from an attempt to formalize L.E.J. Brouwer’s notion that mathematics is “an activity of mental construction.” The traditional way to formalize a mathematical discipline is to lay out a logical language which is intended to lay down the statements (true or false) that can be meaningfully expressed; and then posit some axioms which are intended to capture the basic truths of the system. From this, we “do” mathematics by proving certain statements to be true, and others to be false. That is, the foundation is logic, and mathematics is built on top of this foundation.

Brouwer viewed this as an inversion of the actual state of affairs: The notion of “mathematical construction” is basic, while logic is a sort of mathematical activity, and so must be built on top of this. To formalize this intuition, instead of presenting the valid statements and basic truths, we must posit the basic objects and basic constructions. Only then, if we chose, may we interpret certain objects as being “logical statements”, and certain objects as being “logical proofs”. To present type theory, then, we must determine

• What are the basic objects?
• What are the basic constructions?
• How do we interpret logic?

But before we even begin to answer those questions, we need to think a little more about what “construction” means.

Mathematical constructions: types and judgment

Intuitively, a construction should say something like “given an object x, we may form an object c(x).” For example: Given natural numbers x and y, we may form their sum, x+y. Observe that this statement requires that x and y are natural numbers; it makes no sense to form the sum of “green” and “whiskey”, or a group and a natural number. That is, constructions only make sense on objects of a given type. An object, by virtue of existing, has a type attached. We call our objects “terms”, and we write “the term x has type A” as x:A (we will also express this in English as “x is an inhabitant of A“). I said “by virtue of existing”, meaning that it doesn’t make sense to present an object without it having a type–as opposed to (ZF) set theory, where all object are taken from “the universe”, we must say what an object is when we introduce it.

This expression, x:A, is a typing judgment. It is not a proposition: after all, propositions are part of logic, and we haven’t gotten that far yet. Similarly, we do not “prove” a judgment, we infer it. The valid inferences are given by inference rules, which always take the form

Given judgments $\mathcal{J}_1, \mathcal{J}_2, \ldots, \mathcal{J}_n$, we may infer the judgment $\mathcal{I}$.

So far, we have seen one sort of judgment: x:A. However, before we can even make such a judgment, we must know that A is a type. We express this judgment A:Type. By analogy with typing judgments, this seems to be saying that Type is a type. For now, this analogy is harmless.

There is another form of judgment which is somewhat more subtle. Introducing the sum x+y does us no good unless we can compute with it. That is, unless we can say what the term x+y means. For this, we introduce equality judgments, which give “judgmental” or “definitional” equality. We write x≡y to mean “x is judgmentally equal to y.” Judgmental equality is supposed to capture the idea that two terms are different “words” with the same meaning: one term may be substituted for another without change of meaning. For example “President of the United States” and “Head of State for the US” mean the same thing. However, even though Barack Obama is the President of the United States, we cannot freely substitute “Barack Obama” for “the President of the United States”, as demonstrated by the nonsensical “George Bush was Barack Obama before the President of the United States.”

Since x and y can only have the same meaning when they have the same type, x≡y only makes sense when x:A and y:A (for some type A). When we wish to be clear about the type under question, we will write x≡y:A. By the same analogy as above, we also have A≡B:Type.

That is, we have

• Typing judgments x:A and A:Type
• Equality judgments x≡y:A and A≡B:Type.

Presenting types

Intuitively, a type is the collection of all objects for which certain constructions make sense. Since we need to introduce the “basic” objects and “basic” constructions, this means for any type we need to say

• what “basic” (or canonical) terms inhabit that type (how to introduce inhabitants of that type);
• what “basic” constructions we may perform on terms of that type (how to eliminate inhabitants of that type);
• How the basic constructions “behave” on the basic terms (how to compute with the basic constructions).

These are given by 3 sorts of rules, called introduction, elimination and computation rules. The elimination and computation rules will capture the idea that a type is “completely determined by its canonical inhabitants”. That is, it suffices to define constructions by defining them on canonical terms. For reasons that will become clearer when I discuss dependent types, we will need a fourth sort of rule called a formation rule, which says under what conditions we can “form” a given type. Usually this is “the most obvious condition possible”.

A type is defined by these 4 sorts of rules; unlike sets, which are defined as the totality of their members, a type is defined by when it can be formed, what its canonical elements are, what basic constructions it admits and how these basic constructions behave.

The first example: functions

Since we can present the rest of our types by reference to function types, we start with these. Intuitively, function types represent the type of “valid constructions” which take terms of a given type, and return terms of another. We have

Function formation: Given types A and B (A:Type and B:Type), we may form the type of functions from A to B, written A→B.

Function introduction: What does it mean to have a “basic” function from A to B? Since functions represent constructions, we should have that whenever we have a way to construct a B from an arbitrary A, we may construct a function. Somewhat more rigorously: whenever it is the case that given x:A, we may construct b:B, then there is a function λx.b:B. This is called a lambda abstraction and is usually read out loud as “lambda x dot b”. Intuitively, this is supposed to be the function which returns b when applied to input x. There is a subtlety here: b is a term, but it may contain x as a “free” variable. For example, x+1 is a term of type N (the type of natural numbers), but in order to actually determine what this term means, we must first know what x means. So, λx.x+1 is a function which takes some x:N and returns x+1 (which also has type N).

Function elimination: Given a function f:A→B and a term x:A, we may apply f to x, to get f(x):B. For example, our function λx.x+1 may be applied to the number 0, which should return 1. This fact is captured by

Function computation: If, given an arbitrary x:A, we may construct b:B, and we have a term a:A, then (λx.b)(a) means “b, with x replaced with a.” We write this (λx.b)(a)≡b[x/a]. Using our example function, we have (λx.x+1)(0)≡0+1. When we actually have 0, + and 1 defined, we will see that this is in fact judgmentally equal to 1.

A brief note: This might all look technical and scary, but I’ve really said nothing interesting so far: I’ve only said that if we have a rule for “constructing” an object (of a certain type) whenever we have an object (of some other type), then we have a mathematical object which “captures” this construction. Moreover, this object behaves exactly as we should expect it to.

Functions of multiple arguments

Crucially, the definition of the function type only accepts a single input. To define a function, such as + which takes multiple inputs, we can abstract over multiple arguments. E.g., we have the (formally unsatisfying, but intuitively clear) function λx.λy.x+y. What is the type of this function? Well, it takes x:N, and returns λy.x+y. This in turn takes y:N and returns x+y. x+y:N, so λy.x+y:N→N, and so finally λx.λy.x+y:N→(N→N). That is, we have a function which takes an N and returns a function N→N. This is often called a Curried function (after Haskell Curry). Since we write functions of multiple arguments a lot, we will omit brackets, and assume → associates “to the right”. So N→N→N means a function which takes an N and returns a function N→N, while (N→N)→N is a function which takes a function N→N and returns an N.

The rest of the simple types

Which types are included in “the” simply typed lambda calculus depends on the purpose–some authors just have function types (and some base type), others add sums, or products, or N. We’ll stick with what we need to encode propositional logic: functions, sums, products, unit and empty.

The unit type is supposed to be a “one element type”, which we may as well call 1. We may always form 1.

We introduce a unique canonical term, let’s say *:1.

What does it mean to eliminate a term of type 1? The elimination rule is supposed to say how to define constructions “out of” 1. Since a construction “out of” 1 is the same thing as a function out of 1, we need to know how to define such a function. Intuitively, this should just select a term of some type A. That is,

unit elimination: Given a type A:Type and a term a:A, we have a function a*:1→A such that
unit computation: a*(*) ≡ a.

The empty type is supposed to be a type which we can never construct an inhabitant of. We will call this type 0 (and as with 1, we can always form it). Since we do not want to have any inhabitants, there is no introduction rule.

What about the elimination rule? Just as in set theory, we want to have a unique “empty” function from 0 to any other type. That is: given a type A, there is a function !:0→A, and since there is no introduction rule for 0, we have no computation rule: there is nothing to compute.

Product types are supposed to capture the Cartesian product of two types. So, the product A×B should have terms (a,b) with a:A and b:B. This gives us the formation and introduction rules as:

Given A:Type and B:Type we may form the product A×B:Type; given a:A and b:B, we introduce (a,b):A×B.

What about elimination? The obvious choice is the projections, but we can do a little better: a construction “out of” A×B should correspond to a function A×B→C for some type C; that is, function of two arguments. Recalling back to curried functions, then we should have a function A→B→C; more explicitly:

Product elimination: Given a type C:Type and a function f:A→B→C, there is a function f':A×B→C which behaves like f “on pairs”. that is, we have the computation rule f'((a,b)) ≡ f(a)(b). Then we can define the projections as p1 ≡ (λx.λy.x)' (that is, the function defined by elimination on λx.λy.x) and p2 ≡ (λx.λy.y)' Checking that this works:

p1((a,b)) ≡ (λx.λy.x)'((a,b)) ≡ (λx.λy.x)(a)(b) ≡ (λy.a)(b) ≡ a

The first equality is the definition of p1, the second by the computation rule for products, and the third and fourth by the computation rule for functions. (Notice that there’s no replacement to do in the last one, since y doesn’t occur in the expression a).

Finally, we want sum types, which we write A+B (and we can form this type under the same conditions as product and function types). These correspond to disjoint unions in set theory. Why not unions? because we cannot arbitrarily throw objects into new types (as we can in set theory), so we need to “tag” our terms with their “real” home. That is, given types A:Type and B:Type, and a term a:A, we have a term inl a: A+B, and likewise, given b:B, we have inr b:B (inl is read “in left” and inr is read “in right”). Notice that we have two introduction rules.

To define a function A+B→C, we need to know what to do with the “left” terms and what to do with the “right” terms. In other words, we need two constructions f:A→C and g:B→C. This allows us to construct a function f+g:A+B→C, with the computation rules (one for each introduction rule!), (f+g)(inl a) ≡ f(a) and (f+g)(inr b) ≡ g(b).

Encoding logic

To wrap up this (already long) post, let’s look at how to encode logic. According to our Brouwerian intuition, we should have that “prove a proposition” corresponds to “construct an object”–logical proof is a part of mathematical activity, and mathematical activity is all about constructing things. It seems natural to think that all proofs of a given proposition have the same “type”–the type of proofs of that proposition, so why not encode propositions as types, and say that a “proof” if that proposition is a term of the corresponding type?

We start with the two obvious propositions: “the” true proposition, and “the” false proposition. Since truth corresponds to being inhabited, we have an obvious candidate for true: 1, and since false things should never be provable, the false proposition should have no inhabitants: let’s take 0.

Implication: “A implies B” means if A is true, then B is true. That is, from a proof of A, we can construct a proof of B. This is exactly what A→B describes.

Conjunction: to show that “A and B are true” we need to know that A is true and that B is true. In other words, to construct a proof of “A and B“, we need a proof of A and a proof of B: this is exactly A×B

Disjunction: To prove “A or B” we need either A or B. As you can hopefully guess by now this is exactly A+B.

Finally, what about negation? The usual way to show something is false is to show that it leads to absurd consequences. That is, ¬A is true when from a proof of A, we can conclude something false. I.e., ¬A should correspond to A→0.

So, summing it up:

 True:  1
False:  0
A→B:  A→B
A∧B:  A×B
A∨B:  A+B
¬A:  A→0


If you compare the introduction and elimination rules for types, with the corresponding introduction and elimination rules in Natural Deduction, you see that (except for product elimination), they correspond exactly, with 0 elimination being the principle of explosion. Moreover, the usual elimination rules for conjunction are exactly the projections, and we can “prove” the elimination rule for products in natural deduction (That is, we have a derivation in ND of A∧B→C with open assumption A→B→C)

One thing we’re missing is double negation elimination: that is, we have intuitionistic or constructive propositional logic. For Brouwer, this was a Good Thing: proving that “A is true” by showing that A is not false doesn’t give us an actual witness that A is true. For example, the usual presentation of Euclid’s proof doesn’t (necessarily) give us an actual prime, it just tells us that there must be one somewhere. (For what it’s worth, the theorem is constructively valid, you just need to be more careful to present the proof constructively.)

There are other reasons that this is a good thing (and of course, reasons it’s a bad thing), but we’ll leave those for another day.

The primary difficulty in learning category theory is the need for lots of good examples; more than anything else in math, category theory is tremendously abstract, and without building intuition from examples, it’s just a bunch of arrows and dots. Besides familiar examples, like those from set theory, algebra and topology, it’s often helpful to look at minimal examples—examples which show the categorical structure, with as little “residual” structure as possible.

A wonderful source of minimal examples is order theory—since there is at most one arrow between any two objects in a preorder category, all diagrams commute in a preorder. As a result, we get to see what happens to categories (and functors) without worrying whether the morphisms in question behave nicely. But I’m getting ahead of myself—let’s look at some basic order theory from the classical perspective, and then see how it translates to category theory.

A preorder $\leq$ on a set $S$ is a reflexive, transitive relation. That is, for all $x\in S$, we have $x\leq x$ and for all $x,y,z\in S$ such that $x\leq y$ and $y\leq z$, we have $x\leq z$. Observe that a preorder is not necessarily anti-symmetric: we can have $x\leq y$ and $y\leq x$ without having $x=y$. Since $x\leq y \wedge y\leq x$ is clearly an equivalence relation, we can quotient a pre-order to arrive at a relation which is additionally anti-symmetric: if $x\leq y$ and $y\leq x$, then $x=y$.

A (weak) partial order on a set is a reflexive, transitive, anti-symmetric relation (and a set with a partial order is called a poset). Partial orders show up basically everywhere, and I assume if you’ve read this far you already know what they are, so I won’t bore you with examples, but of course many of the most important examples are collections of sets, where the ordering relation is $\subseteq$. Importantly, in many of these typical examples, we will take the closure of some element with respect to some ambient structure–e.g. the closure of a subspace, or the generated subalgebra (which is “closed” under the algebraic operations in question). These closure operators all have three important properties:

• They are monotone: if $x\subseteq y$ then $\overline{x}\subseteq \overline{y})$.

• They are extensive: $x\subseteq \overline{x}$.

• They are idempotent: We always have $\overline{\overline{x}}=\overline{x}$. In English, the closure of a closed set is closed.

Abstracting to a general partial order, we have that a closure operator $C$ on a poset $P$ is a function $C:P\rightarrow P$ which is monotone, extensive, and idempotent; we call $x\in P$ closed when $x=C(x)$ (equivalently, when $x=C(y)$ for some $y$). If the underlying poset is $\mathcal{P}(X)$ for some set $X$, we usually call it a closure operator on $X$. The most common (explicit) examples of closure operators come from algebra: given, say, a group (in fact, any algebraic structure will do) $G$, the subsets of $G$ form a poset under $\subseteq$. It’s easy to see that the operation $C$ taking a subset $X\subseteq G$ to its generated subalgebra $\langle X\rangle$ is a closure operator.

Let’s focus in on this group example: the subgroup generated by $X$ is the set of all elements $x_1\cdot x_2\cdot\ldots\cdot x_n\in G$, for some natural number $n$, where each $x_i$ is either in $X$, or is $x^{-1}$ for some $x\in X$. In particular, each element of $\langle X\rangle$ can be represented by some finite number of elements of $X$. That is, each $x\in\langle X\rangle$ is in some $\langle X'\rangle$ where $X'$ is a finite subset of $X$. In other words, we have for all $X\subseteq G$

$\langle X\rangle = \bigcup\{\langle X'\rangle \mid X'\subseteq X\text{ is finite}\}$ (*)

A closure operator $C$ on a set $S$ (i.e., on the powerset lattice) is called finitary or algebraic when this happens—that is, when (*) is satisfied by all subsets of $S$. It might be interesting to generalize this to an arbitrary poset, but I won’t do that here.

Ok. Time to look at all this from a categorical perspective. The first point is that from any pre-order $P$ we can define a category: We take as objects the elements of $P$, and when $x\le y$, we have a unique arrow $x\rightarrow y$. The arrow is usually left unnamed, since there’s at most one for any pair of objects. Reflexivity ensures the existence of $id_x$ (as the unique arrow $x\rightarrow x$), while transitivity ensures composites (where $(x\rightarrow y)\circ (y\rightarrow z)$ is the unique arrow guaranteed by $x\le z$.) Categories with at most one arrow between any pair of objects are called, naturally enough, poset categories.

Fact: Any diagram which we can draw in a poset commutes, since any parallel arrows are equal.

A category is skeletal when isomorphic objects are equal. A skeletal precategory is a poset—That is, if $P$ is a preorder, and whenever there are arrows $x\rightarrow y$ and $y\rightarrow x$, then $x=y$. (Exercise: anti-parallel arrows are mutually inverse in any pre-order.)

Finally, some motivation for this whole post: one of the takeaways from a first course in (classical) universal algebra is that algebraic structures and closure operators give the same information—every (algebraic) closure operator (on a powerset lattice) gives rise to an algebraic structure, and vice versa. It’s well-known that monads capture the notion of algebra in a categorical way, and finitary monads correspond precisely to algebraic theories. So, there should be some close relationship between closure operators and monads. Besides the obvious one via algebraic theories, there’s another one:

A monad on a poset is precisely a closure operator. This result is pretty fundamental, and it shows up in just about every category theory book, but somehow I missed it until pretty recently. (I’m new at this!)

A monad on a category $C$ is a functor $T:C\rightarrow C$ together with natural transformations $\eta:1_C\Rightarrow T$ and $\mu:T\circ T\Rightarrow T$ satisfying certain diagrams—since we’ll be dealing only with posets, we don’t care what these diagrams are.

Let’s see how this plays out on a poset $C$:

• $T:C\rightarrow C$ is a functor: we must have $x\le y$ implies $T(x)\rightarrow T(y)$. That is, $T$ is monotone
• There is a natural transformation $\eta: 1_C\Rightarrow T$. Such a natural transformation is given by an arrow $x\rightarrow T(x)$ for all $x\in C$. That is, for all $x\in C$ we have $x\le T(x)$. That is, $T$ is extensive.
• There is a natural transformation $\mu:T\circ T\Rightarrow T$. As in the last point, this becomes for all $x\in X$ we have $T(T(x))\le T(x)$. That is, $T$ is idempotent.

So, there we have it: a monad is precisely a closure operator. But we’re not done yet: an algebra for a monad $T$ on $C$ is an object $x\in C$ together with an arrow $T(x)\rightarrow x$ satisfying diagrams (that we again don’t care about). That is, in a poset, an algebra for a monad $T$ is an element with $T(x)\le x$—a closed element!

My original plan was to finish this off with some discussion of finitary monads, but I think what’s here is enough for now. Perhaps another time I’ll get back to it.

# On Cantor’s Diagonal (again)

Just like so many mathematical cranks, the diagonal argument has always fascinated me. But unlike the cranks, it’s not the argument itself that fascinates me, but the sociological phenomenon of it: Cantor’s theorem is without a doubt the single result in mathematics which generates the most crankery.

What has fascinated me is the mystery of why. Why is this result so hard to stomach for so many? Some have suggested that the answer has to do with intuitionism, which I have argued is not the case. The issue seems to be with the notions of “infinity” and “size”: the usual presentation of the diagonal argument seems to make use of infinitely many choices, and the result (as usually presented) says that the set of reals is “bigger” than the set of naturals, which is repugnant to people who don’t accept the notion of “completed infinity”.

My goal here is not to argue against finitists (I haven’t the patience), or to try to justify the idea that there are more reals than naturals (I have no interest in flogging a horse which died over a century ago). Rather, I want to argue that Cantor’s theorem (rather, the spacial case dealing with the naturals) makes sense without talking about infinite sets, and without talking about size.

To do this, we need to talk about sets without them being extensional (that is, without them being determined by their elements) and to talk about functions in a slightly different way than we normally do when presenting the diagonal argument. We do this by dealing with types.

Types are determined not by their members, but by introduction and elimination rules. Most types of interest are “inductive types”, where the introduction rules give us “canonical elements”, and the elimination rule says that to define a function out of the type, it suffices to say what happens on the canonical elements. Often, the canonical elements are generated according to some recursive rule: If $T$ is the type we are defining, we may define new members of $T$ by reference to old members of $T$ (and possibly by reference to members of other types).

The classical example is the type of natural numbers; the type $\mathbb{N}$ of natural numbers is determined by the following rules:

• Introduction rule: There is a member $0:\mathbb{N}$.
• Introduction rule: For any $n:\mathbb{N}$, there is a member $\mathsf{succ}(n) :\mathbb{N}$. That is, we can always take the “successor” of a natural number.
• Elimination rule: If we have a type $C$, a member $c_0:C$ and a function $c_s:\mathbb{N} \rightarrow (C\rightarrow C)$, there is a function $f:\mathbb{N}\rightarrow C$ such that $f(0)=c_0$ and for an arbitrary $n:\mathbb{N}$ we have $f(\mathsf{succ}(n)) = c_s(n,f(n))$. Here, $c_s(n,f(n))$ is shorthand for $(c_s(n))(f(n))$—since $c_s(n)$ is a function from $C$ to $C$, and $f(n)$ has type C (by the “inductive hypothesis”), we may apply $c_s(n)$ to $f(n)$.

We say that $\mathbb{N}$ is inductively generated by $0$ and $\mathsf{succ}$.

The point is that we don’t need $\mathbb{N}$ to be some “completed collection”—we only need to be able to recognize when something is a natural number, and once we’ve recognized something to be a natural number, to be able to perform the sorts of manipulations we expect to be able to perform on natural numbers.

Another classical example is the type of Booleans, which we call $\mathbf{2}$. It is “inductively” generated by members we call $\top:\mathbf{2}$ and $\bot:\mathbf{2}$. That is,

• $\top:\mathbf{2}$;
• $\bot:\mathbf{2}$;
• Given a type $C$ and members $c_\top:C$ and $c_\bot:C$, there is a function $f:\mathbf{2}\rightarrow C$ such that $f(\top)=c_\top$ and $f(\bot)=c_\bot$.

An example of a function constructed using the elimination rule for $\mathbf{2}$ is the negation, $\neg:\mathbf{2}\rightarrow \mathbf{2}$, which we define as follows: We define $c_\top:=\bot$ and $c_\bot:=\top$. Then the elimination rule says that we have a function (which we denote $\neg$) from $\mathbf{2}$ to $\mathbf{2}$ such that $\neg(\top)=\bot$ and $\neg(\bot)=\top$.

An important fact about inductive types is that canonical elements which take different forms are never equal. For example, $n:\mathbb{N}$ is not equal to $\mathsf{succ}(n)$, and $\top\neq\bot$.

The elimination rules actually generalize from functions to a type, to dependent functions on a type family. For $\mathbf{2}$, this becomes:

Given a type family $F$ indexed by $\mathbf{2}$, and members $f_\top : F(\top)$ and $f_\bot : F(\bot)$, there is a dependent function $f:\prod_{x:\mathbf{2}}F(x)$ such that $f(\top)=f_\top$ and $f(\bot)=f_\bot$.

Recall that in type theory, equality is a type family, double-indexed by members of a fixed type. In particular, we have a type family $D$ indexed by $\mathbf{2}$ defined as $D(x):= \neg(x) = x$. Going a step farther: the claim $x\neq y$ is to be regarded as a type (for $x$ and $y$ fixed), so we may define the type family $F$ indexed by $\mathbf{2}$ as $F(x):= \neg(x)\neq x$. Combined with the elimination rule for $\mathbf{2}$, the fact above that $\top\neq \bot$ gives us

Lemma. For an arbitrary $x:\mathbf{2}$, we have $\neg(x)\neq x$. That is, there is a dependent function $f:\prod_{x:\mathbf{2}}(\neg(x)\neq x)$.

Proof. We remarked above that we have $\top \neq\bot$. That is, we have some $p:\top\neq\bot$. By symmetry, we also have some $p' :\bot\neq\top$. Since $\neg(\top)$ is $\bot$ and $\neg(\bot)$ is $\top$, we have that $p:\neg(\bot)\neq\bot$ and $p':\neg(\top)\neq\top$. Setting $f_\top:=p'$ and $f_\bot:=p$, we may apply the elimination rule for $\mathbf{2}$ to get the desired $f$. $\square$.

We cannot interact with members of these types any way we want—we can only interact with them using the rules of type theory, or by choosing an arbitrary member (which we have no information about, other than its type). In particular, to define a function from $\mathbb{N}$ to another type $C$ we may either:

• Use the elimination rule, provided we have the necessary information (the type $C$, the member $c_0$ and the function $c_s$.)
• Use the introduction rule for function types: If, given an arbitrary member of $\mathbb{N}$ we may construct a member of $C$, then there is a function $f:\mathbb{N}\rightarrow C$.

The take away is that we cannot define a function out of $\mathbb{N}$ by saying “Do this to 0, this to 1, this to 2, and so on.” We must give an explicit rule which does not directly refer to the members, except in an arbitrary way, or according the “inductive structure” of $\mathbb{N}$.

Now, we need to interpret Cantor’s diagonal in this context. Rather than using real numbers, or defining a powerset, we will simply use function $\mathbb{N}\rightarrow\mathbf{2}$. We also must forego any mention of “size” or “cardinality” So our theorem becomes

Theorem. There is no surjective function from $\mathbb{N}$ to $\mathbb{N}\rightarrow\mathbf{2}$.

Proof. Let $f$ be an arbitrary function $f:\mathbb{N}\rightarrow (\mathbb{N}\rightarrow\mathbf{2})$. Note that we know nothing about $f$ other than its type. That is, we may only interact with it using the introduction and elimination rules for function types.

We define a function $\delta_f:\mathbb{N}\rightarrow\mathbf{2}$ by function introduction (“lambda abstraction”): $\delta_f:=\lambda (n:\mathbb{N}). \neg f(n,n)$. That is, $\delta_f$ takes a natural number $n$, a negates the value of applying $f(n)$ to $n$.

Claim. $\delta_f$ is not in the range of $f$. That is, there is no $n:\mathbb{N}$ such that $\delta_f=f(n)$.

Proof (of claim). Let $n:\mathbb{N}$ be arbitrary. By the lemma, we know that $\neg(f(n,n))\neq f(n,n)$. Since $\delta_f(n)$ is $\neg(f(n,n))$, we have that $\delta_f(n) \neq f(n,n)$. But then $\delta_f$ and $f(n)$ differ in some argument, namely $n$, so $\delta_f \neq f(n)$. Since $n$ was arbitrary, there is no $n$ such that $\delta_f = f(n)$. This proves the claim.

In other words, $f$ is not surjective. Since $f$ was arbitrary, we must have that no function $\mathbb{N}\rightarrow(\mathbb{N}\rightarrow\mathbf{2})$ is surjective. $\square$

And there we have it. The point is that we needn’t think of sets as “completed collection”, but merely “things generated by rules”, and we needn’t introduce the notion of “size” into the discussion at all: I don’t know what “the set of natural numbers” is (I just know I have a way of generating some natural numbers, and a way to construct a new object when I am given an arbitrary natural number) and I don’t know what “size” is (I just know what a surjective function is), and I still have a way to make sense of Cantor’s theorem.

Someone intent on arguing with my may suggest that I’ve moved the goalposts—after all, the real numbers aren’t really functions from $\mathbb{N}\rightarrow\mathbf{2}$. However, this misses the point: the argument can be reformulated only slightly to work for the Dedekind or Cauchy construction of the reals; both constructions are given in a style like this one—without appeal to infinity or choice or any other controversial infinitary principle—in Chapter 11 of the HoTT book.

The next post will be on a topic less mundane and tired.

# A quick rant on Cantor’s diagonal argument and intuitionism

Sorry for the delay… I’ve realized that I’ll need to completely rework my next post on forcing, and I don’t have the time right now. Hopefully soon(tm) I’ll be able to finish up and do something a little more fun. In the mean time, some thoughts on the famous diagonal argument.

When explaining the diagonal argument, and when trying to understand why so many students have such a hard time accepting it, intuitionism seems to come up a lot; the statement is usually something like “Many students implicitly use intuitionistic logic, which is why they have such a hard time accepting the proof” or “the diagonal argument uses proof by contradiction so requires classical logic.”

This claim always baffles me because the diagonal argument is inutionistically valid!

The misconception seems to arise from the misleading statement “Intuitionistic logic does not allow proofs by contradiction.” This statement is true, but many math students (and many mathematicians, it seems) have a vague notion of what proof by contradiction is.

Consider the following three “arguments” by contradiction:

1 Claim: foo.

Proof: Assume not foo. [Argument] Contradiction. Therefore foo.

2 Claim: not foo.

Proof: Assume the claim is false. That is, assume foo. [Argument] Contradiction. Therefore not foo.

3 Claim: not foo.

Proof: Assume foo. [Argument] Contradiction. Therefore not foo.

3 is intuitionistically valid, 2 is an incorrect argument (intuitionistically), but leads to a correct result, while 1 is not intuitionistically valid. Where is the difference?

In 1, we assume not foo, and then derive a contradiction. We haven’t yet shown that foo is true, instead, what we’ve shown is that not foo is false. We then jump to foo is true using your favorite classical principle (e.g., law of the excluded middle, or double-negation elimination)

In we make this jump directly. And then we derive a contradiction from the “assumption” not (not foo); however, we’ve actually derived it from foo directly: instead of assuming not (not foo), we should just assume foo directly, and show that this leads to a contradiction (and so must be false). This would give us 3, where we never jump from not foo is false to foo is true.

That little jump is what defines the “proof by contradiction” that intuitionists find objectionable. We haven’t argued that foo is true, we’ve argued that not foo is false. These are (to the intuitionist) different propositions. 1 is an actual proof by contradiction, 2 looks like a proof by contradiction, and 3 looks like 2.

To the classical mathematician, 2 and 3 look the same, and since classical mathematicians don’t usually distinguish between negative and positive propositions, 2 looks like 1, so the three forms of argument all look the same to the classical mathematician. And so the statement “intuitionistic logic doesn’t allow proof by contradiction” ends up looking to the classical mathematician like a heavy restriction indeed.

In fact, the *only* way to prove a negative statement in intuitionistic logic is to show a contradiction: the definition of not foo in intuitionistic logic is “foo implies a contradiction.” So 3 must be valid.

Now, Cantor’s Diagonal is usually presented as follows:

Claim: There is no bijection from the set of naturals to the set of reals.

Proof: Assume the claim is false. That is assume there is such a bijection f. [Costruct the diagonal] contradiction. Therefore the claim is true.

Observe that the form of this argument is that of 2. Instead we shouldn’t assume the claim is false, but rather assume directly such a bijection exists, and end up with an argument with the same form as 3. We can actually do one better, but this has (almost) nothing to do with intuitionism:

Instead of showing there is no bijection, Cantor’s argument actually shows that no function from the naturals to the reals is surjective. And it does so in a remarkably constructive way: if you give me any function f, I can give you a real number not in the range; that is, a term witnessing the fact that f is not surjective. The fact that we can construct an explicit term is usually a good indication that a proof is intuitionistically valid.

# Towards Forcing: Extensions of Structures

Last time

Unfortunately, I don’t have a result to work towards with this post. Instead, I want to give a rough idea of what forcing is, by analogy with other extensions of structures, in particular, field extensions and Cauchy completions.

The usual way to construct an extension $F$ over a base field $K$ is to construct $F$ as a quotient of the polynomial ring $K[x]$ by some (irreducible) ideal $I$. Abstracting a bit, the sketch of this construction is:

• Start with a base structure $M$. (In this case, our base field $K$.)
• Find an object which approximates some gap in our structure. (In this case, the ideal $I$ represents an approximation of a solution to some equation (or set of equations) in $K$.)
• Use some common operation to generate a new structure which contains the approximated object. (In this case, take the quotient $K[x]/I$.)

A similar thing happens when we complete a metric space:

• Start with a metric space $M$.
• Take the set $C$ of Cauchy sequences in $M$. Here, Cauchy sequences are approximations of points which don’t exist in $M$.
• Quotient $C$ by Cauchy equivalence.

Compactification is another example of this type of construction. The basic idea of forcing is similar, but we need to figure out:

• What is our base structure?
• How do we approximate an object which doesn’t exist?
• What operation do we use to finish the approximation?

The base structure is easy: we start with a model of set theory, $M$. For technical reasons, we need this to be a model of a “sufficiently large fragment of ZF”. What this means isn’t important, because we’ll just tacitly assume we’re always working with models of ZF (even though this isn’t quite justified.) We also want this to be a transitive model, because certain technical lemmas (that we’ll avoid) require this. Sorry, this is all a little sloppy; we’ll go into more detail next time.

What are the approximations? Forcing makes use of a poset $P$ (called a forcing notion) which exists in $M$ (That is, $P\in M$.) The elements of $P$ are the approximations of an object which doesn’t exist in $M$, and we want to add to $M$ the “limit” (which won’t exist) of a set of elements of $P$. (edit: I said supremum before, but that’s not really what we’re looking for.)

The subset of $P$ we use is what’s called a generic filter. A filter in $P$ is a subset $F$ of $P$ such that

• $F$ is non-empty;
• $F$ is upward closed. That is, if $p\in F$ and $p\le q$, then $q\in F$;
• Every two elements of $F$ are compatible. That is, for every $p,q\in F$, there is some $r$ such that $r\le p$ and $r\le q$.

Aside: this is nothing more than a generalization of the notion of filter for Boolean algebras (or lattices in general). Since in a Boolean algebra, a filter is dual to an ideal, by taking a filter we are morally doing the same thing as taking an ideal in a ring.

A subset $D$ of $P$ is called dense if for every $p\in P$ there is some $q\in D$ with $q\le p$ (that is, you can always find an element of $D$ below any element of $P$, and $D$ is called open if it is downward closed.

A filter $F$ is called generic over $M$ if additionally, $F$ intersects every dense open subset of $P$ which is an element of $M$.

Genericity expresses “non-existence of an object” in much the same way that non-convergence of a Cauchy sequence or irreducibility of an ideal expresses non-existence of an object.

It’s a fact (which I won’t prove), that for every poset $P\in M$, there is a generic filter $G$ over $M$. However, we want $G$ to not exist in $M$, and to ensure this, we require $P$ to be perfect; that is, for every $p\in P$, there are $q,r\le p$ which are incompatible. If $P$ is perfect, it is in fact the case that there are generic filters over $M$ that are not elements of $M$.

To complete the picture, we need to know how to generate the new structure. For this, we use relative constructability, which requires a more thorough explanation than I’m prepared to give in this post. We’ll leave a discussion of that, as well as some other important details for the next post. At any rate, when we’re finished, we’ll have a structure denoted $M[G]$, which has $M$ as a subset at $G$ as an element. Usually, by letting $c=\bigcup G$, we have that $c\not\in M$, but $c\in M[G]$. Then $c$ is a new element of our universe, and is the element “approximated” by $G$.

Before I leave off, I want to give a sketch of the first forcing argument: adding a new real number.

As our notion of forcing, we take what’s called the Cohen forcing (denoted $\mathbb{C}$), which is the set of all finite sequences of $\{0,1\}$ ordered by reverse inclusion; that is, $\mathbb{C}$ is the complete binary tree turned upside down. The idea is that the elements of $\mathbb{C}$ represent approximations of subsets of $\mathbb{N}$, where as we move down the tree (away from the root) we get a better approximation. Observe that $\mathbb{C}$ is perfect: We can extend any sequence $p$ by adding either 1 or 0, and these two extensions will be incompatible.

A filter in $\mathbb{C}$ is a branch in the tree (being just a filter it need not be a complete branch—it may cut off somewhere). A dense subset $D$ is one which extends every possibly sequence. That is, given any sequence, say $s=\langle 0,1,0\rangle$, we have some sequence in $D$ which extends $s$, maybe something like $\langle 0,1,0,0,1\rangle$.

Genericity ensure two things:

• The branch is a “complete” branch–it is a path all the way from the root to “the end”.
• The branch does not exist in $M$.

Once we have a generic filter $G$, when we construct $M[G]$, we get an infinite sequence $c=\{c_n:n\in\mathbb{N}\}$ which is the limit of the sequences in $G$. By genericity, we must have that $c\not\in M$. But this sequence corresponds to a real a number; if the real number existed in $M$, then so would $c$, so we know that $c$ is a new real number. Such a new real number is called a Cohen real.

We can use this idea to construct a model of ZF in which the continuum hypothesis fails: instead of just adding one real number, we can add as many real numbers as we want.

Next time, after we discuss relative constructibility and some other minutia of forcing, we will see exactly how to “refute” the continuum hypothesis.

# Circles, Contractible Spaces, and the Axiom of Choice

I’m still trying to decide how best to continue the forcing series, but while we wait, let’s talk about something that took me an embarrassingly long time to understand:

Why isn’t the circle contractible?

The topologists in the audience are probably thinking “because it has a hole, obviously!” but I’m talking about the circle and contractibility as defined by homotopy type theorists; when you approach these with intuition from logic rather than topology, seeing the “hole” is a little less obvious. To be fair, the type theoretic proof that the circle is not contractible is pretty straightforward, certainly easier than the traditional proof, but it feels a bit like sleight of hand, and I never found it very convincing by itself.

Type theory

We’ll use some of the same bit of type theory as last time. As a refresher, we have terms which always come with types. We write $a:A$ to say “The term $a$ has type $A$. We have lots of ways to form types and type $A$ will always come with ways of creating terms of type $A$ (called introduction rules) and ways of using terms of type $A$ (called elimination rules). Again, there are more rules, but we’re not too worried about them for now.

We’re still interested in dependent functions: Given a type $A$ and a family of types indexed by $A$ (say, $B(x)$ for $x:A$), we have a type of dependent functions from $A$ to $B$, which we write as $\displaystyle \prod_{x:A}B(x)$. The type $\displaystyle\prod_{x:A}B(x)$ can also be viewed as the quantified formula $\forall x\in A,\;B(x)$, when $B(x)$ represents a proposition. In fact, even when $B(x)$ doesn’t represent a proposition, we can think of a dependent function $d:\displaystyle \prod_{x:A}B(x)$ as a witness that “for all $x:A$, the type $B(x)$ is inhabited”.

As Martín Escardó points out, “types are secretly topological spaces and functions are secretly continuous.” This means that we should actually be thinking of the “for all” in the above as a slightly stronger statement: we have a way to continuously exhibit inhabitants of each $B(x)$.

As discussed last time, we have a type representing equality: given two terms of the same type, say $x,y:A$, there is a type $x=y$ (called the identity type), or more clearly $x=_{A}y$. There are two basic ways to think of this type

• Logically: As the proposition that $x$ is equal to $y$, in which case a term $p:x=y$ is a witness that $x$ is equal to $y$.
• Topologically: This type is the space of paths from the point $x$ to the point $y$, in which case a term $p:x=y$ is a path in $A$ from $x$ to $y$.

For now at least, we’ll mostly be using the second interpretation; after all, we’re talking about a homotopy-theoretic property today. The one last point about identity types is that given a term $x:A$, we have path $r(x):x=x$. The term $r(x)$ is called the reflexivity term or the constant path.

Contractibility

The topological intuition for identity types immediately motivates a definition for contractible: A type $T$ is contractible when it has a “center of contraction”– a point $c:T$ such that $\prod_{x:T}(x=c)$ is inhabited. This seems to be saying “there is a point $c:T$ such that for all $x:T$ there is a path $x=c$–that is, path-connectedness.

The astute reader will recall that just a few paragraphs ago, I said that definitions must be continuous, so this reading is misleading: a better reading is “for each $x:T$ we have a path $p_x:x=c$ which varies continuously with $x$“, which is actually a way to read the classical definition of contractible.

Contractible types may seem somewhat familiar from the last post: in fact, contractible types are propositions, and propositions are either contractible (“one inhabitant up to homotopy”) or empty. (Edit: As Andrej points out below, this is not exactly true, but its double negation is: no proposition is neither contractible nor empty.) Contractible types have a few nice properties, 2 in particular:

• Every contractible type is equivalent to the unit type 1 (which has a single constructor *).
• For any two contractible types, $T$ and $S$, there is a unique function (up to homotopy) $f:T\rightarrow S$ (the one which takes the center of $T$ to the center of $S$). When $T\equiv S$, then this $f$ is the identity function.

The circle and transporting along paths

It took me a long time to convince myself that $\prod_{x:T}(x=c)$ really represented some sort of continuous deformation; I never really believed all that stuff about types secretly being topological space and functions secretly being continuous. This meant I really saw the definition of contractible as “path-connected”

The prototypical example of a path-connected, non-contractible space is the circle, $\mathbb{S}^1$, so to see why my intuition was wrong, we may as well start with the circle. There are a lot of homotopy-equivalent ways to define the circle, but the central fact is that the circle is a loop. A loop is just a path, from some base point to itself, so we need a point which we may as well call base. Then the circle $\mathbb{S}$ is generated by:

• A term $\mathsf{base}:\mathbb{S}$ and
• A path $\mathsf{loop}:\mathsf{base}=_{\mathbb{S}}\mathsf{base}$

The meaning of “generated by” in the last sentence is captured by the following rule:

(Induction Principle for $\mathbb{S}$) Let $P(x)$ be a family of types indexed by $\mathbb{S}$, and suppose we have a term $b:P(\mathsf{base})$ and a path $l:b=_{P(\mathsf{base})}b$, lying over loop. Then we have a dependent function $f:\displaystyle \prod_{x:\mathbb{S}}P(x)$ such that $f(\mathsf{base})=b$ and $f(\mathsf{loop}) = l$.

To make sense of this, we need to know what it means to apply a function to a path, and what it means for a path to “lie over” another one. In explaining how to apply a function to a path, we end up answering what it means for a path to lie over another.

The application of a function to a path: If $x=_Ay$, it seems obvious that for any function $f:A\rightarrow B$ we should always have $f(x)=_Bf(y)$; in type theory, this means we should have a function $\mathsf{ap}_f:(x=_Ay)\rightarrow (f(x)=_Af(y))$ (“ap” is short for application”). For a non-dependent function, this is indeed the case, and by abuse of notation, we we write $f(p)$ instead of $\mathsf{ap}_f(p)$.

For a dependent function $f:\prod_{x:A}B(x)$, it gets a little trickier: since the target type of a dependent function depends on its input, even if $x=y$, it’s not immediately clear that $f(x)$ and $f(y)$, even have the same type. If they don’t have the same type, they cannot be compared, so we need to do more work.

Homotopically, a type family corresponds to fibration (with $B(x)$ being the fiber over $x$), and a dependent function is a section. That is, we have some “total space” (which type theorists write $\sum_{x:A}B(x)$), with subspaces $B(x)$ resting above each $x:A$, and a dependent function picks a point in $B(x)$ for each $x$.

Now, (still speaking homotopically) if we fix some point $u:B(x)$, a path $p:x=y$ lifts to a path $p_\sim:u=v_u$, for some $v_u:B(y)$ (notice the type) in this total space. We can then turn $p_\sim$ into a function $p_*:B(x)\rightarrow B(y)$ by sending $u$ to $v_u$. On the other hand, if we take a section $s$ of the fibration, the path $p$ gives rise to a path $s(p): s(x)=s(y)$ this path lies over $p$ in the sense that it projects from the total space down to $p$. Any path over $p$ factors through $p_\sim$: there is a path $q_p$ in $B(y)$, from $v_u$ to $s(y)$ such that the composition $p_\sim\cdot q_p$ is $s(p)$; or, in terms of our function $p_*$, we have a path $q_p: p_*(s(x)) = s(y)$.

Back to type theory: we can simulate this whole argument in homotopy theory, but the order we do things in changes: The big difference is that in type theory, we “stay within the fibers”; as a result, we forget about $p_\sim$ and the path I called $s(p)$ above. Instead, we work with the map $p_* : B(x)\rightarrow B(y)$ (which is easier to define in type theory than the lifting), and we use the map I called $q_p$ instead of the one I called $s(p)$; then $q_p$ becomes the path “lying over” $p$. Explicitly

When $p:x=_Ay$ and $q:p_*(f(x))=_{B(y)}f(y)$ for some dependent function $f:\prod_{x:A}B(x)$, we say that $q$ lies over $p$.

All of this is to say, in the case of a dependent function $f:\prod_{x:A}B(x)$, the theorem stating that “if $x=y$, then $f(x)=f(y)$“, becomes

For any dependent function $f:\displaystyle\prod_{x:A}B(x)$ and any path $p:x=y$, we have a path $\mathsf{apd}_f(p):p_*(f(x))=f(y)$.

In other words, while we don’t necessarily have an equality $f(x)=f(y)$ (since this isn’t guaranteed to make sense), if we transport f(x) along $p$, we do have an equality.

As with $\mathsf{ap}$, we will typically abuse notation and use $f(p)$ instead of $\mathsf{apd}_f(p)$.

Non-contractibility of the circle

Back to the induction principle for the circle: If we have a $\mathbb{S}$-index family $P(x)$, and we find some $b:P(\mathsf{base})$ and some $l$ lying over $\mathsf{loop}$, then we can create a map $f:\prod_{x:\mathbb{S}}P(x)$ such that $f(\mathsf{base})\equiv b$ and $f(\mathsf{loop})\equiv l$. This really is an induction principle: we have two generators, $\mathsf{base}$ and $\mathsf{loop}$, and by determining what happens with them, we know what happens with the whole structure. The requirement that $l$ lies over $b$ is so that $\mathsf{apd}_f(\mathsf{loop})$ has the correct type, namely $\mathsf{loop}_*(f(\mathsf{base}))=f(\mathsf{base})$.

The requirement that $l$ lies over $b$ is also why we can’t contract the circle: To say the circle is contractible, we want a term $c:\prod_{x:\mathbb{S}}x=\mathsf{base}$. (We could, of course, use some other point… but where would we get this other point from?) This is a perfect place to apply induction: we need to send the basepoint to some path $\mathsf{base}=\mathsf{base}$. We have two “obvious” choices: the reflexivity term, or the loop. Let’s just try $r(\mathsf{base})$. Then we need to send $\mathsf{loop}$ to some path $\mathsf{loop}_*(r(\mathsf{base}))=r(\mathsf{base})$.

This is a bit of an awkward place to be: it’s not clear (and intuitively fishy) that transporting along the loop brings you to where you started–after all, that’s how you prove that $\pi_1(\mathbb{S}^1)=\mathbb{Z}$. And of course, it’s not true. We could try to send $\mathbf{base}$ to $\mathbf{loop}$, but we’ll run into similar problems.

So, we can’t contract the circle (or at least, not in an obvious way), which is probably a good thing.

Pointwise paths, truncation

On the other hand, I’ve said a few times that $\prod$ can be interpreted to mean “for all”, and we should have “for all points in the circle, there is a path to the basepoint.” It wouldn’t be much of a circle otherwise. So how do we get this theorem without contracting the circle?

The answer is to use truncation. The propositional truncation or -1-truncation (“minus 1 truncation”) $\|A\|$ of a type $A$ is essentially the “quotient” of $A$ by the total relation: for each $x:A$ we have $|x|:A$, and we have paths $\prod_{x,y:\|A\|}x=y$. Intuitively, just connect every single point to every other point “continuously”. This turns out to be a proposition (either contractible or empty), and is inhabited iff $A$ is.

What if instead of $\prod_{x:\mathbb{S}}x=\mathsf{base}$ we tried $\prod_{x:\mathbb{S}}\|x=\mathsf{base}\|$? Again, we proceed by induction, and we send $\mathsf{base}$ to the reflexivity term $r(\mathsf{base})$–actually, this doesn’t have the correct type. We need to take $|r(\mathsf{base})|$–the “truncated version”. Now look at our function $\mathsf{loop}_*:\|\mathsf{base}=\mathsf{base}\|\rightarrow\|\mathsf{base}=\mathsf{base}\|$. Remember earlier I said there’s only one function (up to homotopy) between contractible types? Well, this one function must be the identity. So then we know that $\mathsf{loop}_*(|r(\mathsf{base})|)=|r(\mathsf{base})|$. And then we can apply the induction principle to get some term of type $\prod_{x:\mathbb{S}}\|x=\mathsf{base}\|$.

And the Axiom of Choice

The difference between $\prod_{x:\mathbb{S}}\|x=\mathsf{base}\|$ and $\prod_{x:\mathbb{S}}x=\mathsf{base}$ is subtle: the first says that each type $x=\mathsf{base}$ is inhabited, but we (may) have no algorithm for finding such an inhabitant; the second says that we can algorithmically determine these elements. It’s pretty strange that a constructive theory allows us to make non-constructive statements!

But a question arises: when can we say that $\prod_{x:\mathbb{S}}x=\mathsf{base}$ implies $\prod_{x:\mathbb{S}}\|x=\mathsf{base}\|$?

One of my favorite ways of expressing the axiom of choice is: “The product of a non-empty family of non-empty sets is non-empty”. Turned into type theory “a non-empty set” becomes a set with an inhabitant, but we should forget about “constructive content” to get “$A$ is non-empty” as $\|A\|$. We can then translate (this version of) AC into type theory as:

(Axiom of Choice) For any set $A$ and family $P(x)$ indexed by $A$, we have $\displaystyle{\left(\prod_{x:A}\|P(x)\|\right)}\rightarrow \|\displaystyle\prod_{x:A}P(x)\|$.

In the statement of AC and in the preceding paragraph I intentionally used the word “set” where you probably expected the word “type”. That’s because AC$\mathstrut_\infty$ (AC for all types) is inconsistent with univalence (after all, the circle is a counterexample). However, it’s consistent to assume the axiom of choice for sets, which we will leave for another day.

# Towards forcing: Non-well-founded models of Foundation

(An earlier version of this had a terribly false statement about transitive models of ZF. It has been fixed.)

My goal with the next few posts is to present one of my favorite forcing arguments:

Let $\mathbb{C}_\omega$ be the $\omega$-product of the Cohen forcing $\mathbb{C}$ with full support. Forcing with $\mathbb{C}_\omega$ collapses the reals.

We need a fair bit of background to get there. Instead of dumping a bunch of unmotivated information about forcing and collapsing cardinals on you just to get this one result, I’m going to take a more circuitous route and see some cool things along the way. Today, we’re talking about models of set theory. In particular:

(Assuming ZF is consistent,) there are non-well-founded models of ZF.

If you’re familiar with the ZF axioms, this may seem strange, since the axiom of foundation seems to say that the universe is well-founded. A bit of model theory will show us what’s going on.

A brief technical point: for all of the posts “Towards forcing”, the metasystem is ZF, so a theorem like “All foos are bar” really means $ZF\models \text{"all foos are bar"}$. This distinction may seem pedantic, but it does become important when discussing models of set theory. Since we’re discussing the model theory of ZF, using ZF as the metasystem brings certain technical problems: we cannot prove in ZF that ZF has any models. There are lots of ways around this; the easy (albeit unsatisfying) way is: Assume ZF is consistent. From here on, I’ll leave this implicit most of the time, so our theorem “all foos are bar” should be read $ZF\models Con(ZF)\Rightarrow \text{"all foos are bar"}$.

If you haven’t studied much logic or set theory, I may be starting to lose you. So let’s back up a bit: what’s a “model of set theory” anyway? First,

Definition. A relational structure is a set $U$ together with a binary relation $R^{V}\subseteq U\times U$.

The idea is that a “set theory” will be a set of rules governing a binary relation; ZF, for example is simply such a set of rules. A model of set theory is a relational structure: a set which represents the universe, together with the relation we consider membership. Of course, there are certain things we expect from sets, certain axioms we expect our membership relation to satisfy.

In order to talk about a “theory”, we need these axioms to be properties of the theory, rather than properties of the model; that is, they need to be first order. A first-order property is one which can be written as a (set of) first order formulas. A formula is first order if it only uses variables, logical symbols ($\wedge$, $\forall$, etc.) and whatever “extra symbols” we’ve added to our language. In this case, our only extra symbol is $R$. Then, given a (first order) formula $\varphi$, we can interpret $\varphi$ in a structure $(V,R^V)$ by replacing a quantifier $\forall x$ with $\forall x\in V$, and replacing the symbol $R$ with the relation $R^V$, to get a statement which the structure either satisfies or doesn’t.

With a first order theory, anything which has meaning “in the theory” can be interpreted in the same way as a formula.. For example, if I can define some set $S$ in ZF, then in any model $M$ of ZF (that is, in any relational structure satisfying all the ZF axioms), there is some “set” (object of $M$) $S^M$, called the interpretation of $S$. A convenient intuition is that “$M$ thinks that $S^M$ is $S$.”

We’ll be very liberal with what is required to call a relational structure a “model of set theory” and only impose the axiom of extensionality:

Definition. A model of set theory is a relational structure $(V,\in^V)$ which satisfies the axiom of extensionality. That is, $\forall x,y (\forall z (z\in^V x\leftrightarrow z\in^V y) \leftrightarrow x=y).$ In English: x is equal to y if and only if $z\in^V x$ precisely when $z\in^V y$.

This relation $\in^V$ doesn’t have to be “real” membership–it can be anything we like, so long as it’s extensional. Here’s an example: Let the universe be $\mathbb{N}$ and the “membership relation” be the usual ordering $<$. This is extensional, since for any $n,m\in\mathbb{N}$ we have either $n or $m, but not both.

And a non-example: Let $V$ be the Cantor tree (the complete binary tree), and for two vertices $u,v$, say $u\in^V v$ iff $v$ is a child of $u$. This is a relational structure, but it's not a model of set theory, since the left and right child of any node have the same "elements".

You’re probably familiar with induction for numbers, and perhaps “structural induction” on, say, the structure of formulas or graphs. Very often you can do induction over a relational structure in much the same way: Given a relational structure $(V,R)$ and a property $P$, we can prove $\forall x\in V, P(x)$ by proving for each $x$ that $(\forall y (yRx\rightarrow P(y))\rightarrow P(x)$. However, we need to impose a condition called “well-foundedness” on $R$ to avoid “infinite loops”.

Definition. A relational structure $(V,R)$ is called well-founded if for every subset $U\subseteq V$, there is an element $s\subseteq U$ such that $xRs \rightarrow x\not\in U$.

If we think of $R$ as being “order-like”, this is just saying that every subset of $V$ has an $R$-least element. The condition of well-foundedness ensures that there is no infinite descent in $V$. But notice, we make critical use of $V$ in the definition of well-foundedness. In fact, well-foundedness is not a first order property! In other words, we can’t say “V is well-founded” as an axiom.

To get around this, we need to come up with a local version of well-foundedness. Instead of taking all subsets of $V$, what if we only take the subsets $\{ y : yRx\}$ for some $x$? This allows us to write

Definition. The Axiom of Foundation is the formula $\forall x (\exists zRx \rightarrow (\exists yRx \forall z (\neg zRy\wedge zRx)))$. Again in English: For every element $x$ in our structure, if $x$ isn’t “empty”, then $x$ contains an element $y$ “disjoint” from $x$.

The example structure $(\mathbb{N},<)$ above satisfies the axiom of foundation. In fact, this structure is well-founded. An example which satisfies the axiom of foundation, but isn’t well-founded is the set $\mathbb{Z}^-$ of negative integers, with $x\in^{\mathbb{Z}^-}y$ iff $y=x+1$ (the predecessor relation). Here, every “set” has exactly one element, which is clearly disjoint from it so foundation is satisfied, but any $x\in\mathbb{Z}^-$ is the successor of some other non-negative integer, so the structure is not well-founded.

From the title of the post, we can stop there, but really, we’re interested in non-well-founded models of ZF, so we have to do a little more work.

A more interesting model of foundation is the structure $(V_\omega,\in)$. If you aren’t familiar with the cumulative hierarchy, $V_\omega$ is the set $\{x : x\in \mathcal{P}^n(\emptyset)\text{ for some }n\in\mathbb{N}\}$, and $\in$ is the real, actual membership relation (in our metasystem), just restricted to $V_\omega$. Besides being a model of all ZFC axioms except infinity, this structure has two very interesting properties:

• $V_\omega$ is a transitive set, meaning that every element $v\in V_\omega$ is a subset of $V_\omega$.
• The relation $\in$ is, well, actually the membership relation.

Such models are called transitive models. Transitive models are important because,

Theorem. (Mostowski Collapse). Let $(U,R)$ be an extensional, well-founded relational structure (i.e., a well-founded model of set theory). Then $(U,R)$ is canonically isomorphic to a unique transitive model. (“canonically isomorphic” means there is a unique isomorphism.)

I won’t go into the proof, but the sketch is: Define $\varphi:U\rightarrow V$ (where $V$ is the universe of sets) recursively by $\varphi(x)= \{\varphi(y) : yRx\}$. Some simple induction (and the axiom of replacement) gives us that the image of $\varphi$ is a transitive set, and the isomorphism follows directly. Uniqueness is proved by induction.

So far, we’ve been talking about models of set theory in general. Now let’s jump all the way to models of ZF: that is, relational structures that satisfy all the ZFC axioms. We can finally prove

Theorem. (If ZF is consistent), then ZF has non-well-founded models.

Proof. Suppose ZF is consistent. If there are no well-founded models then we are done. Otherwise, by Mostowski’s collapse, we may consider only those well-founded models which are transitive. The ordering given by $\in$ on transitive models is well-founded, so we have a minimal transitive model, $M$.

The interpretation of $M$ internal to $M$ must also be a model of ZF. But it cannot be transitive, as $M$ is the minimal transitive model. That is $M^M$ is a non-well-founded model of ZF.

There are several other ways to prove that ZF has non-well-founded models. Most of them are actually more intuitive, but they don’t hit the points I want to hit.

# A bit of type theory: Negation is a proposition

(This was originally posted to reddit, but since the response to it spawned this blog, I figured I’d repost it here. It could use some editing, but c’est la vie…)

So, I’ve wanted to write a post with a nice, simple but non-trivial logic result for a while now, but I’ve only now found a good reddit-friendly topic: In homotopy type theory, the negation of any type is a proposition.

Don’t let that cryptic sentence scare you! The point of the next ~8000 characters is to explain what that means, and why it’s true. First, a (very) rough sketch of type theory (skip below the line if you already know some type theory, or if you are optimistic.)

In type theory, we have terms and types. Terms inhabit types–to start, think of terms as “elements” and types as “sets”, but there are some crucial differences:

• A term always belongs to some type: there is no universe to arbitrarily draw types from, giving a term a priori means giving its type as well. We write a : A to say “the term a has type A.”
• Types are not extensional: rather than being determined by their members, a type is determined by introduction rules which say how to construct their terms, and elimination rules which say how to use their terms. (there are other rules, but let’s not worry about those for now)
• Reasoning is “internal” to the system (more below): The meta-system is actually a rather weak system of judgments, so unlike set theory, we don’t have the full power of logic at our back; instead we build our logic inside of our system.
• Equality is really weird. This is related to the bit about extensionality above, but all you really need to know for now is that equality in type theory is a “weak” notion, which often isn’t strong enough to prove “nice” things. Oh, and also: equality between terms is a type. That is, given two terms x,y:A, we have a type x=y (Or x=y:A, if we need to be clear about which type we’re working in).

Today, we’re interested in function types A->B and dependent function types Π(x:A) B(x). Given two types, we have a type of functions between them. I’ll assume this makes sense. A dependent function does not have an output type, but rather for each x:A a (possibly) different output type B(x). Whereas for a function f:A->B, we have f(x): B for all x:A, for a dependent function d: Π(x:A) B(x), we have d(x) : B(x). Don’t worry if this doesn’t make sense: we’ll only use the interpretation of Π as “for all” (discussed below). Also, importantly: the existence of a type doesn’t mean it is inhabited. So just because we have a type A->B doesn’t mean we have a term f:A->B.

We are also interested in the empty type 0. This has no introduction rules, and an elimination rule which is “given any typeC, there is a function f:0->C. [Actually: it says for any type family C(x), indexed by 0, we have a dependent functionf:Π(x:0) C(x).]

The “judgments” mentioned above take one of two forms: typing judgments (a : A above), and equality judgments (which we will ignore, because this is the “wrong” notion of equality). This means that if we build logic inside of the system, we need a proof to correspond to a judgment p:P, and from that comes an idea: let us treat propositions as types, andproofs as terms. Then a proposition P is just some (particular) type, and proving it means constructing a term (called a “witness” or “proof”) p:P This is called propositions as types: propositions are types, and connectives are type forming operations. For example, A->B is interpreted as “A implies B”, Π(x:A) B(x) is interpreted as “For all x in A, B(x)” and 0 is interpreted as “false”. Under this interpretation, the elimination rule for 0 is simply explosion.

This logic can be somewhat odd: the axiom of choice (naively translated) is a tautology, for example. Also, importantly: excluded middle is inconsistent with univalence, the thing that makes HoTT interesting. The problem seems to be that “proving” (giving a witness of) a generic type P, can contain a lot more information that just “P is inhabited” (notably,which inhabitant). So in HoTT, we do something clever: we define a type to be a proposition when we can prove any two inhabitants are equal. That is: A is a proposition if the type Π(x,y:A) x=y is inhabited. (This type is read “for all x,y in A, x=y”). You can think of this as “there is (at most) one term of type A“, but this isn’t exactly right. Better is “up to homotopy, there is (at most) one term of type A.” Fortunately, the logic of propositions tends to behave much closer to the “expected” logic than the logic of all types.

One last thing before we can make sense of the statement: Often in logic, the negation of a proposition P is defined to be “P implies false”. That is, given P, we can reach an absurdity. This idea translates to type theory as expected: given a type A, we define the negation of A, “¬A” to be the type A->0.
So, let’s repeat the statement: Given any type A, we have ¬A (that is, the type (A->0)) is a proposition (that is, we can prove that all elements of this type are equal.)

Before we prove it, let’s talk about why this is interesting: In a sentence: double negation shows up a lot. :)

Since type theory is naturally constructive, the proposition “not not A” is usually weaker than the proposition “A” (This goes for non-propositions as well). It turns out that very often, instead of proving “A implies B”, we can prove the stronger “not(not(A)) implies B”, which is good because negative types have useful properties , and because it means that very often we can replace a difficult-to-verify claim (A is inhabited) by an easier-to-verify (but classically equivalent) claim (A isn’t not inhabited). This is all related related to double-negation translation which is used to do classical logic in constructive settings. As a consequence of the above result, we get that the double negation of any type is a proposition, so double-negation translation is a “classical propositional truncation” (as compared to “propositional truncation” used in HoTT), which is interesting and probably useful [added to this post: in fact, if we assume the law of excluded middle, we can define the propositional truncation to simply be double negation.] Lastly, the intuition I gave above is that type theory has “too much information” to be a well-behaved logic, and here we see that by translating to classical logic by double negation, we are forgetting information, reinforcing that point. (In fact, simply by negating a type we are forgetting some information.)

Ok, how do we prove it? Besides the notion of proposition, we need one thing particular to homotopy type theory (which is actually reasonably common among non-homotopy type theorists as well): the axiom of function extensionality. Remember I said equality is a weak notion in type theory? Well it turns out that for two functions f,g: A->B even if we can prove that for all x:A, we have f(x)=g(x), we can’t prove that f=g. So we must assume this as an axiom. Translating all of this into type theory, we have:

Function extensionality: Given two types A and B, functions f,g: A->B, there is a term funext: (Π(x:A) f(x)=g(x)) -> f=g. That is, we have this function funext (which has some hidden parameters), which takes a proof that f and g are “pointwise equal”, and turns it into a proof that f=g.

Now, the proof is easy:
Claim: For any type A, we have that ¬A is a proposition.

Proof: We need to show that Π(x,y:¬A) x= y is inhabited. To this end, fix x,y: A -> 0, and we will use function extensionality to construct a term p:x=y. Let z:A. We wish to see that x(z)=y(z) in the type 0, but this holds vacuously (or rather, by elimination on 0, but think “vacuously”). Since z was arbitrary, we have a term h(x,y):Π(z:A) x(z)=y(z). Then funext(h) : x=y. Again, x and y were arbitrary, so we have a term f:Π(x,y:¬A) x=y, defined byf(x,y) := funext(h(x,y)). That is, we have “for all x,y:¬A, x=y”. That is, ¬A is a proposition.

Hurray! Hopefully you find this interesting/insightful. Let me know if you have any questions. Also, I said a few bold-faced lies to simplify things, but hopefully I didn’t say anything harmful.