I’ve posted a couple of things on (homotopy) type theory so far, and both times hinted at giving an actual introduction. Since I can finally feel justified in not doing work for a little bit, I’ll go ahead and do that now. I wanted to cover dependent type theory, but this takes longer than I had thought, so we’ll start with a less ambitious type theory: the simply typed lambda calculus.

**Mathematics as construction**

Type theory arises from an attempt to formalize L.E.J. Brouwer’s notion that mathematics is “an activity of mental construction.” The traditional way to formalize a mathematical discipline is to lay out a logical language which is intended to lay down the statements (true or false) that can be meaningfully expressed; and then posit some axioms which are intended to capture the basic truths of the system. From this, we “do” mathematics by proving certain statements to be true, and others to be false. That is, the foundation is logic, and mathematics is built on top of this foundation.

Brouwer viewed this as an inversion of the actual state of affairs: The notion of “mathematical construction” is basic, while logic is a sort of mathematical activity, and so must be built on top of this. To formalize this intuition, instead of presenting the valid statements and basic truths, we must posit the *basic objects* and *basic constructions*. Only then, if we chose, may we interpret certain objects as being “logical statements”, and certain objects as being “logical proofs”. To present type theory, then, we must determine

- What are the basic objects?
- What are the basic constructions?
- How do we interpret logic?

But before we even begin to answer those questions, we need to think a little more about what “construction” means.

**Mathematical constructions: types and judgment**

Intuitively, a construction should say something like “given an object `x`

, we may form an object `c(x)`

.” For example: Given natural numbers `x`

and `y`

, we may form their sum, `x+y`

. Observe that this statement requires that `x`

and `y`

are natural numbers; it makes no sense to form the sum of “green” and “whiskey”, or a group and a natural number. That is, constructions only make sense on objects of a given *type*. An object, by virtue of existing, has a type attached. We call our objects “terms”, and we write “the term `x`

has type `A`

” as `x:A`

(we will also express this in English as “`x`

is an inhabitant of `A`

“). I said “by virtue of existing”, meaning that it doesn’t make sense to present an object *without* it having a type–as opposed to (ZF) set theory, where all object are taken from “the universe”, we must say *what* an object is when we introduce it.

This expression, `x:A`

, is a *typing judgment*. It is *not* a proposition: after all, propositions are part of logic, and we haven’t gotten that far yet. Similarly, we do not “prove” a judgment, we *infer* it. The valid inferences are given by *inference rules*, which always take the form

*Given judgments , we may infer the judgment .*

So far, we have seen one sort of judgment: `x:A`

. However, before we can even make such a judgment, we must know that `A`

*is a type*. We express this judgment `A:Type`

. By analogy with typing judgments, this seems to be saying that `Type`

is a type. For now, this analogy is harmless.

There is another form of judgment which is somewhat more subtle. Introducing the sum `x+y`

does us no good unless we can *compute* with it. That is, unless we can say what the term `x+y`

*means*. For this, we introduce *equality judgments*, which give “judgmental” or “definitional” equality. We write `x≡y`

to mean “`x`

is judgmentally equal to `y`

.” Judgmental equality is supposed to capture the idea that two terms are different “words” with the same *meaning*: one term may be substituted for another without change of meaning. For example “President of the United States” and “Head of State for the US” *mean* the same thing. However, even though Barack Obama *is* the President of the United States, we cannot freely substitute “Barack Obama” for “the President of the United States”, as demonstrated by the nonsensical “George Bush was Barack Obama before the President of the United States.”

Since `x`

and `y`

can only have the same meaning when they have the same type, `x≡y`

only makes sense when `x:A`

and `y:A`

(for some type `A`

). When we wish to be clear about the type under question, we will write `x≡y:A`

. By the same analogy as above, we also have `A≡B:Type`

.

That is, we have

- Typing judgments
`x:A`

and`A:Type`

- Equality judgments
`x≡y:A`

and`A≡B:Type`

.

**Presenting types**

Intuitively, a type is the collection of all objects for which certain constructions make sense. Since we need to introduce the “basic” objects and “basic” constructions, this means for any type we need to say

- what “basic” (or
*canonical*) terms inhabit that type (how to*introduce*inhabitants of that type); - what “basic” constructions we may perform on terms of that type (how to
*eliminate*inhabitants of that type); - How the basic constructions “behave” on the basic terms (how to
*compute*with the basic constructions).

These are given by 3 sorts of rules, called *introduction*, *elimination* and *computation* rules. The elimination and computation rules will capture the idea that a type is “completely determined by its canonical inhabitants”. That is, it suffices to define constructions by defining them on canonical terms. For reasons that will become clearer when I discuss dependent types, we will need a fourth sort of rule called a *formation* rule, which says under what conditions we can “form” a given type. Usually this is “the most obvious condition possible”.

A type is *defined by* these 4 sorts of rules; unlike sets, which are defined as the totality of their members, a type is defined by *when* it can be formed, *what* its canonical elements are, *what* basic constructions it admits and *how* these basic constructions behave.

*The first example: functions*

Since we can present the rest of our types by reference to function types, we start with these. Intuitively, function types represent the type of “valid constructions” which take terms of a given type, and return terms of another. We have

*Function formation*: Given types `A`

and `B`

(`A:Type`

and `B:Type`

), we may form the type of functions from `A`

to `B`

, written `A→B`

.

*Function introduction*: What does it mean to have a “basic” function from `A`

to `B`

? Since functions represent constructions, we should have that whenever we have a way to construct a `B`

from an *arbitrary* `A`

, we may construct a function. Somewhat more rigorously: whenever it is the case that given `x:A`

, we may construct `b:B`

, then there is a function `λx.b:B`

. This is called a *lambda abstraction* and is usually read out loud as “lambda x dot b”. Intuitively, this is supposed to be the function which returns `b`

when applied to input `x`

. There is a subtlety here: `b`

is a term, but it may contain `x`

as a “free” variable. For example, `x+1`

is a term of type `N`

(the type of natural numbers), but in order to actually determine what this term means, we must first know what `x`

means. So, `λx.x+1`

is a function which takes some `x:N`

and returns `x+1`

(which also has type `N`

).

*Function elimination*: Given a function `f:A→B`

and a term `x:A`

, we may *apply* `f`

to `x`

, to get `f(x):B`

. For example, our function `λx.x+1`

may be applied to the number `0`

, which should return `1`

. This fact is captured by

*Function computation*: If, given an arbitrary `x:A`

, we may construct `b:B`

, and we have a term `a:A`

, then `(λx.b)(a)`

means “`b`

, with `x`

replaced with `a`

.” We write this `(λx.b)(a)≡b[x/a]`

. Using our example function, we have `(λx.x+1)(0)≡0+1`

. When we actually have `0`

, `+`

and `1`

defined, we will see that this is in fact *judgmentally* equal to `1`

.

A brief note: This might all look technical and scary, but I’ve really said nothing interesting so far: I’ve only said that if we have a rule for “constructing” an object (of a certain type) whenever we have an object (of some other type), then we have a mathematical object which “captures” this construction. Moreover, this object behaves exactly as we should expect it to.

*Functions of multiple arguments*

Crucially, the definition of the function type only accepts a *single* input. To define a function, such as `+`

which takes multiple inputs, we can abstract over multiple arguments. E.g., we have the (formally unsatisfying, but intuitively clear) function `λx.λy.x+y`

. What is the type of this function? Well, it takes `x:N`

, and returns `λy.x+y`

. This in turn takes `y:N`

and returns `x+y`

. `x+y:N`

, so `λy.x+y:N→N`

, and so finally `λx.λy.x+y:N→(N→N)`

. That is, we have a function which takes an `N`

and returns a function `N→N`

. This is often called a *Curried* function (after Haskell Curry). Since we write functions of multiple arguments a lot, we will omit brackets, and assume → associates “to the right”. So `N→N→N`

means a function which takes an `N`

and returns a function `N→N`

, while `(N→N)→N`

is a function which *takes a function* `N→N`

and returns an `N`

.

*The rest of the simple types*

Which types are included in “the” simply typed lambda calculus depends on the purpose–some authors just have function types (and some base type), others add sums, or products, or `N`

. We’ll stick with what we need to encode propositional logic: functions, sums, products, unit and empty.

*The unit type* is supposed to be a “one element type”, which we may as well call `1`

. We may always *form* `1`

.

We *introduce* a unique canonical term, let’s say `*:1`

.

What does it mean to *eliminate* a term of type `1`

? The elimination rule is supposed to say how to define constructions “out of” `1`

. Since a *construction* “out of” `1`

is the same thing as a *function* out of `1`

, we need to know how to define such a function. Intuitively, this should just select a term of some type `A`

. That is,

*unit elimination*: Given a type `A:Type`

and a term `a:A`

, we have a function `a*:1→A`

such that

*unit computation*: `a*(*) ≡ a`

.

*The empty type* is supposed to be a type which we can never construct an inhabitant of. We will call this type `0`

(and as with `1`

, we can always form it). Since we do not want to have any inhabitants, there is no introduction rule.

What about the elimination rule? Just as in set theory, we want to have a unique “empty” function from `0`

to any other type. That is: given a type `A`

, there is a function `!:0→A`

, and since there is no introduction rule for `0`

, we have no computation rule: there is nothing to compute.

*Product types* are supposed to capture the Cartesian product of two types. So, the product `A×B`

should have terms `(a,b)`

with `a:A`

and `b:B`

. This gives us the formation and introduction rules as:

Given `A:Type`

and `B:Type`

we may form the product `A×B:Type`

; given `a:A`

and `b:B`

, we introduce `(a,b):A×B`

.

What about elimination? The obvious choice is the projections, but we can do a little better: a construction “out of” `A×B`

should correspond to a function `A×B→C`

for some type `C`

; that is, function of two arguments. Recalling back to curried functions, then we should have a function `A→B→C`

; more explicitly:

*Product elimination:* Given a type `C:Type`

and a function `f:A→B→C`

, there is a function `f':A×B→C`

which behaves like `f`

“on pairs”. that is, we have the computation rule `f'((a,b)) ≡ f(a)(b)`

. Then we can *define* the projections as `p1 ≡ (λx.λy.x)'`

(that is, the function defined by elimination on `λx.λy.x`

) and `p2 ≡ (λx.λy.y)'`

Checking that this works:

`p1((a,b)) ≡ (λx.λy.x)'((a,b)) ≡ (λx.λy.x)(a)(b) ≡ (λy.a)(b) ≡ a`

The first equality is the definition of `p1`

, the second by the computation rule for products, and the third and fourth by the computation rule for functions. (Notice that there’s no replacement to do in the last one, since `y`

doesn’t occur in the expression `a`

).

Finally, we want *sum types*, which we write `A+B`

(and we can form this type under the same conditions as product and function types). These correspond to disjoint unions in set theory. Why not unions? because we cannot arbitrarily throw objects into new types (as we can in set theory), so we need to “tag” our terms with their “real” home. That is, given types `A:Type`

and `B:Type`

, and a term `a:A`

, we have a term `inl a: A+B`

, and likewise, given `b:B`

, we have `inr b:B`

(`inl`

is read “in left” and `inr`

is read “in right”). Notice that we have *two* introduction rules.

To define a function `A+B→C`

, we need to know what to do with the “left” terms and what to do with the “right” terms. In other words, we need two constructions `f:A→C`

and `g:B→C`

. This allows us to construct a function `f+g:A+B→C`

, with the computation rules (one for each introduction rule!), `(f+g)(inl a) ≡ f(a)`

and `(f+g)(inr b) ≡ g(b)`

.

**Encoding logic**

To wrap up this (already long) post, let’s look at how to encode logic. According to our Brouwerian intuition, we should have that “prove a proposition” corresponds to “construct an object”–logical proof is a part of mathematical activity, and mathematical activity is all about constructing things. It seems natural to think that all proofs of a given proposition have the same “type”–the type of proofs of that proposition, so why not *encode propositions as types*, and say that a “proof” if that proposition is a term of the corresponding type?

We start with the two obvious propositions: “the” true proposition, and “the” false proposition. Since truth corresponds to being inhabited, we have an obvious candidate for true: `1`

, and since false things should never be provable, the false proposition should have no inhabitants: let’s take `0`

.

Implication: “A implies B” means *if* `A`

is true, *then* `B`

is true. That is, from a proof of `A`

, we can construct a proof of `B`

. This is exactly what `A→B`

describes.

Conjunction: to show that “`A`

and `B`

are true” we need to know that `A`

is true and that `B`

is true. In other words, to construct a proof of “`A`

and `B`

“, we need a proof of `A`

and a proof of `B`

: this is exactly A×B

Disjunction: To prove “`A`

or `B`

” we need either `A`

or `B`

. As you can hopefully guess by now this is exactly `A+B`

.

Finally, what about negation? The usual way to show something is false is to show that it leads to absurd consequences. That is, `¬A`

is true when from a proof of `A`

, we can conclude something false. I.e., `¬A`

should correspond to `A→0`

.

So, summing it up:

```
True: 1
False: 0
A→B: A→B
A∧B: A×B
A∨B: A+B
¬A: A→0
```

If you compare the introduction and elimination rules for types, with the corresponding introduction and elimination rules in Natural Deduction, you see that (except for product elimination), they correspond *exactly*, with `0`

elimination being the principle of explosion. Moreover, the usual elimination rules for conjunction are exactly the projections, and we can “prove” the elimination rule for products in natural deduction (That is, we have a derivation in ND of `A∧B→C`

with open assumption `A→B→C`

)

One thing we’re missing is double negation elimination: that is, we have *intuitionistic* or *constructive* propositional logic. For Brouwer, this was a *Good Thing*: proving that “`A`

is true” by showing that `A`

is not false doesn’t give us an *actual* witness that `A`

is true. For example, the usual presentation of Euclid’s proof doesn’t (necessarily) give us an *actual* prime, it just tells us that there must be one *somewhere*. (For what it’s worth, the theorem *is* constructively valid, you just need to be more careful to present the proof constructively.)

There are other reasons that this is a good thing (and of course, reasons it’s a bad thing), but we’ll leave those for another day.

What exactly is a negation type in the context of something like Haskell? How would you encode/use something like this: “¬A: A→0”?

In Haskell, negation doesn’t exactly exist, since Haskell doesn’t have an empty type (bottom inhabits every type).

But, in an earlier part, I mentioned double negation, (A->0)->0. An element of type (A->K)->K is a continuation with output in K, so an element of the double negation of a type is somehow a continuation with no output.

Regardless, I’m interested in type theory more as a variety of logic than as a programming language, so negation is probably more important to me than to a Haskell programmer.