Full Schemas and Categoricity
What I intend to do here is introduce the notion of a schema from scratch, complete with a formalized logic, and prove that a suitable version of arithmetic is in a suitable sense categorical. Today will be the first time that I have presented such a result by itself, instead of along with allied results about other logics. I have tried to restrict myself to the case at hand, but if some unnecessary generality seems to have crept in, it probably has: I've failed to expunge it in adapting earlier more general accounts.
Full Schemas
J. H. Harris—and, following him, McGee—says that the
rules of logic are open ended—they apply not just in the present
language but in any extension of it, whether that extension has been
envisioned or not.
We do in fact take our logical rules to be open
ended: Consider, for example, the rule %$\phi\land\psi\vdash\phi $%
or, for that matter, its informal counterpart in natural language. We
do not stop to reëvaluate the rule when moving to an expanded
vocabulary or even when introducing modalities—all the new formulas
are automatically to be allowed into the rule. As McGee puts it for
the case of
reductio ad absurdum:
| We do not accept reductio ad absurdum because we have surveyed the forms of expression found in English and found that its expressive power is circumscribed in such a way as to validate the rule. |
Thus, for example, the rule %$\phi\land\psi\vdash\phi $% of a
formalized logic in a language
%${\mathcal L}$% is not, if the rule is to reflect our ordinary usage,
| if %$ \phi\land\psi$% is a formula of %${\mathcal L}$%, then from %$\phi\land\psi$% infer %$\phi$%, |
but rather,
| if %$ \phi$% and %$ \psi$% are any formulas whatsoever, then from %$ \phi\land\psi$% infer %$\phi$%. |
The notion of 'any formula whatsoever' is, of course, hopelessly
vague. I see no reason to provide a precise account here, and indeed
some reason not to: since our formalism is going proxy for natural
language, we should surely avoid unnecessary commitment to a
particular semantics. That is all very well, but how can we live with
rules as imprecise as the one just suggested? The rule
expresses a very precise idea: It is used to express a commitment that
however we expand or modify our language in the future, we shall
always continue to uphold certain principles, in this case, the
conjunction elimination principle. Even imprecise rules have clear
cases of application, and we shall only need the rules of logic and
mathematics in clear cases in this work. Given that we shall only be
interested in clear cases, we have no present reason to worry about
the ambiguity of the formulation.
The "open ended" rules of logic are clearly in a certain sense very
general, but what is the nature of that generality? We surely do not,
if at all possible, wish to understand %\[\phi\land\psi\vdash\phi\]%
(or its natural language counterpart)
along the lines of
%\[(\forall \phi )(\forall \psi )(\phi\land\psi\vdash\phi ).\]%
It is not even clear that the universally quantified "formula" is
well formed: the turnstyle '%$\vdash$%' is not a part of the language,
but a symbol that is used to specify a rule—from the antecedent,
infer the consequent.
If the formula is not well formed, it is surely
not an appropriate way in which to understand the relevant
generality. But even if we brush such niceties aside, expressing the
generality of the open-ended rules of logic using quantification will
still be deeply problematic. To use such a formalism would be to
employ the formalism of logic in introducing that very formalism—an
undesirable circularity that can be avoided. One might even consider
it to be worse than a circularity, since it requires the comparatively
sophisticated notion of quantification even to introduce sentential
logic.
To make the circularity more stark, consider how we would apply the
rule universal universal specification:
%$^{\text{footnote}}$% %$^{\text{footnote}}$%
%\[(\forall \phi )(\forall x)(\forall \tau )((\forall x)\phi \vdash \phi (\tau ))\qquad{\text{UUS}}\]% to show that, say,
%\[(\forall u)P(u)\vdash P(c)\]%
is a valid inference. We would first apply UUS to UUS (circularity!) with
%$\phi = (\forall x)(\forall \tau )((\forall x)\phi \vdash \phi (\tau )),$%
%$x=\phi $%, and %$\tau =P(x)$% to obtain
%\[(\forall \phi )(\forall x)(\forall \tau )((\forall x)\phi \vdash \phi (\tau ))\vdash(\forall x)(\forall \tau )((\forall x)P(x) \vdash P (\tau )).\]%
We would then apply MP (Actually, meta--modus ponens,
since '%$\vdash$%' is not in the language. But we have already agreed
to put such considerations aside.) to UUS and the formula immediately
above to yield
%\[(\forall x)(\forall \tau )\,((\forall x)P \vdash P (\tau )),\]%
knocking off the first quantifier (%$(\forall \phi )$%), and then
apply UUS twice more. We would have to apply UUS to use the rule
UUS. The point is entirely parallel to one made by Quine in "Truth by
Convention." There is
nothing inconsistent about describing what we mean by universal
specification in this circular way, but it eliminates the possibility
of telling any straightforward story about how quantification could be
introduced or learned, and it ignores the existence of a
pre-quantificational mechanism that has generally been thought to
exist and to play a genuine role in our reasoning: the schema.
The schema is another, more primitive, form of generality that will do the
job of introducing logical rules without any attendant circularity: we
can take %$\phi\land\psi\vdash\phi $% and the other logical rules to
be schemas used to declare that any substitution instance is valid. In
that case, %$\phi $% and %$\psi $% are schematic variables, and to say
that the schema is open ended is to declare that the variables are
what I have elsewhere called full
schematic variables: 'full' is added to indicate that what counts as
an acceptable substitution instance is open ended and automatically
expands as the language in use expands.
We may use the following example to see how a schema differs in
conception from a universally quantified formula, even an implicitly
universally quantified formula. The sentence %$(\forall x)x=x$% is a
truth bearer, and it is indeed true. In contrast, the schema %$x=x$%,
with %$x$% a schematic variable is not a truth bearer at all, though
it does deserve some sort of honorific, since it can be used to
declare that all of its suitable instances are true. The free
schematic variable in a schematic formula, like an ordinary free
variable, prevents the formula from being a truth bearer. The
schematic variable, moreover, cannot be regarded as a free referential
variable: it does not take on values and has no domain. To assert that
a formula involving a schematic variable is an axiom schema does not
commit us to any true axioms involving a schematic—or any other kind
of—free variable: Rather, it commits us to closed instances of the
schema as axioms.
The idea that schemas are involved in the basis of logic has been
endorsed by Quine and schemas are arguably employed by Russell—his typical
ambiguity—and
Hilbert. Full schemas are used
by Feferman (who introduced them as early as 1977,
in discussing Gödel-type independence phenomena, by Tyler Burge
in his theory of truth, by McGee in his theory of "How
we learn mathematical language," and by Field, Parsons, and I—we have all taken mathematical induction to be a
full schematic principle. In each case, full schemas are used to
codify the idea that certain principles are open ended—that we are
committed to upholding them even through additions to and
modifications of our language. The same applies to the axiom of
replacement of set theory, as I shall argue below.
The case against assimilating schematic generality to that of
referential or substitutional quantification is strongest in the case
of a full schema: Full schemas, which are the ones of chief interest
to us here, are explicitly required to allow new instances as our
language expands, and so the very idea of a full schema is
incompatible with the notion of a fixed domain or substitution class.
Quantificational variables are only auxiliaries for quantification:
formulas with free variables are not truth bearers. One might
therefore wish to avoid the notion that anything is a consequence of a
formula with a free variable. One can, for example, infer %$0=0$% from
%$(\forall x)x=x$% as well as from %$x=x$%, and so perhaps one might
wish to argue that the schematic version of %$x=x$% is really just an
alternative notation for the more familiar universally quantified
sentence, perhaps construing the hypothesized implicit quantifier as
referential, perhaps as substitutional. That cannot be made to work
for several reasons.
A schematic variable cannot be viewed as a free variable, whether
substitutional or referential, one that could be bound by a
quantifier, since schematic and quantifiable variables have different
inferential roles: If %$x$% is a schematic variable, one can infer
%$0=0$% from %$x=x$%, but that is not so if %$x$% is a
quantifiable variable—in that case in some proof systems the formula
cannot appear in a proof at all, while in those that do allow it, the
inference is valid only if %$x$% does not occur free in any of the
premises of the argument. No such proviso is required in the case of a
schematic variable.
Schematic variables are more closely allied to substitutional than to
referential quantification: When we give a schema that is not full, we
typically restrict the schematic variable to all terms or formulas in
a language, not to a range of all objects or properties in a
domain. Intuitively, however, no schema has a domain of application
like the substitution class of a substitutional quantifier—a schema
isn't used to make a general assertion about a class of possible
instances. Instead, a schema provides a general method of obtaining
particular assertions about suitable instances without any need in
advance of a clear notion of all the suitable instances. Thus, for
example, one can infer %$(\forall n)\phi (n)$% from %$\phi (0)$% and
%$(\forall n)(\phi (n)\rightarrow \phi (Sn))$%, but not, if %$n$% is a
schematic letter, from %$\phi (0)$% and %$\phi (n)\rightarrow \phi (Sn)$%,
since the schema, rather than making an assertion about all
numbers, which is what is required to reach the conclusion, provides a
mechanism for making assertions about particular numbers.
To summarize, schematic variables are neither referential nor
substitutional variables. Schematic generality has important preformal
differences from the generality of substitutional universal
quantification. Finally no universally quantified sentences can have
the same consequences as does a suitable corresponding full schema.
Let us now look briefly at how schematic variables can be restricted
to terms that refer to objects in a particular domain. Just as in the
case of referential variables, we usually leave such restrictions
implicit. If a referential quantification over %$x$%,
%$(\forall x)\phi$% or %$(\exists x)\phi$%, is restricted to a domain %$\bf D$%,
we can make that explicit by using %$(\forall x)(D(x)\rightarrow\phi)$% and
%$(\exists x)(D(x)\land\phi)$% in place of the original formulations,
where I have used '%$D$%' as a predicate symbol with extension
%$\bf D$%. Similarly, a schema %$\phi$% may include a schematic letter
%$s$% that we intend to restrict to terms that denote objects in
%$\bf D$%. We can then write %$D(s)\rightarrow\phi$% to make the restriction
explicit. When a referential variable %$x$% is restricted to a domain
%$\bf D$% and a schematic variable %$s$% is restricted to terms that
denote objects in %$\bf D$%, we obtain
%\[(\exists x)x=s,\]%
or, when we make the restrictions explicit,
%\[D(s)\rightarrow(\exists x)(D(x)\land x=s).\]%
Full schemas codify the open-ended principles that will make it possible
to provide categorical axiomatizations of the natural numbers. We therefore
discuss the formalism of axiom schemas in sufficient detail to provide
the background for a categoricity proof. We shall only need to employ
schemas in one or two first-order schematic variables and two or three
second-order schematic variables, and we reserve the letters %$s$%,
%$t$%, %$P$%, %$Q$% and %$R$% for the purpose. We allow them to have
subscripts so that in principle we have infinitely many schematic
variables of each order. In our applications the schematic variables
will have various numbers of substitutable places. We leave it to the
reader to determine the number of substitutable places of a schematic
variable by counting the number of terms enclosed in parentheses
immediately following it. In this work we shall always allow extra
places for parameters. For a fully general treatment, we would have to
specify infinitely many schematic variables of each order and number
of places.
For any term or formula %$ \phi $%, variables %$x_{1},\dots ,x_{n}$%,
and terms %$t_{1},\dots ,t_{n}$%, let %$\phi \left[\!\!\left[\!
\atop{t_{1},\dots ,t_{n}}}\!\right]\!\!\right]$% denote the result of simultaneously
replacing every free occurrence of every %$x_i$%, %$i=1,\dots ,n$%, in
%$\phi$% by %$t_i$% and renaming bound variables as necessary to
prevent collisions with the (free) variables of the terms
%$t_{1},\dots ,t_{n}$%. Given a schematic formula %$\Phi ({\bf P})$%,
%$n$% variables %$x_{1},\dots ,x_{n}$%, and a formula %$\phi$%, where
%$\bf P$% is a first- or second-order schematic variable that has
%$n$% substitutable places, we define %$\Phi\left[\!\!\left[\!
\phi \!\right]\!\!\right]}$% or, usually, but less
precisely, %$\Phi(\hat{x}_{1}\cdots\hat{x}_{n}\phi )$%, to be the
formula that results from %$\Phi ({\bf P})$% when, for every list of %$n$%
terms %$t_{1},\dots ,t_{n}$%, every occurrence of %${\bf P}(t_{1},\dots ,t_{n})$% in %$\Phi ({\bf P})$% is replaced by
%$\phi \left[\!\!\left[\!
\atop{t_{1},\dots ,t_{n}}}\!\right]\!\!\right]$%, changing bound variable in %$ \Phi $%
as necessary to prevent collisions with the free variables of
%$\phi\left[\!\!\left[\!
\atop{t_{1},\dots ,t_{n}}}\!\right]\!\!\right]$% that are not among
the (free) variables of the terms
%$t_{1},\dots ,t_{n}$%. (For example, %$ (\forall x) (\exists y)P(x)\left[\!\!\left[\!
\!\right]\!\!\right] = (\forall x) (\exists v)(\exists u) R(u,x,y)$%, where %$u$% replaced %$x$% to avoid a
collision in %$ (\exists x) R(x,z,y)\left[\!\!\left[\!
\!\right]\!\!\right]$% and %$v$% replaced
%$y$% to avoid a collision in
%$ (\forall x) (\exists y)P(x)\left[\!\!\left[\!
\!\right]\!\!\right]$%.) The resulting formula may
have free variables. We take them to be implicitly universally
quantified.
For any language %$\mathcal L$%, we introduce the %$\mathcal L$%
substitution rule: from any formula %$\Phi ({\bf P})$% in which
%$\bf P$% has %$n$% substitutable places, infer
%$\Phi (\hat{x}_{1}\cdots\hat{x}_{n}\phi )$%, where %${x_{1},\dots ,x_{n}}$%
are any %$n$% variables and %$\phi$% is any term of %$\mathcal{L} $%
if %$\bf P$% is a first-order schematic variable and any formula of
%$\mathcal L$% if %$\bf P$% is a second-order schematic variable.
We now associate a full schematic theory with any base theory
%$T_{0}$% in a language %$\mathcal L$%. The base theory %$T_{0}$% may
include some formulas with schematic variables in them. We do not
regard a theory as a set of sentences in a language %$\mathcal L$% as
is usual, but instead as a function from languages
%${\mathcal L}^{+}\supset{\mathcal L}$% to sets of axioms and rules. If the
schematic variable %$\bf P$%, whether of first or second order, has
%$n$% substitutable places, let %$\overline{x}$% stand for some fixed
%$n$%-tuple of distinct variables. The
full schematic theory associated with
%$T_0$%
over %$\mathcal L$%, %$T^{(+)}$%,
associates with each %${\mathcal L}^{+}\supset{\mathcal L}$% the
theory with axioms %$T_{0}$% and with the
%${\mathcal L}^{+}$%-substitution rule.
When %$T_{0}$% is empty, we refer to the corresponding theory as
full schematic logic.
Let us define what it is for a structure to be a model of a full
schematic theory in line with the redefinition of the
notion of a theory.
Let %$\phi $% be a full schematic formula in a language %${\mathcal L}$%. An
assignment %$s$%
satisfies the formula %$ \phi $%
with respect to full schematic logic in a structure
%$ {\mathfrak A} $%} if %$ {\mathfrak A} $% is a structure
for a language including %${\mathcal L}$%, %$s$% is an assignment for
the structure %$ {\mathfrak A} $%, and if %$s$% satisfies in the
familiar sense every schematic-variable-free deductive consequence
with respect to full schematic
logic of %$\phi $% in any language
%${\mathcal L}^{+}\supset{\mathcal L}$% in any expansion of
%$ {\mathfrak A}$% to a structure for a
language including %${\mathcal L}^{+}$%.
Let %$T $% be a theory in a language %${\mathcal L}$%. A structure
%${\mathfrak A} $% is a
model of the full schematic theory %$T$% if every assignment for
%${\mathfrak A}$% satisfies every formula in %$T$% with respect to
full schematic logic.
To say that
any expansion has a certain property is not the
universally quantified statement that
every expansion has the
property—that would require a notion of the class of all
possible expansions, a notion to which I am not entitled. It is rather
a full schematic metalinguistic statement that will let us infer,
whenever we are given an expansion, that it has the property. That is
a perfectly clear and formally precise account of a constraint on how
we may expand our language. It can be used to represent our
metaphysical commitments concerning models of a theory, that is for
describing what properties we are committed to any model maintaining
even as we revise our language.
ASIDE: The models of vagueness I introduced in the vagueness seminar
yesterday were for a fixed language, but it would be routine to modify
them to allow different languages at different leaves. In such a model
of vagueness, each full scheme would have each linguistic object of
suitable kind (that is, of the right order and number and obeying
whatever restrictions are imposed by the schema) of the total language
as a potential substituend at each node. Of course, that would include
different items at different nodes. I do not pursue the details
further because typical cases of adding new items to the language that
are of interest for our purposes are not cases in which it is vague
whether a given item is part of the language, but cases of genuine
change—additions to the language that are new and represent an
expansion of our linguistic resources. Such innovation is orthogonal
to the issue of vagueness.
The present definition is the correct one in the present setting. When
a theory is defined, not for a single language, but for a single
language and all of its expansions, it only seems natural to require
not only that a model of the theory in the original language satisfy
the theory, but also that expansions of the model continue to satisfy
the theory in the corresponding expanded languages.
Peano Arithmetic with Primitive Recursion
Though primitive recursion is not a part of the axiomatic basis of
Peano arithmetic, it is central to another standard arithmetic theory,
namely, primitive recursive arithmetic. The first theory I wish to
consider is a kind of amalgam of Peano arithmetic and primitive
recursive arithmetic. The fact that primitive recursive arithmetic is
a standard codification of what many have considered to be the most
fundamental and basic properties of the natural numbers serves nicely
to make the point that taking primitive recursion to be a fundamental
part of the basis of arithmetic is not an
ad hoc trick in defense of
categoricity, but simply a recognition of the importance in that
context of a long and deeply held view.
The language of our schematic theory PAPR%$^{(+)}$% (full schematic
Peano arithmetic with
primitive recursion), which will be a first-order theory with two
schemas, will be %$\{0,S\}$% plus a functor %$\mathbb P$% (where the
symbol is chosen arbitrarily) that takes two formulas of our language
plus four variables to a relation symbol of our language—we give a
more complete specification below. Since we take the functor itself
to be part of the language, relation symbols formed using it will be
relation symbols of the language and, if we expand the language with
new symbols then new relation symbols formed using them in
conjunction with the functor will also become relation symbols of the
augmented language.
The axioms will be the successor axiom of Peano arithmetic, the
induction schema, and an additional schema, the
primitive recursion schema:
%\[({\mathbb P}(P,Q)(0,w)\leftrightarrow P(w))\land ({\mathbb P}(P,Q)(Sx,z)\leftrightarrow (\exists y) ({\mathbb P}(P,Q)(x,y)\land Q(x,y,z))).\]%
I must explain how to substitute %$\hat{w}\phi$% for %$P$% and
%$\hat{x}\hat{y}\hat{z}\psi $% for %$Q$% in
%${\mathbb P}(P,Q)(t_{1},t_{2})$%, where %$w$%, %$x$%, %$y$%, and %$z$% are any
variables, %$\phi$% and %$\psi$% are any formulas, and %$t_1$% and
%$t_2$% are any terms. For any formula %$\theta$%, let
%$\Fv (\theta)$% be the set of free first-order variables in it. The
relation symbol
%${\mathbb P}(\hat{w}\phi ,\hat{x}\hat{y}\hat{z}\psi)$%
has %$n+2$% places, where %$n$% is the number of variables in
%\[p=(\Fv (\phi )-\{ w\})\cup (\Fv (\psi )-\{ x,y,z\} ).\]%
The atomic formula that results from the indicated substitution into
%${\mathbb P}(P,Q)(t_{1},t_{2})$% is %${\mathbb P}(\hat{w}\phi ,\hat{x}\hat{y}\hat{z}\psi )(t_{1},t_{2},\overline{x})$%, where
%$\overline{x}$% is the variables (parameters) in %$p$% listed in
alphabetical order. We interpret the %$\mathcal L$%-substitution
rule to apply to substitutions in the newly
extended sense and extend it to apply to both %$P$% and %$Q$%, and we
similarly extend the definitions that employ the rule.
Since we are taking the functor %$\mathbb P$% to be in the base
language, we shall be able to prove theorems about relation symbols
constructed from %$\mathbb P$% by induction. In particular, we shall
be able to prove that if there is a unique %$w$% such that %$\phi (w)$% and for every %$x$% and %$y$% there is a unique %$z$% such that
%$\psi (x,y,z)$%, then %${\mathbb P}(\hat{w}\phi ,\hat{x}\hat{y}\hat{z}\psi )(x,y,z)$% is a well-defined function—I
omit showing the parameters here and frequently below. Thus, we can
define addition by
%\[x+y=z&\leftrightarrow{\mathbb P}(\hat{w}w=u,\hat{x}\hat{y}\hat{z}Sy=z)(y,z,x)\]%
and multiplication by
%\[x\cdot y=z&\leftrightarrow{\mathbb P}(\hat{w}w=0,\hat{x}\hat{y}\hat{z}{\mathbb P}(\hat{w}w=u,\hat{x}\hat{y}\hat{z}Sy=z)(u,z,y))(y,z,x).\]%
PAPR%$^{(+)}$% has the conceptual advantage of employing a uniform
primitive recursion schema instead of treating each primitive
recursive definition separately. That, for example, makes it possible
to prove in schematic form that if there is a unique %$w$% such that
%$\phi (w)$% and for every %$x$% there is a unique %$y$%
such that %$\psi (x,y)$%, then
%${\mathbb P}(\hat{w}\phi ,\hat{x}\hat{y}\psi )(x,y)$% is a well-defined
function, instead of proving that separately for each primitive recursive
definition.
The fragment of PAPR%$^{(+)}$% in the fixed language
%\[\{0,S,{\mathbb P}(\hat{w}w=u,\hat{x}\hat{y}\hat{z}Sy=z),{\mathbb P}(\hat{w}w=0,\hat{x}\hat{y}\hat{z}{\mathbb P}(\hat{w}w=u,\hat{x}\hat{y}\hat{z}Sy=z)(u,z,y))\}\]%
(where the last two relation symbols are the primitive recursive
definitions of addition and multiplication given above is definitionally
equivalent to PA, that is, that fragment and PA share a common extension by
definitions. I shall therefore say, speaking somewhat loosely, that PA is
included in PAPR%$^{(+)}$%.
Analogously, PA%$^{(+)}$% is included in PAPR%$^{(+)}$%.
The theory PAPR%$^{(+)}$% does not include primitive recursive arithmetic for
the simple reason that primitive recursive arithmetic is formulated
in terms of function symbols instead of relation symbols. It is,
however, a straightforward exercise to show that primitive recursive
arithmetic is included in an extension by definitions of PAPR%$^{(+)}$%,
which I take to mean that PAPR%$^{(+)}$% incorporates primitive recursive
arithmetic.
It is a deeper and more surprising fact, due to Gödel, that all of
PAPR%$^{(+)}$% restricted to the base language is included in an
extension by definitions of PA, and hence that PAPR%$^{(+)}$%
restricted to that language is in effect
merely a somewhat redundant formulation of PA. It is presumably for
that reason that axiomatic theories like PAPR are not familiar.
Nonetheless, PAPR is an axiomatization that employs only core ideas
about the natural numbers.
When we consider adding a new unary relation %$R$% to the base
language of PAPR%$^{(+)}$% things become more interesting. (We
consider adding a unary relation for definiteness—the same kinds
of considerations would apply for any relation or function.) The
theory PAPR%$^{(+)}$% automatically adds the requisite new instances
of both the primitive recursion and induction schemas when the
language is expanded. Since we are adding the new symbols into the
induction schema, we must establish that the addition is acceptable. I
will offer two arguments for that. The first argument is direct: The use of
primitive recursion in concert with new relations and even across
domains without any presumption of a background theory is such a
well-established part of traditional mathematical practice that no
extra-mathematical objection to it could be tenable. It just is a part
of standard mathematical practice that we do take arbitrary primitive
recursive definitions to be unexceptionable, and we allow them into
the induction schema freely. Moreover, that mathematical practice long
antedates the idea of a background set theory, which provides some
additional support for the common view that the practice does not
require the presupposition of a background set theory.
The second argument is that primitive recursive definitions of
relations are acceptable because they are canonically equivalent to
positive inductive definitions, which I shall argue below are
acceptable on independent grounds.
The theory PAPR%$^{(+)}$% in the language formed by adding
the new predicate %$R$% to the base language has an extension by
definitions that has as parts both PA%$^{(+)}$% and the theory of
functions primitive recursive in %$R$%. The proofs are
straightforward extensions of the proofs of the comparable results
without the new predicate %$R$%. Those facts provide an additional
argument that the full theory PAPR%$^{(+)}$% is
the best codification of our preformal intuitions concerning the
natural numbers. It is the theory that extends in the way that
reflects our unreflective expectations: Anyone who understands
primitive recursion automatically understands primitive recursion in
additional relations. The idea is not so much a new one as an
automatic extension of the old one. It is the theories PAPR%$^{(+)}$%
that reflect that reality.
Perhaps surprisingly, PAPR%$^{(+)}$% is not an essentially new theory: it can
be obtained as an extension by definitions of PA%$^{(+)}$% by
straightforward application of Gödel's method.
It is only when we consider the relativized theory PAPR%$^{(+)U}$% that the
reason for considering PAPR%$^{(+)}$% and not just PA%$^{(+)}$%
becomes apparent. The theory PAPR%$^{(+)U}$% is PAPR%$^{(+)}$% with
the quantifiers in the axioms and schemas relativized to a unary
predicate %$U$%. Unrelativized formulas are allowed as substituends for
the schematic letters in the substitution rule. In addition, we take
PAPR%$^{(+)U}$% to have the formulas %$U(0)$% and
%$(\forall x)(U(x)\rightarrow U(Sx))$% as axioms. (The need for the
additional axioms could be avoided by formulating
the theory without constant or function symbols in the first place.
Comparable axioms for addition and multiplication are not needed:
they are theorems proved by induction.)
The two arguments mentioned earlier for believing the primitive recursion
schema to be an acceptable addition to PAPR%$^{(+)}$% are equally
arguments for believing the primitive recursion schema to be an
acceptable addition to PAPR%$^{(+)U}$%, where, of course, the schema
is now taken to be relativized to %$U$%.
The
raison d'être of PAPR%$^{(+)U}$% is that it makes it
possible to consider the natural numbers in conjunction with other
systems. For example, consider the natural
numbers along with a group. The
axioms may be taken to be PAPR%$^{(+)U}\cup{\text{G}}^{U'}$%,
where %${\text{G}}$% is the axioms for the group.
Natural number exponentiation is
%\[{\mathbb P}(\hat{w}w=e,\hat{x}\hat{y}\hat{z}y\times u=z)(x,z,y),\]%
and existence and uniqueness on the appropriate domains is readily
proved by induction. With PAPR%$^{(+)U}$% it is not necessary
to add anything new to get exponentiation.
Gödel's method does not apply here, since it does
not yield codes for sequences of members of a group, and so
PAPR%$^{(+)U}$% is truly stronger than PA%$^{(+)U}$% in the sense
that for an arbitrary theory %$T$% the theory
PAPR%$^{(+)U}\cup T^{U'}$% need not be an extension by definitions of
PA%$^{(+)U}\cup T^{U'}$%.
%$^{\text{footnote}}$%
The theory PAPR%$^{(+)U}$% is my candidate for a codification of our
basic preformal intuitions concerning the natural numbers. I shall not
attempt to give here all my reasons for taking such primitive
recursion to be central, resting content with noting that many such
functions have historically been employed without the special notice
that would have been required had they been essentially new (for
example, natural number exponentiation of fractions, polynomials, and
complex numbers and iteration of functions) and that those same
functions are introduced today in secondary schools without any idea
that a new method or procedure is being invoked: it is taken for
granted, I believe quite rightly, that such procedures are an
integral part of what has already been taught about the natural
numbers. Moreover, the procedures are acceptable because they are positive inductive definitions—see below.
Now that I have shown that PAPR%$^{(+)U}$% is a natural,
independently motivated theory of the natural numbers not introduced
in an
ad hoc way to solve the Skolem problem, we can see how
it helps with that problem:
Theorem.
Let %$T$% be the theory in the language
%${\mathcal A}^{2}\cup\
^{(+)}+\lnot{\text{Con(PA)}}$%, where Con(PA) is the sentence
asserting the consistency of PA used in Gödel's second
incompleteness theorem, is (if PA is consistent) consistent but it has
no models. Here is the easy proof: A model in the ordinary first-order
sense of this theory must be nonstandard, since PA is consistent,
but PA%$^{(+)}$% has no nonstandard models by the categoricity theorem.
I do not know of a nontrivial necessary and sufficient condition for a
full schematic theory to be satisfiable, certainly not one as
straightforward as simple consistency, but there is a useful general
sufficient condition. There is also an additional sufficient condition
useful for some special cases, including PA.
My main interest in full schematic theories has been their use in
characterizing mathematical domains. The full schematic principles of
interest concern characteristics of the domain that survive the
arbitrary addition of new relations. It is therefore natural to ask
when the addition of a theory %$A$% to a full schematic theory
%$T^{(+)}$% adds new relations without changing the domain. I shall
call such an addition an
acceptable addition. For example, adding
%$\lnot{\text{Con(PA)}}$% to a theory of arithmetic is not acceptable,
because satisfying it may require adding new elements (nonstandard
numbers) to the domain.
If an addition to a full schematic theory is an acceptable addition,
then any intended model of the full schematic theory can be expanded
to a model of the addition. Thus, if a full schematic theory is
satisfiable, so is the theory plus any acceptable addition.
The condition is sufficient, but it is surely not necessary. For example,
large cardinal axioms are not acceptable additions to set
theory—the whole point is that they add things to the
domain. Nonetheless, there may well be good reason to allow adding on
large cardinal axioms.
We can give the notion of an acceptable
addition a mathematically precise explication as follows: an addition
%$A$% to a theory %$T$% is
acceptable with respect to a class of models %$\Gamma$%
if every model of %$T$% in %$\Gamma$% has an
expansion to a model of %$A$%. Actually, we shall need a slightly
more general definition to allow additions with free variables for
parameters: an addition %$A$% to a theory %$T$% is acceptable with
respect to a class of ordered pairs of the form
%$\langle {\mathfrak A},s\rangle$%, where %${\mathfrak A}$% is a model and
%$s$% is an assignment for %${\mathfrak A}$%, if every pair
%$\langle {\mathfrak A},s\rangle$% in %$\Gamma$% is such that if
%${\mathfrak A}$% is a
model of %$T$% then %${\mathfrak A}$% has an expansion to a model in
which %$s$% satisfies %$A$%.
In the case of PA%$^{(+)}$%, the intended model is a smallest model.
It is isomorphic, in a suitable sense, to a substructure of every
structure that is a model of PA%$^{(+)}$% with respect to a fixed language.
That follows from the following theorem:
Theorem.
Let %$T$% be a theory that is complete for atomic sentences (that is,
if %$\sigma$% is an atomic sentence then either %$\sigma$% or
%$\lnot\sigma$% is a consequence of %$T$%).
Then, with respect to a fixed language, any model of %$T$%
in which every member of the domain is denoted by a closed term is a
smallest model of %$T$%.
The theory PA, and hence PA%$^{(+)}$%, satisfies the hypothesis:
The intended model is one in which every
member of the domain is denoted by a closed term. The atomic
sentences are all variable-free equations, and PA suffices to prove
all the true ones and to disprove all the false ones.
Proof:
Let %$T$% be a theory and %$\mathfrak A$% be a model as in the
hypothesis of the theorem. Let %$\Delta$% be the set of all atomic
and negation atomic sentences that are consequences of %$T$%. Then
%$\mathfrak A$% is a model of %$\Delta$%, and %$\Delta$% is a maximal
consistent set of atomic sentences. Thus, %$\Delta$% is the atomic
diagram of %$\mathfrak A$%. Since every model of %$T$% is a model of
%$\Delta$%, it follows that %$\mathfrak A$% is isomorphic to a
substructure of every model of %$T$%.
%$\square$%
Since PA%$^{(+)}$% has a smallest model, it will be useful to show
that any universal theory (that is, any set of formulas of the form
%$(\forall \overline{x})\phi$%, where %$\phi$% is free of
quantifiers) is an acceptable addition to any theory %$T^{(+)}$% with
respect to the class of smallest models of %$T^{(+)}$% in a fixed language
if %$T^{(+)}+A$% is consistent:
Theorem.
Suppose that some theory %$T$% has a smallest model %$\mathfrak A$%.
Then %${\mathfrak A}$% can be expanded to a model of any universal
theory %$A$% that is consistent with %$T$%.
Proof:
Since %$T+A$% is consistent, it has a model %$\mathfrak B$%. Since
%$\mathfrak A$% is a smallest model of %$T$%, we may as well assume
that %$\mathfrak A$% is a substructure of the reduct of %$\mathfrak
B$% to the language of %$\mathfrak A$%. The substructure of
%$\mathfrak B$% with universe the universe of %$\mathfrak A$%, call
it %${\mathfrak A}^+$%, is thus a model of %$T$%. Since
%${\mathfrak A}^+$% is a submodel of %$\mathfrak B$% and %$\mathfrak B$% is a
model of %$A$%, it follows that %${\mathfrak A}^+$% is a model of
%$A$%—the theory %$A$%, being universal, has the submodel property.
%$\square$%
By the Matijasevi%$\v{\text{c}}$%--Robinson--Davis resolution of Hilbert's tenth
problem, there is a universal sentence
%$\sigma$% in the language of PA that can be proved in PA to be
equivalent to Con(PA). Since %$\sigma$% is universal,
%${\text{PA}}^{(+)}+\sigma$% is satisfiable, by the theorem. The
preceding considerations now give us a justification for our
intuitive preference of PA+Con(PA) over PA%$+\lnot$%Con(PA): By
the theorem, if PA is consistent, PA%$^{(+)}+\sigma$%, and
therefore PA%$^{(+)}+$%Con(PA), is satisfiable in the base language,
and therefore, by the categoricity result, PA%$^{(+)}+\lnot$%Con(PA) is not.
Next, we show that one can always add certain inductive definitions
of new relations to any full schematic theory, that is, that such
additions are acceptable additions to any full schematic theory with
respect to the class of all ordered pairs of models and assignments
for them.
I shall introduce the notion of an inductive definition following
Moschovakis, though my terminology is perhaps closer to that of
Aczel.
An %$n$%-_ary operator_ %$\Gamma$% on a set %$A$% is a function
from %${\text{power}} (A^{n})$% to %${\text{power}} (A^{n})$%. An %$n$%-ary operator
%$\Gamma$% on %$A$% is
monotone if for all
%$X, Y\subseteq A^{n}$%, if %$X\subseteq Y$%, then %$\Gamma (X)\subseteq \Gamma (Y)$%.
A set %$B\subseteq A^{n}$% is a
fixed point of an
%$n$%-ary operator %$\Gamma$% on %$A$% if %$\Gamma (B)=B$%.
Theorem.
Every monotone operator %$\Gamma$% has a fixed point, and indeed a
least fixed point.
The proof is straightforward: Define, for any ordinal %$\xi$%,
%$I_{\Gamma}^{\xi}=\Gamma (\bigcup_{\eta <\xi }I_{\Gamma}^{\eta})$%
and let %$\kappa$% be the least ordinal whose predecessors form a set
of cardinality the cardinality of %$A$%. Then %$I_{\Gamma}^{\kappa ^{+}}$% is the least fixed point (compare Moschovakis).
Let %$\mathfrak A$% be a structure for the language %$\mathcal L$%
with universe %$A$%, let %$S$% be an %$n$%-placed relation symbol not in
the language %$\mathcal L$%, let %$\phi (x_{1}, \dots , x_{n},y_{1},\dots ,y_{m})$% be a formula in the language %${\mathcal L}\cup\{S\}$% with all free variables among those displayed, and let
%$a_{1}, \dots ,a_{m}$% be members of %$A$%. Then the %$n$%-ary operation
%$\Gamma _{\phi , a_{1}, \dots ,a_{m}}^{\mathfrak A}$% on %$A$%, the
operation defined on %$\mathfrak A$%
by %$\phi$%
with parameters
%$a_{1}, \dots ,a_{m}$%, is the operation that takes any
%$B\subseteq A^{n}$% to
%\[\left\{\langle b_{1}, \dots ,b_{n}\rangle\in A^{n}:({\mathfrak A},B)\models\phi\left[\!
\atop{b_{1},\dots , b_{n},a_{1}, \dots ,a_{m}}}\!\right]\right\},\]%
where %$({\mathfrak A}, B)$% is the expansion of %$\mathfrak A$%
to the language %${\mathcal L}\cup\{ S\}$% in which the
interpretation of %$S$% is %$B$% and where the material in square
brackets indicates the obvious assignment of objects to variables. If
%$S$% occurs positively in %$\phi$% (that is, if the connectives
%$\rightarrow$% and %$\leftrightarrow$% do not occur in %$\phi$% and
if no occurrence of %$S$% in %$\phi$% is inside the scope of a negation
sign), then %$\Gamma _{\phi , a_{1}, \dots ,a_{m}}^{\mathfrak A}$% is
monotone, as may be shown by an easy inductive proof
(compare Moschovakis). When %$S$% occurs positively in %$\phi$%, we say
that %$\Gamma _{\phi , a_{1}, \dots ,a_{m}}^{\mathfrak A}$% is an
operation defined by a positive formula. Note that %$B$% is a fixed
point of %$\Gamma_{\phi , a_{1}, \dots ,a_{m}}^{\mathfrak A}$% if and
only if any assignment that assigns %$a_{1}, \dots ,a_{m}$% to
%$y_{1}, \dots ,y_{m}$% satisfies the formula
%\[\label{eqfp}(\forall x_{1},\dots x_{n})(S(x_{1},\dots x_{n})\leftrightarrow\phi )\]%
in the structure %$({\mathfrak A},B)$%.
Standard inductive definitions are (or at least can easily be put
into) the above form, but what one is normally interested in is not
just any fixed point, but the least fixed point. In the notation of
our axiomatization of the notion of a fixed point just above, let
%$Q$% be a new %$n$%-placed schematic variable. Then %$B$% is the
least fixed point of %$\Gamma _{\phi , a_{1}, \dots ,a_{m}}^{\mathfrak A}$% if and only if %$B$% is a fixed point of
%$\Gamma _{\phi , a_{1}, \dots ,a_{m}}^{\mathfrak A}$% and any
assignment that assigns %$ a_{1}, \dots ,a_{m}$% to %$ y_{1}, \dots ,y_{m}$% satisfies the full schema
%\[(\forall x_{1})\dotsm (\forall x_{n})\left(Q( x_{1}, \dots ,x_{n})\leftrightarrow \phi \left[\!\!\left[\!
\!\right]\!\!\right]\right)\]%
%\[\qquad \rightarrow (\forall x_{1})\dotsm (\forall x_{n})(S( x_{1}, \dots ,x_{n})\rightarrow Q( x_{1}, \dots ,x_{n}) )\]%
in the structure %$( {\mathfrak A} ,B)$%. Similarly, %$B$% is the
greatest fixed point of
%$\Gamma _{\phi , a_{1}, \dots ,a_{m}}^{\mathfrak A}$%
if and only if the same condition obtains,
but with the formula above replaced by
%\[(\forall x_{1})\dotsm (\forall x_{n})\left(Q( x_{1}, \dots ,x_{n})\leftrightarrow \phi \left[\!\!\left[\!
\!\right]\!\!\right]\right)\]%
%\[\qquad \rightarrow (\forall x_{1})\dotsm (\forall x_{n})(Q( x_{1}, \dots ,x_{n})\rightarrow S( x_{1}, \dots ,x_{n}) ).\]%
As it happens, none of the proofs in this work require the
information that a given fixed point is least or greatest, and so the
conditions for a fixed point to be least or greatest play no role in
anything that follows.
I now show that every function defined by primitive recursion on the
natural numbers is the unique (and therefore
the least and the greatest) fixed point of an operator defined by a
formula in which the new symbol occurs positively.
Let %$T^{(+)}$% be a theory for the language %$\mathcal L$%, let %$S$%
be an %$n$%-placed relation symbol not in the language %$\mathcal L$%,
and let %$\phi (x_{1}, \dots , x_{n},y_{1}, \dots ,y_{m})$% be a
formula in the language %${\mathcal L}\cup \{S\}$% with all free
variables among those displayed in which all occurrences of %$S$% are
positive. Observe that the formula %$\Phi =$%
%\[(\forall x_{1},\dots x_{n})(S(x_{1},\dots x_{n})\leftrightarrow\phi )\]%
is an acceptable addition to %$T^{(+)}$% with respect to the class of
all pairs of models and assignments for them: By the theorem above,
for any model %$\mathfrak A$% of %$T^{(+)}$% and any assignment %$s$%
for %$\mathfrak A$%, there is a fixed point %$B$% of
%$\Gamma^{\mathfrak A}_{\phi,s(y_1) ,\dots ,s(y_m)}$%. Thus, %$s$%
satisfies %$T^{(+)}\cup\{\Phi\}$% in %$({\mathfrak A},B)$%. We have
shown:
Theorem.
Let %$T^{(+)}$% be a theory for the language %$\mathcal L$%, let %$S$%
be an %$n$%-placed relation symbol not in the language %$\mathcal L$%,
and let %$\phi (x_{1}, \dots , x_{n},y_{1}, \dots ,y_{m})$% be a
formula in the language %${\mathcal L}\cup \{S\}$% with all free
variables among those displayed in which all occurrences of %$S$% are
positive. If %$\mathfrak A$% is a model of %$T^{(+)}$% and
%$a_{1},\ldots ,a_{m}$% are members of the universe of
%$\mathfrak A$%, then there is an expansion of %$\mathfrak A$% in which
%$c_{1},\ldots ,c_{m}$% are new constant symbols that denote
%$a_{1},\ldots ,a_{m}$% that satisfies
%\[T^{(+)}\cup\{(\forall x_{1},\dots x_{n})(S(x_{1},\dots x_{n})\leftrightarrow\phi (x_{1},\ldots ,x_{n},c_{1},\ldots ,c_{m}))\}.\]%
The theorems on acceptable additions just proved provide powerful criteria for
determining what can be added to full schematic theories, but they
are not suited to play a foundational role. Their statements, not to
mention the proofs given, make strong use of a background set theory.
For present purposes, they are to be regarded merely as
ex post facto justifications, and perhaps generalizations, of principles
concerning acceptable additions that we take as basic, intuitive, and
well-established parts of mathematical practice: We can add any
universal theory consistent with arithmetic to arithmetic, and we can
add fixed points of operations defined by positive formulas to any
full schematic theory. I take it that the operative sense of "can"
in the last sentence is that the theory with the addition is
acceptable in the pretheoretic sense that it only adds new relations,
without the need for a change in domain or a change in the full
schematic character of the theory
The notion of acceptability is an intuitive one that cannot be made
mathematically precise without set-theoretic apparatus to which I am
not entitled for my foundational purposes, but all I need is the result
that the definition of a fixed point of an
operation defined by a positive formula is an acceptable addition to
any full schematic theory.
Take variables %$u,x,y,z$% to be implicitly bound to %$U$% and
variables %$x',y',z'$% to be implicitly bound to %$U'$%. That is, we
take, for example, quantifications %$(\forall x) \phi $% and
%$(\exists x') \phi $% to be abbreviations for %$ (\forall x) (U(x)\rightarrow \phi )$% and %$ (\exists x') (U'(x')\land \phi )$%,
respectively. This convention leaves us without a way to write
ordinary, unbounded quantifiers, but that is not a serious problem,
since we have virtually no use for such quantifiers.
Back to arithmetic
Introduce a new relation %$x<y$% defined as
%$ (\exists u) (u\ne 0\land x+u=y)$%, and we introduce %$x'<'y'$% in
a parallel way. We
can then introduce bounded quantifiers %$ (\forall x<y) $%, and so
forth, as usual. Thus, for example, %$ (\exists x'<y') \phi $% is an
abbreviation for %$(\exists x')(x'<'y'\land \phi )$%, which, in turn,
is an abbreviation for %$(\exists x')(U(x')\land (x'<'y'\land \phi ))$%,
a formula in which the quantifier is interpreted in its
ordinary sense.
Define a new binary relation %$I$% as follows, employing both the
unprimed and the primed vocabulary:
%\[(\forall x) (\forall x') (I(x,x') \leftrightarrow ((\forall y<x)(\exists y'<'x')I(y,y')\land (\forall y'<'x')(\exists y<x)I(y,y'))).\]%
Note that the definition of %$I$% is a definition of a fixed point of
an operation defined by a positive formula, and that it is therefore
an acceptable definition. The relation %$I$% is the isomorphism we defined
above using primitive recursion.
To see that primitive recursive definitions of relations
are acceptable, we note that the formula
%\[{\mathbb P}(P,Q)(x,y)\leftrightarrow (x=0\land P(y))\lor (\exists u)(\exists v) (x=Su\land {\mathbb P}(P,Q)(u,v)\land Q(u,v,y))\]%
is a positive inductive definition of %${\mathbb P}(P,Q)(x,y)$%. The
addition of %${\mathbb P}(P,Q)(x,y)$% is therefore acceptable, by the theorem on the acceptable addition of positive inductive definitions.
--
ShaughanLavine - 16 Mar 2005
Notes
,
,
:
:
:
:
:
:
:
,
: