A defense of syntactic freedom; or, First-order logic is better than you think.
In a recent email exhortation, @@@ Ian Horrocks asks us to restrict ourselves to well-understood principles and make sure that we are always working inside some subset of first-order logic (FOL). I entirely agree, but Ian and I come to different conclusions about what this means. Since my view is perhaps more radical than Ian's, and certainly less well documented, I explain it here and explain why some of the concerns that have been expressed about its internal coherence are unwarranted.
What counts as first order?
First, there is no such thing as the 'correct' or 'classical' or 'normal' first-order logic. There are probably as many variations of first-order logic as there are logicians, and among the dozens, perhaps hundreds, of logic textbooks that have been published almost no two use exactly the same logical system. The syntax can be written in a large variety of ways (infix, prefix, postfix; graphically like Peirce or like Frege; etc.); there are many widely different notions of first-order proofs (natural deduction systems, Gentzen tablaux, resolution-style proof systems, etc.) and there are even several semantic options. Some versions allow function symbols, others not; some use few connectives, some use only one, others allow many; and so on. I emphasize this rather elementary point only to provide some pre-emptive answer to the frequently heard complaint that any new idea violates the norms of 'normal' first-order logic. Of course all these varieties are profoundly alike in a certain sense (modulo a few well-known distinctions, such as the presence or absence of equality) but the issue we need to face is, how best to characterize this common quality? What makes a logic first-order?
One kind of answer, the one most often found in elementary textbooks, is purely syntactic. It characterizes first-order languages by their conformity to a kind of syntactic layering: there are individual symbols and relation symbols, and a language is first-order if it respects this layering by keeping these categories distinct and only allowing quantification over the individuals. In contrast, higher-order languages (which are well known to be less tractable in many ways) have a syntax which blurs or even eliminates this distinction by allowing relations to hold betwen other relations and allowing quantification over relations. On this view, then, the essentially first-order nature of FOL is essentially a syntactic restriction. This has the merit of being very easy to state and trivial to check. However, it misses the essential point.
Another kind of answer would be semantic: a language is first-order if it has a conventional Tarskian model theory in which individual names denote things in the universe and relation names denote relations over the universe. This is better, but it ignores the fact that one can give such a semantics to a notation which most people would consider non-first-order, such as type theory.
I wish to suggest a different answer, based on the relationship between the semantic and proof-theoretic qualities of the language. The reason why FOL plays such a central role in the foundations of mathematics and in knowledge representation is not because of a simple syntactic restriction. FOL has a fundamental logical character which make it in a sense into the canonical computational representational logic: it is the strongest possible relational logic which is both complete and compact. Relational means that the logic mentions entities and relations between them. Completeness means that proof theory and the semantics are in correspondence. Without this, of course, one could erect all kinds of semantic fantasies but they would have no bearing on the actual proofs that could be obtained. Compactness means that proofs can only use finite resources. Any conclusion from an infinite set of assumptions must be finitely derivable from some finite subset of them. This means that proofs are things that can actually be implemented and constructed by mortal people and physical machines.
Now, these are very general conditions; and, moreover, they make a certain basic sense. They are not arbitrary or capricious. Any logic which is going to be used by finite agents to represent content must be first-order in this sense, and so it seems interesting to characterize the nature of this strongest logic in this way. In
This allows for any surface syntactic form, any version of the many interdefineable combinations of syntactic features, and any finitistic proof theory to be counted as FOL. When it comes to semantics, the variety of options available at the syntax and proof-theory levels reduces to a single common picture which can also be characterized very elegantly. The semantics of FOL is a mathematical theory of how the language can be interpreted in a world, and it is the simplest possible such theory. That is, it makes the minimal assumptions about the nature of the world, consistent with its intended function of assigning a truth-value to every expression of the logic. Call this the principle of semantic agnosticism; the theory of truth should make the assumptions needed to determine the truth of any expression, but no more.
Applied to any logic with connectives such as and, or, not and implies, for example, this requires that the semantics provide truthtables for the connectives, in order to determine the truthvalues of expressions built up using the connectives. This is not to say that these truth-tables are the full meanings of these words; only that they are what the formal semantics is required to specify. Perhaps to assert (p implies q) is to say more than ((not p) or q); no matter, because that extra meaning in the word 'implies', if there is any, does not affect the truth-value of the expression in any particular world; and the logical semantics is concerned only with truth in this strict sense of correspondence to a world.
Developing FOL from semantic agnosticism.
In this section we will show that applying the agnosticism principle rigorously can guide the task of designing a model theory for a relational language so as to almost completely define both the model theory and the syntax of the language. As a deliberate policy, whenever the we will always prefer the option which imposes the fewest constraints on the
constraints of the language. The aim here is to identify Applied to the quantifiers, this principle requires that a world specify a set of entities to quantify over (to make sense of 'for all'), and for each predication of a relation expression to some arguments, whether or not the predication is true. This amounts to assigning to each thing which is denoted by any relational expression, a set of n-tuples: those sequences of arguments which make a predication of this relation true. We know they are finite, because the language is compact. Thus, the agnosticism principle requires that an interpretation define a nonempty set U, and a mapping IEXT from some subset R of U to U*, the set of finite sequences over U, and that any expression which occurs in the relation position in any atomic sentence denotes an entity in the subset R; and that is all. Any such structure will suffice to assign a unique truthvalue to any expression of a first-order language. Notice that nothing has been said here about the nature of the entities in the universe set. It is an essential part of the nature of FOL that it imposes no conditions on its universe other than it be nonempty; any such condition would limit its generality.
This condition has an inherent awkwardness, however, considered as a semantic condition, since it refers to the semantic forms of the language: IEXT is required to be defined on denotations of expressions which occur in a relational position in an atomic sentence. One could work within this awkwardness, but the language is usually tidied up in order to eliminate this condition. One way to do this is to simply declare a syntactic category of relation symbols, impose a syntactic restriction that all atomic expressions must begin with a relation symbol, and require all relation symbols to have denotations in the domain of IEXT. A simpler strategy, which amounts to a special case of the above, is to simply require that IEXT be defined on the entire universe. These are closely related: one can obtain the second from the first by assigning empty extensions to non-relation symbols. Since the second is simpler and requires no extra syntactic restrictions to be placed on the langauge, we will adopt it.
but not that it specify anything further about the nature of those entities. This