Acknowledgements.

This document represents the combined efforts of the SCL working group, a self-selected group comprising Murray Altheim, Bill Anderson, Pat Hayes, Chris Menzel, John F. Sowa, and Tanel Tammet. Contributions were also made by Michael Gruninger, Geoff Sutcliffe, Kenneth Murray, Jay Halcomb, Robert E. Kent, Elisa Kendall, David Fraenkel and Mark Stickel. The work was an outgrowth of earlier work by the KIF/CL working group comprising, in addition to the above, Adam Pease, Michael F. Uschold, Christopher A. Welty and David Whitten, with contributions from Mike Genesereth. The ancestor of this entire effort was KIF, authored by Mike Genesereth.

This style indicates notes and comments which are not part of the final text. Some are dated and may have been superceded by others with a later date.

This style indicates text that is likely to change or has been called into question.

Last modified 11/8/04

NOT YET DONE:
1. XCL syntax. But see here for early draft
2. datatyping. But see here for outline.
3. Mappings from other languages into SCL. See here.

NOTE. Following suggestions by Tanel Tammet, the language has been modified from earlier versions by introducing two syntactic styles for atoms and terms depending on whether the operator name is intended to denote. This allows a very considerable simplification of the model theory and a more transparent mapping into conventional logical syntax. Earlier versions should be disregarded.

SCL: Simple Common Logic

SCL is a first-order logical language intended for information exchange and transmission. SCL allows for a variety of different syntactic forms, called dialects, all expressible within a common XML-based syntax and all sharing a single semantics.

SCL Requirements

SCL has been designed with several requirements in mind, all arising from its intended role as a medium for transmitting logical content on the WWWeb.

1. Be a full first-order logic with equality.

1a. SCL syntax and semantics should provide for the full range of first-order syntactic forms, with their usual meanings. Any conventional first-order syntax should be directly translateable into SCL without loss or alteration of meaning.

2. Provide a general-purpose syntax for communicating logical expressions.

2a. There should be a standard XML syntax for communicating SCL content.

2b. The language should be able to express various commonly used 'syntactic sugarings' for logical forms

2c. The syntax should relate to existing logical standards and conventions; in particular, it should be capable of rendering any content expressible in RDF, RDFS or OWL.

2d. The syntax should provide for extendability to include future syntactic extensions to the language, such as modalities, extended quantifier forms, nonmonotonic constructions, etc..

2e. There should be at least one compact, human-readable syntax defined which can be used to express the entire language

2f. The notation should not make gratuitous or arbitrary assumptions about logical relationships between different expressions, particularly if these assumptions can be expressed in SCL directly.

3. Be 'web-savvy'

3a. The XML syntax must be compatible with the specs for XML, URI syntax and XML Schema, Unicode and other standards relevant to transmission of information on the WWWeb.

3b. URI references can be used as logical names in the language

3c.  URI references can be used to to give names to expressions and sets of expressions, to facilitate Web operations such as retrieval, importation and cross-reference.

3d. SCL must contain a general-purpose datatyping convention which handles the XSD datatype suite so as to be compatible with RDF, RDFS and OWL usage.

4. Be an open-network common logic

4a. Transmission of content between SCL-aware agents should not require negotiation about syntactic roles of symbols, or translations between syntactic roles.

4b. Any piece of SCL text should have the same meaning, and support the same inferences, everywhere on the network. Every name should have the same logical meaning at every node of the network.

4c. No ontology can limit the ability of another ontology to refer to any entity or to make assertions about any entity.

4d. The language should support ways to refer to a local universe of discourse and relate it to other universes.

4e. Users of SCL should be free to invent new names and use them in published ontologies.

SCL design overview

First, if we follow the convention whereby any language has a grammar, then SCL is a family of languages rather than a single language. The different SCL languages, referred to here as dialects, may differ sharply in their syntax, but they have a single uniform semantics. Membership in the family is defined by being intertranslateable with the other dialects while preserving meaning, rather than by having any particular syntactic form. Several existing languages can be considered to be SCL dialects.

One SCL dialect plays a special role. The XML syntax for SCL, here called XCL, is designed to serve three functions. It is a standard intercommunication notation for machine use on a network; it is the single 'reference syntax' into which all other dialects can be translated or embedded [requirement 2a], and relative to which the SCL semantics is defined; and it provides forms which can be used to express various extensions and relations to existing standards [requirements 2b, 2c, 2d ] and to support existing conventions regarding URI usage [requirements 2c, 3c, 3d and 4e ]. XCL syntax follows conventional XML usage and is partly inspired by MathML. It is not intended to be compact or human-readable. This document also independently defines a more conventional dialect, SCL Core, which is intended to be compact, human-readable [requirement 2e] and more similar to conventional machine-oriented logics, notably to KIF [ref KIF]. SCL Core syntax is used in giving examples.

SCL syntax summary

SCL follows conventional logical syntactic conventions. The particular syntactic primitives chosen are a compromise between a 'minimal' set of logical primitives and a 'full' set, and embody several commonly used abbreviated forms, such as allowing conjunctions and disjunctions with any number of arguments, and alternative forms for atomic sentences.

Most SCL expressions can be described as an operator applied to a set or sequence of operands. The core syntax uses sequences throughout, but other dialects can use sets in some cases. We list the basic syntactic forms here for convenience. An exact EBNF for the core syntax is given in section @@ .

 syntax category   description  general syntactic form or immediate syntactic constituents core syntax
  name  primitive  referring expression 

 a character string

abc
"a b)c"

  term denotes the value of a function applied to some arguments  a function term applied to a sequence of argument terms (f a b ...)
f(a b
...)

  atom(ic sentence) 

asserts that a relation holds between arguments   a  relation term applied to an argument sequence; or
a relation term applied to a set of role/argument pairs

(R a b ...)
R(a b
...)
(R (roleset (r1 a)(r2 b))
R((roleset (r1 a)(r2 b))

  equation asserts that its arguments are equal  two terms (= a b)
  boolean conjunction means that all its components are true  a set of conjuncts (and p q r ...)
 boolean  disjunction means one of its components is true  a set of disjuncts (or p q r ...)
boolean implication means that the antecedent implies the consequent  an antecedent and a consequent (implies p q)
boolean biconditional  means that each of two sentences implies the other  two sentences (iff p q)
boolean negation means that its inner sentence is false  a sentence (not p)
existential quantifier means that its body is true for some interpretation of its variables in the guard  an optional guarding name, a sequence of bindings, and a body which is a sentence (exists g (a (b c)...) p)
(exists (a (b c)
...) p)
universal quantifier means that its body is true for any interpretation of its variables in the guard  an optional guarding name, a sequence of bindings, and a body which is a sentence (forall g (a (b c)...) p)
(forall (a (b c)
...) p)
SCL text a collection of SCL sentences  a set of sentences p q r ...
module definition gives a name to a piece of SCL text and optionally provides some external information  a name, an optional header, and some SCL text (module a (header p q ...) r s ...)
argument sequence general form for arguments of terms and atoms

a sequence of terms, optionally followed by a sequence variable

a b c
a b c ...x
role/argument pair alternative form for atom arguments a role name together with an argument which is a term (r1 a)
binding general form for indicating variables bound by quantifiers a name; or
a name together with a type which is a term
a
(a type)

Most of SCL is a conventional first-order logical syntax in the style of KIF, ie with prefixed connectives and expression structure indicated by parentheses. Here the unusual features are briefly reviewed.

SCL names can be any character string, but since the core syntax uses whitespace, parentheses and a few other strings, the option of enclosing a name string inside double-quote marks is provided. There is no special syntactic distinction between variables and names - a variable is simply a name that has been bound by a quantifier - but such conventions can be used legally within the core syntax if found convenient by applications, eg KIF's use of a ? prefix, or the Prolog upper/lower case convention.

SCL allows two distinct basic forms for terms and atoms, which are rendered in the core syntax by the forms (P a b) and P(a b). Both of these represent the result of applying P to the sequence of arguments <a, b>, and they have the same truth conditions, but they differ in the conditions they impose on P. The first form requires the denotation of P to be in the universe of discourse; the second form requires it to not be in the universe of discourse. They are thus mutually incompatible, and their conjunction is always unsatisfiable. The first form treats the function or relation in exactly the same way as any other SCL name. It allows any term to be used to denote the function or relation, and allows the function or relation term to be bound by a quantifier and to be subject to normal processes of first-order reasoning such as equality substitution. The second form, corresponding to a conventional first-order syntax, requires that the function or relation be identified by a single name which cannot be bound by a quantifier. The advantages and disadvantages of the distinct forms are discussed later. Both forms can be used anywhere in SCL text, but the use of a single relation or function name in both forms in the same text is an error.

SCL atom and term syntax also allows for variadic functions and relations, which can take any finite number of arguments. Under these conditions it is useful to allow a limited form of schematic quantification over arbitrary sequences, in the style of KIF's use of sequence variables. SCL limits such variables to the last place in any argument sequence, and does not allow explicit quantification over sequence variables. Intuitively they are thought of as universally quantified at the 'top level' of any expression. As described below in more detail, this allows a useful degree of 'recursive' expressiveness in axiom schema, without taking the language beyond first-order, and while retaining some useful computational properties, such as the possibility of a general unification algorithm. The three-dot ellipsis mark ... is used as a prefix to indicate a sequence variable.

As an additional option, an SCL atom can be presented as a relation name followed by a set of role/argument pairs. Formally, this is syntactic sugar for an existential statement, following the pattern

(P (roleset (r1 a1)(r2 a2) ... )) 

is sugar for

(exists (x) (and (P x) (r1 x a1) (r2 x a2) ... ))

Intuitively, the x here indicates the presence of an entity which can be called the fact of P's holding, which is related to the things involved in the fact in the ways indicated by the roles. Relational roles were chosen in order to allow uses where a role may have multiple values. This form of atomic assertion is often used to

The SCL quantifier syntax allows a single quantifier to bind a sequence of names (variables). The quantifier syntax allows any name to be relativized by a term indicating a sort restriction, eg

(forall (a (b maleHumanBeing) c) <body>) 

is sugar for 

(forall (a b c) (implies (MaleHumanBeing b) <body>))

with a corresponding form for the existential using conjunction; and also allows for an optional guard. This is a little more complex to specify as sugar:

(forall g (a b ... c) <body> ) 

is sugar for 

(forall (a b ... c)(implies (g a b ... c x ... z) <body> )

where x ... z are all the names which are free in <body>; and similarly for the existential.

The motivation for including this as a syntactic option is that the use of guarded quantifiers provides a natural way to indicate contextuality of quantification within an ontology, as described in section @@@, and also this fits within a robustly decideable subcase of first-order logic. [refGuardFrag]

Finally, the top-level SCL module syntax provides for naming a piece of SCL text and optionally associating it with another special-purpose SCL text called a header.  Headers provide a range of abilities to relate ontologies to one another, but can be omitted by applications requiring only the transmission of bare logical text.

Rationale

Most of the design aspects of SCL can be understood by considering the following scenario. Two people, or more generally two agents, A and B, each have a logical formalization of some knowledge. They now wish to communicate their knowledge to a third agent C which will make use of the combined information so as to draw some conclusions. All three agents are using first-order logic, so they should be able to communicate this information fully, so that any inferences which C draws from A's input should also be derivable by A using basic logical principles, and vice versa; and similarly for C and B. The goal of SCL is to provide a logical framework which can support this kind of communication and use without requiring complex negotations between the agents (requirement 4a). This ought to be simple, but in practice there are many barriers to such communication.

First, A and B may have used different surface syntactic forms to express their knowledge. This is a well-known problem and various proposals have been made to solve it, usually by defining a standard syntax into which others can be translated, such as KIF [refKIF]. SCL approaches this by requiring all SCL dialects to be translateable into XCL. The advent and widespread adoption of XML technology has made this problem easier to solve, by makikng it possible to divorce the 'structured' specification of the langauge from any particular surface form. XCL allows the same SCL logical structrure to be assigned a variety of surface forms, and for various surface forms - dialects - to be transmitted unchanged within XCL markup if required. In many cases, the original surface form of the dialect can be regenerated from the XCL by adding suitable CSS style sheets.

Second, A and B may have made divergent assumptions about the logical signatures of their formalizations. Conventionally, a first-order language is defined relative to a particular signature, which is a lexicon sorted into disjoint sublexica of individual names, relation names and perhaps function names, the latter two each associated with a particular arity, i.e. a fixed number of arguments with which it must be provided. This provides a simple context-free method of checking well-formedness of terms and atomic sentences, and ensures that any expression obtained by substitution of well-formed terms for variables will be assigned a meaningful interpretation. However, it is not practical to assume that all users of a logic on an open network will adopt the same signature, particularly when new names can be invented and published at any time [requirement 4b]. It is common for one agent to use a relation name for a concept described by another as a function, for example, or for two agents to use the same relation with different argument orderings or even different numbers of arguments. More radically, a particular concept, such as marriage, might be represented by A as an individual, but by B as a relation. Often, one can give mappings between the logical forms of such divergent choices, many of which are widely familiar; but in a conventional first-order framework, these have to be considered metasyntactic mappings which translate between distinct first-order formalizations; and very few general frameworks exist to define such meta-syntactic mappings on a principled basis, in a way that allows reasoning agents to draw appropriate conclusions. The notion of signature is appropriate for textbook uses of logic and for stand-alone formalizations: but applied to an interchange situation, where sentences are distributed and transmitted over a network (in particular, on the WWWeb) and may be utilized in ways that are remote from their source, this would require either that a single global convention for signatures be adopted, which is impractical and violates requirement 4b, or else that communication between agents involve a negotation about how to translate one signature into another, in violation of requirement 4a.

Consider the situation of two agents A and B communicating information to C, and suppose that there is some concept used by both A and B but with different signatures. For example, A might use a relation married with two arguments (the people who are married), while B might use it with three (two people and a time-interval); or as a function from people to time-intervals. Cases like this can be handled very simply merely by allowing a symbol to play more than one role. SCL simply allows relation and function symbols to be variadic, i.e. to take any number of arguments, and it allows a relation name to be used as as function name and vice versa. Thinking in traditional terms, this amounts to allowing a kind of 'punning', often allowed in mechanized inference systems, where a single name is allowed to play several syntactic (and hence semantic) roles at once. SCL treats this more simply by conflating the various relational roles into a single notion of a variadic relation, and similarly for functions.

This is purely a syntactic 'permission' to use a name in multiple roles: SCL does not assume any particular relationship between the truth of sentences in which the name plays these various roles, so for example (R a) neither entails nor is entailed by (R a b) or (exists (x) (R a x)) or any other sentence constructed using R with two arguments; and similarly SCL sanctions no logical entailments between sentences using R as a relation and other sentences using R as a function. This is in conformity with requirement 2f.  Such conventions can be stated as axioms in SCL itself, but the SCL semantics does not impose them as a matter of logical necessity. Since there are no relevant logical principles which would mandate any such connection if distinct names were used for the various signature-distinct cases of the name, this punning does not violate the 'common logic' principle; for example, the various uses of a single name could be rendered lexically distinct by subscripting names with tags such as -rel-3 or -fun-2, and the meaning of the SCL would be unchanged.

It is also possible that A uses a name such as married as an individual name while B uses it as a relation name. For example, A might have an ontology of interpersonal states, perhaps categorized in some way. Identifying an individual with a relation may seem like a direct violation of first-order conventions, but it is important to understand that this is not necessarily an irreconcileable difference of opinion between A and B. The traditional Tarskian semantics for first-order logic requires that the universe of discourse be a nonempty set of things called 'individuals', but it does not mandate what kind of things must be in this set. In particular, there is no requirement that individuals be in any sense inherently non-relational in nature; in fact, it is quite common to introduce relations into a first-order theory by treating them as arguments to a 'dummy' relation, writing expressions such as Holds(R a b) rather than R(a b). Thus, A's use of a name to denote an individual, and B's use of the same name to denote a relation, need not be considered a difference of opinion: they can both be right, and can both correctly use first-order reasoning. In this case, however, the agent C who wishes to make use of both their sentences is faced with a dilemma, since traditional first-order signature-based syntax is unable to accomodate both points of view at once. SCL relaxes this rigidity and allows 'punning' between relation and individual names, as between relation and function names.

In this case, however, the requirement that C must be able to understand both A and B and still be able to reason appropriately has some more significant consequences. Since A can state equations between individuals and use terms to denote them, as for example in an equation such as

(= married (conjugalRelationBetween Joe Jane))

our 'common logic' requirement implies that C must be able to reason similarly, even when faced with an expression from B in which the name appears in a relation position:

(married Jack Jill)

The resulting expression has a term in a relation position and would be considered syntactically illegal in most traditional syntaxes for first-order logic:

((conjugalRelationBetween Joe Jane) Jack Jill)

However, it has a perfectly clear meaning which can be taken directly from the conventional first-order readings of the sentences used by A and B, viz.: the relational entity which is the value of the term (conjugalRelationBetween Joe Jane), being identical to married, holds true between Jack and Jill.

Similarly, since A may use existential generalization on this individual name, for example to infer

(exists (x) (= x (conjugalRelationBetween Joe Jane)))

then (by virtue of requirement 4a) C must be able to use similar logical inferences on the name, even when it occurs in sentences from B as a relation name, producing sentences even less conventional when viewed as first-order syntax:

(exists (x) (x Jack Jill))

SCL also allows this as legal syntax.

The resulting syntactic freedom allows a wide variety of alternative first-order axiomatic styles to co-exist within a common syntactic framework, with their meanings related by axioms, all expressed in a single uniform language. For example, the idea of marriage can be rendered in a formal ontology as a binary relation between persons, or a more complex relation between persons, legal or ecclesiastical authorities and times; or as as a legal state which 'obtains', or as a class or category of 'eventualities', or as a kind of event. Much the same factual information can be expressed with any of these ontological options, but resulting in highly divergent signatures in the resulting formal sentences. Rather than requiring that one of these be considered to be correct and the others translated into it by some external syntactic transformation, SCL allows them all to co-exist, with the connections between the various ontological frameworks to be expressed in the logic itself. For example, consider the following sentences (all written in the SCL core syntax):

(married Jack Jill)
(married (roleset: (husband Jack) (wife Jill)))
(exists (x)(and (married x) (= Jack (husband x)) (= Jill (wife x)) ))
(= (when (married Jack Jill)) (hour 3 (pm (thursday (week 12 (year 1997))))) )
(= (wife (married 32456)) Jill)
(ConjugalStatus married Jack)
((ConjugalStatus Jack) Jill)

These all express similar propositions about a Jack and Jill's being married, but in widely different ways. The logical name married appears here as a binary relation between people; a unary property of a state of affairs, with associated roles and fillers; a three-place relation between two persons and a time; a function from numbers to individuals; as an individual 'status', and finally, although not named explicitly, as the value of a function from persons to predicates on persons. In SCL all these forms can be used at the same time: the conjunction of the above set of sentences is syntactically correct SCL.

SCL also allows quantification over relations and functions. SCL syntax accepts the use of relation-valued functions, relations applied to other relations, and so on. The resulting syntax is reminiscent of a higher-order logic: but unlike traditional higher-order logic, SCL syntax is completely type-free. It requires no allocation of higher types to functionals such as ConjugalStatus, and imposes no type discipline. HO languages traditionally require more elaborate signatures than first-order languages, in order to prohibit 'circular' constructions involving self-application, such as

(rdfs:Class rdfs:Class)

SCL syntax, in contrast, is obtained from conventional FO syntax by removing signatureal restrictions, so that both SCL atomic sentences and terms can be described uniformly as a term followed by a sequence of argument terms. The above example, which expresses a basic semantic assumption of RDFS [RDFS], is legal (and meaningful) in SCL.

On being first-order

Readers familiar with conventional first-order syntax will recognize that this syntactic freedom is unusual, and may suspect that SCL is a higher-order logic in disguise. However, as has been previously noted on several occasions [ref HILOG][refHenkin], a superficially 'higher-order' syntax can be given a completely first-order semantics, and SCL indeed has a purely first-order semantics, and satisfies all the usual semantic criteria for a first-order language, such as compactness and the Skolem-Lowenheim property.

What makes a language semantically first-order is the way that its names and quantifiers are interpreted. A first-order interpretation has a single homogenous universe of individuals over which all quantifers range; any expression that can be substituted for a quantified variable is required to denote something in this universe. The only logical requirement on this universe is that is a non-empty set. Relations hold between finite sequences of individuals, and all that can be said about an individual is the relations it takes part in: there is no other structure. Individuals are 'logically atomic' in a first-order language: they have no logically relevant structure other than the relationships in which they participate. Expressed mathematically, a first-order intepretation is a purely relational structure. It is exactly this essential simplicity which gives first-order logic much of its power: it achieves great generality by refusing to countenance any restrictions on what kinds of 'thing' it talks about; it requires only that they stand in relationships to one another.

Contrast this with for example higher-order type theory, which requires an interpretation to be sorted into an infinite heirarchy of types, satisfying very strong conditions. In HOL, a quantification over a higher type is required to be understood as ranging over all possible entities of that type; often an infinite set. The goal of higher-order logic is to express logical truths about the domain of all relations over some basic universe, so a higher-order logic supports the use of comprehension principles which guarantee that relations exist. In contrast, SCL only allows the universe to contain relations, but imposes no conditions on what relations exist other than those required to ensure that every name has a referent.

The semantic technique used by SCL is an extension of the 'punning' by which it identifies relation and function symbols. A relation denotes a relational extension (a set of n-tuples over the domain), a function denotes a functional extension (a set of pairs of n-tuples and elements, one for each possible seqence), and an individual name denotes an element in the domain. In order to satisfy requirement 4b, any name must have a unique referent; in order to satisfy requirement 4c, it must be possible to interpret any name as playing any logical role. The obvious construction, then, is to associate the three kinds of denotation into a single entity and interpret the name as denoting it. In an SCL interpretation, a name always denotes an individual, and every individual has an associated relational and functional extension. In order to accomodate the situation in traditional logic where a relation or function name is not required to denote something in the universe of discourse used by the quantifiers, SCL allows the 'external' use of a name - in the core syntax, indicated by placing the name immediately before an open parenthesis at the beginning of an atom or term; in XCL, indicated by a special 'nondenoting' property - to assert that the function or relation mentioned by a term or atom is not in the universe of discourse. Such names are consided to be outside the normal logical operations of the language: they are not used to instantiate variables, do not support xistential generalizations, and are not subject to equality reasoning. This is an option in SCL syntax which corresponds exactly to a traditional first-order syntax. One obtains a traditional first-order transcription of any piece of SCL syntax by inserting a 'dummy' external function or relation name in the front of every other term or atom. We refer to this as a 'non-denoting name', although it is important to emphasise that this is not a lexical category but a syntactic role which is local to the text which uses the name in this special way: the name does, in fact, have a denotation, and may be used conventionally to refer to it in some other SCL text. Thus, one ontology may describe entities which are explicitly denied existence by another ontology, in accordance with requirement 4c. However, we do not allow a single ontology to be confused about its own universe of discourse: legal SCL text cannot have the same name used both as a normal (denoting) name and also as a nondenoting name, and nondenoting names must not be bound by quantifiers.

The third difficulty for logical communication arises because A and B may have been writing with different intended universes of discourse in mind. This kind of situation is quite common. If A is thinking about taxonomic classifications of animals, say, it is often difficult to bear in mind that the complement of the set of mammals may be taken by B, and hence by C, to include fruit, sodium molecules, styles of avant-garde paintings or the names of fictional characters in movies. Conventional first-order logic assumes a universe of discourse - the range of  the quantifiers - but provides no way to refer to it explicitly. On an open network, this amounts to a global assumption that every ontology has the same universe of discourse; an assumption which is embodied in 'global' terms for the universe incorporated into the syntax of existing languages, such as rdf:Resource and owl:Thing. SCL, in contrast, provides methods for an ontology to precisely state conditions on its own universe of discourse and to relate it to those of other ontologies by making statements in SCL itself, providing for a more nuanced expression of inter-ontology relationships.

This is done by distinguishing the universes of discourse assumed by the text in headers and bodies of modules. All names in the header and body are required to have the same meaning, but the quantifiers in the body are understood to range over a subset of the global universe. By convention, this subset is the relation extension of the ontology name itself, which can therefore be used to indicate membership in the local universe of discourse. In contrast, quantifiers in the header are understood to range over a common global universe implicitly shared with the header text of all other ontologies on the Web. Since the global universe of discourse is open-ended, ontologies cannot exclude anything from it, or impose conditions on it all (requirement 4c). This means that SCL text in headers is restricted in certain ways. all quantifiers must be guarded, and the use of nondenoting syntax (which can only be interpreted as excluding denotations from the local context) is illegal in an SCL header. Applications may either ignore it, treat it as an error, or treat it as having the same meaning as the corresponding text in the body of the module.

The use of headers is optional, so the subtleties of the header/body distinction can be ignored by applications which are concerned only to transmit first-order logical text between applications without reference to relationships between local ontology contexts.

SCL Core syntax

Any SCL Core, or core, expression is encoded as a sequence of Unicode characters as defined in ISO/IEC 10646-1. Any character encoding which supports the repertoire of ISO 10646-1 may be used, but UTF-8 (ISO 10646-1:2000 Annex D ) is preferred. Only characters in the US-ASCII subset are reserved for special use in the core itself, so that the language can be encoded as an ASCII text string if required. This document uses the ASCII encoding. Unicode characters outside that range are represented in ASCII text by a character coding sequence of the form \unnnn or \Unnnnnn where n is a hexadecimal digit character. When transforming an ASCII text string to a full-repertoire character encoding, such a sequence should be replaced by the corresponding direct encoding of the character.

The syntax of the core is designed to be readable, while being easy to process mechanically, and so as to allow the most lexical freedom in the character sequences which can be used as logical names. Sets are not used: all syntax is written as sequences of characters. The syntax is defined in terms of disjoint blocks of characters called lexical tokens (in the sense used in ISO/IEC 2382-15). A character stream can be converted into a stream of lexical tokens by a simple process of lexicalisation which checks for a small number of interlexical characters, which indicate the termination of one lexical token and possibly the beginning of the next lexical token. Any consecutive sequence of whitespace characters acts as a separator between lexical tokens. Certain characters are reserved for special use as the first character in a lexical item. The double-quote(U+0022) character is used to start end names which contain interlexical characters, the single-quote (apostrophe U+002C) character is used to start and end quoted strings, which are also lexical items which may contain interlexical characters, and the equality sign must be a single lexical item when it is the first character of an item.

The backslash \ (reverse solidus U+005C) character is reserved for special use. Followed by the letter u or U and a four- or six-digit hexadecimal code respectively, it is used to transcribe non-ASCII Unicode characters in an ASCII character stream, as explained above. Any string of this form plays the same SCL syntactic role in an ASCII string rendering as a single ordinary character. The combination \' (U+005C, U+002C) is used to encode a single quote inside an SCL quoted string, and similarly the combination \" (U+005C, U+0022) indicates a double quote inside a double-quoted enclosed name string.

The following syntax is written using Extended Backus-Naur Form (EBNF), as specified by International Standard ISO/IEC 14977. Literal chararacters are 'quoted', sequences of items are separated by commas, | indicates alternatives, { } indicates any sequence of expressions, - indicates an exception, [ ] indicates an optional item, and parentheses ( ) are used as grouping characters. Productions are terminated with ;.

The syntax is writen to apply to ASCII encodings. It also applies to full Unicode character encodings, with the change noted below to the category nonascii.

The syntax is presented here in two parts. The first deals with parsing character streams into lexical items: the second is the logical syntax of SCL Core, written assuming that lexical items have been isolated from one another by a lexical analyser. This way of presenting the syntax allows the logical form to be stated without complications arising from whitespace handling.

June 6. This syntax is a minimal tweak to the previous BNF in order to allow the nondenoting name construction. However it is not entirely satisfactory, as we ought to allow this option in the role-set and restricted-qantifier syntax as well. Also the lexicalization hack Ive used is not entirely right as for example foo(baz) is now illegal: it needs a whitespace after the nondenotingopen in order to be properly lexicalized: foo( baz). The fact is that by using this foo( versus (foo trick, the BNF really needs to distinguish left- from right- hand lexical breaking constructions, so should be rewritten rather extensively and given full whitespace handling. Sigh.

1. Lexical syntax

white = space U+0020 | tab U+0009 | line U+0010 | page U+0012 | return U+0013 ;

Single quote (apostrophe) is used to delimit quoted strings, and double quote to delimit enclosed names which obey special lexicalization rules. Quoted strings and enclosed names are the only SCL lexical items which can contain whitespace and parentheses. Parentheses outside quoted strings and enclosed names are self-delimiting; they are considered to be lexical tokens in their own right. Parentheses are the primary grouping device in SCL core syntax.

A name in SCL can be any sequence of unicode characters. If the name contains whitespace or parentheses, or begins with a three dots or a single or double quote character, or is identical to any SCL reserved word, then it must be rendered in this core syntax as an enclosed name. Other names can be rendered directly as name sequences. If preferred, all names in the core syntax can be rendered as enclosed names. A name sequence and the same string enclosed within name quotes are considered to be the same name. Enclosed name syntax added following suggestion by Tanel, allowing any present or future URI syntax to be used as a name.June 12

open = '(' ;
close = ')' ;
stringquote = ''' ;
namequote = '"' ;

char is all the remaining ASCII non-control characters, which can all be used to form lexical tokens (with some restrictions based on the first character of the lexical token). This includes all the alphanumeric characters.Note that double quote has now been removed from char.June 12

char = digit | '~' | '!' | '#' | '$' | '%' | '^' | '&' | '*' | '_' | '+' | '{' | '}' | '|' | ':' | '<' | '>' | '?' | '`' | '-' | '=' | '[' | ']' | '\' | ';'| ',' | '.' | '/' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' ;

digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
hexa = digit | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' ;

Certain character sequences are used to indicate the presence of a single character.

nonascii
is the set of characters or character sequences which indicate a Unicode character outside the ASCII range. Note. For input using a full Unicode character encoding, this production should be ignored and nonascii should be understood instead to be the set of all non-control characters of Unicode outside the ASCII range which are supported by the character encoding. The use of the \uxxxx and \Uxxxxxx sequences in text encoded using a full Unicode character repertoire is deprecated.

innerstringquote is used to indicate the presence of a single-quote character inside a quoted string. A quoted string can contain any character, including whitespace; however, a single-quote character can occur inside a quoted string only as part of an innerstringquote, i.e. when immediately preceded by a backslash character. The occurrence of a single-quote character in the character stream of a quoted string marks the end of the quoted string lexical token unless it is immediately preceded by a backslash character.

Inside enclosed name strings, double quotes are treated exactly similarly. Innernamequote is used to indicate the presence of a double-quote character inside an enclosed name.

nonascii = '\u' , hexa, hexa, hexa, hexa | '\U' , hexa, hexa, hexa, hexa, hexa, hexa ;
innerstringquote = '\'' ;
innernamequote = '\"' ;

numeral = digit , { digit } ;

Sequence variables are a distinctive syntactic form with a special meaning in SCL. Note that a bare ellipsis '...' is a sequence variable.

seqvar = '...' , { char } ;

Single quotes are delimiters for quoted strings; double quotes for enclosed names. An enclosed name is simply a name which may contain 'illegal' characters. A quoted string is an SCL special form which denotes the text string inside the quotes, except that the combination \' indicates the presence of a single quote mark in the denoted string; and in an ASCII character encoding, the presence of a nonascii sequence indicates a Unicode character at the position in the denoted string. Any occurrence of the backslash character \ not immediately followed by the character ' or u or U simply indicates the backslash character itself. The combination \u or \U may be the initial part of a nonascii sequence which indicates a Unicode character in the denoted string, but if not part of such a sequence then it simply indicates itself. With these conventions, a quoted string denotes the string enclosed in the quote marks. For example, the quoted string 'a\(b\'c' denotes the string a\(b'c, and '\'a" (b\\\') c \\'' denotes the string: 'a" (b\\') c \'. Similar conventions apply to enclosed names with double-quote replacing single-quote.

Quoted strings and enclosed names require a different lexicalization algorithm than other parts of SCL core text, since parentheses and whitespace do not break a quoted text stream into lexical tokens.

When SCL Core text is enclosed inside markup which uses character escaping conventions, the SCL conventions here described are understood to apply to the SCL text described or indicated by the conventions in use, which should be applied first. Thus for example the content of the XML element: <scl-text>&apos;a\&apos;b&lt;c&apos</scl-text> is the SCL core syntax quoted string 'a\'b<c' which denotes the five-character text string a'b<c . Considered as bare SCL text, however, &apos;a\&apos;b&lt;c&apos would simply be a rather long SCL name.

quotedstring = stringquote, { white | open | close | char | nonascii | namequote | innerstringquote }, stringquote ;
enclosedname = namequote, { white | open | close | char | nonascii | stringquote | innernamequote }, namequote ;

reservedelement consists of the lexical tokens which are used to indicate the syntactic structure of SCL. These may not be used as names in SCL text.

reservedelement = '=' | 'and' | 'or' | 'iff' | 'implies' | 'forall' | 'exists' | 'not' | 'roleset:' | 'scl:imports' | 'scl:header' | 'scl:module' | 'scl:comment' ;

A namesequence is a lexical token which does not start with any of the special characters. Note that namesequences may not contain whitespace or parentheses, and may not start with a quote mark although they may contain it; and that numerals are syntactically similar to namesequences but are distinguished for semantic reasons.

namesequence = ( char , { char | stringquote } ) - ( reservedelement | numeral | seqvar ) ;

A name immediately followed by an open parenthesis, with no intermediate whitespace, signals a non-denoting function or relation name applied to a list of arguments. Since the SCL syntax also allows an open parenthesis to initiate an application, it is necessary to recognize this combination as a distinct lexical form in order to avoid ambiguity. This is a new feature of the syntax. June 12

nondenotingnameopen = ( namesequence | enclosedname ), open

lexbreak = open | close | white , { white } ;
nonlexbreak = numeral | quotedstring | seqvar | reservedelement | namesequence | enclosedname | nondenotingnameopen ;
lexicaltoken = open | close | nonlexbreak ;
charstream = { white }, { lexicaltoken, lexbreak } ;

Lexical tokens are divided into ten mutually disjoint categories: parentheses, the equality sign, numerals, quoted strings (which begin with '''), seqvars (which begin with '...'), reserved elements, enclosed names (which begin with '"'), namesequences, and the special combination of a name and an open parenthesis. Lexical tokens other than parentheses are separated by whitespace or parentheses. Thus, whitespace adjacent to a parenthesis is optional (except where it would break a nondenotingnameopen ). The task of a lexical analyser is to parse a character stream as a charstream, and to deliver the lexical tokens it finds, in order, to the next stage of parsing.

Note that occurrences of quote marks, equality or ellipses inside a name sequence lexical token have no particular lexical significance. However, a name quote, string quote or ellipsis initiating any lexical token requires that the token be parsed as a enclosedname, quoted string.or a sequence variable respectively. For example,
'
'a'bc '
is an error (cannot be parsed as a charstream) since the first single-quote symbol immediately follows a white, so initiates a quotedstring which is required to be terminated by the second stringquote, but there is no lexbreak to separate this from the next token. In contrast, the same string prefixed by a non-quote character, e.g., ' a'a'bc ' is a legal name sequence. Any char string may occur as an internal part of a name sequence or sequence variable; for example, 'forallIknow ' , 'a=b' , 'a...b' and '==>'look_here'<==' are all name sequences, '"for all I know" ', '"scl:imports"' and '"\"(')" ' are all legal enclosed names, and '...a', '...a...', '.....' and '...' are all legal seqvars. Identity of lexical tokens can be checked by a simple character-by-character string match.

A name is any lexical token which is understood to denote something in the domain of discourse.  The expression syntax, below, uses this category. It is not required for lexicalization.

name = namesequence | numeral | quotedstring | enclosedname ;

2. Core expression syntax

This part of the syntax is written so as to apply to a sequence of SCL lexical tokens rather than a character stream, so it ignores whitespace handling. A full EBNF syntax for SCL Core suitable for generating parsers of character-level data is given in appendix @@@..

Both terms and atomic sentences use the notion of a sequence of terms representing a vector of arguments to a function or relation.

termseq = { term } , [ seqvar ] ;

Names count as terms, and a complex (application) term consists of a function with a vector of arguments. The function may be denoted by a term inside the open parentheses, or indicated by a non-denoting name immediately preceding the open parenthesis.This is a new feature of the syntax. June 4. Terms may also have an associated comment enclosed inside matching quotes.

term = name | ( (open, term ) | nondenotingnameopen ), termseq, close ) | ( open, 'scl:comment', ( quotedstring | enclosedname ) , term, close ) ;

Equations are distinguished as a special category because of their special semantic role and special handling by many applications. Note that the equality sign is not a term.This is a change from a previous version of the syntax.

equation = open, '=', term, term, close

Atomic sentences are similar in structure to terms; but in addition the arguments to an atomic sentence may be represented using role-pairs consisting of a role-name and a term. Equations are considered to be atomic sentences.

atomsent = equation | ( (open, term ) | nondenotingnameopen ), ( termseq | ( open, term, open, 'roleset:' , { open, name, term, close } , close ) ), close ) ;

Boolean sentences require implication and iff to be binary, but allow and and or to have any number of arguments (including zero).

boolsent = ( open, ('and' | 'or') , { sentence }, close ) | ( open, ('implies' | 'iff') , sentence , sentence, close ) | ( open, 'not' , sentence, close ;

Quantifiers may bind any number of variables and may be guarded; and bound variables may be restricted to a named category.
The guarded-quantifier syntax - the optional name immediately after the quantifier - is a new option not in the previous version. May 12.

quantsent = open, ('forall' | 'exists') , [ name ] , open, { name | ( open, name, term, close )} , close, sentence, close ;

Like terms, sentences may have comments attached. Note that comments can be attached to sentences at any level, and to commented sentences.

sentence = atomsent | boolsent | quantsent | ( open, 'scl:comment', ( quotedstring | enclosedname ) , sentence , close ) ;

SCL text is a sequence of phrases, each of which is either a 'top-level' sentence, an importation, or a bare comment. We distinguish sentence from phrase because phrases are considered to bind sequence variables: there is no syntactic distinction. The name argument of an importation will usually be a URI.

phrase = sentence | open,'scl:imports', name , close | open, 'scl:comment', ( quotedstring | enclosedname ) , close ;
scltext = { phrase } ;

An SCL module is a named piece of text with an optional header containing SCL text which is intended to convey 'meta-information'. Quantifiers in headers are interpreted slightly differently than quantifiers in the body text. Details are given later in the document.

moduledefinition = open, 'scl:module' , name , [open, 'scl:header' , scltext , close ] , scltext , close ;

Specialname is a category of lexical tokens to which SCL attaches particular semantic meanings. Although they are not required to be distinguished for parsing purposes, the list of special forms in the SCL core syntax is appended here for convenience. Other SCL dialects may extend this category to include other name forms reserved for special use, and headers may declare other classes of special names. If a special name is used in any SCL text then its meaning cannot be over-ridden by any other interpretation of that text, so inappropriate uses of special names may produce inconsistencies. This collection of special names may be modified in the future.

specialname = 'scl:same' | 'scl:different' | 'scl:arity' | 'scl:Ind' | 'scl:Rel' | 'scl:Fun' | 'scl:Integer' | 'scl:String' | numeral | quotedstring;

As noted earlier, certain combinations of SCL text may be considered 'illegal' even though they are sanctioned by this EBNF syntax. Any text in which a name occurs both inside a nondenotingnameopen and also as a normal name is uninterpretable, and may be considered a syntactic error. Any module in which the header text contains any nondenotingnameopen is uninterpretable and may be considered to be illegal (or, may be re-parsed by replacing every such occurence of a nondenotingnameopen '<name>(' by the sequence '(<name>'.)

Discussion

Sequences and sequence variables

Finite sequences play a central role in SCL syntax and semantics. Atomic sentences consist of an application of one term, denoting a relation, to a finite sequence of other terms. Such argument sequences may be empty, but they must be present in the syntax, as an application of a relation term to an empty sequence does not have the same meaning as the relation term alone.

SCL also allows relations to be take any number of arguments: such a relation is called variadic. (Married is an example of a variadic relation.) SCL follows KIF also in providing a special syntactic form called sequence variables to allow generic assertions to be made which apply to arbitrary argument sequences. Sequence variables are indicated in the SCL core syntax by an ellipsis ... optionally followed by a distinguishing character string. If an argument sequence ends with a sequence variable, then the presence of the variable indicates that the sequence may be arbitrarily extended, and the sentence is required to hold for all the possible extensions.

An SCL axiom using sequence variables can be viewed as a first-order axiom scheme, standing in for a countably infinite set of first-order instances.

Sequence variables are quite a powerful device: in effect, they allow a finite sentence to assert infinitely many quantifications all at the same time. Indeed, if arbitrary quantification over sequence variables is permitted, then the resulting language is a subset of the infinitary logic Lw1w, of a higher expressiveness than first-order logic, and for example is not compact [refHayMenzel]. The restrictions on sequence variables in SCL have been chosen to restrict the language to first-order while providing much of the practical utility of sequence variables for writing 'recursive' axioms to define lists and other recursively defined constructions. In SCL, sequence variables are allowed to occur only in the last position in an argument sequence, and are not bound by quantifiers: this keeps the language first-order and makes term-unification decideable [refKutsia]. Sequence variables can be replaced by a more conventional notation, at some considerable cost in elegance, by introducing argument lists explicitly.

Since sequence variables are implicitly universally quantified, any top-level expression in which a sequence variable occurs has the same semantic import as the countably infinite conjunction of all the expressions obtained by replacing the sequence variable by a finite sequence of variables, all universally quantified at the top (phrase) level. For example, the phrase

(R ...)

has the same meaning as the infinite conjunction:

(and
(R)
(forall (x)(R x))
(forall (x y)(R x y))
(forall (x y z)(R x y z))
... )

This includes the 'trivial' case for zero arguments:
(forall ( ) (R))
where the vacuous quantifier has been omitted. To exclude this case, write the top-level atomic sentence with one quantified variable and a sequence variable:
(forall (x)(R x ...))

Sequence variables can be used to write 'recursive' axiom schemes. Rather than attempt a fully general description of this technique, we will illustrate it with a canonical example. Consider the following sentences:

(= nil (list))
(forall (x z) (iff (= x (list ...) (= (cons z x) (list z ...)))

which have the same meaning as the infinite set of sentences:

(= nil (list))
(forall (x z) (iff (= x (list) (= (cons z x) (list z))))
(forall (x z x1) (iff (= x (list x1)) (= (cons z x) (list z x1)) ))
(forall (x z x1 x2) (iff (= x (list x1 x2)) (= (cons z x) (list z x1 x2)) ))
(forall (x z x1 x2 x3) (iff (= x (list x1 x2 x3)) (= (cons z x) (list z x1 x2 x3)) ))
...
(forall (x z x1 x2
... xn) (iff (= x (list x1 ... xn) (= (cons z x) (list z x1 ... xn)) ))
... )

which clearly entails for any finite sequence t1 ... tn of terms, that

(= (cons t1 (cons t2 (cons ... (cons tn nil) ... ))) (list t1 ... tn) )

and thus provides the familiar LISP technique of writing a sequence of items as a cons-nil list, used more recently as the RDF 'collection' vocabulary [refRDF]. The tail-recursion represented by the axiom is re-expressed here as induction on the length of a proof.

SCL sequence variables do not actually provide a true recursive definition, as the 'schema' can be used only as a source of assertions and cannot itself be derived by using an inductive argument. The more powerful language obtained by allowing sequence variables to be bound by sequence quantifiers does provide true recursive powers, and is therefore in a higher expressive class than any first-order language. For example, it is not compact. A more tractable language can probably be obtained by adding an explicit 'least fixed point' operator, as in the mu-calculus [refMuCalc], but this idea goes beyond the scope of this document.

This general technique of imitating tail-recursive definitions by writing them as implications using sequence variables was pioneered by KIF [refKIF] and is one of the primary uses for sequence variables in SCL. What this construction shows, however, is that the use of sequence variables can be replaced, when required, by the use of explicit cons-nil lists as arguments, represented in SCL by nested terms. This convention is widely used in logic programming applications, and in RDF and OWL. The general rule can itself be stated in SCL by using a sequence variable:

(forall (x) (iff (x ...) (x (list ...)) ))

This thus provides a general technique for replacing the use of sequence variables by explicit quantification over argument lists. The costs of this translation are a considerable reduction in syntactic clarity and readability, and the need to allow lists as entities in the universe of discourse. The advantage is the ability of rendering arbitrary argument sequences using only a small number of primitives with a fixed, low arity. SCL can use either convention or both together, and can describe the relationship between them.

The above discussion uses a conventional LISP-like syntax. Using the RDF collection syntax, and using the relation/existential rather than function/equation axiomatic style more suitable to RDF, the example would look like this:

(= rdf:nil (list))
(forall (x z)(iff (= x (list ...)(exists (y) (and (rdf:first y z)(rdf:rest y x)(= y (list z ...))))

and the expansion would produce conjunctions describing the lists in terms of first/rest triples.

Role-set syntax, eventualities and selections

Atomic sentences which use the roleset: construction, for example

(Married (roleset: (wife Jill) (husband Jack)))

are considered to take a set of arguments in the form of role-value pairs. In this role-set syntax, the connection between the relation and its arguments is indicated by the role labels rather than by position in an argument sequence, which avoids the necessity of remembering the argument order when writing sentences, and provides some insurance against communication errors arising from sentences using different argument-order conventions.

In SCL, the roles are considered to be binary relations between an entity to which the main predicate applies and the argument, so for example the above atom is syntactic sugar for (is logically equivalent to) the existential conjunction:

(exists (x) (and (Married x) (wife x Jill) (husband x Jack) ))

where x indicates an entity which has been variously called an eventuality, fact, situation, event, state or trope, and might be rendered in English as "Jack and Jill's being married" or "the state of marriage between Jack and Jill". For SCL purposes however it is possible to think of x more prosaically as being a vector or sequence of arguments, and the role-names as selectors on the elements of this sequence.

This syntax allows arguments to be provided in a piecemeal fashion, and allows arguments to be omitted: such arguments are implicitly existentially quantified. In fact, any such atom entails a similar atom with some of the role-pairs omitted. Notice the difference in meaning between Married, (Married) and (Married (roleset:)). The first is a term which denotes the relational entity and makes no assertion. The second is an atomic sentence which asserts positively that Married is true of the empty sequence - which is probably false in the actual world - while the third, also an atomic sentence, merely asserts that Married is true of something, to which other things may be related by unknown roles. In effect, it asserts that some marriage exists, and so is true in the actual world.

This role-set convention does not itself specify any relationship between the same fact expressed as a role set and by using an argument sequence. Since SCL allows variadic relations, it is possible to use the same relation in both ways, and they can be related axiomatically. For example,

(forall (x y) (iff (Married x y) (exists (z) (and (Married z) (= x (wife z)) (= y (husband z))) ))

indicates the convention by which a wife must be the first argument of a marriage. Again, SCL provides for both kinds of syntax, separately or together, and can describe a relationship between them using SCL axioms.

 

Modules, headers, naming, importing and domains of discourse

SCL text is any set or sequence of SCL top-level sentences, called SCL phrases.

The SCL core syntax uses sequences throughout, for convenience in defining iterative syntactic operations, but no particular significance attaches to the ordering of the phrases in a text, and other SCL dialects may adopt a set-based syntactic convention.

An SCL module, or simply a module, is some SCL text together with an optional header consisting of SCL text, and which has a global name in the form of a Uniform Resource Identifier [refURI]. Headers have a special semantic role and are treated in a distinct way in the semantics: they provide a general method for encoding syntactic and other 'background' information about the module. A special header vocabulary is provided which is sufficient to describe many standard signatures and syntactic conventions; but any SCL text may be included or imported into a header.

The terminology of "text" and "module" is intended to be a neutral terminology used to refer to any coherent piece of SCL which can be published, transmitted or asserted. It is not intended to convey any particular categorization or to indicate an intended use or philosophical stance. Other terms in widespread use in various communities include "ontology", "knowledge base", "axiom set" , "set of sentences", "metadata", "data model" and "model", as well as such particular usages as "RDF graph" and "document".

The core also provides an importing syntax, which indicates that a named module is being asserted along with another module. The intended meanings of these constructions are fairly obvious, but the formal semantic statement requires that we assume the existence of a global module-naming convention, and that module names refer to entities in formal interpretations. This can all be stated abstractly, but is more directly conveyed by the observation that SCL uses the emerging semantic web conventions, as used in RDF and OWL, which are based on the W3C URI naming protocols [ref RFC2396].

This "top-level" header SCL syntax also provides for modules to declare their intended domain of discourse, to relate them explicitly to other domains of discourse, and to express intended restrictions on the syntactic roles of symbols. In order to make this possible, quantifiers in the header must be allowed to range over a wider universe of discourse than those in the body text of the module. For example, it is impossible in a conventional logical syntax to say that a name does not denote anything in the domain of discourse, since the very act of using the name places the referent in the domain. [Wittgenstein, tractatus final words.] In order to adequately relate two different assumptions about the universe of discourse, however, it may be essential to reason about things being included and excluded from a domain of discourse. SCL adopts a two-level convention to handle this: the universe of discourse of text in the body of a module is assumed to be the relational extension of the ontology name; but text in the header is interpreted as referring to a global universe of discourse shared with all other modules. The intended purpose of embodying this distinction is to allow headers to relate universes of discourse explicitly to one another, providing a more structured way to relate modules than simple importation. This also provides ways to relate vocabularies used in one module to their definition in another module, and can be used for example to declare the datatypes used by a module explicitly, and to express certain kinds of nonmonotonic constructions such as closed-world and unique-name properties of ontologies or databases.

SCL semantics

The semantics of SCL is defined conventionally in terms of a satisfaction relation between SCL text and structures called interpretations. Since SCL core syntax contains a number of 'syntactic sugar' constructions, we give the semantic conditions for a basic subset called the SCL kernel, and then translate the remaining syntactic constructions into the kernel.

A vocabulary a set of names, partitioned into two subsets, referred to as the denoting and nondenoting names of the vocabulary. If the subset of nondenoting names is empty then we call the vocabulary global. The vocabulary of an SCL text is the set of names occurring in the text, with a name occurring in a nondenotingnameopen considered to be a nondenoting name, and all other names considered to be denoting names. The vocabulary of an SCL module is the union of the vocabularies of all text in the module.

An interpretation I of a vocabulary V is a set UI , the universe, with a distinguished nonempty subset DI, the domain of discourse, or simply domain, and three mappings: relI from UI to subsets of DI*, funI from UI to functions DI*->DI, and intI from V to UI, such that intI(n) is in DI iff n is a denoting name in V. We will omit the subscripts and identify intI with I when no confusion arises.

Intuitively, D is the domain of discourse containing all the individual things the interpretation is 'about' and over which the quantifiers range. U is a potentially larger set of things which might also contain, for example, relations and functions which are not in the universe of discourse. One might describe these respectively as the local and global universes: as discussed, we interpret quantifiers in the header of any module to range over U rather than D. In accordance with requirements 4a, 4b and 4c, all names are interpreted in the same basic way, whether or not they are understood to denote something in the domain of discourse; this is why there is only a single interpretation mapping intI applying to all names regardless of their syntactic role.

The value of any kernel expression in the interpretation I is then given by following the rules in the semantic table. These are the conventional rules for interpreting first-order logical expressions containing numerals and quoted strings. A name map on a set of names is any mapping from the set into D; a sequence mapping on a set of sequence variables is any mapping from those variables to the sedt of finite sequences of elements of D, ie. to D*. If A is a name or sequence map on S, IA is the interpretation which is like A on names or sequence variables in S, but otherwise is like I: formally, UIA = UI, DIA = DI, relIA = relI, funIA = funI, and intIA(v) = A(v) when v is in S, otherwise intIA(v)= intI(v).

When A is a sequence map, this is an extension of the notion of interpretation. We use this to provide an interpretation of phrases containing sequence variables. The rest of the truth-conditions map transparently to this extension.

This notation associates to the left, so that IAB = (IA)B, meaning that if x is in the domain of B then intIAB(x)=B(x). In addition, we use the following notation: if s= <s1,...,sn> and t=<t1,....,tm> are finite sequences, then s;t is the concatenated sequence <s1,...,sn,t1,...,tm>

SCL interpretation I of an expression E
if E is a of the form and I(E) =
name numeral n intI(n) = the decimal value of n
quoted string 'string' intI('string') = string with every innerquote substring replaced by a single quote character '''
name intI(name)
term sequence (t1 ... tn) <I(t1),..., I(tn)>
(t1 ... tn ...v) <I(t1),..., I(tn)> ; I(...v)
term (f seq) or f(seq) x, where funI(f) contains <x, I(seq)>
(scl:comment comment term) I(term)
atom (= t1 t2) true if I(t1)= I(t2), otherwise false.
(r seq) or r(seq) true if relI(r) contains I(seq), otherwise false
boolean sentence (not s) true if I(s) = false, otherwise false
(and s1 ... sn) true if I(s1) = ... = I(sn) = true, otherwise false
(or s1 ... sn) false if I(s1) = ... = I(sn) = false, otherwise true
(implies s1 s2) false if I(s1) = true and I(s2) = false, otherwise true
(iff s1 s2) true if I(s1) = I(s2), otherwise false
quantified sentence (forall (var) body) if for every name map B on {var}, IB(body) = true, then true;
otherwise, false.
(exists (var) body) if for some name map B on {var}, IB(body) = true, then true;
otherwise, false.
sentence (scl:comment comment sentence) I(sentence)
phrase (scl:comment comment )

true

sentence

if for every sequence map A on the seqvars in sentence, IA(sentence) = true, then true;
otherwise, false

(scl:imports name) false if I(I(name)) = false, otherwise true
scl text s1 ... sn

true if I(s1) = ... = I(sn) = true, otherwise false

scl module
(simple text case)
(scl:module name body)

true iff I(name) = (the SCL text in body).

Apart from the handling of sequence variables and the built-in numeral and string conventions, this is a conventional first-order interpretation structure.

For a conventional syntax with no punning between relation and function names, all non-denoting names are often described as mapping directly to relation or function extensions. This is the special case of the above where rel and fun are the identity function on elements of U outside D.

The semantics for role-set atomic syntax is given by the translation:

(rel (roleset: (t1 a1) (t2 a2) ... (tn an)))  --->  (exists (x) (and (rel x)(t1 x a1)(t2 x a2)...(tn x an) ))

and for restricted and guarded quantification by the translations:

(forall ((type x) ... ) body )  --->    (forall (x ... )(implies (type x) body )))
(exists ((type x) ... ) body )  --->    (exists (x ... )(and (type x) body )))

(forall guard (x ... z) body )  --->  (forall (x ... z) (implies (guard x ... z u ... v) body))
(exists guard (x ... z) body )  --->  (exists (x ... z) (and (guard x ... z u ... v) body))

where u ... v are all the names which occur free in the body.

Notice that comments surrounding any piece of SCL are semantically transparent, i.e. they do not affect the meaning of the expressions they enclose; and top-level comments are considered to be trivially true.

The semantics given here for scl:imports applies only to importation of a labelled piece of SCL text into another SCL text. Notice that this simple case requires SCL texts to be individuals, but not that they be in the domain of discourse.

Importation of a module with a header is more subtle and is described in section @@@ below.

 

Mapping SCL text to other first-order notations

It may be found convenient to render SCL text in a form more suitable for input to applications designed to accept a more restrictive first-order syntax. This can be done in several stages, outlined here.

First, sequence variables, used to express general facts about variadic relations or functions, can be replaced by explicit recursions, if the notation allows this, or by using lists as described earlier, rendered in any of several suitable notations. The variadicity is thereby removed, since all such predications are unary, applied to a list. (If it is necessary to distinguish this use of lists from other uses in an ontology, the entire translation can be performed using a special 'argument-list' vocabulary.) This part of the translation may require the application to treat these lists in a special way which respects their intended meaning: it is not 'transparent'.

Second, all atoms and terms with a denoting relation or function term, with the core syntactic form (operator arg1 ... argn), can be prefixed with a 'dummy' nondenoting relation or function name. The recommended such names are holds-n for atomic sentences (meaning: the relation holds true of the arguments) and app-n for terms (meaning: the result of applying the function to the arguments). The resulting core syntactic form uses the nondenotingnameopen syntax: holds-n(operator arg1 ... argn) for an atom and app-n(operator arg1 ... argn) for a term. We will refer to this as the 'holds-app' translation.

This translation is transparent: any interpretation of the original maps to an interpretation of the translation, over the same domain of discourse, which satisfies the translation just when the first interpretation satisfies the original SCL. In fact one can view SCL syntax as being the holds-app translation, with a perfectly conventional first-order semantics, but with the 'dummy' holds-n and app-n relation and function names simply erased. Note that the holds-app translation leaves unchanged all SCL terms and atoms which already use a nondenoting function or relation name, and that the result of a holds-app translation from correct SCL text never has a name occurring both as an individual name and as a relation or function name. The result of a holds-app translation is, of course, itself legal SCL text.

The mapping between intepretations is fairly obvious: an SCL interpretation I of normal SCL text maps to an interpretation J of the corresponding hold-app translation just when they have the same domain, relJ and funJ are the identity mapping outside the domain of discourse, <x y1 ... yn> is in J(holds-n) iff <y1 ... yn> is in relI(x), and <<x y1 ... yn> z> is in J(app-n) iff  <<y1 ... yn> z> is in funI(x).

Function and relation name punning, and the use of a single name for functions and relations with different numbers of arguments, can be eliminated by some systematic renaming convention, such as by postfixing -fun and -rel to the function and relation cases, and -n, for some suitable numeral n, to indicate the number of arguments. This superficial transformation is also transparent; in fact it does not change the meaning of any SCL text, and is in any case not required by many first-order engines.

Finally, some notational conventions require a lexical distinction between variables and names. Since SCL places no lexical restrictions on the form of names, one can achieve this simply by using names which satisfy the lexical restrictions in all cases where a name is bound by a quantifier, for example by prefixing all bound names with a question mark to conform to KIF conventions. Note however that many such notations presume that free - unbound - variables are treated as universally quantified at the phrase level, similarly to how sequence variables are treated in SCL, but there is no such convention in SCL core syntax itself, so to use a variable-prefixed name in a position not bound by a quantifier in SCL does not automatically carry this meaning.

June 6. I wonder if we should revert to the KIF convention of explicit lexical variable marking? It would make life easier for applications, be only a small tweak to the core BNF, and would hardly change the MT at all. We need the top-level-phrase distinction in any case for the seqvars. Also the XCL could allow arbitrary variable lexicalization. After all, the MT effectively makes a name have a diffrent meaning once it gets bound, so the current syntax is suspiciously close to being dishonest on requirement 4b.

Give some examples of the holds/app translation.

Semantics of modules and headers

The general philosophy of SCL modules and headers is that the text in the body of a module is intended to be an ontology; that is, to describe a 'local world' of some kind; whereas the SCL text in the header of a module is intended to be interpreted globally, with its quantifiers applying across the entire universe of the whole network. This primary purpose of the header text is to relate the local domain of discourse, assumed in the body text, to other domains of discourse used by other modules on the network. In header text, domains of discourse are represented by names which denote the ontologies themselves, and which have as their relational extension all tuples of elements of the domain of discourse presumed by that ontology; and while all names in headers are interpreted in the same way as the body text, quantifiers in the header are understood to range over a wider, global universe of discourse.

Since header text is supposed to be asserted in the context of the entire Web, not all assertion forms are appropriate. For example, an ontology about fish might contain in its body text the assertion

(forall (x) (Fish x))

to indicate that everything it is discussing is a fish. To make such an assertion in a header, however, would be to claim that everything in the entire universe was a fish, which is absurd; but more to the present point, which would violate requirement 4c. In fact, following requirement 4c, any unconstrained universal quantifier in a header text is inappropriate, since whatever is said by any ontology about all things can be rationally denied by another ontology. We therefore impose some restrictions on the form of sentences used in any header text: all quantifiers must be guarded, and nondenoting name constructions are prohibited. That is, only denoting names may occur in header text. These restrictions ensure that header text does not violate the requirements.

It might be argued that there are some necessary truths which must be accepted by any ontology and could therefore be safely asserted in a global context. Yaddah yaddah @@@

 

 


June 6. The following material is from earlier versions has not yet been incorporated into the body of the text. It will require wordsmithing and tweaking to bring it into line.

Special name interpretations

In addition to the basic semantic conditions, interpretations are required to satisfy constraints on the interpretations of the SCL special names, summarized in the following table.

Special name satisfies condition
scl:same relI(scl:same) = {<x, ..., x>: x in UI }
scl:different relI(scl:different) = {<x1, ..., xn>: x1,...,xn all in UI and xi =/= xj for 1<= i,j <=n}
scl:Integer relI(scl:Integer) = {<n>: n an non-negative integer in UI}
scl:String relI(scl:String) = {<u>: u a Unicode character string in UI}
scl:Thing relI(scl:Thing) ={<x1,..., xn> : x1,...,xn all in UI }This will probably not be needed now.
scl:Rel relI(scl:Rel) = {<x1, ..., xn>: x1,..., xn all in RI }
scl:Fun relI(scl:Fun) = {<x1, ..., xn>: x1,..., xn all in FI }
scl:arity relI(scl:arity) ={<m, x1, ..., xn >: x1,...,xn all in RI and relationI(x1) ,..., relationI(xn) all contain an s with length(s)=m }

All SCL dialects and extensions are required to satisfy these general conditions. Other SCL dialects and extensions may also impose semantic conditions on other categories of special names; some examples are given later.

scl:same and scl:different are convenient abbreviations for conjunctions of n equations and n(n-1) negated equations respectively, where n is the number of arguments.

Note that scl:different is identical in meaning to the negation of scl:same only in the case where both are applied to precisely two arguments.

The conventions for strings and numerals can be viewed as special 'built-in' datatypes, following the datatype convention described below: the other items are special vocabulary primarily intended for use in module headers, also described later.

 

Special Name Theories

Headers may also be used to declare special names of various forms. By convention, a header may be specified to be (equivalent to) a special name theory which consists of an recursively enumerable set of ground axioms specifying the positive and negative cases of some lexical categorization, typically the lexical space of a datatype represented by a predicate. Such headers can be imported into other headers, serving there as a form of declaration that the lexical forms are considered to be special names in the module body. In particular, SCL recognizes the built-in datatypes in XML Schema Part 2: Datatypes [XML-SCHEMA2], listed in the following table:

XSD datatypes
xsd:string, xsd:boolean, xsd:decimal, xsd:float, xsd:double, xsd:dateTime, xsd:time, xsd:date, xsd:gYearMonth, xsd:gYear, xsd:gMonthDay, xsd:gDay, xsd:gMonth, xsd:hexBinary, xsd:base64Binary, xsd:anyURI, xsd:normalizedString, xsd:token, xsd:language, xsd:NMTOKEN, xsd:Name, xsd:NCName, xsd:integer, xsd:nonPositiveInteger, xsd:negativeInteger, xsd:long, xsd:int, xsd:short, xsd:byte, xsd:nonNegativeInteger, xsd:unsignedLong, xsd:unsignedInt, xsd:unsignedShort, xsd:unsignedByte, xsd:positiveInteger

For each such name xsd:sss, the corresponding special name header consists of the axioms

(xsd:sss (xsd:sss name))

for each name which is a legal lexical form for xsd:sss;

(not (xsd:sss (xsd:sss name)))

for each name which is not a legal lexical form for xsd:sss;

(= (xsd:sss name1) (xsd:sss name2))

for each pair of names which are legal lexical forms for xsd:sss and map to the same item in the value space of xsd:sss. The datatype name serves here both as a function (mapping the lexical form to the value) and as a predicate on the value space: this follows the convention used in RDF [refRDF]. In addition, we use the datatype URI reference as the name for the module, so that the appropriate datatype can be included in a header by writing for example:

(scl:import xsd:nonPositiveInteger)

in the header. Note that importing a special name theory into the body of a module amounts to including an infinite set of axioms, but importing it into a header only imposes the appropriate semantic conditions on the special names actually used in the the vocabulary of the body.

When the header of a module imports a special name theory, then all the legal lexical forms of the theory are considered to be special names in that module.