Sets:

RDF Terms, T = U union B union L.

Variables, V is disjoint from T

RDF Ground Terms, G = U union L

Pattern triple is element of (G union V) x (G union V) x (G union V) (Compare RDF triple)

Basic pattern (simple pattern?) is a set of pattern triples. (Compare RDF graph)

Note, no bnodes allowed in patterns. Variables take their place. Alternative: allow bnodes in patterns.

Substitution is a function from a subset of V, the domain of the substitution, dom(S), to T. Alternative: is mapping from (V union B) to T.

We can consider a substitution to be a map from V to (T union V) which is constant outside its domain. If v is in dom(S) then we say that v is bound by S; otherwise, it is unbound.

If X subset dom(S) and dom(S')=X and S'(v) = S(v) for all v in X then S' is the restriction of S to X, written S|X.

If S is a substitution then the result of replacing any v in a basic pattern P by S(v) is a pattern instance of P, written S(P).

Note: this is exactly like the RDF notion of instance, but maps variables to terms instead of bnodes to terms. Clearly any RDF instance of a pattern instance of P is itself a pattern instance of P. Clearly also if we allowed bnodes in queries, then allowing substitutions to apply to them would have exactly the same effect as instances of patterns in this sense.

A query consists of a result form together with a pattern. The simplest query combines a selection form with a basic pattern:

[select VAR where PAT ]

VAR a set of variables, PAT is a basic pattern. More complex queries can be formed by constructing more complex patterns, using pattern operators, and by using more elaborate result forms.

Answers to queries are defined in terms of answer substitutions which, when applied to the pattern, yield subgraphs of the target graph; the exact form of the answer depends on the result form, which can be viewed as a method of processing the answer substitution to yield an answer.

An answer substitution to a basic pattern PAT w.r.t. a target graph G is a substitution S with S(PAT) a subset of G.

(Lemma. S is an answer substitution iff there is a substitution S' with S'(PAT) simply entailed by G.
Proof:
If: S'(PAT) is simply entailed by G just when an instance I(S'(PAT)) of S'(PAT) is a subgraph of G. Then the composition lambda(x) I(S'(x)) satisfies the condition for an answer substitution.
Only if: Any graph simply entails any of its subgraphs.
QED)

More complex patterns.

Each pattern type has a characteristic definition of answer substitution. In some cases, this is defined in terms of the answer substitutions of sub-patterns; such definitions are understood to apply recursively. The recursions always bottom out in basic patterns.

1. Constraints. A pattern may contain a constraint, such as v==val, where val is an indication of a datatype value. In this case the answer must bind v to a typed literal whose datatype value is equal to val. Other constraints are defined in @@@; they all amount to boolean-valued expressions on datatype values. Answer substitutions are required to bind variables occurring in constraints to typed literals such that the constraint is true when applied to the value of the typed literal. In this way, patterns may constrain literals to have particular values rather than having particular syntactic forms. Note that a constraint can be considered to be a triple with a special predicate. An answer substitution

2. Optional patterns. An optional pattern is composed of an unconditional part, which is a basic pattern (is that right? Must this be a basic pattern?), and zero or more options, which are patterns, such that no variable occurs in more than one option; note however that a variable in the unconditional part may also occur in an option. When the options of an optional pattern are themselves optional patterns, the constraint on co-occurrence of variables applies at each level. An optional pattern has the form

PAT [PAT1] [PAT2] ... [PATn]

An answer substitution to this query is any answer substitution to any pattern formed by merging PAT with some subset of {PAT1, ... ,PATn}. (The subset may be empty.)

3. Alternative patterns. An alternative pattern is composed of two patterns conjoined by union.

PAT1 union PAT2

An answer substitution for this pattern is any answer substitution for either of the queries PAT1 or PAT2.

In general, alternatives and options may be nested recursively, forming complex pattern expressions; the definitions of answer also apply recursively, defining answers to complex nested queries in terms of answers to their components.

-------

The definition of answer for a query depends on the result form.

1. Selection form. An answer for a selection result form

[select VAR where PAT ]

is S|VAR where S is an answer substitution for PAT. (SPARQL provides a number of syntactic forms for serializing a substitution as an answer....)

Note, if PAT is an optional pattern then an answer substitution may fail to bind all the variables in VAR, since some of these variables may not occur in some subsets. In this case the answer should indicate the unbound status of any variables in VAR. (Is this right? If so, how is it done when the answer is presented as RDF?)

Note, the symbol '*' may be used in place of VAR, and is understood as shorthand for the set of all variables in PAT. (If the pattern is optional, does this mean shorthand for all the variables in the query form, or shorthand for those variables bound by the particular answer?)

The result set of a selection form query w.r.t. a target graph is the set of all answers to the query w.r.t. the graph, possibly with blank nodes changed. Blank nodes occurring in an answer are scoped to the answer: if the same blank node occurs in a different answer or in the target graph, there is no implication that these blank nodes must denote the same resource. (In this respect, answers are treated exactly like RDF graphs.)

A result is a bag of answers, all of which are in the query result set. If a select query contains the keyword DISTINCT, then the result is required to have no duplicates. If it contains the phrase LIMIT n, for an integer n, then the result is required to have no more than n items in it. Otherwise, the result is required to contain every member of the result set.

2. Construction form. The result for a construction result form

[construct * where PAT ]

is an RDF graph formed by merging the set of graphs

{S(PAT) : S is an answer substitution for PAT}.

Note that each component graph in this set is a subgraph of the target.

(Is this well-defined when PAT is anything more complicated than a basic pattern? In particular, what defines the graph when there are constraints? What if PAT is an optional?)

(Why not simplify the syntax here? We could just say [construct PAT] and leave out the 'where'.)

3. Template construction form. The result for a construction result form

[construct TEM where PAT ]

where TEM is a basic pattern, is the graph formed by merging the set of graphs

{graph(S(TEM)): S is an answer substitution for PAT}

where graph(P) is the subset of triples in P which do not contain variables. (Note that S(TEM) may be strictly neither a graph or a pattern, since it may contain both blank nodes and variables.)

(There is no need to make special provision for blank nodes, since this is handled automatically by the definition of merge.)

4. Describe construction form. The result for a construction result form

[describe TAR where PAT ]

where TAR is either a set of variables or a URI, is an RDF graph ... which might not have any particular relationship to any answer substitution or to the target graph, as far as I can tell. I give up on trying to define this.

5. Ask result form. The result for an ask result form

[ask PAT ]

is 'yes' if there is an answer substitution for PAT, and 'no' if there is no answer substitution.