let RDF-U be the set of all
RDF URIs [[SHOULD THAT BE URI REFERENCES??]]
let RDF-L be the set of all RDF Literals
let RDF-B be the set of all blank nodes
The set of RDF Terms, RDF-T, is RDF-U union RDF-L union RDF-B.
V is a set disjoint from RDF-T, members of V are called variables.
A substitution is a partial mapping from (V union RDF-B) to T. If S is a substitution we will say that S(x)=x when x is not in the domain of S. The empty substitution E is the mapping with E(x)=x for any x.
If S is a substitution then the result of replacing every v in P with S(v) is written S(P).
Note, this notation and definition applies to any structure that may contain variables or blank nodes.
A triple pattern is a triple in (RDF-T union V)x (RDF-T union V)x (RDF-T union V).
A basic graph pattern is a set of triple patterns.
A query consisting of a basic graph pattern P, applied to an RDF graph G, is called an elementary query. A solution for a basic graph pattern P (on G) is a substitution S on the variables in P such that S(P) is a subgraph of G, and P matches G when a solution for P on G exists.
1. A substitution S is a solution for a basic graph pattern P on G just when G simply entails S(P).
2. These definitions are unaffected by P and G having any blank nodes in common, so we can assume that they do not. In general, the scope of any blank nodes in any graph, graph pattern or query answer is assumed to be local to the graph, graph pattern or answer in which they occur.
3. An elementary query is not a legal SPARQL query, but the definitions given below build on this elementary case, which can be thought of as the base case of a large recursive definition.
A SPARQL query is a 4-tuple consisting of a graph pattern, defined below; a dataset description, which is a set of URIs; a set of solution sequence modifiers, defined below [[Where are these defined??]], and a result form, also defined below.
A pattern is one of:
a basic graph pattern;
a constrained pattern, which is a pattern plus a value constraint;
an optional pattern, which contains a pattern;
a group pattern, which is a finite set of patterns;
a union pattern, which is a finite set of patterns;
a dataset pattern, which is a pair of a graph identier and a pattern, where a graph identifier is an element of (V union RDF-U)
A dataset is a set containing an RDF graph, called the default graph, and zero or more named graphs, each being a pair of a URI and an RDF graph, such that none of the URIs occurs more than once in the dataset.
Note, queries are understood to be evaluated with respect to a dataset.
For each type of pattern and dataset, we extend the definition of solution for P on D:
If P is a basic graph pattern then a solution for P on D is any solution for P on some G with G in D or <x, G> in D. [[is that right? Do we include the named graphs even when the pattern does not mention datasets explicitly?]]
If P is a constrained pattern containing a pattern P' and a value constraint V then S is a solution for P when S is a solution for P' and S(V) evaluates to true.
If P is an optional pattern containing a pattern P', then a solution for P is either the empty substitution, or a solution for P'.
If P is a group pattern then S is a solution for P when S is a solution for every P' in P.
If P is a union pattern then S is a solution for P when S is a solution for some P' in P.
If P is a dataset pattern <i, P'>, then S is a solution for P on D when D contains a named graph <u, G> where S(i)=u and S is a solution for P' on G.
The solution set for P on D is the set of all solutions for P on D.
A solution sequence is a sequence of elements of the solution set. A distinct solution sequence is any solution sequence which has no repetitions; an ordered solution sequence is a solution sequence which satisfies some ordering condition. (Note, the ordering condition may not completely determine the order.) A limited solution sequence has at most a fixed number of members, called the bound. An offset solution sequence....[[not sure what to say here]] [[Also Im not sure how this relates to the missing 'solution sequence modifiers', but I bet they are related:-)]]
A result form is one of the strings 'construct', 'describe' or 'ask', or the string 'select' followed by a list of variables. The result form specifies the form in which the solutions of a query should be delivered.
Note, the variables listed after 'select' might include only a subset of the variables in the pattern, and cannot contain any blank nodes.
The answer to a basic query depends on the result form, and is defined in terms of the solution set:
if the result form is 'select' followed by v1,...,vn, then the answer is a table with n columns such that there is a row a1,...,an of the table if [[and only if]]?? there is an S in the solution set with S(vi)=ti for i from 1 to n;
if the result form is 'construct', then the answer is an RDF graph formed by merging S(P) for every S in the solution set;
if the result form is 'describe', then the solution is [[ I don't know what it is! ]];
if the result form is 'ask' then the answer is true if the solution set is non-empty, otherwise false.
Note the distinction between an empty answer - which may for example arise from a query pattern with no variables - and an empty set of answers, indicating that no matches are possible betwen the query and the dataset. The result form 'ask' will give true in the first case, but false in the second.