General Semantic Conditions for SPARQL basic graph pattern answers.

This section is normative for all extensions of SPARQL to more general entailment regimes. The particular conditions on SPARQL itself are instances of these general conditions, defined in the next section. We refer throughout to E-entailment; examples include rdf-entailment, rdfs-entailment, OWL-DL entailment applied to the RDF encoding of OWL-DL, etc..

Suppose Q is a query, B is a BGP in Q, and G is the source graph. Answers to the query Q are given by reference to a graph GQ, called the scoping graph, rather than to G itself. The scoping graph is fixed for the entire query, and each query is considered to be evaluated with respect to a distinct scoping graph.

Each SPARQL extension must specify precisely how the scoping graph GQ is defined relative to any source graph G. In SPARQL, GQ is graph-isomorphic to G but does not share any blank nodes with G or B. Another possibility is that GQ could be G itself, in an application where blank node IDs in a query answer are understood to identify actual blank nodes in the source graph, so retaining their identity across multiple queries. However, uses of bnodeIDs as common identifiers between the answer document and source graph, or between multiple queries, are not supported by the SPARQL protocols. Other possibilities include for example that GQ is a 'skolemized' version of G in which all bnodes have been replaced by IRIs, or that GQ is a subgraph of G from which all triples containing blank nodes have been removed, corresponding to a SPARQL extension which refuses to recognize blank nodes in the source graph.

A binding (for a basic graph pattern B) is a mapping from the query variables (occurring in B) to RDF terms.

Given a set {m1, m2, ... , mn} of bindings for B, an answer graph of that set is an RDF graph of the form

m1(B1) union m2(B2) ... union mn(Bn)

where {B1, B2, ... , Bn} is a set of graphs all equivalent to B but no two of which share any blank nodes. Clearly, all answer graphs of a given set are graph-equivalent, so we can speak of the answer graph, following the RDF equivalent graph convention.

The motive for these definitions is to ensure that blank node IDs in a basic graph pattern are treated as being locally scoped to the pattern, while blank node IDs that are introduced by the answer bindings reflect the same scope rules as they have in the scoping graph. Thus, two occurrences of the same blank node ID in two distinct answer bindings (to the same query) refer to the same blank node.

A set of bindings such that the scoping graph E-entails its answer graph is called an answer set for B, and the members of the set are called answers or answer bindings. A complete answer set is an answer set whose answer graph E-entails the answer graph of any other answer set.

Any SPARQL extension must provide complete answer sets for every query. However, restrictions on the form of the scoping graph may restrict the set of answers in significant ways, for example by eliminating from consideration blank node answer bindings which would otherwise be entailed.

Being a complete answer set is not sufficient to define a unique answer to a query. In general, complete answer sets are not unique, since one can add superfluous blank-node answer bindings while entailing and being entailed by the same graphs. It is necessary, therefore, to restrict the answer sets in some way so as to limit the potentially infinite amount of redundancy that the above conditions permit. This can be done in various ways, but every SPARQL extension must specify conditions which determine a unique complete answer set for each basic graph pattern query on every source graph. This can be done by restricting the semantic conditions, by imposing tighter syntactic conditions, or by giving an algorithmic description of how to compute an answer set. For example, in the above cases where the scoping graph contains no blank nodes, it would be appropriate to simply prohibit answer bindings to a blank node, and the complete answer set may then be unique. SPARQL itself requires that all answer bindings are to RDF terms which actually occur in the scoping graph.

Two different source graphs may be logically equivalent, i.e. each E-entails the other, yet give different answers to the same query. These answers will however be logically equivalent, in the sense that the corresponding answer graphs also E-entail each other.

To sum up: any SPARQL extension must specify (1) an entailment regime (2) how to determine the scoping graph defined by each source graph, and (3) how to remove or control the potentially infinite amount of possible redundancy, so that each query has a single answer set which is unique up to RDF graph isomorphism of the answer graph.

SPARQL Basic Graph Patterns

SPARQL follows the definitions in the previous section where E-entailment is simple entailment, the scoping graph GQ is some RDF graph which is graph-equivalent to G and shares no bnodes with G or with B, and the answer set is defined in terms of subgraph matching, as follows.

A pattern instance (for a basic graph pattern B and scoping graph GQ) is a mapping from query variables and blank nodes in B to RDF terms which occur in GQ. Clearly, every pattern instance defines a binding, which is the restriction of the pattern instance to the variables in B, called the answer m' of the pattern instance m . The SPARQL answer set for B is then the set of answers of every pattern instance m such that m(B) is a subgraph of GQ;

{m': m(B) is a subgraph of GQ}.

<<Proof that this is a 'complete answer set' in the above sense to follow shortly.The proof is elementary but tedious. Also various points in the text should be linked to examples in the testcase document.>>