The following replaces 5/5.1/5.2 :

----------------

5 Basic Graph Patterns

A basic graph pattern is a set of triple patterns. Basic graph patterns stand in the same relation to triple patterns that RDF graphs do to RDF triples, and much of the same terminology can be applied to them. In particular, two basic graph patterns are said to be *equivalent* if there is a bijection M between the terms of the triple patterns that maps blank nodes to blank nodes and maps variables, literals and IRIs to themselves, such that a triple ( s, p, o ) is in the first pattern if and only if the triple ( M(s), M(p) M(o) ) is in the second. This definition extends that for RDF graph equivalence to basic graph patterns by preserving variable names across equivalent patterns.

Basic graph patterns can be instantiated by replacing both variables and blank nodes by terms, giving two notions of instance. Following the terminology of [RDF semantics], an *RDF instance mapping* is a mapping from blank nodes to RDF terms. An *answer mapping* is a mapping from query variables to RDF terms. A *SPARQL instance mapping*, or simply an *instance mapping,* is a mapping from blank nodes and variables to RDF terms. Any instance mapping defines an answer mapping and an RDF instance mapping obtained by restricting it to query variables and blank nodes respectively.

5.1 SPARQL Basic Graph Pattern Matching

Suppose BGP is a basic graph pattern and G is an RDF graph. A *pattern solution* for BGP *from* G is an answer mapping P such that P(BGP) has an RDF instance which is a subgraph of G. Equivalently, such that P(BGP) is simply entailed by G.

This definition allows the solution mapping to bind a variable in BGP to a blank node in G. Since SPARQL treats blank node IDs in the answer document as scoped to the document, they cannot be understood as identifying nodes in the dataset. If DS is the dataset of a query, pattern solutions are therefore understood to be not from DS itself, but from an RDF graph, called the *scoping graph,* which is graph-equivalent to DS but shares no blank nodes with DS or with BGP. The same scoping graph is used for all answers to a single query. The scoping graph is purely a theoretical construct; in practice, the effect is obtained simply by the document scope conventions for blank node identifiers.

Since RDF blank nodes allow infinitely many redundant solutions for many patterns, there can be infinitely many pattern solutions (obtained by replacing blank nodes by different blank nodes.) It is necessary, therefore, to somehow delimit the set of answers. SPARQL adopts a simple subgraph matching criterion for this. A SPARQL answer is the restriction of a SPARQL instance mapping M to query variables, where M(BGP) is a subset of the scoping graph. There is one answer for each distinct such SPARQL instance mapping M.

This is optimised for ease of computation rather than redundancy elimination. It allows answer sets to contain redundancies even when the dataset is lean, and it allows logically equivalent datasets to yield distinct answer sets.

<<might give a couple of simple examples?>>

Appendix @@@ gives general conditions on query answer sets for other entailment regimes.

-----------------

This is the Appendix:

-----------------

General Framework for Basic Graph Matching

The overall SPARQL design can be used for queries which assume a more elaborate form of entailment than simple entailment, by re-writing the matching conditions for basic graph patterns. Since it is an open research problem to state such conditions in a single general form which applies to all forms of entailment and optimally eliminates needless or inappropriate redundancy, this document only gives necessary conditions which any such solution should satisfy. These will need to be extended to full definitions for each particular case.

An *entailment regime* is a transitive idempotent binary relation between subsets of RDF graphs and RDF graphs. A graph in the range of an entailment regime E is called *well-formed* for the regime.

Examples of entailment regimes include simple entailment [RDF-MT], RDF entailment [RDF-MT], RDFS entailment [RDF-MT], D-entailment [RDF-MT] and OWL-DL entailment [OWL-Semantics]. If E is an entailment regime then we will refer to E-entailment, E-consistency, etc.., following this naming convention.

Some entailment regimes can categorize some RDF graphs as inconsistent. For example, "-1"^^xsd:positiveInteger is inconsistent with respect to D-entailment. The effect of a query on an inconsistent graph is not covered by this specification, but **must** be specified by the particular SPARQL extension.

A SPARQL extension to E-entailment **must** satisfy the following conditions.

(1) The scoping graph SG corresponding to any consistent source document SD is uniquely specified and is E-equivalent to SD.

(2) For any basic graph pattern BGP and pattern solution P, P(BGP) is well-formed for E

(3) For any scoping graph SG and answer set {P_{1} ... P_{n}} for a basic graph pattern BGP,

SG E-entails (SG union P_{1}(BGP_{1}) union ... union P_{n}(BGP_{n}))

where {BGP_{1 .... }BGP_{n}}is a set of basic graph patterns all equivalent to BGP none of which share any blank nodes with any other or with SG.

These conditions do not fully determine the set of possible answers, since RDF allows unlimited amounts of redundancy. In addition, therefore, the following **must** hold.

(4) Each SPARQL extension must provide conditions on answer sets which guarantee that every BGP and SD has a finite set of answers which is unique up to RDF graph equivalence.

Notes.

(a) SG will often be graph equivalent to SD, but restricting this to E-equivalence allows some forms of normalization, for example elimination of semantic redundancies, to be applied to the source documents before querying.

(b) The construction in condition 3 ensures that any blank nodes introduced by the answer mapping are used in a way which is internally consistent with the way that blank nodes occur in SG. This ensures that blank node identifiers occur in more than one answer in an answer set only when the blank nodes so identified are indeed identical in SG. If the extension does not allow answer bindings to blank nodes, then this condition can be simplified to the condition:

SG E-entails P(BGP) for each pattern solution P.

(c) These conditions do not impose the SPARQL requirement that SG share no bnodes with SD or BGP. In particular, it allows SG to actually be SD. This allows query protocols in which blank node identifiers retain their meaning between the query and the source document, or across multiple queries. Such protocols are not supported by the current SPARQL protocol specification, however.

(d) Since conditions 1 to 3 are only necessary conditions on answers, condition 4 allows cases where the set of legal answers can be restricted in various ways. For example, the current state of the art in OWL-DL querying focusses on the case where answer bindings to blank nodes are prohibited. We note that these conditions even allow the pathological 'mute' case where every query has an empty answer set.

(e) None of these conditions refer explicitly to instance mappings on blank nodes in BGP. For some entailment regimes, the existential interpretation of blank nodes cannot be fully captured by the existence of a single instance mapping. These conditions allow such regimes to give blank nodes in query patterns a 'fully existential' reading.

--------------

It is straightforward to show that SPARQL satisfies these conditions for the case where E is simple entailment, given that the SPARQL condition on SG is that it is graph-equivalent to SD but shares no bnodes with SD or BGP (which satisfies the first condition). The only condition which is nontrivial is (3).

Every answer P_{i} is the answer mapping restriction of a SPARQL instance M_{i} such that M_{i}(BGP_{i}) is a subgraph of SG. Since BGP_{i} and SG have no blank nodes in common, the range of M_{i} contains no blank nodes from BGP_{i}; therefore, the answer mapping P_{i} and RDF instance mapping I_{i} components of M_{i} commute, so M_{i}(BGP_{i}) = I_{i}(P_{i}(BGP_{i})). So

M_{1}(BGP_{1}) union ... union M_{n}(BGP_{n})

= I_{1}(P_{1}(BGP_{1})) union ... union I_{n}(P_{n}(BGP_{n}))

= [ I_{1} + ... + I_{n}]( P_{1}(BGP_{1}) union ... union P_{n}(BGP_{n}) )

since the domains of the I_{i} instance mappings are all mutually exclusive. Since they are also exclusive from SG,

SG union [ I_{1} + ... + I_{n}]( P_{1}(BGP_{1}) union ... union P_{n}(BGP_{n}) )

= [ I_{1} + ... + I_{n}](SG union P_{1}(BGP_{1}) union ... union P_{n}(BGP_{n}) )

i.e.

SG union P_{1}(BGP_{1}) union ... union P_{n}(BGP_{n})

has an instance which is a subgraph of SG, so is simply entailed by SG by the RDF interpolation lemma [RDF Semantics: http://www.w3.org/TR/rdf-mt/#interplemmaprf]. QED.