Interaction of GRAPH graph patterns and subqueries #2793

nkaralis · 2024-10-24T12:36:38Z

Version

5.2.0

Question

Hello,

I have some questions about the interaction of GRAPH graph patterns and subquries.

I am using version 5.2.0.

Assume the scenario described below.

First, I load a graph into two separate named graphs.

LOAD <https://raw.githubusercontent.com/w3c/rdf-tests/refs/heads/main/sparql/sparql11/functions/data.ttl> INTO GRAPH <http://www.example.org/graph1> ;
LOAD <https://raw.githubusercontent.com/w3c/rdf-tests/refs/heads/main/sparql/sparql11/functions/data.ttl> INTO GRAPH <http://www.example.org/graph2>

Both graphs contain 16 triples.

The query provided below, returns the triples found in both graphs, which results in 32 solutions. Here, ?g is always unbound.

SELECT * WHERE {
    GRAPH ?g { 
        {
            SELECT ?s ?p ?o  WHERE {
                ?s ?p ?o
            }
        }
    }
}

The query provided below also returns 32 results. In this case, ?g is always assigned a value (i.e., <http://www.example.org/graph1> or <http://www.example.org/graph2>)

SELECT * WHERE {
    GRAPH ?g { 
        {
            SELECT * WHERE {
                ?s ?p ?o
            }
        }
    }
}

I have the following questions:

First, why do these queries return different results?
Second, why does the second query return 32 results?

For both queries, I was expecting 64 results: Cartesian product between the results of the subqueries (32 results) and the possbible values for ?g (2 named graphs).

Thank you in advance.

The text was updated successfully, but these errors were encountered:

rvesse · 2024-10-24T13:10:34Z

Can you provide details of what your storage setup is e.g.

Is this TDB2 or an in-memory dataset?
Do you have unionDefaultGraph enabled by any chance?
Providing the Fuseki config file if using Fuseki would be helpful

In algebra terms these end up being different algebra's which likely explains the difference in results.

Your first query yields the following algebra:

(base <http://example/base/>
  (project (?s ?p ?o)
    (quadpattern (quad ?g ?s ?p ?o))))

While your second yields the following algebra:

(base <http://example/base/>
  (quadpattern (quad ?g ?s ?p ?o)))

Notice that with the SELECT * in the inner query the project step is omitted from the generated algebra so ?g is always unbound. However, I'm not sure if this is the correct behaviour here, probably a question for @afs to answer

For both queries, I was expecting 64 results: Cartesian product between the results of the subqueries (32 results) and the possbible values for ?g (2 named graphs).

That shouldn't ever be the case, the way a GRAPH ?g clause is logically defined is that the inner pattern is executed independently for each graph in the dataset and the results are union'd together with the . So each graph independently yields 16 results and these union together to yield 32 results.

nkaralis · 2024-10-24T13:26:29Z

I am using fuseki with TDB2

# for starting the server
java -jar fuseki-server.jar --update --tdb2 --loc=databases/testing /endpoint

I am using the default config file found in apache-fuseki-5.2.0/run

# Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0

## Fuseki Server configuration file.

@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .

[] rdf:type fuseki:Server ;
   # Example::
   # Server-wide query timeout.   
   # 
   # Timeout - server-wide default: milliseconds.
   # Format 1: "1000" -- 1 second timeout
   # Format 2: "10000,60000" -- 10s timeout to first result, 
   #                            then 60s timeout for the rest of query.
   #
   # See javadoc for ARQ.queryTimeout for details.
   # This can also be set on a per dataset basis in the dataset assembler.
   #
   # ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "30000" ] ;

   # Add any custom classes you want to load.
   # Must have a "public static void init()" method.
   # ja:loadClass "your.code.Class" ;   

   # End triples.
   .

That shouldn't ever be the case, the way a GRAPH ?g clause is logically defined is that the inner pattern is executed independently for each graph in the dataset and the results are union'd together with the . So each graph independently yields 16 results and these union together to yield 32 results

I see. It makes sense, thank you

afs · 2024-10-27T10:07:10Z

@nkaralis -- thank for the report. The unbound ?g is a bug.

@rvesse's analysis is correct (a simpler reproduction below). TDB is the main user of quad-based execution but its available for in-memory as well:

## ==> Q.rq <==
SELECT * {
    GRAPH ?g { 
            SELECT ?s  { ?s ?p ?o }
    }
}

## ==> D.trig <==
PREFIX : <http://example/>

GRAPH :g1 { :s :p :o }

and

  sparql --optimize=false --engine=quad --data D.trig --query Q.rq

giving

--------------------------
| s                  | g |
==========================
| <http://example/s> |   |
--------------------------

nkaralis added the question label Oct 24, 2024

afs added bug and removed question labels Oct 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interaction of GRAPH graph patterns and subqueries #2793

Interaction of GRAPH graph patterns and subqueries #2793

nkaralis commented Oct 24, 2024

rvesse commented Oct 24, 2024

nkaralis commented Oct 24, 2024

afs commented Oct 27, 2024

Interaction of GRAPH graph patterns and subqueries #2793

Interaction of GRAPH graph patterns and subqueries #2793

Comments

nkaralis commented Oct 24, 2024

Version

Question

rvesse commented Oct 24, 2024

nkaralis commented Oct 24, 2024

afs commented Oct 27, 2024