Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interaction of GRAPH graph patterns and subqueries #2793

Open
nkaralis opened this issue Oct 24, 2024 · 3 comments
Open

Interaction of GRAPH graph patterns and subqueries #2793

nkaralis opened this issue Oct 24, 2024 · 3 comments
Labels

Comments

@nkaralis
Copy link

Version

5.2.0

Question

Hello,

I have some questions about the interaction of GRAPH graph patterns and subquries.

I am using version 5.2.0.

Assume the scenario described below.

First, I load a graph into two separate named graphs.

LOAD <https://raw.githubusercontent.com/w3c/rdf-tests/refs/heads/main/sparql/sparql11/functions/data.ttl> INTO GRAPH <http://www.example.org/graph1> ;
LOAD <https://raw.githubusercontent.com/w3c/rdf-tests/refs/heads/main/sparql/sparql11/functions/data.ttl> INTO GRAPH <http://www.example.org/graph2>

Both graphs contain 16 triples.

The query provided below, returns the triples found in both graphs, which results in 32 solutions. Here, ?g is always unbound.

SELECT * WHERE {
    GRAPH ?g { 
        {
            SELECT ?s ?p ?o  WHERE {
                ?s ?p ?o
            }
        }
    }
}

The query provided below also returns 32 results. In this case, ?g is always assigned a value (i.e., <http://www.example.org/graph1> or <http://www.example.org/graph2>)

SELECT * WHERE {
    GRAPH ?g { 
        {
            SELECT * WHERE {
                ?s ?p ?o
            }
        }
    }
}

I have the following questions:

  • First, why do these queries return different results?
  • Second, why does the second query return 32 results?

For both queries, I was expecting 64 results: Cartesian product between the results of the subqueries (32 results) and the possbible values for ?g (2 named graphs).

Thank you in advance.

@rvesse
Copy link
Member

rvesse commented Oct 24, 2024

Can you provide details of what your storage setup is e.g.

  • Is this TDB2 or an in-memory dataset?
  • Do you have unionDefaultGraph enabled by any chance?
  • Providing the Fuseki config file if using Fuseki would be helpful

In algebra terms these end up being different algebra's which likely explains the difference in results.

Your first query yields the following algebra:

(base <http://example/base/>
  (project (?s ?p ?o)
    (quadpattern (quad ?g ?s ?p ?o))))

While your second yields the following algebra:

(base <http://example/base/>
  (quadpattern (quad ?g ?s ?p ?o)))

Notice that with the SELECT * in the inner query the project step is omitted from the generated algebra so ?g is always unbound. However, I'm not sure if this is the correct behaviour here, probably a question for @afs to answer


For both queries, I was expecting 64 results: Cartesian product between the results of the subqueries (32 results) and the possbible values for ?g (2 named graphs).

That shouldn't ever be the case, the way a GRAPH ?g clause is logically defined is that the inner pattern is executed independently for each graph in the dataset and the results are union'd together with the . So each graph independently yields 16 results and these union together to yield 32 results.

@nkaralis
Copy link
Author

I am using fuseki with TDB2

# for starting the server
java -jar fuseki-server.jar --update --tdb2 --loc=databases/testing /endpoint

I am using the default config file found in apache-fuseki-5.2.0/run

# Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0

## Fuseki Server configuration file.

@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .

[] rdf:type fuseki:Server ;
   # Example::
   # Server-wide query timeout.   
   # 
   # Timeout - server-wide default: milliseconds.
   # Format 1: "1000" -- 1 second timeout
   # Format 2: "10000,60000" -- 10s timeout to first result, 
   #                            then 60s timeout for the rest of query.
   #
   # See javadoc for ARQ.queryTimeout for details.
   # This can also be set on a per dataset basis in the dataset assembler.
   #
   # ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "30000" ] ;

   # Add any custom classes you want to load.
   # Must have a "public static void init()" method.
   # ja:loadClass "your.code.Class" ;   

   # End triples.
   .

That shouldn't ever be the case, the way a GRAPH ?g clause is logically defined is that the inner pattern is executed independently for each graph in the dataset and the results are union'd together with the . So each graph independently yields 16 results and these union together to yield 32 results

I see. It makes sense, thank you

@afs afs added bug and removed question labels Oct 27, 2024
@afs
Copy link
Member

afs commented Oct 27, 2024

@nkaralis -- thank for the report. The unbound ?g is a bug.

@rvesse's analysis is correct (a simpler reproduction below). TDB is the main user of quad-based execution but its available for in-memory as well:

## ==> Q.rq <==
SELECT * {
    GRAPH ?g { 
            SELECT ?s  { ?s ?p ?o }
    }
}
## ==> D.trig <==
PREFIX : <http://example/>

GRAPH :g1 { :s :p :o }

and

  sparql --optimize=false --engine=quad --data D.trig --query Q.rq

giving

--------------------------
| s                  | g |
==========================
| <http://example/s> |   |
--------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants