www.openlinksw.com
docs.openlinksw.com

Book Home

Contents
Preface

RDF Database and SPARQL

Data Representation
RDF and SPARQL API and SQL
IRI Dereferencing
IRI Dereferencing For FROM Clauses, "define get:..." Pragmas IRI Dereferencing For Variables, "define input:grab-..." Pragmas
RDF Views -- Mapping Relational Data to RDF
SPARQL Implementation
RDF Inference in Virtuoso
Using Full Text Search in SPARQL
Aggregates in SPARQL
Virtuoso SPARQL Query Service

14.3. IRI Dereferencing

There are many cases when RDF data should be retrieved from remote sources only when really needed. E.g., a scheduling application may read personal calendars from personal sites of its users. Calendar data expire quickly, so there's no reason to frequently re-load them in hope that they are queried before expired.

Virtuoso extends SPARQL so it is possible to download RDF resource from a given IRI, parse them and store the resulting triples in a graph, all three operations will be performed during the SPARQL query execution. The IRI of graph to store triples is usually equal to the IRI where the resource is download from, so the feature is named "IRI dereferencing" There are two different use cases for this feature. In simple case, a SPARQL query contains from clauses that enumerate graphs to process, but there are no triples in DB.DBA.RDF_QUAD taht correspond to some of these graphs. The query execution starts with dereferencing of these graphs and the rest runs as usual. In more sophisticated case, the query is executed many times in a loop. Every execution produces a partial result. SPARQL processor checks for IRIs in the result such that resources with that IRIs may contain relevant data but not yet loaded into the DB.DBA.RDF_QUAD. After some iteration, the partial result is identical to the result of the previous iteration, because there's no more data to retrieve. As the last step, SPARQL processor builds the final result set.

14.3.1. IRI Dereferencing For FROM Clauses, "define get:..." Pragmas

Virtuoso extends SPARQL syntax of from and from named clauses. It allows additional list of options at end of clause: option ( param1 value1, param2 value2, ... ) where parameter names are QNames that start with get: prefix and values are "precode" expressions, i.e. expressions that does not contain variables other than external parameters. Names of allowed parameters are listed below.


14.3.2. IRI Dereferencing For Variables, "define input:grab-..." Pragmas

Consider a set of personal data such that one resource can list many persons and point to resources where that persons are described in more details. E.g. resource about user1 describes the user and also contain statements that user2 and user3 are persons and more data can be found in user2.ttl and user3.ttl, user3.ttl can contain statements that user4 is also person and more data can be found in user4.ttl and so on. The query should find as many users as it is possible and return their names and e-mails.

If all data about all users were loaded into the database, the query could be quite simple:

sparql select ?id ?fullname ?email
where {
    graph ?g {
        ?id a <Person> ;
          <FullName> ?fullname ;
          <EMail> ?email .
      } };

It is possible to enable IRI dereferencing in such a way that all appropriate resources are loaded during the query execution even if names of some of them are not known a priori.

sparql
  define input:grab-var "?more"
  define input:grab-depth 10
  define input:grab-limit 100
  define input:grab-base-iri "http://myhost/"
select ?id ?fullname ?email
where {
    graph ?g {
        ?id a <Person> ;
          <FullName> ?fullname ;
          <EMail> ?email .
	optional { ?id <SeeAlso> ?more } } };

The IRI dereferencing is controlled by the following pragmas:

Default resolver procedure is DB.DBA.RDF_GRAB_RESOLVER_DEFAULT(). Note that the function produce two absolute URIs, abs_uri and dest_uri. Default procedure returns two equal strings, but other may return different values, e.g., return primary and permanent location of the resource as dest_uri and the fastest known mirror location as abs_uri thus saving HTTP retrieval time. It can even signal an error to block the downloading of some unwanted resource.

DB.DBA.RDF_GRAB_RESOLVER_DEFAULT (
  in base varchar,         -- base IRI as specified by input:grab-base pragma
  in rel_uri varchar,      -- IRI of the resource as it is specified by input:grab-iri or a value of a variable
  out abs_uri varchar,     -- the absolute IRI that should be downloaded
  out dest_uri varchar,    -- the graph IRI where triples should be stored after download
  out get_method varchar ) -- the HTTP method to use, should be "GET" or "MGET".