Thursday, 19 March 2015

How to explore SPARQL endpoint?


In this article you will learn:
- how to explore SPARQL endpoint with Ontorion™ SPARQL Tools for Excel
- about SPARQL autocomplete tool from Cognitum
Ontorion™ SPARQL Tools for Excel latest release offers two new features enriching the SPARQL experience. First one is Explore SPARQL endpoint tool, which enables to get a quick overview of the data content of the endpoint. Second is SPARQL autocomplete tool, which guides user throughout writing the query with intelligent autocomplete hints.

Motivation

A growing amount of data is made public via SPARQL endpoints. You can explore the data by asking SPARQL queries. One of the most famous SPARQL endpoints is DBpedia - a semantic version of Wikipedia. Many public institutions expose some of their data in such an open way as well. A good example is The Environment Agency of England and Wales, which publishes data about bathing water quality .
The basic building block of SPARQL data set is a triple. It is a statement of the form subject-predicate-object. Apart from that, the structure of the data can be quite loosely defined. Thus it may sometimes be difficult to explore a new SPARQL endpoint for the first time.


Explore SPARQL endpoint

To reproduce the steps described in this section
- download Ontorion™ SPARQL Tools for Excel
- choose tab Ontorion > Import from SPARQL
- enter your SPARQL endpoint address and press Explore SPARQL endpoint
- to get a preview of the query Ontorion > Change query
Imagine you are given a SPARQL endpoint address http://environment.data.gov.uk/sparql/bwq/query and you have no idea what is inside....
What should you do? How about first checking out available classes? Here we ask a query about things that are defined to be classes in OWL/RDF standard.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www/w3/org/2002/07/owl>

SELECT DISTINCT ?class
WHERE {
     { ?class rdf:type owl:Class }
     UNION 
     { ?class rdf:type rdfs:Class }
      }
LIMIT 500

As a result you will get a list of 16 classes:

Let us compare it with a list of things that are used as if they were classes. In order to do so, we ask a query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?class (COUNT(?instance) AS ?numberOfInstance)
WHERE {
     ?instance rdf:class ?class .
      }
GROUP BY ?class
ORDER BY DESC(?numberOfInstances)
LIMIT 500


You can see we get much more results. So why are there some things which are used as classes even though they are not declared to be classes? There can be two reasons for that:

  • the data references data from outside data sources so it does not have their definitions
  • depending on the endpoint, the data set might by loosely defined - it does not contain definitions for the classes it introduces

You can see we get much more results. So why are there some things which are used as classes even though they are not declared to be classes? In a similar manner we will get few results for things that are defined to be OWL properties. However there will be plenty of things which are used as properties. Below are the useful queries. The first query:

 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www/w3/org/2002/07/owl>
SELECT DISTINCT ?objectProperty
WHERE {
      ?objectProperty rdf:type owl:ObjectProperty
}
LIMIT 500
... gets much fewer results than the second one:

SELECT ?property (COUNT(?subject) AS ?numberOfUses)
WHERE {
      ?subject ?property ?object .
      }
GROUP BY ?property
ORDER BY DESC(?numberOfUses)

SPARQL autocomplete tool

With SPARQL autocomplete tool you can get autocomplete hints as you type. The autocomplete tool analyzes the syntax of the query being entered:

It also contains list of well known ontologies and the prefixes for them. The list is maintained with LOV endpoint.


What is even more, you can get a preview of the content of the SPARQL endpoint as you type the query. The core part of every SPARQL query are triple patterns (subject-predicate-object), for example ?x ?y ?z or ?a rdfs:label ?b. The autocomplete tool queries the endpoint for triples similar to the triple pattern that is being typed. The results shown are thus a sample of the endpoint content: they may not include all the data from the endpoint. Nevertheless, they can give a good intuition on what the data looks like: what are the most common predicates, what other ontologies are referenced etc.
How does it work in practice? Imagine you begin typing your triple pattern with variable ?a . The autocomplete tool shows that among other options the second token in your query can be rdf:type. Thus you are helped to construct a triple pattern ?a rdf:type ?b. What is more, typing ?a rdf:type and loading autocomplete will give you a sample of things that are an object (third place) in a triple with predicate (second place) rdf:type.
Side note: as you can see in the screenshot above, with Ontorion™ SPARQL Tools for Excel you can now choose between two HTTP request methods: GET and POST while sending your query.

No comments:

Post a Comment