Thursday, September 19, 2013

Meet the ontop team!

Babak, Martin, Timi, Mariano, Mindaugas, Guohui

-ontop- v1.9 released!

We are happy to announce the availability of  -ontop- v1.9

a Java framework to query RDBMS using SPARQL over RDF(s) and OWL ontologies. Get
it at:


 HIGHLIGHT OF CHANGES in v1.9

This release continues with the project's refactoring, cleaning of code, dependencies and internal documentation. We also added two very exciting new features. And last, a bunch of implementation improvements related to, both, performance and bug fixing. Here goes a summary:


  • FEATURE: Hybrid RDF graphs! This is a unique new feature in ontop that no other R2RML system has (TTBOMK). Now you have combine data coming from mappings and still have some RDF triples. This lets you store most of the data on the DB, but keep some facts of knowledge in the ontology as triples (ABox facts). For more info see this post:

    http://ontop-obda.blogspot.it/2013/09/hybrid-rdf-graphs-or-hybrid-aboxes-as.html
  • FEATURE: Now ontop supports mappings with URI templates in Class or Properties! For more info check this link:
     
    http://ontop-obda.blogspot.it/2013/09/uri-templates-for-properties-and-classes.html
       
  • IMPROVEMENT: We cleaned a lot of dependencies in ontop, you will see that our package is now half the size as before.
  • IMPROVEMENT: Upgraded libraries. Now we link to Sesame 2.7.6, OWLAPI 3.4.5 and Protege 4.3
  • BUF FIXES: Critical bug fixes in CONSTRUCT and DESCRIBE queries, as well as in the code that matches URI's to URI templates. Several bugfixes in ontopPro and the SPARQL end-point.

   

NEW ONTOP TUTORIAL

We also prepared a new tutorial that guides you through the first steps of using ontop and ontopPro (the Protege 4 plugin), and how it can be used for data access and data integration. Find it here:

http://ontop-obda.blogspot.it/2013/09/new-ontop-tutorial.html
   

Cheers,
The -ontop- team

URI Templates for properties and classes

Since v1.9 ontop supports mappings with URI templates or variables in property or class locations. This means now you can write mappings like:

:person/{ID} rdf:type :Person{OCCUPATION}
SELECT ID, OCCUPATION FROM tbl_person

or like:

:person/{id} <{attribute}> {value}
SELECT id, attribute, value FROM tbl_data

This kind of mappings are very useful when some of the vocabulary of the ontology is in the DB.

Semantics

The new mappings are just syntactic sugar for normal mappings. What the ontop does internally is that, during initialisation time, it will transform these mappings into traditional mappings with fixed predicates/classes by inspecting the values in the DB. This is done at initialistation time, and changes to the DB in columns related to these mappings will not be taken into account by the system. For example, suppose the table tbl_person is as follows:


ID OCCUPATION
1 Researcher
2 Researcher
3 Doctor
4 Driver
5 Driver

If your mappings looks like:

:person/{ID} rdf:type :{OCCUPATION}
SELECT ID, OCCUPATION FROM tbl_person

ontop will translate it into the following 3 mappings (one for each distinct value of OCCUPATION):

:person/{ID} rdf:type :Researcher
SELECT ID, OCCUPATION FROM tbl_person WHERE OCCUPATION="Researcher"

:person/{ID} rdf:type :Doctor
SELECT ID, OCCUPATION FROM tbl_person WHERE OCCUPATION="Doctor"

:person/{ID} rdf:type :Driver
SELECT ID, OCCUPATION FROM tbl_person WHERE OCCUPATION="Driver"

If a new row is inserted in tbl_person in which a new occupation is introduced, e.g., (6,"Singer"), the system will not update itself. You need to restart it.

Limitations and Performance

The code that implements this is not very robust and the system may fail to create the real mappings if the original SQL query is not a simple SELECT-PROJECT-JOIN query. 

Also, each mapping of these form requires that ontop queries the database to find out the values of the DB. This could be expensive, depending on the database.

Give it a try and let us know how it goes!

Hybrid RDF Graphs (or, Hybrid ABoxes, as you want to see it ;-) )

This is a exciting new feature that allows to combine virtual RDF (mappings) with real RDF (or, virtual ABoxes with ABox assertions). This is a unique feature in ontop, in other systems either you have mappings and everything is about SPARQL to SQL, or you have triples and you have a triple store. 

With hybrid RDF graphs you can have an ontology with axioms and data as follows (in turtle syntax):

Axiomatic triples
:ceoOf rdfs:domain        :CEO
:CEO   rdfs:subClassOf    :BusinessMan
:ceoOf rdfs:subPropertyOf :worksFor

Data triples
:Bill_Gates :ceoOf :Microsoft 

Mappings
:person/{ID} :knows :Bill_Gates
SELECT ID FROM tbl_microsoft_employees


Note how the mapping states that all people that are created from IDs in tbl_microsoft_employees know Bill Gates. Bill gates is a sort of "global" individual. Moreover, we also know some things about Bill Gates, i.e., that he is the CEO of Microsoft. And we know some things about the business world, i.e., that the domain of ceoOf is a CEO, that a CEO is a kind of BussinesMan, and that being a ceo of a a company is one way of working for that company. 

Now we execute queries like the following and get the answers that we expect:

SELECT ?x ?y WHERE {
   ?x :knows ?y. ?y a :BusinessMan ; :worksFor :Microsoft 
}

As always, ontop will translate this SPARQL query into an SQL query, and in this particular case the query will look something like this:

SELECT "person/{ID}" as x, ":Bill_Gates" as y
FROM tbl_microsoft_employees

Notice that there is a lot going on here, this is not just query translation. There was reasoning going, involving all axioms in the ontology, the data triples and the mappings. In the end, we arrive to the simple, efficient query that we would write manually, and that will get you great performance even in the presence of large volumes of data.

Why to use hybrid RDF graphs?

This functionality is useful when you have large volumes of data, which wouldn't be efficient to translate into RDF and you want to keep in the original RDBMS, but at the same time you have some (not so large volume of) data that you want to use during query answering. The smaller dataset is to little to bother to insert it into the RDBM and make mappings for it, or it simply belongs in the ontology, i.e., it is domain knowledge, not application data.

Limitations

This functionality is available only for Class and Object Properties. That is, you may not have data triples like: 

:Bill_Gates :age "57"^^xsd:integer
:Bill_Gates :name "William Henry Gates"

Performance

Using hybrid RDF graphs may slow down the query rewriting process. The system deals with rdf triples as if they where mappings that require nothing from the DB. That means that all those facts are considered during the SQL generation, and having too many of them may slow things down during query translation.

Free variables: Particularly, query rewriting maybe become slow in queries that have "free classes" or "free properties" in the graph patterns, for example:

SELECT ?x ?p WHERE { ?x ?p :mariano }

or

SELECT ?x ?c WHERE { ?x rdf:type ?c }

If you are experiencing slow query rewriting because of this, try to avoid having these "free" patterns in isolation. Use them only if there is a "non-free" section of the query with which you can JOIN them. This will restrict the query and will limit the facts that are involved in answering your query, making everything faster. For example:

SELECT ?x ?c 
WHERE {?x :hasFather ?y. ?x :hasAge ?z. ?x rdf:type ?c }

JOIN order: At the moment, make sure that any triple patterns in SPARQL that are related to data triples are at the end of the query. Specially those with free predicates. For example, this is not good

SELECT ?x ?c 
WHERE { ?x rdf:type ?c . ?x :hasFather ?y. ?x :hasAge ?z. }

but this is good:

SELECT ?x ?c 
WHERE {?x :hasFather ?y. ?x :hasAge ?z. ?x rdf:type ?c }


A good join order is the one in which triple patterns which are more "restricted" come first. For virtual RDF graphs (pure mappings) this doesn't matter, but for Hybrid it might matter a lot. In the future we hope to improve this, but for the moment you should take it into account.

Number of data triples: The number of facts (data triples) will affect performance of query rewiring. How much is "too big" and when query rewriting may become slow depends on your memory, machine, the SPARQL query and how much the ABox interacts with the Tbox. However, the current implementation should allow for a few thousand ABox assertions in normal hardware.  


Give this kind of modelling a try and let us know how it goes!

Wednesday, September 18, 2013

New ontop tutorial

A few weeks ago there was the short-Protege course in Vienna. We were invited to give a talk on ontopPro and now I want to share the material.

It's tutorial on how to create mappings using ontop, how inference (OWL 2 QL and RDFS) plays a role answering SPARQL queries in ontop, and how ontop's support for on-the-fly SQL query translation enables scenarios of semantic data access and data integration.

The material includes all the worked out mappings, SPARQL queries and SQL databases and it's full with useful hints on how to use OBDA for different purposes. 

Please take a look and send me any feedback you may have :)


Monday, September 2, 2013

Issues in last release

Hi, in the last release we introduced two critical bugs that are affecting DESCRIBE and general SELECT queries. In particular, depending on the URI templates used in mappings, queries might return empty results when they shouldn't.

We are now fixing this and we will make a release ASAP to fix this issues. 

Wednesday, July 31, 2013

v1.8 now available

We are happy to announce the availability of

-ontop- v1.8

a Java framework to query RDBMS using SPARQL over RDF(s) and OWL ontologies.

If you are interested in virtual RDF graphs, R2RML mappings and/or RDFS/OWL2QL reasoning, then -ontop- is for you. The system implements the cutting edge SPARQL-to-SQL and query optimisation techniques that allow it to answer SPARQL queries using SQL databases while obtaining great performance.

With -ontop-, you don't need to move your data from your DB to enjoy the benefits of the RDF data model, the SPARQL query language or RDF/OWL2QL inference. There is no need for expensive ETL or forward/backward chaining either.

The project went through a large refactoring (its still ongoing) and this is our most stable release to the date. Moreover, as always, we included many new features. The highlights of this new release include:
  • SOURCE CODE IS NOW AVAILABLE (See the new license terms)
  • Improved mapping syntax (see full changelog)
  • New command line tools to query, automatically generate mappings, generate RDF/OWL from mappings
  • Faster SQL translation
  • Improved the SQL queries generated by Quest by avoiding some operations when unnecessary (e.g., casting), and placing optimally the JOIN conditions in the SQL query. 
  • Major restructuring of code for performance and clarity of the API (this is still ongoing)
  • Many important bugfixes for CONSTRUCT, ASK and DESCRIBE queries -ontopPro- is now compatible with Protege 4.3
  • Many bug fixes


Cheers,
The -ontop- development team

Friday, April 5, 2013

Build 2305 is available


The highlights of this new release include:

* Support for importing/exporting R2RML mappings
* Support for automated mapping generation as defined by the "Direct Mapping" W3C specification
* Various options for data materialization (similar to dump-rdf in D2RQ)
* Support for concurrent queries in the SPARQL endpoint through connection pooling
* Better support for Oracle, Postgres, SQL Server and DB2
* Improved the SQL queries generated by Quest by avoiding some operations when unnecessary (e.g., casting), and placing optimally the JOIN conditions in the SQL query.

Tuesday, March 5, 2013

New build on the oven

Following last release, we have been cooking the next release. Goodies to expect in this release are:

  • Support for multiple JDBC connections by connection pooling, which means better performance in multithreaded environments.
  • Support for multiple clients for the SPARQL end-point (btw, we are getting GREAT performance in comparison to single client benchmarks)
  • First implementation of R2RML and Direct Mapping! yes, finally. There are still some rough edges, but most of it is already there. Documentations on how to use these will come with the release.
  • Several important bug-fixes for DB2 and SQL Server.
We are also running many new benchmarks, comparing OWLIM, Virtuoso, Stardog, D2RQ, Sparqlmap and Quest using several DBs as backend (MySQL, PostgreSQL, DB2), we'll start putting the results online as soon as possible. So far, things look good for Quest and -ontop-, even when using open source DBs such as MySQL.


Tuesday, January 29, 2013

New build almost ready, great performance enhancements

Quick update, we are now preparing the next build. We are quite excited since we finally finished implementing several optimization that where missing on the mapping manipulation algorithm. The use of "OR" in WHERE clauses is much more robust now and we now are able to exploit FOREIGN KEY contraints.

The result is that in virtual RDF Graph mode, with real databases and usual SPARQL queries we rarely generate more than 1 query SQL query one which often matches what you would have written yourself in SQL. An it is FAST (also for more complex SPARQL with variables everywhere, even if its more than one query), even with inference enabled.

We'll be posting benchmark numbers soon. Stay tunned!

Friday, January 11, 2013

Documentation and Examples

Quick update. He have re-organized the documentation section of the main site and of the technical wiki . Things should be much easier to find now.

We are also extending the "Examples" section with several of the scenarios that we use for testing, demonstration and benchmarking. We have already included the BSBM and FishMark scenarios that we use for benchmarks, and the IMDB-MovieOntology scenario that we use for demonstrations. Take a look at them, they are nice examples of very natural mappings.

The IMDB-MovieOntology scenario is particularly interesting since the database is real, very large and mappings turn out to be interesting. Take a look!