Friday, November 25, 2011

Quest and SNOMED and other huge ontologies

Today we started working with SNOMED. We are using it as a benchmark for your ontology loading and preprocessing algorithms. Even though SNOMED is strictly out of the OWL 2 QL profile, it seems to be a very good testing bed for us since most of the axioms in the ontology do fall in the OWL 2 QL fragment (with some minor syntax adjustments). Even better, it seems we will also be able to approximate and get complete inferences for ground instances (which is what often matters in the data-intensive applications we have in mind with Quest).

Things to do to fully support SNOMED:

  • Upgrade Quest to support the OWLAPI 3.
  • Optionally, upgrade quest to support Protege 4.1 (however, once the previous one is done, this should be straight forward).
  • Upgrade our ontology translation mechanism (from OWLAPI to internal representation)
  • Benchmark and fix.
  • Possibly upgrade our semantic index implementation.
Once the loading performance is done, we should be able to easily link massive amounts of data to SNOMED concepts and relationships with the techniques already have. This would be specially useful for applications like semantic search with NLP concept tagging of resources which is a common use of SNOMED and that generates huge amounts of data assertions. One more step towards getting read of forward/backward chaining!

Expect good performance in huge ontologies like this very soon!

Wednesday, November 2, 2011

Performance! things to expect for version 1.7

Hi, we just came back from ISWC and its time to get back to Quest and the OBDA plugin. There are several important things that we just started implementing and that you will be able to expect in the 1.7 release. We would like to give you a small peak at them,

  • T-Mapping (virtual mode): As you know, right now the virtual mode in Quest is not efficient. This is because way in which we generate SQL from the mapping of the OBDA model is very simple, basically just a one-to-one implementation of the methods described in [1], which tend to generate too many SQL queries.  However, we already have the theory to optimize the SQL generation step [2], the technique is called T-Mappings for OBDA, and it is basically a query preserving transformation of the mappings of the systems that allows us to simplify the rewriting process, and the SQL generated by the system. Once T-Mappings are implemented in Quest, the system will get a dramatic boost in performance in the virtual OBDA setting, similar to the performance boost that you get when you use "Semantic Index" instead of "direct" modes. No more exponential SQL queries!
  • SQL Analysis (virtual and classic mode): We just finished implementing the SQL api for the OBDALib and we are now integrating it with the SQL generation module of Quest. This will allow us to generate very efficient SQL, with little or no nesting at all.
  • Improved query containment detection.  
  • Bulk loading and external databases (classic mode): Currently, Quest in classic OBDA mode is limited by the amount of RAM you have in the system. This is so because of two points. First we Quest can only receive as input an OWLAPI OWLOntology object, and not a reference to the location of the ontology. This means that the whole data has to be loaded before Quest can receive it, hence, RAM limit 1. To fix this we will allow you to load an ontology and data using URL references and files. The second reason of the RAM limit is that once Quest loads the data, it stores it in an in-memory H2 database, hence RAM limit 2. To fix this issues we will allow you to instruct Quest store the ontology in an external database where you will not have such limits.
Other improvements:
  • A Sesame implementation for Quest
  • Support for OWLAPI 3 and Protege 4.1
[1] Linking data to ontologies. Antonella Poggi, Domenico Lembo, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Riccardo Rosati. J. on Data Semantics, X:133-173, 2008. pdf
[2] Dependencies: Making ontology based data access work in practice Mariano Rodriguez-Muro and Diego Calvanese. In Proc. of the 5th Alberto Mendelzon Int. Workshop on Foundations of Data Management (AMW 2011), volume 749 of CEUR Electronic Workshop Proceedings, http://ceur-ws.org/, 2011. pdf