# Reviews

## On rules with existential variables: Walking the decidability line

This journal paper Bagetrulesexistentialvariables2011 is about decidability ENTAILMENT problem in knowledge base using existential rules.

## A Generalized Framework for Ontology-based Data Access

This paper BotoevaGeneralizedFrameworkOntologybased proposes a generalization of the OBDA framework for solving heterogeneous integration problem (in theory) and for allowing one MongoDB source (in practice).

### General Framework

They uses the following theoretical settings:

• OWL 2 QL as ontology language ;
• R2RML for mapping specification ("R" in R2RML means Relational). They use mappings of the form $$\phi(\bar x) \leadsto t(\bar x)$$, where $$t$$ is an DL triple (where the property is an IRI and for type property the class-object is an IRI) and $$\phi$$ is an SQL query. Mapping semantics make them sound ;
• SPARQL queries ;
• supported source formats are JSON (MongoDB), XML, relational … (they don't mention RDF)

They propose an approach using a relational schema $$[D]$$ containing constraints on sources (inter-source constraints, also). It is like defining views with constraints. They also define a query language $$\mathcal Q$$ for directly querying source instances.

The process translates successively the SPARQL query into:

1. $$IQ$$ an optimized version of the reformulation, (reformulation step)
2. a set of queries in $$\mathcal Q$$ (rewriting step)
3. combined answers of each previous query (mediator work)

The optimization is due to a offline process on mappings, ontology and source schema (constraints) explained in SequedaOBDAQueryRewriting2014.

### Ontop/MongoDB

They build a system Ontop/MongoDB supporting the above framework with one MongoDB source. There is no need for mediator, except for creating IRI (Skolem function).

They denormalize the BDBM benchmark in order to translate it into JSON documents collections. This operation induces some redundancy in the data, speeding up some query and slowing down some others.

They show the aggregation framework (MAQ) of MongoDB which provides a way to query collections with the expressive power of Nested Relational Algebra (Nested is not for the expressive power). Such aggregation are declared using a sequence of operations in a map-reduce style. I noticed that they can push into MongoDB filter and aggregation operation.

The Ontop/MongoDB system handles SPARQL including BGP, FILTER, JOIN, OPTIONAL and UNION. They use a translation from NRA to MAQ to translate the rewritten query into MongoDb query (one to one translation).

They compare ontoSQL/MongoDB on BSBM benchmarck (1.2 billion triples) with Virtuoso system. They show that the denormalization of the data implies that their system is slower, when it has to access no contiguous disk portions, but faster of one order magnitude otherwise.

## 07/02/2019 OBDA: Query Rewriting or Materialization ? In Practice, Both !

This paper SequedaOBDAQueryRewriting2014a is about OBDA and mapping set saturation technique in an extension of RDFS ontology setting. They use only one relational source. They reduce the problem of query answering in OBDA to an unfolding problem of query using GAV RDF mappings of the form:

• $$\alpha(x) \leadsto \triple{x}{\type}{\class}$$ where $$\alpha$$ is a FO query on sources and $$\class$$ is a constant;
• $$\beta(x,y) \leadsto \triple{x}{\prop}{y}$$ where $$\beta$$ is a FO query on sources and $$\prop$$ is a constant.

They translate each triple of the ontology (extension of RDFS) into an entailment rule following the list:

• $$\triple{\class_{1}}{\subclass}{\class_{2}}$$: $$\triple{x}{\type}{\class_{1}} \rightarrow \triple{x}{\type}{\class_{2}}$$ ;
• $$\triple{\prop_{1}}{\subclass}{\prop_{2}}$$: $$\triple{x}{\prop_{1}}{y} \rightarrow \triple{x}{\prop_{2}}{y}$$ ;
• $$\triple{\prop}{\domain}{\class}$$: $$\triple{x}{\prop}{y} \rightarrow \triple{x}{\type}{\class}$$ ;
• $$\triple{\prop}{\range}{\class}$$: $$\triple{x}{\prop}{y} \rightarrow \triple{y}{\type}{\class}$$ ;
• $$\triple{\prop}{\mathrm{equivProp}}{\prop'}$$: $$\triple{x}{\prop}{y} \leftrightarrow \triple{x}{\prop'}{y}$$ ;
• $$\triple{\class}{\mathrm{equivClass}}{\class'}$$: $$\triple{x}{\type}{\class} \leftrightarrow \triple{x}{\type}{\class'}$$
• $$\triple{\prop}{\mathrm{inverse}}{\prop'}$$: $$\triple{x}{\prop}{y} \leftrightarrow \triple{y}{\prop'}{x}$$ ;
• $$\triple{\prop}{\type}{\mathrm{symProp}}$$: $$\triple{x}{\prop}{y} \rightarrow \triple{y}{\prop}{x}$$ ;
• $$\triple{\prop}{\type}{\mathrm{transProp}}$$: $$\triple{x}{\prop}{y} \wedge \triple{y}{\prop}{z} \rightarrow \triple{x}{\prop}{z}$$.

The transitivity is handled by allowing recursive query in mapping body.

They generally define a saturation of a mapping set:

Given $$\mappings$$ a mapping set over a source schema $$R$$, $$\onto$$ an ontology, $$\mappings^{\star}$$ is a saturation of $$\mappings$$ w.r.t. $$\onto$$ if for every instance $$I$$ of $$R$$: $(\graph_{I}^{\mappings})^{\rules_{\onto}} = \graph_{I}^{\mappings^{\star}}$ where $$\rules_{\onto}$$ is the set of generated entailment rules from $$\onto$$ and $$\graph$$ represents the induced graph.

They define a set of generation straightforward rules that produce a saturation of a mapping set $$\mathrm{SAT}(\mappings, \onto)$$ w.r.t. an ontology $$\onto$$ that doesn't contain transitive constraint on property. They can compute this saturation in $$O(|\mappings|. |\onto|)$$.

They wrap together each mapping having the same head by defining a view as the union of their body query. Hence, they can translate each wrapped mapping into one SQL query on the relational source. This query produces triples of the same form of the wrapped mapping head using values contenting in its view content.

## The Berlin SPARQL Benchmark

The Berlin SPARQL Benchmark BizerBerlinSPARQLBenchmark aims to compare performance of native RDF stores with the performances SPARQL-to-SQL rewriters across architectures. Also BSBM is not designed for complex reasoning but to measure SPARQL query performance against large amounts of RDF data.

BSBM is built around an e-commerce use case, where a set of product is offered by different vendors and consumers have posted reviews about products. It has a data generator in which the number of product is used as a scale factor. It can generate either relational or RDF representation of the same dataset.

Queries are using most of the SPARQL possibilities, so none of the proposed queries are BPG.

View definitions are not mentioned, but paper MontoyaSemLAVLocalAsViewMediation2014 proposes a set of views based on BSBM. This views seem not to contain existential variables in their definitions.

## IBench

IBench creates GLAV integration systems metadata, (with data ?) without query.

Documentation: https://github.com/RJMillerLab/ibench/wiki/tutorial

### Strong Points

1. The described system reuses well-known distributed environment wrappers to evaluate a query on heterogeneous sources such as Cassandra, MongoDB, MySQL, etc. Moreover, the only requirement for a source to be supported is the existence of dedicated wrappers in the distributed environment.
2. Experiments are performed on big dataset (up to 1.75 billion triples) distributed on 5 sources each using different engine.

### Weak Points

1. No ontology is used into the paper, there is not mention of any reasoning or schema constraint. This works is only about joins in integration system that makes sources data available as RDF triples.
2. The query fragment supported by the introduced system is not well defined in the paper, although such information can be found on the experiments web page: https://github.com/EIS-Bonn/Squerall/tree/master/evaluation.
3. The query answers seems to be not well defined or not follow the definition commonly used in integration system. (see below)
4. The evaluation section does not mention crucial information about executed queries. It questions the pertinence of the proposed evaluation of the system.

This paper introduces a novel integration system of heterogeneous sources supporting SPARQL queries. The mediation is done by an intermediate data representation handled by a Big Data distributed system like Spark or Presto. Like in OBDA system, mappings are used to represent source content as RDF triples. The authors presents an algorithm for joining intermediate data representations of source data using a star shape decomposition of the query. They also presents an transformation of mappings and query in order to solve some values mismatch between sources and hence allowing new joins. The evaluation is done on BSBM benchmark distributed across 5 heterogeneous data sources.

#### Preliminaries

• typo in Data Entity and Relevant Entity definition: "matcheing" -> "matching"
• 2.3 (1). Supported queries should be clearly defined. In particular, BPG in the WHERE clauses seems to have only variable as subject, it is necessary to apply the stars decomposition.
• 2.3 (2). "Relevant Entity Extraction. For every extracted star, entities that have attribute mappings to each of the properties of the star are determined. Such entities are relevant to the star" Now, suppose that we have the following star shaped BGP:
?k bibo:isbn ?i .
?k dc:title ?t.


and the two following mappings:

<#ISBNMapping>
rml:logicalSource [
rml:source "bibo";
];
rr:subjectMap [
rr:template "{iri}";
rr:class schema:Book
];

rr:predicateObjectMap [
rr:predicate bibo:isbn;
rr:objectMap [rml:reference "isbn"]
];

<#TitleMaping>
rml:logicalSource [
rml:source "dc";
];
rr:subjectMap [
rr:template "{iri}";
rr:class schema:Book
];

rr:predicateObjectMap [
rr:predicate dc:title;
rr:objectMap [rml:reference "title"]
];


Since none of the entity mapping bibo and dc have attributes mappings for both bibo:isbn and dc:title, then there is no entity relevant for the query star. But, we can imagine that there are possible joins between bibo and dc on the column iri. Hence, in this context a SPARQL query with the clauses defined by above BGP could have answers in an such OBDA system. Why proposed answering approach seems to returns always empty answer ? A clear definition of query answers with respect to the data source and mapping seems necessary.

• 2.3 (3) typo: "Data Like" instead of "Data Lake" ?
• Figure 2 unnecessary bold font for rdf:type.
• 2.4 typo in the third sentence: "translates" -> "translate"

#### Evaluation

• In the evaluation, the authors say that they stores "the same data (experiments data) in a relational MySQL database". They also say that they record the time in MySQL. It would have been interesting to have the time comparison with Presto and Spark-based Squerall in the paper or the github depository.
• 5 queries (Q1, Q3, Q4, Q5 and Q8) over 10 use LIMIT keywords and limits answers number to at most 10. This chose is never mentioned in the paper and it really affects the query execution time ! No surprise then that this queries scales.
• Vertical axis tick number in Figure 3 (a) and (c) should be displayed on one line for readability.

# Bibliography

• [Bagetrulesexistentialvariables2011] Baget, Lecl\`ere, Mugnier & Salvat, On Rules with Existential Variables: Walking the Decidability Line, Artificial Intelligence, 175(9), 1620-1654 (2011). doi.
• [BotoevaGeneralizedFrameworkOntologybased] Botoeva, Calvanese, Cogrel, Corman & Xiao, A Generalized Framework for Ontology-Based Data Access, , 13 .
• [SequedaOBDAQueryRewriting2014] @incollectionSequedaOBDAQueryRewriting2014, address = Cham, title = OBDA: Query Rewriting or Materialization? In Practice, Both!, volume = 8796, isbn = 978-3-319-11963-2 978-3-319-11964-9, shorttitle = OBDA, abstract = Given a source relational database, a target OWL ontology and a mapping from the source database to the target ontology, Ontology-Based Data Access (OBDA) concerns answering queries over the target ontology using these three components. This paper presents the development of UltrawrapOBDA, an OBDA system comprising bidirectional evaluation; that is, a hybridization of query rewriting and materialization. We observe that by compiling the ontological entailments as mappings, implementing the mappings as SQL views and materializing a subset of the views, the underlying SQL optimizer is able to reduce the execution time of a SPARQL query by rewriting the query in terms of the views specified by the mappings. To the best of our knowledge, this is the first OBDA system supporting ontologies with transitivity by using SQL recursion. Our contributions include: (1) an efficient algorithm to compile ontological entailments as mappings; (2) a proof that every SPARQL query can be rewritten into a SQL query in the context of mappings; (3) a cost model to determine which views to materialize to attain the fastest execution time; and (4) an empirical evaluation comparing with a state-of-the-art OBDA system, which validates the cost model and demonstrates favorable execution times., language = en, booktitle = The Semantic Web \textendash ISWC 2014, publisher = Springer International Publishing, author = Sequeda, Juan F. and Arenas, Marcelo and Miranker, Daniel P., editor = Mika, Peter and Tudorache, Tania and Bernstein, Abraham and Welty, Chris and Knoblock, Craig and Vrande\v ci\'c, Denny and Groth, Paul and Noy, Natasha and Janowicz, Krzysztof and Goble, Carole, year = 2014, pages = 535-551, file = /home/mburon/documents/zotero/storage/HJS7NG46/Sequeda et al. - 2014 - OBDA Query Rewriting or Materialization In Pract.pdf, doi = 10.1007/978-3-319-11964-9_34
• [SequedaOBDAQueryRewriting2014a] Sequeda, Arenas & Miranker, OBDA: Query Rewriting or Materialization? In Practice, Both!, 535-551, in in: The Semantic Web \textendash ISWC 2014, edited by Mika, Tudorache, Bernstein, Welty, Knoblock, Vrande\v ci\'c, Groth, Noy, Janowicz & Goble, Springer International Publishing (2014)
• [BizerBerlinSPARQLBenchmark] Bizer & Schultz, The Berlin SPARQL Benchmark, , 28 .
• [MontoyaSemLAVLocalAsViewMediation2014] @incollectionMontoyaSemLAVLocalAsViewMediation2014, address = Berlin, Heidelberg, series = Lecture Notes in Computer Science, title = SemLAV: Local-As-View Mediation for SPARQL Queries, isbn = 978-3-642-54426-2, shorttitle = SemLAV, abstract = The Local-As-View (LAV) integration approach aims at querying heterogeneous data in dynamic environments. In LAV, data sources are described as views over a global schema which is used to pose queries. Query processing requires to generate and execute query rewritings, but for SPARQL queries, the LAV query rewritings may not be generated or executed in a reasonable time.In this paper, we present SemLAV, an alternative technique to process SPARQL queries over a LAV integration system without generating rewritings. SemLAV executes the query against a partial instance of the global schema which is built on-the-fly with data from the relevant views. The paper presents an experimental study for SemLAV, and compares its performance with traditional LAV-based query processing techniques. The results suggest that SemLAV scales up to SPARQL queries even over a large number of views, while it significantly outperforms traditional solutions., language = en, booktitle = Transactions on Large-Scale Data- and Knowledge-Centered Systems XIII, publisher = Springer Berlin Heidelberg, author = Montoya, Gabriela and Ib\'a\~nez, Luis-Daniel and Skaf-Molli, Hala and Molli, Pascal and Vidal, Maria-Esther, editor = Hameurlain, Abdelkader and K\"ung, Josef and Wagner, Roland, year = 2014, keywords = Data integration,Semantic Web,Local-as-view,SPARQL query, pages = 33-58, file = /home/mburon/documents/zotero/storage/8KPCQ9QZ/Montoya et al_2014_SemLAV.pdf, doi = 10.1007/978-3-642-54426-2_2