Planet RDF

It's triples all the way down

February 01

AKSW Group - University of Leipzig: AKSW Colloquium, 01.02.2016, Co-evolution of RDF Datasets

Natanael ArndtAt the todays colloquium, Natanael Arndt will discuss the the paper “Co-evolution of RDF Dataset” by Sidra Faisal, Kemele M. Endris, Saeedeh Shekarpour and Sören Auer (2016, available on arXiv)

Link: http://arxiv.org/abs/1601.05270v1

Abstract: For many use cases it is not feasible to access RDF data in a truly federated fashion. For consistency, latency and performance reasons data needs to be replicated in order to be used locally. However, both a replica and its origin dataset undergo changes over time. The concept of co-evolution refers to mutual propagation of the changes between a replica and its origin dataset. The co-evolution process addresses synchronization and conflict resolution issues. In this article, we initially provide formal definitions of all the concepts required for realizing co-evolution of RDF datasets. Then, we propose a methodology to address the co-evolution of RDF datasets. We rely on a property-oriented approach for employing the most suitable strategy or functionality. This methodology was implemented and tested for a number of different scenarios. The result of our experimental study shows the performance and robustness aspect of this methodology.

Posted at 14:53

AKSW Group - University of Leipzig: Holographic Embeddings of Knowledge Graphs

During the upcoming colloquium, Nilesh Chakraborty will give a short introduction on factorising RDF tensors and present a paper on “Holographic Embeddings of Knowledge Graphs”:

Holographic Embeddings of Knowledge Graphs

Authors: Maximilian Nickel, Lorenzo Rosasco, Tomaso Poggio
Abstract: Learning embeddings of entities and relations is an efficient and versatile method to perform machine learning on relational data such as knowledge graphs. In this work, we propose holographic embeddings (HolE) to learn compositional vector space representations of entire knowledge graphs. The proposed method is related to holographic models of associative memory in that it employs circular correlation to create compositional representations. By using correlation as the compositional operator HolE can capture rich interactions but simultaneously remains efficient to compute, easy to train, and scalable to very large datasets. In extensive experiments we show that holographic embeddings are able to outperform state-of-the-art methods for link prediction in knowledge graphs and relational learning benchmark datasets.

Posted at 13:32

January 25

AKSW Group - University of Leipzig: AKSW Colloquium, 25.01.2016, LargeRDFBench and Introduction To The Docker Ecosystem

On the upcoming colloquium, Muhammad Saleem will present his paper “LargeRDFBench: A Billion Triples Benchmark for SPARQL Endpoint Federation” about the benchmarking of federated SPARQL endpoints. The other talk will be an introduction to the Docker ecosystem by Tim Ermilov.

LargeRDFBench: A Billion Triples Benchmark for SPARQL Endpoint Federation

Authors: Muhammad Saleem, Ali Hasnain, Axel Ngonga
Abstract. Gathering information from the Web of Data is commonly carried out by using SPARQL query federation approaches. However, the fitness of current SPARQL query federation approaches for real applications is difficult to evaluate with current benchmarks as they are either synthetic, too small in size and complexity or do not provide means for a fine-grained evaluation. We propose LargeRDFBench, a billion-triple benchmark for SPARQL query federation which encompasses real data as well as real queries pertaining to real bio-medical use cases. We evaluate state-of-the-art SPARQL endpoint federation approaches on this benchmark with respect to their query runtime, triple pattern-wise source selection, result completeness and correctness. Our evaluation results suggest that the performance of current SPARQL query federation systems on simple queries (in terms of total triple patterns, query result set sizes, execution time, use of SPARQL features etc.) does not reflect the systems’ performance on more complex queries. Moreover, current federation systems seem unable to deal with real queries that involve processing large intermediate result sets or lead to large result sets.

Introduction To The Docker Ecosystem

Presented by: Tim Ermilov
Slides are available online

On the upcoming colloquium, Muhammad Saleem will present his paper “LargeRDFBench: A Billion Triples Benchmark for SPARQL Endpoint Federation” about the benchmarking of federated SPARQL endpoints. The other talk will be an introduction to the Docker ecosystem by Tim Ermilov.

LargeRDFBench: A Billion Triples Benchmark for SPARQL Endpoint Federation

Authors: Muhammad Saleem, Ali Hasnain, Axel Ngonga
Abstract. Gathering information from the Web of Data is commonly carried out by using SPARQL query federation approaches. However, the fitness of current SPARQL query federation approaches for real applications is difficult to evaluate with current benchmarks as they are either synthetic, too small in size and complexity or do not provide means for a fine-grained evaluation. We propose LargeRDFBench, a billion-triple benchmark for SPARQL query federation which encompasses real data as well as real queries pertaining to real bio-medical use cases. We evaluate state-of-the-art SPARQL endpoint federation approaches on this benchmark with respect to their query runtime, triple pattern-wise source selection, result completeness and correctness. Our evaluation results suggest that the performance of current SPARQL query federation systems on simple queries (in terms of total triple patterns, query result set sizes, execution time, use of SPARQL features etc.) does not reflect the systems’ performance on more complex queries. Moreover, current federation systems seem unable to deal with real queries that involve processing large intermediate result sets or lead to large result sets.

Introduction To The Docker Ecosystem

Presented by: Tim Ermilov
Slides are available online.

Each talk will last for 20 minutes. The audience will have 10 minutes to ask questions. There will be cookies and coffee break after the talks for discussion as well.

 

Posted at 13:39

January 22

AKSW Group - University of Leipzig: HOBBIT project kick-off

HOBBIT, a new InfAI project within the EU’s “Horizon 2020″ framework program kicked-off in Luxembourg on 18 and 19 january in 2016.

The main goal of the HOBBIT project (@hobbit_project on Twitter) is to benchmark linked and big data systems and assess their performance using industry-relevant key performance indicators. To achieve this goal, the project develops 1) a holistic open-source platform and 2) eight industry-grade benchmarks for systems of different parts of the linked data lifecycle. These benchmarks will contain datasets based on industry-related, real-world data and can be scaled up to evaluate even Big Data solutions.

Our partners in this project are iMinds, AGT Group R&D GmbH, Fraunhofer IAIS, USU Software AG, Foundation for Research & Technology – Hellas (FORTH), National Center for Scientific Research “Demokritos” (NCSR), OpenLink Software, TomTom and Ontos AG.

Please continue reading Press Release “New EU project develops a platform for benchmarking large linked datasets by University of Leipzig Press. The Text is also available in English.

Find out more at http://project-hobbit.eu/ and by following us (@hobbit_project) on Twitter.

This project has received funding from the European Union’s H2020 research and innovation action program under grant agreement number 688227.

HOBBIT ProjectEC-H2020

Posted at 13:52

January 17

Bob DuCharme: The past and present of hypertext

You know, links in the middle of sentences.

Posted at 15:58

January 14

AKSW Group - University of Leipzig: AKSW Colloquium, 18.01.2016, Natural Language Processing and Question Answering

On the upcoming colloquium, Ivan Ermilov and Konrad Höffner, members of AKSW, will present two papers from the natural language processing (NLP) and Question Answering (QA) research areas.

ClausIE: Clause-Based Open Information Extraction

Authors. Del Corro, Luciano, and Rainer Gemulla.
Abstract. We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs from previous approaches in that it separates the detection of “useful” pieces of information expressed in a sentence from their representation in terms of extractions. In more detail, ClausIE exploits linguistic knowledge about the grammar of the English language to first detect clauses in an input sentence and to subsequently identify the type of each clause according to the grammatical function of its constituents. Based on this information, ClausIE is able to generate high-precision extractions; the representation of these extractions can be flexibly customized to the underlying application. ClausIE is based on dependency parsing and a small set of domain-independent lexica, operates sentence by sentence without any post-processing, and requires no training data (whether labeled or unlabeled). Our experimental study on various real-world datasets suggests that ClausIE obtains higher recall and higher precision than existing approaches, both on high-quality text as well as on noisy text as found in the web.

A Joint Model for Question Answering over Multiple Knowledge Bases

Authors. Zhang, Yuanzhe, et al.
Abstract. As the amount of knowledge bases (KBs) grows rapidly, the problem of question answering (QA) over multiple KBs has drawn more attention. The most significant distinction between multiple KB-QA and single KB-QA is that the former must consider the alignments between KBs. The pipeline strategy first constructs the alignments independently, and then uses the obtained alignments to construct queries. However, alignment construction is not a trivial task, and the introduced noises would be passed on to query construction. By contrast, we notice that alignment construction and query construction are interactive steps, and jointly considering them would be beneficial. To this end, we present a novel joint model based on integer linear programming (ILP), uniting these two procedures into a uniform framework. The experimental results demonstrate that the proposed approach outperforms stateof-the-art systems, and is able to improve the performance of both alignment construction and query construction.

Each talk will last for 20 minutes. The audience will have 10 minutes to ask questions. There will be cookies and coffee break after the talks for discussion as well.

Posted at 09:25

January 13

AKSW Group - University of Leipzig: LinkedGeoData: New RDF versions of OpenStreetMap datasets available

The AKSW research group is happy to announce that a new LinkedGeoData maintenance release with more than 1.2 billion triples based on the OpenStreetMap planet file from 2015-11-02 is now online. Enjoy!

Quick Links

Posted at 15:07

January 11

Orri Erling: New Semantic Publishing Benchmark Record

There is a new SPB (Semantic Publishing Benchmark) 256 Mtriple record with Virtuoso.

As before, the result has been measured with the feature/analytics branch of the v7fasttrack open source distribution, and it will soon be available as a preconfigured Amazon EC2 image. The updated benchmarks AMI with this version of the software will be out there within the next week, to be announced on this blog.

On the Cost of RDF Query Optimization

RDF query optimization is harder than the relational equivalent; first, because there are more joins, hence an NP complete explosion of plan search space, and second, because cardinality estimation is harder and usually less reliable. The work on characteristic sets, pioneered by Thomas Neumann in RDF3X, uses regularities in structure for treating properties usually occurring in the same subject as columns of a table. The same idea is applied for tuning physical representation in the joint Virtuoso / MonetDB work published at WWW 2015.

The Virtuoso results discussed here, however, are all based on a single RDF quad table with Virtuoso's default index configuration.

Introducing query plan caching raises the Virtuoso score from 80 qps to 144 qps at the 256 Mtriple scale. The SPB queries are not extremely complex; lookups with many more triple patterns exist in actual workloads, e.g., Open PHACTS. In such applications, query optimization indeed dominates execution times. In SPB, data volumes touched by queries grow near linearly with data scale. At the 256 Mtriple scale, nearly half of CPU cycles are spent deciding a query plan. Below are the CPU cycles for execution and compilation per query type, sorted by descending sum of the times, scaled to milliseconds per execution. These are taken from a one minute sample of running at full throughput.

Test system is the same used before in the TPC-H series: dual Xeon E5-2630 Sandy Bridge, 2 x 6 cores x 2 threads, 2.3GHz, 192 GB RAM.

We measure the compile and execute times, with and without using hash join. When considering hash join, the throughput is 80 qps. When not considering hash join, the throughput is 110 qps. With query plan caching, the throughput is 145 qps whether or not hash join is considered. Using hash join is not significant for the workload but considering its use in query optimization leads to significant extra work.

With hash join

Compile Execute Total Query
3156 ms 1181 ms 4337 ms Total
1327 ms 28 ms 1355 ms query 01
444 ms 460 ms 904 ms query 08
466 ms 54 ms 520 ms query 06
123 ms 268 ms 391 ms query 05
257 ms 5 ms 262 ms query 11
191 ms 59 ms 250 ms query 10
9 ms 179 ms 188 ms query 04
114 ms 26 ms 140 ms query 07
46 ms 62 ms 108 ms query 09
71 ms 25 ms 96 ms query 12
61 ms 13 ms 74 ms query 03
47 ms 2 ms 49 ms query 02
       

Without hash join

Compile Execute Total Query
1816 ms 1019 ms 2835 ms Total
197 ms 466 ms 663 ms query 08
609 ms 32 ms 641 ms query 01
188 ms 293 ms 481 ms query 05
275 ms 61 ms 336 ms query 09
163 ms 10 ms 173 ms query 03
128 ms 38 ms 166 ms query 10
102 ms 5 ms 107 ms query 11
63 ms 27 ms 90 ms query 12
24 ms 57 ms 81 ms query 06
47 ms 1 ms 48 ms query 02
15 ms 24 ms 39 ms query 07
5 ms 5 ms 10 ms query 04

Considering hash join always slows down compilation, and sometimes improves and sometimes worsens execution. Some improvement in cost-model and plan-space traversal-order is possible, but altogether removing compilation via caching is better still. The results are as expected, since a lookup workload such as SPB has little use for hash join by nature.

The rationale for considering hash join in the first place is that analytical workloads rely heavily on this. A good TPC-H score is simply unfeasible without this as previously discussed on this blog. If RDF is to be a serious contender beyond serving lookups, then hash join is indispensable. The decision for using this however depends on accurate cardinality estimates on either side of the join.

Previous work (e.g., papers from FORTH around MonetDB) advocates doing away with a cost model altogether, since one is hard and unreliable with RDF anyway. The idea is not without its attraction but will lead to missing out of analytics or to relying on query hints for hash join.

The present Virtuoso thinking is that going to rule based optimization is not the preferred solution, but rather using characteristic sets for reducing triples into wider tables, which also cuts down on plan search space and increases reliability of cost estimation.

When looking at execution alone, we see that actual database operations are low in the profile, with memory management taking the top 19%. This is due to CONSTRUCT queries allocating small blocks for returning graphs, which is entirely avoidable.

Posted at 15:22

Dublin Core Metadata Initiative: Webinar: Creating Content Intelligence: Harmonized Taxonomy and Metadata in the Enterprise Context

2016-01-11, Many organizations have content dispersed across multiple independent repositories, often with a real lack of metadata consistency. The attention given to enterprise data is often not extended to unstructured content, widening the gap between the two worlds and making it near impossible to provide accurate business intelligence, good user experience, or even basic findability. How do you bring all those disparate efforts together to create content intelligence across the organization? This webinar will describe the benefits and challenges in developing metadata and taxonomy across multiple functional areas, creating a unified Enterprise Content Architecture (ECA). Hear Stephanie Lemieux, President and Primary Consultant at Dovecot Studio, talk about real enterprise metadata and taxonomy harmonization projects in different contexts, including a greeting card company, a media company, an automotive manufacturer and a consumer food manufacturer. See how they worked to harmonize across a number of diverse systems that supported multiple functions, from creative processes to manufacturing to reporting. For more information about the webinar and to register, go to http://dublincore.org/resources/training/#2016lemieux.

Posted at 08:06

Dublin Core Metadata Initiative: DC-2016 Call of Participation published

2016-01-11, The DC-2016 Call for Participation (CfP) has been published. DC-2016 will take place in Copenhagen and will be collocated with the ASIS&T Annual Meeting. The conference program will include a Technical Program of peer reviewed papers, project reports, and posters tracks. The Professional Program will include special sessions and panels, tutorials, workshops and best practice posters & demonstrations tracks. The Conference Committee is seeking submissions in all tracks. The CfP is published at http://dcevents.dublincore.org/index.php/IntConf/dc-2016/schedConf/cfp.

Posted at 08:06

Dublin Core Metadata Initiative: Valentine Charles and Lars G. Svensson named DC-2016 Technical Program Co-Chairs

2016-01-11, DCMI is pleased to announce that Valentine Charles, Europeana, and Lars G. Svensson, Deutsche Nationalbibliothek, have agreed to serve as Co-Chairs of the DC-2016 Technical Program. In their capacity as Co-Chairs, Valentine and Lars with oversee the peer review processes for the conference. The DC-2016 Conference website is open at http://dcevents.dublincore.org/IntConf/dc-2016 and the Call for Participation has been published at http://dcevents.dublincore.org/index.php/IntConf/dc-2016/schedConf/cfp.

Posted at 08:06

January 01

W3C Read Write Web Community Group: Read Write Web — Q4 Summary — 2015

Summary

A quiet end to the year in terms of discussions, but lots of work going on in implementations.  Perhaps this is a sign that read write standards for the web are entering a maturation process and 2016 will be a year of using them to see what they can do.

Many of the participants of this group attended TPAC 2015, in Sapporo Japan, and it was by all accounts it was a very exciting experience, with the W3C moving towards working groups in Payments, among other things.  A good wrapup of W3C data activity included: data on the web best practices, spatial data and CSV.

Most of the activity I noticed this quarter was oriented towards the maturing Solid specification, which has been reorganized into logical sections (thanks Amy!) and I’ve presented is a small gitbook.

Communications and Outreach

Presentations on read write standards were given at the redecentralize conference.  One video is available here and all the talks on the Solid platform are now in a github archive.

 

Community Group

A quiet quarter on the mailing list, as I think more people are devoting time to implementations, perhaps 2016 could be a good chance for feedback as to which items the group would like to focus on.  There was also one interesting post on open badges.

solid

Applications

Work on applications has picked up considerably this quarter, with much of the focus on client side javascript apps for the Solid platform.  A great new library for building Solid apps is available now and called, solid.js.  Additionally I have tried to put together some basic tutorials for getting started with apps.

Two great apps built using this library are called solid-inbox and plume.  Solid-inbox is a tool to let you see the items in your “inbox” which is an area that users have designed for notifications.  Plume, a pet blogging project, allows you to create rudimentary blogs on using Solid standards.

A new tool for writing Solid documents, dokie.li, is progressing well.  Originally designed to author academic papers, it is becoming more generic to allow any kind of document to be authored using linked data.

Signup and identity providers have been improved and it is now possible to add your own identity provider to create a more diverse system of decentralized web identity.  This can be done by running one of the Solid servers on your own machine, or by creating your own fork!

More great work from openlink in the form the structured data sniffer which turns much of the existing web into structured data.

I’ve been personally working on an proof of concept, alpha version of a social network that implements Solid called, solid.social.  Some rough notes also accompany the site, that provide some screenshots and hopefully an idea of the direction things can go.  Other than that, some basic command line utilities for reading and writing, were also written in the form of rdf-shell.

co-operating

Last but not Least…

A new startup supporting decentralized read write technology co-operating systems was launched.  From the site:

“We envision a web where we can use applications tailor-made for each of us. These applications will navigate linked data seamlessly across organisational boundaries. They will allow us to choose where to host our information, with whom we share it, and how we identify ourselves. This will create a distributed social web which will foster innovative ideas, help transform them into projects and allow us to share resources securely.”

Looking forward to further updates on this work!

Posted at 21:33

December 30

Libby Miller: Olimex ESP 8266 dev with Arduino IDE

Bits and pieces of this are everywhere but I’ve not found it all in one place. The

Posted at 20:27

December 28

Libby Miller: A cheap BTLE button with Android

Thanks to the marvellous 

Posted at 12:38

December 20

Bob DuCharme: My new job

Lots of cutting edge technologies, 18 minutes from my home.

Posted at 14:27

December 18

Leigh Dodds: Digital public institutions for the information commons?

I’ve been thinking a bit about “the commons” recently. Specifically, the global information commons that is enabled and supported by Creative Commons (CC) licences. This covers an increasingly wide variety of content as

Posted at 18:58

December 17

W3C Data Activity: End of Year Bonanza!

Three of our data-centric Working Groups have rounded off their year and published new documents today. First of all, congratulations are due to the CSV on the Web Working Group whose work has reached Recommendation status. That means they have … Continue reading

Posted at 17:09

Ebiquity research group UMBC: UCO: A Unified Cybersecurity Ontology

Unified Cybersecurity Ontology

Zareen Syed, Ankur Padia, Tim Finin, Lisa Mathews and Anupam Joshi, UCO: Unified Cybersecurity Ontology, AAAI Workshop on Artificial Intelligence for Cyber Security (AICS), February 2016.

In this paper we describe the Unified Cybersecurity Ontology (UCO) that is intended to support information integration and cyber situational awareness in cybersecurity systems. The ontology incorporates and integrates heterogeneous data and knowledge schemas from different cybersecurity systems and most commonly used cybersecurity standards for information sharing and exchange. The UCO ontology has also been mapped to a number of existing cybersecurity ontologies as well as concepts in the Linked Open Data cloud. Similar to DBpedia which serves as the core for general knowledge in Linked Open Data cloud, we envision UCO to serve as the core for cybersecurity domain, which would evolve and grow with the passage of time with additional cybersecurity data sets as they become available. We also present a prototype system and concrete use cases supported by the UCO ontology. To the best of our knowledge, this is the first cybersecurity ontology that has been mapped to general world ontologies to support broader and diverse security use cases. We compare the resulting ontology with previous efforts, discuss its strengths and limitations, and describe potential future work directions.

Posted at 02:01

December 13

AKSW Group - University of Leipzig: AKSW Colloquium, 14-12-2015, SERIMI

Mofeed HassanIn the incoming AKSW Colloquium, scheduled for the 14th of December at 3 PM, Mofeed Hassan will present the paper “SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets”, by Samur Araujo et al.

Abstract

State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the target dataset. Direct matching is not suitable when the overlap between the datasets is small. Aiming at resolving this problem, we propose a new paradigm called class-based matching. Given a class of instances from the source dataset, called the class of interest, and a set of candidate matches retrieved from the target, class-based matching refines the candidates by filtering out those that do not belong to the class of interest. For this refinement, only data in the target is used, i.e., no direct comparison between source and target is involved. Based on extensive experiments using public benchmarks, we show our approach greatly improves the quality of state-of-the-art systems; especially on difficult matching tasks.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 21:18

Copyright of the postings is owned by the original blog authors. Contact us.