Planet RDF

It's triples all the way down

July 23

Leigh Dodds: Reputation data portability

Yesterday I went to the

Posted at 11:24

July 18

AKSW Group - University of Leipzig: AKSW Colloquium, 18.07.2016, AEGLE and node2vec

On Monday 18.07.2016, Kleanthi Georgala will give her Colloquium presentation for her paper “An Efficient Approach for the Generation of Allen Relations”, that was accepted at the European Conference on Artificial Intelligence (ECAI) 2016.


Event data is increasingly being represented according to the Linked Data principles. The need for large-scale machine learning on data represented in this format has thus led to the need for efficient approaches to compute RDF links between resources based on their temporal properties. Time-efficient approaches for computing links between RDF resources have been developed over the last years. However, dedicated approaches for linking resources based on temporal relations have been paid little attention to. In this paper, we address this research gap by presenting AEGLE, a novel approach for the efficient computation of links between events according to Allen’s interval algebra. We study Allen’s relations and show that we can reduce all thirteen relations to eight simpler relations. We then present an efficient algorithm with a complexity of O(n log n) for computing these eight relations. Our evaluation of the runtime of our algorithms shows that we outperform the state of the art by up to 4 orders of magnitude while maintaining a precision and a recall of 1.

Tommaso SoruAfterwards, Tommaso Soru will present a paper considered the latest chapter of the Everything-2-Vec saga, which encompasses outstanding works such as Word2Vec and Doc2Vec. The paper title is node2vec: Scalable Feature Learning for Networks” [PDF] by Aditya Grover and Jure Leskovec, accepted for publication at the International Conference on Knowledge Discovery and Data Mining (KDD), 2016 edition.

Posted at 12:56

July 11

Dublin Core Metadata Initiative: FINAL call for DC-2016 Presentations and Best Practice Posters/Demos

2016-07-11, The submission deadline of 15 July is rapidly approaching for the Presentations and Best Practice Posters and Demos tracks at DC-2016. Both presentations and posters/demos provide the opportunity to practitioners and researchers specializing in metadata design, implementation, and use to present their work at the International Conference on Dublin Core and Metadata Applications in Copenhagen. No paper is required for presentations or posters/demos. Accepted submissions in the Presentations track will have approximately 20-25 minutes to present and 5-10 minutes for questions and discussion. Proposal abstracts will be reviewed for selection by the Program Committee. The presentation slide decks and the poster images will be openly available as part of the permanent record of the DC-2016 conference. If you are interested in presenting at DC-2016, please submit a proposal abstract through the DC-2016 submission system before the 15 July deadline at For a fuller description of the Presentations track, see

Posted at 23:59

Dublin Core Metadata Initiative: Dublin Core at 21

2016-07-11, 2016-07-11, Announcing an IFLA Satellite event: Friday, August 19 at OCLC in Dublin, OH. The Dublin Core originated in 1995 at a meeting at OCLC (in the very room where this IFLA Satellite event will take place). This special event will bring an historical view of DCMI's history through people who were there when the Web was young and Dublin Core was new and evolving rapidly. Presenters will include metadata experts with long ties to Dublin Core including several who were at the original invitational meeting in 1995. A panel discussion will permit speakers to reflect on activities and trends past and present, and project what the future will look like. Attendees are invited to attend a complimentary reception and special unveiling following the presentation portion of the day. The event is being sponsored by the IFLA Information Technology Section and the Dublin Core Metadata Initiative with support from OCLC. Additional information and registration is available at

Posted at 23:59

July 10

Egon Willighagen: Setting up a local SPARQL endpoint

... has never been easier, and I have to say, with Virtuoso it already was easy.

Step 1: download the jar and fire up the server
OK, you do need Java installed, and for many this is still the case, despite Oracle doing their very best to totally ruin it for everyone. But seriously, visit the Blazegraph website (@blazegraph) and download the jar and type:

$ java -jar blazegraph.jar

It will give some output on the console, including a webpage with SPARQL endpoint, upload form etc.

That it tracks past queries is a nice extra.

Step 2: there is no step two

Step 3: OK, OK, you also want to try a SPARQL from the command line
Now, I have to say, the webpage does not have a "Download CSV" button on the SPARQL endpoint. That would be great, but doing so from the command line is not too hard either.

$ curl -i -H "Accept: text/csv" --data-urlencode \

But it would be nice if you would not have to copy/paste the query into a file, or go to the command line in the first place. Also, I had some trouble finding the correct SPARQL endpoint URL, as it seems to have changed at least twice in recent history, given the (outdated) documentation I found online (common problem; no complaint!).

HT to Andra who first mentioned Blazegraph to me, and the Blazegraph team.

Posted at 18:57

July 02

Libby Miller: Working from home

A colleague asked me about my experiences working from home so I’ve made a few notes here.

I’m unusual in my department in that I work from home three or four days a week, and one or two in London, or very occasionally Salford. I started off in this job on an EU-funded project where everyone was remote, and so it made little difference where I was physically as long as we synced up regularly. Since then I’ve worked on multiple other projects where the other participants are mostly in one place and I’m elsewhere. That’s made it more difficult, but also, sometimes, better.

A buddy

Where everyone else is in one place, the main thing I need to function well is one or more buddies who are physically there, who remember to call me in for meetings and let me know anything significant that’s happening that I’m missing because I’m not physically there. The first of these is the most important. Being remote you are easily forgettable. Without Andrew, Dan, Joanne, Tristan, and now Henry and Tim, I’d sometimes be left out.

IRC or slack

I’ve used IRC for years for various remote things (we used to do “scheduled topic chats” 15 year ago on freenode for various Semantic Web topics), the various bots that keep you informed and help you share information easily – loggers and @Edd’s “chump” in particular, but also #swhack bots of many interesting kinds. I learned a huge amount from friends in W3C who are mostly remote from each other and have made lots of tools and bots for helping them manage conference calls for many years.

Recently our team have started using slack as well as irc, so now I’m on both: Slack means that a much more diverse set of people are happy to participate, which is great. It can be very boring working on your own, and these channels make for a sense of community, as well as being useful for specific timely exchanges of information.

Lots of time on organisation

I spend a lot of time figuring out where I need to be and making decisions about what’s most important, and what needs to be face to face and what can be a call. Also: trying to figure out how annoying I’m going to be to the other people in a meeting, and whether I’m going to be able to contribute successfully, or whether it’s best to skip it. I’ve had to learn to ignore the fomo.

I have a text based todo list, which can get a little out of control, but in general has high level goals for this week and next, goals for the day, as well as specific tasks that need to be done on any particular day or a particular time. I spend a little time each morning figuring these out, and making sure I have a good sense of my calendar (Dan Connolly taught me to do this!). In general, juggling urgent and project-managery and less-urgent exploratory work is difficult and I probably don’t do enough of the latter (and I probably don’t look far enough ahead, either). I sometimes schedule my day quite concretely with tasks at specific times to make sure I devote thinking time for specific problems, or when I have a ton to do, or a lot of task switching.

Making an effort not to work

Working at home means I could work any time, and having an interesting job means that I’d probably quite enjoy it, too. There’s a temptation to do the boring admin stuff in work and leave the fun stuff until things are quieter in the evenings or at the weekend. But I make an effort not to do this, and it helps that the team I work in don’t work late or at weekends. This is a good thing. We need downtime or we’ll get depleted (I did in my last job, a startup, where I also worked at home most of the time, and where we were across multiple timezones).

Weekends are fairly easy to not work in, evenings are harder, so I schedule other things to do where possible (Bristol Hackspace, cinema, watching something specific on TV, other technical personal projects).

Sometimes you just have to be there

I’m pretty good at doing meetings remotely but we do a lot of workshops which involve getting up and doing things, writing things down on whiteboards etc. I also chair a regular meeting that I feel works better if I’m there. When I need to be there a few days, I’m lucky enough to be able to stay with some lovely friends, which means its a pleasure rather than being annoying and boring to not be at home.

What I miss and downsides

What I miss is the unscheduled time working or just hanging out with people. When I’m in London my time is usually completely scheduled, which is pretty knackering. Socialising gets crammed into short trips to the pub. The commute means I lose my evening at least once a week and sometimes arrive at work filled with train-rage (I guess the latter is normal for anyone who commutes by rail).

Not being in the same place as everything day to day means that I miss some of the up-and down-sides of being physically there, which are mostly about spontaneity: I never get included in ad-hoc meetings, so have more time to concentrate but also miss some interesting things; I don’t get distracted (by fun or not-fun) things, including bad moods in the organisation, gossip, but also impromptu games, fun trips out etc etc.

And finally…

For me, working from home in various capacities has given me opportunities I’d never have had, and I’m very lucky to be able to do it in my current role.

Posted at 17:11

Egon Willighagen: Two Apache Jena SPARQL query performance observations

Doing searches in RDF stores is commonly done with SPARQL queries. I have been using this with the semantic web translation of WikiPathways by Andra to find common content issues, though sometimes combined with some additional Java code. For example, find PubMed identifiers that are not numbers.

Based on Ryan's work on interactions, a more complex curation query I recently wrote in reply to issues that Alex ran into with converting pathways to BioPax, is to find interactions that convert a gene to another gene. Such occurred in WikiPathways because graphically you do not see the difference. I originally had this query:

SELECT (str(?organismName) as ?organism) ?page
       ?gene1 ?gene2 ?interaction
  ?gene1 a wp:GeneProduct .
  ?gene2 a wp:GeneProduct .
  ?interaction wp:source ?gene1 ;
    wp:target ?gene2 ;
    a wp:Conversion ;
    dcterms:isPartOf ?pathway .
  ?pathway foaf:page ?page ;
    wp:organismName ?organismName .
} ORDER BY ASC(?organism)

This query properly found all gene-gene conversions to be fixed. However, it was also horribly slow with my JUnit/Apache Jena set up. The queries runs very efficiently on the Virtuoso-based SPARQL end point. I had been trying to speed it up in the past, but without much success. Instead, I ended up batching the testing on our Jenkins instance. But this got a bit silly, with at some point subsets of less than 100 pathways.

Observation #1
So, I turned to twitter, and quite soon got three useful leads. The first two suggestions did not help, but helped me rule out the problem. Of course, there is literature about optimizing, like this recent paper by Antonis (doi:10.1016/j.websem.2014.11.003), but I haven't been able to convert this knowledge into practical steps either. After ruling out these options (though I kept the sameTerm() suggestion), and realized it had to be the first two triples with the variables ?gene1 and ?gene2. So, I tried using FILTER there too, resulting with this query:

  ?interaction wp:source ?gene1 ;
    wp:target ?gene2 ;
    a wp:Conversion ;
    dcterms:isPartOf ?pathway .
  ?pathway foaf:page ?page ;
    wp:organismName ?organismName .
  FILTER (!sameTerm(?gene1, ?gene2))
  FILTER (?gene1 a wp:GeneProduct)
  FILTER (?gene2 a wp:GeneProduct)
} ORDER BY ASC(?organism)

That did it! The time to run a query halved. Not so surprising, in retrospect, but it all depends on the SPARQL engine: which parts does it run first. Apparently, Jena's SPARQL engine starts at the top. This seems to be confirmed by the third comment I got. However, I always understood engine can also start at the bottom.

Observation #2
But that's not all. This speed up made me wonder something else. The problem clearly seems to engine approach to run parts of the query. So, what if I remove further choices in what to run first? That leads me to a second observation. It helps significantly if you reduce the number of subgraphs it should later "merge". Instead, if possible, use property paths. That again, about halved the runtime of the query. I ended up with the below query, which, obviously, no longer give me access to the pathway resources, but I can live with that:

  ?interaction wp:source ?gene1 ;
    wp:target ?gene2 ;
    a wp:Conversion ;
    dcterms:isPartOf/foaf:page ?pathway ;
    dcterms:isPartOf/wp:organismName ?organismName .
  FILTER (!sameTerm(?gene1, ?gene2))
  FILTER EXISTS {?gene1 a wp:GeneProduct}
  FILTER EXISTS {?gene2 a wp:GeneProduct}
} ORDER BY ASC(?organism)

I'm hoping these two observations may help other with using Apache Jena with unit and integrated testing of RDF generation too.

Loizou, A., Angles, R., Groth, P., Mar. 2015. On the formulation of performant SPARQL queries. Web Semantics: Science, Services and Agents on the World Wide Web 31, 1-26.

Posted at 13:07

July 01

W3C Read Write Web Community Group: Read Write Web — Q2 Summary — 2016


Decentralization is becoming more and more of a theme on the web, and this quarter witnessed the Decentralized Web Summit, in San Francisco.  Keynotes from Tim Berners-Lee and Vint Cerf are definitely worth checking out.

Some interesting work is coming up in as a verified claims working group has been proposed.  The editors draft is available for review.

In the Community Group, there has been a some discussion, but main focus is on apps, and also a specification for linked data notifications has been started.

Communications and Outreach

Aside from the Decentralized Web Summit, some folks attended the ID2020 summit which aims to help provide legal identifiers to everyone by 2030.  There was much interest there on the idea of decentralized identifiers.

There was also a session at WWW 2016 entitled Building Decentralized Applications on the Social Web.


Community Group

A new spec has been created called “Linked Data Notifications“.  The system combines the concept of an inbox with the idea sending notifications to users.  Please feel free to take a look at this work, provide feedback or raise issues.  Some light discussion on the mailing list and a few apps and demos have been released.  More below!



Lots of application work going on in the solid and linkeddata github repositories.  Solid.js has been renamed to solid client, and provides lots of useful features for dealing with solid servers.

The Solid connections UI is an app to help manage your social graph.  Work has begun on an improved solid profile manager. A new signup widget is also being worked on.

The tabulator project has been modularized and split up into various components and apps (aka panes) so that anyone can build and skin a “data browser” capable of working together with web 3.0.  Lots more interesting work in the individual repos.

I have done some work on the economic side release version 0.1 of webcredits, which provides a linked data based ledger for apps.  A demo of the functionality built on top can be seen at :  Im also happy to report that I have got solid node server running on my phone and it performs really well!


Last but not Least…

Nicola Greco has collected a cool set of papers described as a “reading list” : of articles or papers relevant to the work of Solid.  It is possible to drop your own paper in by adding a new issue.  Happy reading!


Posted at 16:41

Semantic Web Company (Austria): PoolParty at Connected Data London

ConnectedDataConnected Data London “brings together the Linked and Graph Data communities”.

More and more Linked Data applications seem to emerge in the business world and software companies make it part of their business plan to integrate Graph Data in their data stories or in their features.

MarkLogic is opening a new wave to how enterprise databases should be used to push over the limits of closed, rigid structures to integrate more data. Neo4j explains how you can enrich existing data and follow new connections and leads for investigations of the Panama Papers.

No wonder the communities in different locations gather to share, exchange and network around topics like Linked Data. In London, a new conference is emerging exactly for this purpose: Connected Data London. The conference sets the stage for industry leaders and early adopters as well as researchers to present their use cases and stories. You can hear talks from multiple domains about how they put Linked Data to a good use: space exploration, financial crime, bioinformatics, publishing and more.

The conference will close with an interesting panel discussion about “How to build a Connected Data capability in your organization.” You can hear from the specialists how this task is approached. And immediately after acquiring the know-how you will need a easy-to-use and easy-to-integrate software to help with your Knowledge Model creation and maintenance as well as Text Mining and Concept Annotating.

Semantic Web Company has an experienced team of professional consultants who help you in all your steps of implementing the acquired know-how together with PoolParty.

In our dedicated slot we present how a Connected Data Application is born from a Knowledge Model and which are the steps to get there.

Register today!

Connected Data London – London, 12th July, Holiday Inn Mayfair

Posted at 13:33

June 29

AKSW Group - University of Leipzig: AKSW Colloquium, 04.07.2016. Big Data, Code Quality.

On the upcoming Monday (04.07.2016), AKSW group will discuss topics related to Semantic Web and Big Data as well as programming languages and code quality. In particular, the following papers will be presented:

S2RDF: RDF Querying with SPARQL on Spark

by Alexander Schätzle et al.
Presented by: Ivan Ermilov

RDF has become very popular for semantic data publishing due to its flexible and universal graph-like data model. Yet, the ever-increasing size of RDF data collections makes it more and more infeasible to store and process them on a single machine, raising the need for distributed approaches. Instead of building a standalone but closed distributed RDF store, we endorse the usage of existing infrastructures for Big Data processing, e.g. Hadoop. However, SPARQL query performance is a major challenge as these platforms are not designed for RDF processing from ground. Thus, existing Hadoop-based approaches often favor certain query pattern shape while performance drops significantly for other shapes. In this paper, we describe a novel relational partitioning schema for RDF data called ExtVP that uses a semi-join based preprocessing, akin to the concept of Join Indices in relational databases, to efficiently minimize query input size regardless of its pattern shape and diameter. Our prototype system S2RDF is built on top of Spark and uses its relational interface to execute SPARQL queries over ExtVP. We demonstrate its superior performance in comparison to state
of the art SPARQL-on-Hadoop approaches using the recent WatDiv test suite. S2RDF achieves sub-second runtimes for majority of queries on a billion triples RDF graph

A Large Scale Study of Programming Languages and Code Quality in Github

by Baishakhi Ray et al.
Presented by: Tim Ermilov

What is the effect of programming languages on software quality? This question has been a topic of much debate for a very long time. In this study, we gather a very large data set from GitHub (729 projects, 80 Million SLOC, 29,000 authors, 1.5 million commits, in 17 languages) in an attempt to shed some empirical light on this question. This reasonably large sample size allows us to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static v.s. dynamic typing, strong v.s. weak typing on software quality. By triangulating findings from different methods,
and controlling for confounding effects such as team size, project size, and project history, we report that language design does have a significant, but modest effect on software quality. Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size. However, we hasten to caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, e.g., the preference of certain personality types for functional, static and strongly typed languages

Each paper will be presented in 20 minutes, which will be followed by 10 minutes discussion. After the talks, there is more time for discussion in smaller groups as well as coffee and cake. The colloquium starts at 3 p.m. and is located on 7th floor (Leipzig, Augustusplatz 10, Paulinum).

Posted at 10:34

June 27

AKSW Group - University of Leipzig: Accepted Papers of AKSW Members @ Semantics 2016

logo-semantics-16This year’s SEMANTiCS conference which is taking place between September 12 – 15, 2016 in Leipzig recently invited for the submission of research papers on semantic technologies. Several AKSW members seized the opportunity and got their submitted papers accepted for presentation at the conference.

These are listed below:

  • Executing SPARQL queries over Mapped Document Stores with SparqlMap-M (Jörg Unbehauen, Michael Martin )
  • Distributed Collaboration on RDF Datasets Using Git: Towards the Quit Store (Natanael Arndt, Norman Radtke and Michael Martin)
  • Towards Versioning of Arbitrary RDF Data (Marvin Frommhold, Ruben Navarro Piris, Natanael Arndt, Sebastian Tramp, Niklas Petersen and Michael Martin)
  • DBtrends: Exploring query logs for ranking RDF data (Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg)
  • MEX Framework: Automating Machine Learning Metadata Generation (Diego Esteves, Pablo N. Mendes, Diego Moussallem, Julio Cesar Duarte, Maria Claudia Cavalcanti, Jens Lehmann, Ciro Baron Neto and Igor Costa)

logo-www.leds-projekt.deAnother AKSW-driven event of the SEMANTiCS 2016 will be the Linked Enterprise Data Services (LEDS) Track taking place between September 13-14, 2016. This track is specifically organized by the BMBF-funded LEDS project which is part of the Entrepreneurial Regions program – a BMBF Innovation Initiative for the New German Länder. Focus is on discussing with academic and industrial partners new approaches to discover and integrate background knowledge into business and governmental environments.

DBpediaLogoFullSEMANTiCS 2016 will also host the 7th edition of the DBpedia Community Meeting on the last day of the conference (September 15 – ‘DBpedia Day‘). DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and link the different data sets on the Web to Wikipedia data.

So come and join SEMANTiCS 2016, talk and discuss with us!

More information on the program can be found here.

LEDS is funded by:                      Part of:

Wachstumskern Region

Posted at 10:50

June 26

AKSW Group - University of Leipzig: AKSW Colloquium, 27.06.2016, When owl:sameAs isn’t the Same + Towards Versioning for Arbitrary RDF Data

In the next Colloquium, June the 27th at 3 PM, two papers will be presented:

When owl:sameAs isn’t the Same: An Analysis of Identity in Linked Data

andre_terno_itaAndré Valdestilhas will present the paper “When owl:sameAs isn’t the Same: An Analysis of Identity in Linked Data” by Halpin et al. [PDF]:

Abstract:  In Linked Data, the use of owl:sameAs is ubiquitous in interlinking data-sets. There is however, ongoing discussion about its use, and potential misuse, particularly with regards to interactions with inference. In fact, owl:sameAs can be viewed as encoding only one point on a scale of similarity, one that is often too strong for many of its current uses. We describe how referentially opaque contexts that do not allow inference exist, and then outline some varieties of referentially-opaque alternatives to owl:sameAs. Finally, we report on an empirical experiment over randomly selected owl:sameAs statements from the Web of data. This theoretical apparatus and experiment shed light upon how owl:sameAs is being used (and misused) on the Web of data.

Towards Versioning for Arbitrary RDF Data

marvin-frommhold-foto.256x256Afterwards, Marvin Frommhold will practice the presentation of his paper “Towards Versioning for Arbitrary RDF Data” (Marvin Frommhold, Rubén Navarro Piris, Natanael Arndt, Sebastian Tramp, Niklas Petersen, and Michael Martin) [PDF] which is accepted at the main conference of the Semantics 2016 in Leipzig.

Abstract: Coherent and consistent tracking of provenance data and in particular update history information is a crucial building block for any serious information system architecture. Version Control Systems can be a part of such an architecture enabling users to query and manipulate versioning information as well as content revisions. In this paper, we introduce an RDF versioning approach as a foundation for a full featured RDF Version Control System. We argue that such a system needs support for all concepts of the RDF specification including support for RDF datasets and blank nodes. Furthermore, we placed special emphasis on the protection against unperceived history manipulation by hashing the resulting patches. In addition to the conceptual analysis and an RDF vocabulary for representing versioning information, we present a mature implementation which captures versioning information for changes to arbitrary RDF datasets.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 13:46

June 25

Egon Willighagen: New Paper: "Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources"

Andra Waagmeester published a paper on his work on a semantic web version of the WikiPathways (doi:10.1371/journal.pcbi.1004989). The paper outlines the design decisions, shows the SPARQL endpoint, and several examples SPARQL queries. These include federates queries, like a mashup with DisGeNET (doi:10.1093/database/bav028) and EMBL-EBI's Expression Atlas. That results in nice visualisations like this:

If you have the relevant information in the pathway, these pathways can help a lot in helping understanding of what is biologically going on. And, of course, used for exactly that a lot.

Press release
Because press releases have become an interesting tool in knowledge dissemination, I wanted to learn what it involved to get one out. This involved the people as PLOS Computational Biology and the press offices of the Gladstone Institutes and our Maastricht University (press release 1, press release 2 EN/NL). There is already one thing I learned in retrospect, and I am pissed with myself that I did not think of this: you should always have a graphics supporting your story. I have been doing this for a long time in my blog now (sometimes I still forget), but did not think of that in the press release. The press release was picked up by three outlets, though all basically as we presented it to them (thanks to

But what makes me appreciate this piece of work, and WikiPathways itself, is how it creates a central hub of biological knowledge. Pathway databases capture knowledge not easily embedded an generally structured (relational) databases. As such, expression this in the RDF format seems simple enough. The thing I really love about this approach, is that your queries become machine readable stories, particularly when you start using human readable variants of SPARQL for this. And you can share these queries with the online scientific community with, for example, myExperiment.

There are two applications how I have used SPARQL on WikiPathways data for metabolomics: 1. curation; 2. statistics. Data analysis is harder, because in the RDF world resources scientific lenses are needed to accommodate for the chemical structural-temporal complexity of metabolites. For curation, we have long used SPARQL for unit tests to support the curation of WikiPathways. Moreover, I have manually used the SPARQL end point to find curation tasks. But now that the paper is out, I can blog about this more. For now, many examples SPARQL queries can be found in the WikiPathways wiki. It features several queries showing statistics, but also some for curation. This is an example query I use to improve the interoperability of WikiPathways with Wikidata (also for BridgeDb):

  ?metabolite a wp:Metabolite .
  OPTIONAL { ?metabolite wp:bdbWikidata ?wikidata . }
  FILTER (!BOUND(?wikidata))

Feel free to give this query a go at!

This papers completes a nice triptych of three papers about WikiPathways in the past 6 months. Thanks to whole community and the very many contributors! All three papers are linked below.

Waagmeester, A., Kutmon, M., Riutta, A., Miller, R., Willighagen, E. L., Evelo, C. T., Pico, A. R., Jun. 2016. Using the semantic web for rapid integration of WikiPathways with other biological online data resources. PLoS Comput Biol 12 (6), e1004989+.
Bohler, A., Wu, G., Kutmon, M., Pradhana, L. A., Coort, S. L., Hanspers, K., Haw, R., Pico, A. R., Evelo, C. T., May 2016. Reactome from a WikiPathways perspective. PLoS Comput Biol 12 (5), e1004941+.
Kutmon, M., Riutta, A., Nunes, N., Hanspers, K., Willighagen, E. L., Bohler, A., Mélius, J., Waagmeester, A., Sinha, S. R., Miller, R., Coort, S. L., Cirillo, E., Smeets, B., Evelo, C. T., Pico, A. R., Jan. 2016. WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Research 44 (D1), D488-D494.

Posted at 09:44

June 23

Leigh Dodds: The state of open licensing

I spend a lot of time reading through licences and terms & conditions. Much more so than I thought I would when I first started getting involved with open data. After all, I largely just like making things with data.

But there’s still so much data that is

Posted at 19:03

June 22

AKSW Group - University of Leipzig: Should I publish my dataset under an open license?

Undecided, stand back we know flowcharts:

Did you ever try to apply the halting problem to a malformed flowchart?


Taken from my slides for my keynote  at TKE:

Linguistic Linked Open Data, Challenges, Approaches, Future Work from Sebastian Hellmann

Posted at 09:41

June 21

Leigh Dodds: From services to products

Over the course of my career I’ve done a variety of consulting projects as both an employee and freelancer. I’ve helped found and run a small consulting team. And, through my experience leading engineering teams, some experience of designing products and platforms. I’ve been involved in a few discussions, particularly over the last 12 months or so, around how to generate repeatable products off the back of consulting engagements.

I wanted to jot down a few thoughts here based on my own experience and a bit of background reading. I don’t claim to have any special insight or expertise, but the topic is one that I’ve encountered time and again. And as I’m trying to write things down more frequently, I thought I’d share my perspective in the hope that it may be useful to someone wrestling with the same issues.

Please comment if you disagree with anything. I’m learning too.

What are Products and Services?

Lets start with some definitions.

A service is a a bespoke offering that typically involves a high-level of expertise. In a consulting business you’re usually selling people or a team who have a particular set of skills that are useful to another organisation. While the expertise and skills being offered are common across projects, the delivery is usually highly bespoke and tailored for the needs of the specific client.

The outcomes of an engagement are also likely to be highly bespoke as you’re delivering to a custom specification. Custom software development, specially designed training packages, and research projects are all examples of services.

A product is a packaged solution to a known problem. A product will be designed to meet a particular need and will usually be designed for a specific audience. Products are often, but not always, software. I’m ignoring manufacturing here.

Products can typically be rapidly delivered as they can be installed or delivered via a well-defined process. While a product may be tailored for a specific client they’re usually very well-defined. Product customisation is usually a service in its own right. As is product support.

The Service-Product Spectrum

I think its useful to think about services and products being at opposite ends of a spectrum.

At the service end of the spectrum your offerings are:

  • are highly manual, because you’re reliant on expert delivery
  • are difficult to scale, because you need to find the people with the skills and expertise which are otherwise in short supply
  • have low repeatability, because you’re inevitably dealing with bespoke engagements

At the product end of the spectrum your offerings are:

  • highly automated, because you’re delivering a software product or following a well defined delivery process
  • scalable, because you need fewer (or at least different) skills to deliver the product
  • highly repeatable, because each engagement is well defined, has clear life-cycle, etc.

Products are a distillation of expertise and skills.

Actually, there’s arguably a stage before service. Lets call those “capabilities” to

Posted at 17:15

June 18

Leigh Dodds: “The Wizard of the Wash”, an open data parable

The fourth

Posted at 15:31

June 15

Dublin Core Metadata Initiative: Deadline of 15 July for DC-2016 Presentations and Best Practice Poster tracks

2016-06-15, The deadline of 15 July is approaching for abstract submissions for the Presentations on Metadata track and the Best Practice Poster track for DC-2016 in Copenhagen. Both tracks provide metadata practitioners and researchers the opportunity to present their work in Copenhagen. Neither of the tracks require a paper submission. Submit your proposal abstract for either track at Selections for presentation in Copenhagen will be made by the DC-2016 Organizing Team.

Posted at 23:59

Dublin Core Metadata Initiative: DCMI announces Workshop series for DC-2016

2016-06-15, DCMI is proud to announce a series of four workshops as part of the Professional program at DC-2016. Both half-day and full-day workshops are available. Abstracts of the workshops are available at Delegates to DC-2016 may register for both the International Conference and the Workshops individually at Day and half-day rates for individual workshops are also available.

Posted at 23:59

Dublin Core Metadata Initiative: DCMI opens registration for DC-2016

2016-06-15, Registration for DC-2016 is now open at The International Conference takes place on 13-14 October and the Workshop Series on 15-16. Separate registrations are available for the Conference and Workshops. DC-2016 in Copenhagen is collocated in the same venue with the ASIST Annual Meeting that takes place from 14-18 October. Special rates for the ASIST meeting are available to DCMI members. The program for DC-2016 will include papers, project reports, posters (research and best practice) and presentations on metadata. In addition, there will be a series of topical special sessions and two days of workshops. For more information and to register, visit the DC-2016 conference website at

Posted at 23:59

June 14

Leigh Dodds: Discussion document: archiving open data

This is a brief post to highlight a short discussion document that I recently published about

Posted at 16:50

AKSW Group - University of Leipzig: TKE 2016 has announced their invited speakers

Sebastian HellmannThe 12th International Conference on Terminology and Knowledge Engineering (TKE 2016) has announced their invited speakers, including Dr. Sebastian Hellmann, Head of the AKSW/KILT research group at Leipzig University and Executive Director of the DBpedia Association at the Institut for Applied Informatics (InfAI) e.V.. Sebastian Hellman will give a talk on Challenges, Approaches and Future Work for Linguistic Linked Open Data (LLOD).

The theme of the 12th International Conference on Terminology and Knowledge Engineering will be ‘Term Bases and Linguistic Linked Open Data’. So the main aims of TKE 2016 will be to bring together researchers from these related fields, provide an overview of the state-of-the-art, discuss problems and opportunities, and exchange information. TKE 2016 will also cover applications, ongoing and planned activities, industrial uses and needs, as well as requirements coming from the new e-society.

DownloadThe TKE 2016 conference will take place in Copenhagen, Denmark, between 22-24 June, 2016. Further information about the program and speakers confirmed so far can be found at the conference website.


Posted at 10:58

Leigh Dodds: What 3 Words? Jog on mate!


Posted at 07:11

June 12

AKSW Group - University of Leipzig: Two Papers accepted at ECAI 2016

Ecai-2016Hello Community! We are very pleased to announce that two of our papers were accepted for presentation at the biennial European Conference on Artificial Intelligence (ECAI). ECAI is Europe’s premier venue for presenting scientific results in AI and will be held from August 29th to September 02nd in The Hague, Netherlands.


In more detail, we will present the following papers:

An Efficient Approach for the Generation of Allen Relations                     (Kleanthi Georgala, Mohamed Sherif, Axel-Cyrille Ngonga Ngomo)

Abstract: Event data is increasingly being represented according to the Linked Data principles. The need for large-scale machine learning on data represented in this format has thus led to the need for efficient approaches to compute RDF links between resources based on their temporal properties. Time-efficient approaches for computing links between RDF resources have been developed over the last years. However, dedicated approaches for linking resources based on temporal relations have been paid little attention to. In this paper, we address this research gap by presenting AEGLE, a novel approach for the efficient computation of links between events according to Allen’s interval algebra. We study Allen’s relations and show that we can reduce all thirteen relations to eights simpler relations. We then present an efficient algorithm with a complexity of O(n log n) for computing these eight relations. Our evaluation of the runtime of our algorithms shows that we outperform the state of the art by up to 4 orders of magnitude while maintaining a precision and a recall of 100%.

Towards SPARQL-Based Induction for Large-Scale RDF Data sets             (Simon Bin, Lorenz Bühmann, Jens Lehmann, Axel-Cyrille Ngonga Ngomo)

Abstract: We show how to convert OWL Class Expressions to SPARQL queries where the instances of that concept are — with restrictions sensible in the considered concept induction scenario — equal to the SPARQL query result.  Furthermore, we implement and integrate our converter into the CELOE algorithm (Class Expression Learning for Ontology Engineering). Therein, it replaces the position of a traditional OWL reasoner, which most structured machine learning approaches assume knowledge to be loaded into. This will foster the application of structured machine learning to the Semantic Web, since most data is readily available in triple stores. We provide experimental evidence for the usefulness of the bridge. In particular, we show that we can improve the runtime of machine learning approaches by several orders of magnitude. With these results, we show that machine learning algorithms can now be executed on data on which in-memory reasoners could not be  use previously possible.

Come over to ECAI and enjoy the talks. For more information on the conference program and other papers please see here.

Sandra on behalf of AKSW

Posted at 20:22

Bob DuCharme: Emoji SPARQL😝!

If emojis have Unicode code points, then we can...

Posted at 16:46

AKSW Group - University of Leipzig: AKSW Colloquium, 13.06.2016, SPARQL query processing with Apache Spark

In the upcoming Colloquium, Simon Bin will discuss the paper “SimonSPARQL query processing with Apache Spark” by H. Naacke that has been submitted to ISWC2016.  Abstract

The number of linked data sources and the size of the linked open data graph keep growing every day.  As a consequence, semantic RDF services are more and more confronted to various big data problems.  Query processing is one of them and needs to be efficiently addressed with executions over scalable, highly available and fault tolerant frameworks.  Data management systems requiring these properties are rarely built from scratch but are rather designed on top of an existing cluster computing engine.  In this work, we consider the processing of SPARQL queries with Apache Spark.
We propose and compare five different query processing approaches based on different join execution models and Spark components.  A detailed experimentation, on real-world and synthetic data sets, emphasizes that two approaches tailored for the RDF data model outperform the other ones on all major query shapes, i.e star, snowflake, chain and hybrid.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 06:53

June 08

AKSW Group - University of Leipzig: AKSW at ESWC 2016


We are very pleased to report that 4 of our papers were accepted for presentation as full papers at ESWC 2016. These are

In addition, we organised the first HOBBIT community meeting. Many thanks to all who participated. Get involved in the project by going here. Our survey pertaining to benchmarking is still open and we’d love to have your feedback on what you would want benchmarking Linked Data to look like.

We also presented  three research projects, i.e., HOBBIT, QAMEL and DIESEL during the EU networking sessions. Many thanks for the fruitful discussions and ideas.

Finally, we thank all the systems which participated to QALD-6 and OKE and made these challenges so interesting. Little perk: We are still to find a system to beat CETUS at the OKE challenge :)

FYI, a full list of accepted conference papers can be found here.


In addition to the main conference, we were active during the workshops. Axel gave the keynote at the Profiles workshop (many thanks to the organizers for the invite). The following papers were accepted as full papers.

  • DBtrends : Publishing and Benchmarking RDF Ranking Functions by Edgard Marx, Amrapali J. Zaveri, Mofeed Mohammed, Sandro Rautenberg, Jens Lehmann, Axel-Cyrille Ngonga Ngomo and Gong Cheng, SumPre2016 Workshop at ESWC 2016
  • Towards Sustainable view-based Extract-Transform-Load (ETL) Fusion of Open Data by Kay Mueller, Claus Stadler, Ritesh Kumar Singh and Sebastian Hellmann, LDQ2016 [pdf]
  • UPSP: Unique Predicate-based Source Selection for SPARQL Endpoint Federation by Ethem Cem Ozkan, Muhammad Saleem, Erdogan Dogdu and Axel-Cyrille Ngonga Ngomo  PROFILES Workshop at ESWC 2016 [pdf]
  • Federated Query Processing: Challenges and Opportunities by Axel-Cyrille Ngonga Ngomo and Muhammad Saleem Keynote at PROFILES Workshop at ESWC 2016 [pdf]

Quo Vadis?

We are now looking forward to EDF 2016, where we will present HOBBIT as a poster as well as organise a post conference event (see Thereafter, you can meet us at  ISWC 2016, where we will present two tutorials (Link Discovery and Federated SPARQL queries) and organise the BLINK [[]] workshop. Your submissions are welcome.


Posted at 14:38

May 31

AKSW Group - University of Leipzig: AKSW@LREC2016

Since the first edition held in Granada in 1998, LREC has become one of the the major events on Language Resources (LRs) and Language Technologies (LT). At the 10th edition of the Language Resources and Evaluation Conference (LREC 2016), held from 23-28 May 2016 in Portorož (Slovenia), the AKSW/KILT members Bettina Klimek, Milan Doichinovski and Sebastian Hellmann took active participation. At the conference they presented their most recent research results and project outcomes in the areas of Linked Data and Language Technologies. With over 1250 paper submissions and 744 accepted papers, we are pleased to have contributed to the research field with the following contributions:

  • DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus, by Brümmer, Martin; Dojchinovski, Milan and Hellmann, Sebastian [PDF]
  • FREME: Multilingual Semantic Enrichment with Linked Data and Language Technologies, by Dojchinovski, Milan; Sasaki, Felix; Gornostaja,Tatjana;  Hellmann, Sebastian; Mannens, Erik; Salliau, Frank; Osella, Michele; Ritchie, Phil; Stoitsis, Giannis; Koidl, Kevin; Ackermann, Markus and Chakraborty, Nilesh [PDF]
  • Creating Linked Data Morphological Language Resources with MMoOn – The Hebrew Morpheme Inventory, by Klimek, Bettina and Arndt, Natanael and Krause, Sebastian and Arndt, Timotheus [PDF]
  • The Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud, by McCrae, John P.; Chiarcos, Christian; Bond, Francis; Cimiano, Philipp; Declerck, Thierry; de Melo, Gerard; Gracia, Jorge; Hellmann, Sebastian; Klimek, Bettina; Moran, Steven; Osenova, Petya; Pareja-Lora, Antonio and Pool, Jonathan [PDF]

At the  main conference Bettina Klimek gave an oral presentation of the Hebrew Morpheme Inventory that is based on the MMoOn project. The audience showed high interest in the data and the underlying MMoOn ontology including questions about possible applications such as creating MMoOn based lemmatizers.

Bettina Klimek presenting @LREC2016

Milan Dojchinovski @LREC 2016


Further, Milan Dojchinovski gave two poster presentations summarizing the latest results from the FREME project. He presented the “DBpedia Abstracts” – a large-scale, open, multilingual NLP training corpus. The presentation attracted huge interest from the audience which has shown particular interest in its use. Several requests on availability of the corpora in other languages (i.e. Welsh) have been also received.

Milan has also presented the latest developments within the FREME project and the framework itself. The presentation has been primarily focused on the technical aspects of the framework, its availability, active use a real-world scenarios and the future plans.

Also, being active members of the Open Knowledge Foundation’s Working Group on Open Data in Linguistics (OWLG),  Sebastian Hellmann and Bettina Klimek helped organizing the 5th Workshop on Linked Data in Linguistic (LDL-2016) which was one of the LREC conference workshops. Around 50 participants attended the workshop discussing topics dealing with managing, building and using linked language resources. In the workshop’s poster session Bettina Klimek introduced the MMoOn model for representing morphological language data to the various interested workshop attendants. In addition, Milan Dojchinovski also presented results from the FREME project which relate to the research presented at the LDL workshop and the Linked Data and Language Technologies community.

The LDL Workshop participants.

In continuation of OWLG organized events, the First Workshop on Knowledge Extraction and Knowledge Integration (KEKI 2016) will take place on the 17-18 October in conjunction with the 15th International Semantic Web Conference in Kobe (Japan). The topics of linguistic Linked Data creation and integration will be taken up in order to move the LLOD cloud to its next phase in which innovative applications will be developed overcoming the language barriers on the Web. Paper submission is still open until 1st of July!

During the main conference days 25-27 May, 2016, Milan Dojchinovski and Felix Sasaki (FREME project coordinator) have taken participation in the exhibition area with a booth dedicated to the FREME project. The ultimate goal of this participation was to meet people interested in understanding how the open framework deployed within the project may help in narrowing the gap between the actual business needs and the language and Linked Data technologies. For more on the FREME presence at LREC 2016 you can read here.

LREC has been a great event to meet the community, make new connections, discuss current research challenges, share ideas, and establish new collaborations. Having said that, we look forward to the next LREC conference, in two years from now!

Posted at 17:42

May 30

Dublin Core Metadata Initiative: Call for Participation and Demos: NKOS Dublin Core workshop

2016-05-30, The 16th European Networked Knowledge Organization Systems (NKOS) Workshop will take place at the DC-2016 conference in Copenhagen. Proposals are invited for the following: (a) Presentations (typically 20 minutes plus discussion time, potentially longer if warranted) on work related to the themes of the workshop (see below). An option for a short 5 minute project report presentation is also possible; and (b) Demos on work related to the themes of the workshop. The submission deadline is Friday, 1 July 2016 with notification of acceptance by Tuesday, 16th August 2016. The Call for Participation can be found on the conference website at and on the NKOS website at

Posted at 23:59

Dublin Core Metadata Initiative: Dublin Core at 21 (A celebration in Dublin, Ohio)

2016-05-30, IFLA Satellite Event. Dublin Core at 21 celebrates DC's amazing 21 year history and anticipates its future. The Dublin Core originated in 1995 at a meeting at OCLC (in the very room where this IFLA Satellite event will also take place). This special event will bring a historical view of key people who were there when the Web was young, and Dublin Core was new, and evolving rapidly. But the Web does not stand still. Presentations will also provide information on the latest metadata standards-related work underway by DCMI and OCLC's current work with metadata models, standards, and technologies advancing the state of the art for libraries and archives. Presenters will include metadata experts with long ties to Dublin Core including several who were at the original invitational meeting in 1995. A panel discussion will permit speakers to reflect on activities and trends past and present. Attendees are invited to attend a complimentary reception and special unveiling following the presentation portion of the day. For more information and to register for this IFLA Satellite event, go to

Posted at 23:59

Copyright of the postings is owned by the original blog authors. Contact us.