Planet RDF

It's triples all the way down

September 17

Dublin Core Metadata Initiative: A Successful DCMI 2018 Conference

The DCMI Annual Conference was held last week, hosted by the Faculty of Engineering of the University of Porto, Portugal. The conference was co-located with TPDL which meant that while many people arrived as part of one community, all left with the experience and appreciation of two! The full conference proceedings are now available, with copies of presentation slides where appropriate. Some photographs of the conference can be found on Flickr, tagged with 'dcmi18'.

Posted at 00:00

September 15

Egon Willighagen: Wikidata Query Service recipe: qualifiers and the Greek alphabet

Just because I need to look this up each time myself, I wrote up this quick recipe for how to get information from statement qualifiers from Wikidata. Let's say, I want to list all Greek letters, with in one column the lower case and in the other the upper case letter. This is what our data looks like:

So, let start with a simple query that lists all letters in the Greek alphabet:

SELECT ?letter WHERE {
  ?letter wdt:P361 wd:Q8216 .

Of course, that only gives me the Wikidata entries, and not the Unicode characters we are after. So, let's add that Unicode character property:

SELECT ?letter ?unicode WHERE {
  ?letter wdt:P361 wd:Q8216 ;
          wdt:P487 ?unicode .

Ah, that gets us somewhere:

But you see that the upper and lower case are still in separate rows, rather than columns. To fix that, we need access to those qualifiers. It's all in there in the Wikidata RDF, but the model is giving people a headache (so do many things, like math, but that does not mean we should stop doing it!). It all comes down to keeping notebooks, write down your tricks, etc. It's called the scientific method (there is more to that, than just keeping notebooks, tho).

So, a lot of important information is put in qualifiers, and not just the statements. Let's first get all statements for a Greek letter. We would do that with:

?letter ?pprop ?statement .

One thing we want to know about the property we're looking at, is the entity linked to that. We do that by adding this bit:

?property wikibase:claim ?propp .

Of course, the property we are interested in is the Unicode character, so can put that directly in:

wd:P487 wikibase:claim ?propp .

Next, the qualifiers for the statement. We want them all:

?statement ?qualifier ?qualifierVal .
?qualifierProp wikibase:qualifier ?qualifier .

And because we do not want any qualifier but the applies to part, we can put that in too:

?statement ?qualifier ?qualifierVal .
wd:P518 wikibase:qualifier ?qualifier .

Furthermore, we are only interested in lower case and upper case, and we can put that in as well (for upper case):

?statement ?qualifier wd:Q98912 .
wd:P518 wikibase:qualifier ?qualifier .

So, if we want both upper and lower case, we now get this full query:

SELECT DISTINCT ?letter ?unicode WHERE {
  ?letter wdt:P361 wd:Q8216 ;
          wdt:P487 ?unicode .
  ?letter ?pprop ?statement .
  wd:P487 wikibase:claim ?propp .
  ?statement ?qualifier wd:Q8185162 .
  wd:P518 wikibase:qualifier ?qualifier .

We are not done yet, because you can see in the above example that we get the unicode character differently from the statement. This needs to be integrated, and we need the wikibase:statementProperty for that:

wd:P487 wikibase:statementProperty ?statementProp .
?statement ?statementProp ?unicode .

If we integrate that, we get this query, which is indeed getting complex:

SELECT DISTINCT ?letter ?unicode WHERE {
  ?letter wdt:P361 wd:Q8216 .
  ?letter ?pprop ?statement .
  wd:P487 wikibase:claim ?propp ;
          wikibase:statementProperty ?statementProp .
  ?statement ?qualifier wd:Q8185162 ;
             ?statementProp ?unicode .  
  wd:P518 wikibase:qualifier ?qualifier .

But basically we have our template here, with three parameters:
  1. the property of the statement (here P487: Unicode character)
  2. the property of the qualifier (here P518: applies to part)
  3. the object value of the qualifier (here Q98912: upper case)
If we use the SPARQL VALUES approach, we get the following template. Notice that I renamed the variables of ?letter and ?unicode. But I left the wdt:P361 wd:Q8216 (='part of' 'Greek alphabet') in, so that this query does not time out:

SELECT DISTINCT ?entityOfInterest ?statementDataValue WHERE {
  ?entityOfInterest wdt:P361 wd:Q8216 . # 'part of' 'Greek alphabet'
  VALUES ?qualifierObject { wd:Q8185162 }
  VALUES ?qualifierProperty { wd:P518 }
  VALUES ?statementProperty { wd:P487 }

  # template
  ?entityOfInterest ?pprop ?statement .
  ?statementProperty wikibase:claim ?propp ;
          wikibase:statementProperty ?statementProp .
  ?statement ?qualifier ?qualifierObject ;
             ?statementProp ?statementDataValue .  
  ?qualifierProperty wikibase:qualifier ?qualifier .

So, there is our recipe, for everyone to copy/paste.

Completing the Greek alphabet example
OK, now since I actually started with the upper and lower case Unicode character for Greek letters, let's finish that query too. Since we need both, we need to use the template twice:

SELECT DISTINCT ?entityOfInterest ?lowerCase ?upperCase WHERE {
  ?entityOfInterest wdt:P361 wd:Q8216 .

  { # lower case
    ?entityOfInterest ?pprop ?statement .
    wd:P487 wikibase:claim ?propp ;
            wikibase:statementProperty ?statementProp .
    ?statement ?qualifier wd:Q8185162 ;
               ?statementProp ?lowerCase .  
    wd:P518 wikibase:qualifier ?qualifier .

  { # upper case
    ?entityOfInterest ?pprop2 ?statement2 .
    wd:P487 wikibase:claim ?propp2 ;
            wikibase:statementProperty ?statementProp2 .
    ?statement2 ?qualifier2 wd:Q98912 ;
               ?statementProp2 ?upperCase .  
    wd:P518 wikibase:qualifier ?qualifier2 .

Still one issue left to fix. Some greek letters have more than one upper case Unicode character. We need to concatenate those. That requires a GROUP BY and the GROUP_CONCAT function, and get this query:

SELECT DISTINCT ?entityOfInterest
  (GROUP_CONCAT(DISTINCT ?lowerCase; separator=", ") AS ?lowerCases)
  (GROUP_CONCAT(DISTINCT ?upperCase; separator=", ") AS ?upperCases)
  ?entityOfInterest wdt:P361 wd:Q8216 .

  { # lower case
    ?entityOfInterest ?pprop ?statement .
    wd:P487 wikibase:claim ?propp ;
            wikibase:statementProperty ?statementProp .
    ?statement ?qualifier wd:Q8185162 ;
               ?statementProp ?lowerCase .  
    wd:P518 wikibase:qualifier ?qualifier .

  { # upper case
    ?entityOfInterest ?pprop2 ?statement2 .
    wd:P487 wikibase:claim ?propp2 ;
            wikibase:statementProperty ?statementProp2 .
    ?statement2 ?qualifier2 wd:Q98912 ;
               ?statementProp2 ?upperCase .  
    wd:P518 wikibase:qualifier ?qualifier2 .
} GROUP BY ?entityOfInterest

Now, since most of my blog posts are not just fun, but typically also have a use case, allow me to shed light on the context. Since you are still reading, your officially part of the secret society of brave followers of my blog. Tweet to my egonwillighagen account a message consisting of a series of letters followed by two numbers (no spaces) and another series of letters, where the two numbers indicate the number of letters at the start and the end, for example, abc32yz or adasgfshjdg111x, and I will you add you to my secret list of brave followers (and I will like the tweet; if you disguise the string to suggest it has some meaning, I will also retweet it). Only that string is allowed and don't tell anyone what it is about, or I will remove you from the list again :) Anyway, my ambition is to make a Wikidata-based BINAS replacement.

So, we only have a human readable name. The frequently used SERVICE wikibase:label does a pretty decent job and we end up with this table:

Posted at 09:12

September 13

AKSW Group - University of Leipzig: AskNow 0.1 Released

Dear all,

we are very happy to announce AskNow 0.1 – the initial release of Question Answering Components and Tools over RDF Knowledge Graphs.


The following components with corresponding features are currently supported by AskNow:

  • AskNow UI 0.1: The UI interface works as a platform for users to pose their questions to the AskNow QA system. The UI displays the answers based on whether the answer is an entity or a list of entities, boolean or literal. For entities it shows the abstracts from DBpedia.

We want to thank everyone who helped to create this release, in particular the projects HOBBIT, SOLIDE, WDAqua, BigDataEurope.

View this announcement on Twitter:

Kind regards,
The AskNow Development Team

Posted at 13:35

Dublin Core Metadata Initiative: Webinar: SKOS - Overview and Modeling of Controlled Vocabularies

This webinar is scheduled for Thursday, October 11, 2018, 14:00 UTC (convert this time to your local timezone here) and is free for DCMI members. SKOS (Simple Knowledge Organization Systems) is the recommendation of the W3C to represent and publish datasets for classifications, thesauri, subject headings, glossaries and other types of controlled vocabularies and knowledge organization systems in general. The first part of the webinar includes an overview of the technologies of the semantic web and shows in detail the different elements of the SKOS model.

Posted at 00:00

Dublin Core Metadata Initiative: Webinar: SKOS - Overview and Modeling of Controlled Vocabularies

This webinar is scheduled for Thursday, October 11, 2018, 14:00 UTC ([convert this time to your local timezone here]()) and is free for DCMI members. SKOS (Simple Knowledge Organization Systems) is the recommendation of the W3C to represent and publish datasets for classifications, thesauri, subject headings, glossaries and other types of controlled vocabularies and knowledge organization systems in general. The first part of the webinar includes an overview of the technologies of the semantic web and shows in detail the different elements of the SKOS model.

Posted at 00:00

September 11

W3C Blog Semantic Web News: JSON-LD Guiding Principles, First Public Working Draft

Coming to consensus is difficult in any working group, and doubly so when the working group spans a broad cross-section of the web community. Everyone brings their own unique set of experiences, skills and desires for the work, but at the end of the process, there can be only one specification.  In order to provide a framework in which to manage the expectations of both participants and other stakeholders, the JSON-LD WG started out by establishing a set of guiding principles.  These principles do not constrain decisions, but provide a set of core aims and established consensus to reference during difficult discussions.  The principles are lights to lead us back out of the darkness of never-ending debate towards a consistent and appropriately scoped set of specifications. A set of specifications that have just been published as First Public Working Drafts.

These principles start with the uncontroversial “Stay on target!”, meaning to stay focused on the overall mission of the group to ensure the ease of creation and consumption of linked data using the JSON format by the widest possible set of developers. We note that the target audience is software developers generally, not necessarily browser-based applications.

To keep the work grounded, requiring use cases with actual data, that have support from at least two organizations (W3C members or otherwise) was also decided as important principles to keep in mind. The use cases are intended to be supporting evidence for the practicality and likely adoption of a proposed feature, not a heavyweight requirements analysis process.

Adoption of specifications is always a concern, and to maximize the likelihood of uptake, we have adopted several principles around simplicity, usability and preferring phased or incremental solutions. To encourage experimentation and to try and reduce the chances of needing a breaking change in the future, we have adopted a principle of defining only what is conforming to the specification, and leaving all other functionality open. Extensions are welcome, they are just not official or interoperable.

Finally, and somewhat controversially, we adopted the principle that new features should be compatible with the RDF Data Model. While there are existing features that cannot be expressed in RDF that are possible in JSON-LD, we do not intend to increase this separation between the specifications and hope to close it as much as possible.

Using these guidelines, the working group has gotten off to a very productive start and came to very quick consensus around whether or not many features suggested to the JSON-LD Community Group were in scope for the work or not, including approving the much requested lists-of-lists functionality. This will allow JSON arrays to directly include JSON arrays as items in JSON-LD 1.1, enabling a complete semantic mapping for JSON structures such as GeoJSON, and full round-tripping through RDF. The publication of the FPWD documents is a testimony to the efforts of the Working Group, and especially those of Gregg Kellogg as editor.

Posted at 22:19

September 08

Egon Willighagen: Also new this week: "Google Dataset Search"

There was a lot of Open Science news this week. The announcement of the Google Dataset Search was one of them:

 Of course, I first tried searching for "RDF chemistry" which shows some of my data sets (and a lot more):

It picks up data from many sources, such as Figshare in this image. That means it also works (well, sort of, as Noel O'Boyle noticed) for supplementary information from the Journal of Cheminformatics.

It picks up metadata in several ways, among which So, next week we'll see if we can get eNanoMapper extended to spit compatible JSON-LD for its data sets, called "bundles".

Integrated with Google Scholar?
While the URL for the search engine does not suggest the service is more than a 20% project, we can hope it will stay around like Google Scholar has been. But I do hope they will further integrate it with Scholar. For example, in the above figure, it did pick up that I am the author of that data set (well, repurposed from an effort of Rich Apodaca), it did not figure out that I am also on Scholar.

So, these data sets do not show up in your Google Scholar profile yet, but they must. Time will tell where this data search engine is going. There are many interesting features, and given the amount of online attention, they won't stop development just yet, and I expect to discover more and better features in the next months. Give it a spin!

Posted at 09:13

August 27

Bob DuCharme: Pipelining SPARQL queries in memory with the rdflib Python library

Using retrieved data to make more queries.

Posted at 13:55

August 18

Egon Willighagen: Compound (class) identifiers in Wikidata

Bar chart showing the number of compounds
with a particular chemical identifier.
I think Wikidata is a groundbreaking project, which will have a major impact on science. One of the reasons is the open license (CCZero), the very basic approach (Wikibase), and the superb community around it. For example, setting up your own Wikibase including a cool SPARQL endpoint, is easily done with Docker.

Wikidata has many sub projects, such as WikiCite, which captures the collective of primary literature. Another one is the WikiProject Chemistry. The two nicely match up, I think, making a public database linking chemicals to literature (tho, very much needs to be done here), see my recent ICCS 2018 poster (doi:10.6084/m9.figshare.6356027.v1, paper pending).

But Wikidata is also a great resource for identifier mappings between chemical databases, something we need for our metabolism pathway research. The mapping, as you may know, are used in the latter via BridgeDb and we have been using Wikidata as one of three sources for some time now (the others being HMDB and ChEBI). WikiProject Chemistry has a related ChemID effort, and while the wiki page does not show much recent activity, there is actually a lot of ongoing effort (see plot). And I've been adding my bits.

Limitations of the links
But not each identifier in Wikidata has the same meaning. While they are all classified as 'external-id', the actual link may have different meaning. This, of course, is the essence of scientific lenses, see this post and the papers cited therein. One reason here is the difference in what entries in the various databases mean.

Wikidata has an extensive model, defined by the aforementioned WikiProject Chemistry. For example, it has different concepts for chemical compounds (in fact, the hierarchy is pretty rich) and compound classes. And these are differently modeled. Furthermore, it has a model that formalizes that things with a different InChI are different, but even allows things with the same InChI to be different, if need arises. It tries to accurately and precisely capture the certainty and uncertainty of the chemistry. As such, it is a powerful system to handle identifier mappings, because databases are not clear, and chemistry and biological in data is even less: we measure experimentally a characterization of chemicals, but what we put in databases and give names, are specific models (often chemical graphs).

That model differs from what other (chemical) databases use, or seem to use, because not always do databases indicate what they actually have in a record. But I think this is a fair guess.

ChEBI (and the matching ChEBI ID) has entries for chemical classes (e.g. fatty acid) and specific compounds (e.g. acetate).

PubChem, ChemSpider, UniChem
These three resources use the InChI as central asset. While they do not really have the concept of compound classes so much (though increasingly they have classifications), they do have entries where stereochemistry is undefined or unknown. Each one has their own way to link to other databases themselves, which normally includes tons of structure normalization (see e.g. doi:10.1186/s13321-018-0293-8 and doi:10.1186/s13321-015-0072-8)

HMDB (and the matching P2057) has a biological perspective; the entries reflect the biology of a chemical. Therefore, for most compounds, they focus on the neutral forms of compounds. This makes linking to/from other databases where the compound is not neutral chemically less precise.

CAS registry numbers
CAS (and the matching P231) is pretty unique itself, and has identifiers for substances (see Q79529), much more than chemical compounds, and comes with a own set of unique features. For example, solutions of some compound, by design, have the same identifier. Previously, formaldehyde and formalin had different Wikipedia/Wikidata pages, both with the same CAS registry number.

Limitations of the links #2
Now, returning to our starting point: limitations in linking databases. If we want FAIR mappings, we need to be as precise as possible. Of course, that may mean we need more steps, but we can always simplify at will, but we never can have a computer make the links more complex (well, not without making assumptions, etc).

And that is why Wikidata is so suitable to link all these chemical databases: it can distinguish differences when needed, and make that explicit. It make mappings between the databases more FAIR.

Posted at 12:46

August 09

Egon Willighagen:

Posted at 11:59

August 07

AKSW Group - University of Leipzig: Jekyll RDF Tutorial Screencast

Since 2016 we are developing Jekyll-RDF a plugin for the famous Jekyll–static website generator. With Jekyll-RDF we took the slogan of Jekyll “Transform your plain text into static websites and blogs” and transformed it to “Transform your RDF Knowledge Graph into static websites and blogs”. This enables people without deep programming knowledge to publish data, which is encoded in complicated RDF structures, on the web in an easy to browse format.

To ease your start with Jekyll-RDF I’ve created a Tutorial Screencast that teaches you all the basics necessary to create a simple Jekyll page from an RDF knowledgebase. I hope that you enjoy it and that it is helpful for you!


Posted at 09:11

August 04

Egon Willighagen: WikiPathways Summit 2018

I was not there when WikiPathways was founded; I only joined in 2012 and I found my role in the area of metabolic pathways of this Open knowledge base (CC0, to be precise) of biological processes. This autumn, A WikiPathways Summit 2018 is organized in San Francisco to celebrate the 10th anniversary of the project, and everyone interested is kindly invited to join for three days of learning about WikiPathways, integrations, and use cases, data curation, and hacking on this great Open Science project.

Things that I would love to talk about (besides making metabolic pathways FAIR and Openly available) are the integrations with other platforms (Reactome, RaMP, MetaboLights, Pathway Commons, PubChem, Open PHACTS (using the RDF), etc, etc), Wikidata interoperability, and future interoperability with platoforms like AOPWiki, Open Targets, BRENDA, Europe PMC, etc, etc, etc.

Posted at 06:54

August 01

Libby Miller: Neue podcast in a box, part 1

Ages ago I wrote a post on how to create a physical podcast player (“podcast in a box”) using Radiodan. Since then, we’ve completely rewritten the software, so those instructions can be much improved and simplified. Here’s a revised technique, which will get you as far as reading an RFID card. I might write a part 2, depending on how much time I have.

You’ll need:

  • A Pi 3B or 3B+
  • An 8GB or larger class 10 microSD card
  • A cheapo USB soundcard (e.g.)
  • A speaker with a 3.5mm jack
  • A power supply for the Pi
  • An MFC522 RFID reader
  • A laptop and microSD card reader / writer

The idea of Radiodan is that as much as possible happens inside web pages. A server runs on the Pi. One webpage is opened headlessly on the Pi itself (internal.html) – this page will play the audio; another can be opened on another machine to act as a remote control (external.html).

They are connected using websockets, so each can access the same messages – the RFID service talks to the underlying peripheral on the Pi, making the data from the reader available.

Here’s what you need to do:

1. Set up the the Pi as per these instructions (“setting up your Pi”)

You need to burn a microSD card with the latest Raspian with Desktop to act as the Pi’s brain, and the easiest way to do this is with Etcher. Once that’s done, the easiest way to do the rest of the install is over ssh, and the quickest way to get that in place is to edit two files while the card is still in your laptop (I’m assuming a Mac):

Enable ssh by typing:

touch /Volumes/boot/ssh

Add your wifi network to boot by adding a file called


contents: (replace AP_NAME and AP_PASSWORD with your wifi details)

ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev


Then eject the card, put the card in the Pi, attach all the peripherals except for the RFID reader and switch it on. While on the same wifi network, you should be able to ssh to it like this:

ssh pi@raspberrypi.local

password: raspberry.

Then install the Radiodan software using the provisioning script like this:

curl | sudo bash

2. Enable SPI on the Pi

Don’t reboot yet; type:

sudo raspi-config

Under interfaces, enable SPI, then shut the Pi down

sudo halt

and unplug it.

3. Test Radiodan and configure it

If all is well and you have connected a speaker via a USB soundcard, you should hear it say “hello” as it boots.

Please note: Radiodan does not work with the default 3.5mm jack on the Pi. We’re not sure yet why. But USB soundcards are very cheap, and work well.

There’s one app available by default for Radiodan on the Pi. To use it,

  1. Navigate to http://raspberrypi.local/radio
  2. Use the buttons to play different audio clips. If you can hear things, then it’s all working



shut the Pi down and unplug it from the mains.

4. Connect up the RFID reader to the Pi

like this

Then start the Pi up again by plugging it in.

5. Add the piab app

Dan has made a very fancy mechanism for using Samba to drag and drop apps to the Pi, so that you can develop on your laptop. However, because we’re using RFID (which only works on the Pi), we may as well do everything on there. So, ssh to it again:

ssh pi@raspberrypi.local
cd /opt/radiodan/rde/apps/
git clone

This is currently a very minimal app, which just allows you to see all websocket messages going by, and doesn’t do anything else yet.

6. Enable the RFID service and piab app in the Radiodan web interface

Go to http://raspberrypi.local:5020

Enable “piab”, clicking ‘update’ beneath it. Enable the RFID service, clicking ‘update’ beneath it. Restart the manager (red button) and then install dependencies (green button), all within the web page.



Reboot the Pi (e.g. ssh in and sudo reboot). This will enable the RFID service.

7. Test the RFID reader

Open http://raspberrypi.local:5000/piab and open developer tools for that page. Place a card on the RFID reader. You should see a json message in the console with the RFID identifier.


The rest is a matter of writing javascript / html code to:

  • Associate a podcast feed with an RFID (e.g. a web form in external.html that allows the user to add a podcast feed url)
  • Parse the podcast feed when the appropriate card id is detected by the reader
  • Find the latest episode and play it using internal.html (see the radio app example for how to play audio)
  • Add more fancy options, such as remembering where you were in an episode, stopping when the card is removed etc.

As you develop, you can see the internal page on http://raspberrypi.local:5001 and the external page on http://raspberrypi.local:5000. You can reload the app using the blue button on http://raspberrypi.local:5020.

Many more details about the architecture of Radiodan are available; full installation instructions and instructions for running it on your laptop are here; docs are here; code is in github.

Posted at 09:11

July 22

Bob DuCharme: Dividing and conquering SPARQL endpoint retrieval

With the VALUES keyword.

Posted at 16:52

July 20

AKSW Group - University of Leipzig: DBpedia Day @ SEMANTiCS 2018

Don’t miss the 12th edition of the DBpedia Community Meeting in Vienna, the city with the highest quality of life in the world. The DBpedia Community will get together for the DBpedia Day on September 10, the first day of the SEMANTiCS Conference which will be held from September 10 to 13, 2018.

What cool things do you do with DBpedia? Present your tools and datasets at the DBpedia Community Meeting! Please submit your presentations, posters, demos or other forms of contributions through our web form.


  • Keynote#1: Dealing with Open Domain Data by Javier David Fernández García (WU)
  • Keynote #2: Linked Open Data cloud – act now before it’s too late by Mathieu d’Aquin (NUI Galway)
  • DBpedia Showcase Session
  • DBpedia Association Hour
  • Special Chapter Session with DBpedia language chapters from different parts of Europe


  • Attending the DBpedia Community Meeting costs €50 (excl. registration fee and VAT). DBpedia members get free admission. Please contact your nearest DBpedia chapter or the DBpedia Association for a promotion code.
  • Please check all details here!

We are looking forward to meeting you in Vienna!


Posted at 12:37

July 19

Dublin Core Metadata Initiative: DCMI 2018 Conference Programme Published

The conference programme for DCMI 2018 has now been published. With an exciting mixture of 3 keynotes, papers, presentations, workshops and working meetings, this year's conference promises to continue the excellent tradition of DCMI annual events. (and all this in the wonderful location of Porto, Portugal!). Register now!

Posted at 00:00

July 05

Dublin Core Metadata Initiative: Webinar: The Current State of Automated Content Tagging: Dangers and Opportunities

This webinar is scheduled for Thursday, July 19, 2018, 14:00 UTC (convert this time to your local timezone here) and is free for DCMI members. There are real opportunities to use technology to automate content tagging, and there are real dangers that automated content tagging will sometimes inappropriately promote and obscure content. We’ve all heard talks about AI, but little detail about how these applications actually work. Recently I’ve been working with clients to explore the current state of the art of so-called AI technology, and to trial several of these tools with research, policy and general news content.

Posted at 00:00

July 04

Dublin Core Metadata Initiative: Webinar: The Role of Dublin Core Metadata in the Expanding Digital and Analytical Skill Set Required by Data-Driven Organizations

This webinar is scheduled for Thursday, July 12, 2018, 14:00 UTC (convert this time to your local timezone here) and is free for DCMI members. Many areas of our world are being subject to digitalisation as leaders and policymakers embrace the possibilities that can be harnessed through the capturing and exploiting of data. New business models are being developed, and new revenue streams are being uncovered that require a solid and recognised data competence capacity.

Posted at 00:00

July 01

Egon Willighagen:

Posted at 12:09

June 26

AKSW Group - University of Leipzig: SANSA 0.4 (Semantic Analytics Stack) Released

We are happy to announce SANSA 0.4 – the fourth release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

You can find the FAQ and usage examples at

The following features are currently supported by SANSA:

  • Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format
  • Reading OWL files in various standard formats
  • Support for multiple data partitioning techniques
  • SPARQL querying via Sparqlify
  • Graph-parallel querying of RDF using SPARQL (1.0) via GraphX traversals (experimental)
  • RDFS, RDFS Simple, OWL-Horst, EL (experimental) forward chaining inference
  • Automatic inference plan creation (experimental)
  • RDF graph clustering with different algorithms
  • Terminological decision trees (experimental)
  • Anomaly detection (beta)
  • Knowledge graph embedding approaches: TransE (beta), DistMult (beta)

Noteworthy changes or updates since the previous release are:

  • Parser performance has been improved significantly e.g. DBpedia 2016-10 can be loaded in <100 seconds on a 7 node cluster
  • Support for a wider range of data partitioning strategies
  • A better unified API across data representations (RDD, DataFrame, DataSet, Graph) for triple operations
  • Improved unit test coverage
  • Improved distributed statistics calculation (see ISWC paper)
  • Initial scalability tests on 6 billion triple Ethereum blockchain data on a 100 node cluster
  • New SPARQL-to-GraphX rewriter aiming at providing better performance for queries exploiting graph locality
  • Numeric outlier detection tested on DBpedia (en)
  • Improved clustering tested on 20 GB RDF data sets

Deployment and getting started:

  • There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
  • The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
  • Example code is available for various tasks.
  • We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects Big Data Europe, HOBBIT, SAKE, Big Data Ocean, SLIPO, QROWD, BETTER, BOOST and SPECIAL.

Spread the word by retweeting our release announcement on Twitter. For more updates, please view our Twitter feed and consider following us.

Greetings from the SANSA Development Team


Posted at 16:33

June 17

Bob DuCharme: Running and querying my own Wikibase instance

Querying it, of course, with SPARQL.

Posted at 16:17

June 12

Dublin Core Metadata Initiative: Registration for DCMI 2018 now open

We are pleased to announce that registration is now open for the annual Dublinc Core conference to be held in Porto, Portugal, 10-13th September 2018. This year's conference is co-located with TPDL 2018. By registering once for the DCMI conference you will automatically have free access to the TPDL conference programme as well! Click here for more details about DCMI 2018, and how to register for the conference. We look forward to seeing you in Porto!

Posted at 10:03

June 04

Ebiquity research group UMBC: paper: Attribute Based Encryption for Secure Access to Cloud Based EHR Systems

Attribute Based Encryption for Secure Access to Cloud Based EHR Systems

Attribute Based Encryption for Secure Access to Cloud Based EHR Systems

Maithilee Joshi, Karuna Joshi and Tim Finin, Attribute Based Encryption for Secure Access to Cloud Based EHR Systems, IEEE International Conference on Cloud Computing, San Francisco CA, July 2018


Medical organizations find it challenging to adopt cloud-based electronic medical records services, due to the risk of data breaches and the resulting compromise of patient data. Existing authorization models follow a patient centric approach for EHR management where the responsibility of authorizing data access is handled at the patients’ end. This however creates a significant overhead for the patient who has to authorize every access of their health record. This is not practical given the multiple personnel involved in providing care and that at times the patient may not be in a state to provide this authorization. Hence there is a need of developing a proper authorization delegation mechanism for safe, secure and easy cloud-based EHR management. We have developed a novel, centralized, attribute based authorization mechanism that uses Attribute Based Encryption (ABE) and allows for delegated secure access of patient records. This mechanism transfers the service management overhead from the patient to the medical organization and allows easy delegation of cloud-based EHR’s access authority to the medical providers. In this paper, we describe this novel ABE approach as well as the prototype system that we have created to illustrate it.

Posted at 15:28

Dublin Core Metadata Initiative: Webinar: Introduction to Metadata Application Profiles

This webinar is scheduled for Thursday, June 14, 2018, 14:00 UTC (convert this time to your local timezone here) and is free for DCMI members. Successful data sharing requires that users of your data understand the data format, the data semantics, and the rules that govern your particular use of terms and values. Sharing often means the creation of “cross-walks” that transfer data from one schema to another using some or all of this information.

Posted at 00:00

May 29

Leigh Dodds: Observations on the web

Eight years ago I was invited to a workshop. The Office for National Statistics were gathering together people from the statistics and linked data communities to talk about publishing statistics on the web.

At the time there was lots of ongoing discussion within and between the two communities around this topic. With a particular emphasis on government statistics.

I was invited along to talk about how publishing linked data could help improve discovery of related datasets.

Others were there to talk about other related projects. There were lots of people there from the SDMX community who were working hard to standardise how statistics can be exchanged between organisations.

There’s a short write-up that mentions the workshop, some key findings and some follow on work.

One general point of agreement was that statistical data points or observations should be part of the web.

Every number, like the current population of Bath & North East Somerset, should have a unique address or URI. So people could just point at it. With their browsers or code.

Last week the ONS launched the beta of a new API that allows you to create links to individual observations.

Seven years on they’ve started delivering on the recommendations of that workshop.

Agreeing that observations should have URIs was easy. The hard work of doing the digital transformation required to actually deliver it has taken much longer.

Proof-of-concept demos have been around for a while. We made one at the ODI.

But the patient, painstaking work to change processes and culture to create sustainable change takes time. And in the tech community we consistently underestimate how long that takes, and how much work is required.

So kudos to Laura, Matt, Andy, Rob, Darren Barnee and the rest of present and past ONS team for making this happen. I’ve see glimpses of the hard work they’ve had to put in behind the scenes. You’re doing an amazing and necessary job.

If you’re unsure as to why this is such a great step forward, here’s a user need I learned at that workshop.

Amongst the attendees was a designer who worked on data visualisations. He would spend a great deal of time working with data to get it into the right format and then designing engaging, interactive views of it.

Often there were unusual peaks and troughs in the graphs and charts which needed some explanation. Maybe there had been an external event that impacted the data, or a change in methodology. Or a data quality issue that needed explaining. Or maybe just something interesting that should be highlighted to users.

What he wanted was a way for the statisticians to give him that context, so he could add notes and explanations to the diagrams. He was doing this manually and it was a lot of time and effort.

For decades statisticians have been putting these useful insights into the margins of their work. Because of the limitations of the printed page and spreadsheet tables this useful context has been relegated into footnotes for the reader to find for themselves.

But by putting this data onto the web, at individual URIs, we can deliver those numbers in context. Everything you need to know can be provided with the statistic, along with pointers to other useful information.

Giving observation unique URIs, frees statisticians from the tyranny of the document. And might help us all to share and discuss data in a much richer way.

I’m not naive enough to think that linking data can help us address issues with fake news. But it’s hard for me to imagine how being able to more easily work with data on the web isn’t at least part of the solution.

Posted at 20:13

May 28

Bob DuCharme: RDF* and SPARQL*

Reification can be pretty cool.

Posted at 14:36

May 23

Dublin Core Metadata Initiative: JTDC 2018 - A Doctoral Consortium

Update: The deadlines for submissions have been revised, so there is still time to submit! New dates: July 01, 2018 – Submissions deadline (submissions by email to July 08, 2018 – Notifications of acceptance September, 10, 2018 – JTDC 2018 workshop session In addition to the main DCMI conference (co-located with TPDL) in Porto in September, DCMI is also collaborating with TPDL to host JTDC 2018, described as a "Doctoral Consortium".

Posted at 12:03

May 03

Dublin Core Metadata Initiative: Webinar: Understanding and Testing Models

This webinar is scheduled for Tuesday, May 22, 2018, 14:00 UTC (convert this time to your local timezone here) and is free for DCMI members. Every structured exchange requires consensus about the structure. The Shape Expressions (ShEx) language captures these structures in an intuitive and powerful syntax. From metadata description (e.g. DDI) to data description (e.g. FHIR), ShEx provides a powerful schema language to develop, test and deploy shared models for RDF data.

Posted at 10:15

Dublin Core Metadata Initiative: Webinar: A Linked Data Competency Framework for Educators and Learners

Update: the recording of this webinar is now available on YouTube This webinar is scheduled for Thursday, May 10, 2018, 14:00 UTC (convert this time to your local timezone here) and is free for DCMI members. Linked Data is recognized as one of the underpinnings for open data, open science, and data-driven research and learning in the Semantic Web era. Questions still exist, however, about what should be expected as Linked Data related knowledge, skills, and learning outcomes, and where to find relevant learning materials.

Posted at 10:15

May 02 and

Over the past few years we have seen a number of application areas benefit from markup. discussions have often centered around the importance of ease of use, simplicity and adoption for publishers and webmasters. While those principles will continue to guide our work, it is also important to work to make it easier to consume structured data, by building applications and making more use of the information it carries. We are therefore happy to welcome the new Data Commons initiative, which is devoted to sharing such datasets, beginning with a corpus of fact check data based on the ClaimReview markup as adopted by many fact checkers around the world. We expect that this work will benefit the wider ecosystem around structured data by encouraging use and re-use of related datasets.

Posted at 20:41

Copyright of the postings is owned by the original blog authors. Contact us.