Planet RDF

It's triples all the way down

November 18

Bob DuCharme: Extracting RDF data models from Wikidata

That's "models", plural.

Posted at 14:41

November 17

Egon Willighagen: New paper: "Explicit interaction information from WikiPathways in RDF facilitates drug discovery in the Open PHACTS Discovery Platform"

Figure from the article showing the interactive
Open PHACTS documentation to access
Ryan, PhD candidate in our group, is studying how to represent and use interaction information in pathway databases, and WikiPathways specifically. His paper Explicit interaction information from WikiPathways in RDF facilitates drug discovery in the Open PHACTS Discovery Platform (doi:10.12688/f1000research.13197.2) was recently accepted in F1000Research, which extends on work started by, among others, Andra (see doi:10.1371/journal.pcbi.1004989).

The paper describes the application programming interfaces (API) methods of the Open PHACTS REST API for accessing interaction information, e.g. to learn which genes are upstream of downstream in the pathway. This information can be used in pharmacological research. The paper discussed examples queries and demonstrates how the API methods can be called from HTML+JavaScript and Python.

Posted at 10:30

November 03

Egon Willighagen: Fwd: "We challenge you to reuse Additional Files (a.k.a. Supplementary Information)"

Download statistics of J. Cheminform.
Additional Files show a clear growth.
Posted on the BMC (formerly BioMedCentral) Research in progress blog our challenge to you to reuse additional files:
Since our open-access portfolio in BMC and SpringerOpen started collaborating with Figshare, Additional Files and Supplementary Information have been deposited in journal-specific Figshare repositories, and files available for the Journal of Cheminformatics alone have been viewed more than ten thousand times. Yet what is the best way to make the most of this data and reuse the files? Journal of Cheminformatics challenges you to think about just that with their new upcoming special issue.
We already know you are downloading the data frequently and more every year, so let us know what you're doing with that data!

For example, I would love to see more data from these additional files end up in databases, such as Wikidata, but any reuse in RDF form would interest me.

Posted at 09:45

October 28

Bob DuCharme: SPARQL full-text Wikipedia searching and Wikidata subclass inferencing

Wikipedia querying techniques inspired by a recent paper.

Posted at 17:37

October 24

Dublin Core Metadata Initiative: Webinar: SKOS - visión general y modelado de vocabularios controlados

IMPORTANT UPDATE: Due to unforseen circumstances, we have had to postpone this webinar. Please look out for a further announcement giving the new date and time SKOS (Simple Knowledge Organization Systems) es la recomendación del W3C para representar y publicar conjuntos de datos de clasificaciones, tesauros, encabezamientos de materia, glosarios y otros tipos de vocabularios controlados y sistemas de organización del conocimiento. La primera parte del webinar incluye una visión general de las tecnologías de la web semántica y muestra detalladamente los diferentes elementos del modelo SKOS.

Posted at 00:00

October 22

AKSW Group - University of Leipzig: AKSW at in São Paulo

From October 1st until 6th a delegation from AKSW Group, Leipzig University of Applied Sciences (HTWK), eccenca GmbH, and Max Planck Institute for Human Cognitive and Brain Sciences went to São Paulo, Brazil to meet people from the Web Technologies Study Center ( for evaluation future collaboration.
For getting to know our mutual research interests we held the Workshop on Linked Data Management.

The Workshop on Linked Data Management (Workshop sobre Gestão de Dados Abertos) was co-located with the annual conference of the Brazilian Word Wide Web Consortium (Conferencia 2018) in São Paulo.
During the workshop 11 talks were held by researchers from the Brazilian hosts and the German delegation.
By mutually presenting our research areas, open questions, and visions to the audience overlapping research interests and complementing areas of expertise could be identified.
A recurring hypothesis was that Open Data is a very powerful method to foster participation, accessibility, and collaboration across areas.
During the presentations the potential in the areas of research data in the digital humanities, the accessibility of educational resources and organization of educational infrastructures, and the participation in public administration and government became visible.
A recurring topic in the presentations was the need for collaboration among actors and stakeholders which arises the need for methodologies and systems for supporting the collaboration.
Asset for a potential future cooperation in this research area between the Brazilian and the German side were the mutually complementing interests and experiences of the groups.
The Brazilian side has an existing involvement with public administration, government, and education particularly with the special needs from a developing country perspective.
On the German side a strong background in the creation and operation of data management systems and infrastructures, as well as data integration exists.
We are currently in the process of establishing useful communication channels and collaboration platforms which allow efficient joint work across timezone, language, and continental borders, to foster the cooperation between the two groups.
For a common understanding of our interests and skills the first subject of collaboration is an common extended documentation of the initial workshop. Following this documentation a requirements engineering process will be started identify the concrete needs and potentials on both sides for a common project in future.
The second workshop planed in June 2019 will focus on the results of this discussion.

After the workshop we have also visited the DFG Office in Latin America to discuss possible research collaboration between German institutions and institutions in São Paulo.

The Open Data Management Workshop and the visit of the German delegation is funded by the German Research Foundation (DFG) in cooperation with the São Paulo Research Foundation (FAPESP) under grant agreement number 388784229.

Also read about our trip at the HTWK news portal (German).

Posted at 07:37

October 05

Libby Miller: Etching on a laser cutter

I’ve been struggling with this for ages, but yesterday at Hackspace – thanks to Barney (and I now realise, Tiff said this too and I got distracted and never followed it up) – I got it to work.

The issue was this: I’d been assuming that everything you lasercut had to be a vector DXF, so I was tracing bitmaps using inkscape in order to make a suitable  SVG, converting to DXF, loading it into the lasercut software at hackspace, downloading it and – boom – “the polyline must be closed” for etching: no workie. No matter what I did it to the export in Inkscape or how I edited it, it just didn’t work.

The solution is simply to use a black and white png, with a non-transparent background. This loads directly into lasercut (which comes with JustAddSharks lasers) and…just…works.

As a bonus and for my own reference – I got good results with 300 speed / 30 power (below 30 didn’t seem to work) for etching (3mm acrylic).


Posted at 12:45

October 01

Ebiquity research group UMBC: paper: Early Detection of Cybersecurity Threats Using Collaborative Cognition

The CCS Dashboard’s sections provide information on sources and targets of network events, file operations monitored and sub-events that are part of the APT kill chain. An alert is generated when a likely complete APT is detected after reasoning over events.

The CCS Dashboard’s sections provide information on sources and targets of network events, file operations monitored and sub-events that are part
of the APT kill chain. An alert is generated when a likely complete APT is detected after reasoning over events.

Early Detection of Cybersecurity Threats Using Collaborative Cognition

Sandeep Narayanan, Ashwinkumar Ganesan, Karuna Joshi, Tim Oates, Anupam Joshi and Tim Finin, Early detection of Cybersecurity Threats using Collaborative Cognition, 4th IEEE International Conference on Collaboration and Internet Computing, Philadelphia, October. 2018.


The early detection of cybersecurity events such as attacks is challenging given the constantly evolving threat landscape. Even with advanced monitoring, sophisticated attackers can spend more than 100 days in a system before being detected. This paper describes a novel, collaborative framework that assists a security analyst by exploiting the power of semantically rich knowledge representation and reasoning integrated with different machine learning techniques. Our Cognitive Cybersecurity System ingests information from various textual sources and stores them in a common knowledge graph using terms from an extended version of the Unified Cybersecurity Ontology. The system then reasons over the knowledge graph that combines a variety of collaborative agents representing host and network-based sensors to derive improved actionable intelligence for security administrators, decreasing their cognitive load and increasing their confidence in the result. We describe a proof of concept framework for our approach and demonstrate its capabilities by testing it against a custom-built ransomware similar to WannaCry.

Posted at 13:20

September 28

Gregory Williams: Property Path use in Wikidata Queries

I recently began taking a look at the Wikidata query logs that were published a couple of months ago and wanted to look into how some features of SPARQL were being used on Wikidata. The first thing I’ve looked at is the use of property paths: how often paths are used, what path operators are used, and with what frequency.

Using the “interval 3” logs (2017-08-07–2017-09-03 representing ~78M successful queries1), I found that ~25% of queries used property paths. The vast majority of these use just a single property path, but there are queries that use as many as 19 property paths:

Pct. Count Number of Paths
74.3048% 58161337 0 paths used in query
24.7023% 19335490 1 paths used in query
0.6729% 526673 2 paths used in query
0.2787% 218186 4 paths used in query
0.0255% 19965 3 paths used in query
0.0056% 4387 7 paths used in query
0.0037% 2865 8 paths used in query
0.0030% 2327 9 paths used in query
0.0011% 865 6 paths used in query
0.0008% 604 11 paths used in query
0.0006% 434 5 paths used in query
0.0005% 398 10 paths used in query
0.0002% 156 12 paths used in query
0.0001% 110 15 paths used in query
0.0001% 101 19 paths used in query
0.0001% 56 13 paths used in query
0.0000% 12 14 paths used in query

I normalized IRIs and variable names used in the paths so that I could look at just the path operators and the structure of the paths. The type of path operators used skews heavily towards * (ZeroOrMore) as well as sequence and inverse paths that can be rewritten as simple BGPs. Here are the structures representing at least 0.1% of the paths in the dataset:

Pct. Count Path Structure
49.3632% 10573772 ?s <iri1> * ?o .
39.8349% 8532772 ?s <iri1> / <iri2> ?o .
4.6857% 1003694 ?s <iri1> / ( <iri2> * ) ?o .
1.8983% 406616 ?s ( <iri1> + ) / ( <iri2> * ) ?o .
1.4626% 313290 ?s ( <iri1> * ) / <iri2> ?o .
1.1970% 256401 ?s ( ^ <iri1> ) / ( <iri2> * ) ?o .
0.7339% 157212 ?s <iri1> + ?o .
0.1919% 41110 ?s ( <iri1> / ( <iri2> * ) ) / ( ^ <iri3> ) ?o .
0.1658% 35525 ?s <iri1> / <iri2> / <iri3> ?o .
0.1496% 32035 ?s <iri1> / ( <iri1> * ) ?o .
0.1124% 11889 ?s ( <iri1> / <iri2> ) / ( <iri3> * ) ?o .

There are also some rare but interesting uses of property paths in these logs:

Pct. Count Path Structure
0.0499% 5274 ?s ( ( <iri1> / ( <iri2> * ) ) / ( <iri3> / ( <iri2> * ) ) ) / ( <iri4> / ( <iri2> * ) ) ?o .
0.0015% 157 ?s ( <iri1> / <iri2> / <iri3> / <iri4> / <iri5> / <iri6> / <iri7> / <iri8> / <iri9> ) * ?o .
0.0003% 28 ?s ( ( ( ( <iri1> / <iri2> / <iri3> ) ? ) / ( <iri4> ? ) ) / ( <iri5> * ) ) / ( <iri6> / ( <iri7> ? ) ) ?o .

Without further investigation it’s hard to say if these represent meaningful queries or are just someone playing with SPARQL and/or Wikidata, but I found them curious.

  1. These numbers don’t align exactly with the Wikidata query dumps as there were some that I couldn’t parse with my tools. ↩︎

Posted at 17:06

September 23

Ebiquity research group UMBC: talk: Design and Implementation of an Attribute Based Access Controller using OpenStack Services

Design and Implementation of an Attribute Based Access Controller using OpenStack Services

Sharad Dixit, Graduate Student, UMBC
10:30am Monday, 24 September 2018, ITE346

With the advent of cloud computing, industries began a paradigm shift from the traditional way of computing towards cloud computing as it fulfilled organizations present requirements such as on-demand resource allocation, lower capital expenditure, scalability and flexibility but with that it brought a variety of security and user data breach issues. To solve the issues of user data and security breach, organizations have started to implement hybrid cloud where underlying cloud infrastructure is set by the organization and is accessible from anywhere around the world because of the distinguishable security edges provided by it. However, most of the cloud platforms provide a Role Based Access Controller which does not adequate for complex organizational structures. A novel mechanism is proposed using OpenStack services and semantic web technologies to develop a module which evaluates user’s and project’s multi-varied attributes and run them against access policy rules defined by an organization before granting the access to the user. Henceforth, an organization can deploy our module to obtain a robust and trustworthy access control based on multiple attributes of a user and the project the user has requested in a hybrid cloud platform like OpenStack.

Posted at 19:44

Bob DuCharme: Panic over "superhuman" AI

Robot overlords not on the way.

Posted at 16:27

September 22

Libby Miller: Simulating crap networks on a Raspberry Pi

I’ve been having trouble with libbybot (my Raspberry Pi / lamp based presence robot) in some locations. I suspect this is because the Raspberry Pi 3’s inbuilt wifi antenna isn’t as strong as that in, say a laptop, so wifi problems that go unnoticed most of the time are much more obvious.

The symptoms are these:

  • Happily listening / watching remotely
  • Stream dies
  • I get a re-notification that libbybot is online, but can’t connect to it properly

My hypothesis is that the Raspberry Pi is briefly losing wifi connectivity, Chromium auto-reconnects, but the webRTC stream doesn’t re-initiate.

Anyway, the first step to mitigating the problem was to try and emulate it. There were a couple of ways I could have gone about this. One was use network shaping tools on my laptop to try and emulate the problems by messing with the receiving end. A more realistic way would be to shape the traffic on the Pi itself, as that’s where the problem is occurring.

Searching for network shaping tools – and specifically dropped packets and network latency, led me to the FreeBSD firewall, called dummynet and referenced by ipfw. However, this is tightly coupled to the kernel and doesn’t seem suitable for the Raspberry Pi.

On the laptop, there is a tool for network traffic shaping on Mac OS – it used to be ipfw, but since 10.10 (details) it’s been an app called network link conditioner, available as part of Mac OS X developer tools.

Before going through the xcode palaver for something that wasn’t really what I wanted, I had one last dig for an easier way, and indeed there is: wondershaper led me to using tc to limit the bandwidth which in turn led to iptables for dropped packets.

But. None of these led to the behaviour that I wanted, in fact libbybot (which uses RTCMulticonnection for webRTC) worked perfectly under most conditions I could simulate. The same when using tc with with Netem, which can emulate network-wide delays – all fine.

Finally I twigged that the problem was probably a several-second network outage, and for that you can use iptables again. In this case using it to stop the web page (which runs on port 8443) being accessed from the Pi. Using this I managed to emulate the symptoms I’d been seeing.

Here are a few of the commands I used, for future reference.

The final, useful command: emulate a dropped network on a specific port for 20 seconds using iptables output command:

echo "stopping external to 8443"
iptables -A OUTPUT -p tcp --dport 8443 -j DROP
sleep 20
echo "restarting external to 8443"
iptables -D OUTPUT -p tcp --dport 8443 -j DROP

Other things I tried: drop 30% of (input or output) packets randomly, using iptable’s statistics plugin

sudo iptables -A INPUT -m statistic --mode random --probability 0.30 -j DROP

sudo iptables -A OUTPUT -m statistic --mode random --probability 0.30 -j DROP

list current iptables rules

iptables -L

clear all (flush)

iptables -F

Delay all packets by 100ms using tc and netem

sudo tc qdisc add dev wlan0 root netem delay 100ms

change that to 2000ms

sudo tc qdisc change dev wlan0 root netem delay 2000ms 10ms 25%

All the tc rules go away when you reboot.

Context and links:

tc and netem: openWRT: Netem (Network emulator)

iptablesUsing iptables to simulate service interruptions by Matt Parsons, and The Beginner’s guide to iptables, the linux firewall


Posted at 12:40

September 17

Dublin Core Metadata Initiative: A Successful DCMI 2018 Conference

The DCMI Annual Conference was held last week, hosted by the Faculty of Engineering of the University of Porto, Portugal. The conference was co-located with TPDL which meant that while many people arrived as part of one community, all left with the experience and appreciation of two! The full conference proceedings are now available, with copies of presentation slides where appropriate. Some photographs of the conference can be found on Flickr, tagged with 'dcmi18'.

Posted at 00:00

September 15

Egon Willighagen: Wikidata Query Service recipe: qualifiers and the Greek alphabet

Just because I need to look this up each time myself, I wrote up this quick recipe for how to get information from statement qualifiers from Wikidata. Let's say, I want to list all Greek letters, with in one column the lower case and in the other the upper case letter. This is what our data looks like:

So, let start with a simple query that lists all letters in the Greek alphabet:

SELECT ?letter WHERE {
  ?letter wdt:P361 wd:Q8216 .

Of course, that only gives me the Wikidata entries, and not the Unicode characters we are after. So, let's add that Unicode character property:

SELECT ?letter ?unicode WHERE {
  ?letter wdt:P361 wd:Q8216 ;
          wdt:P487 ?unicode .

Ah, that gets us somewhere:

But you see that the upper and lower case are still in separate rows, rather than columns. To fix that, we need access to those qualifiers. It's all in there in the Wikidata RDF, but the model is giving people a headache (so do many things, like math, but that does not mean we should stop doing it!). It all comes down to keeping notebooks, write down your tricks, etc. It's called the scientific method (there is more to that, than just keeping notebooks, tho).

So, a lot of important information is put in qualifiers, and not just the statements. Let's first get all statements for a Greek letter. We would do that with:

?letter ?pprop ?statement .

One thing we want to know about the property we're looking at, is the entity linked to that. We do that by adding this bit:

?property wikibase:claim ?propp .

Of course, the property we are interested in is the Unicode character, so can put that directly in:

wd:P487 wikibase:claim ?propp .

Next, the qualifiers for the statement. We want them all:

?statement ?qualifier ?qualifierVal .
?qualifierProp wikibase:qualifier ?qualifier .

And because we do not want any qualifier but the applies to part, we can put that in too:

?statement ?qualifier ?qualifierVal .
wd:P518 wikibase:qualifier ?qualifier .

Furthermore, we are only interested in lower case and upper case, and we can put that in as well (for upper case):

?statement ?qualifier wd:Q98912 .
wd:P518 wikibase:qualifier ?qualifier .

So, if we want both upper and lower case, we now get this full query:

SELECT DISTINCT ?letter ?unicode WHERE {
  ?letter wdt:P361 wd:Q8216 ;
          wdt:P487 ?unicode .
  ?letter ?pprop ?statement .
  wd:P487 wikibase:claim ?propp .
  ?statement ?qualifier wd:Q8185162 .
  wd:P518 wikibase:qualifier ?qualifier .

We are not done yet, because you can see in the above example that we get the unicode character differently from the statement. This needs to be integrated, and we need the wikibase:statementProperty for that:

wd:P487 wikibase:statementProperty ?statementProp .
?statement ?statementProp ?unicode .

If we integrate that, we get this query, which is indeed getting complex:

SELECT DISTINCT ?letter ?unicode WHERE {
  ?letter wdt:P361 wd:Q8216 .
  ?letter ?pprop ?statement .
  wd:P487 wikibase:claim ?propp ;
          wikibase:statementProperty ?statementProp .
  ?statement ?qualifier wd:Q8185162 ;
             ?statementProp ?unicode .  
  wd:P518 wikibase:qualifier ?qualifier .

But basically we have our template here, with three parameters:
  1. the property of the statement (here P487: Unicode character)
  2. the property of the qualifier (here P518: applies to part)
  3. the object value of the qualifier (here Q98912: upper case)
If we use the SPARQL VALUES approach, we get the following template. Notice that I renamed the variables of ?letter and ?unicode. But I left the wdt:P361 wd:Q8216 (='part of' 'Greek alphabet') in, so that this query does not time out:

SELECT DISTINCT ?entityOfInterest ?statementDataValue WHERE {
  ?entityOfInterest wdt:P361 wd:Q8216 . # 'part of' 'Greek alphabet'
  VALUES ?qualifierObject { wd:Q8185162 }
  VALUES ?qualifierProperty { wd:P518 }
  VALUES ?statementProperty { wd:P487 }

  # template
  ?entityOfInterest ?pprop ?statement .
  ?statementProperty wikibase:claim ?propp ;
          wikibase:statementProperty ?statementProp .
  ?statement ?qualifier ?qualifierObject ;
             ?statementProp ?statementDataValue .  
  ?qualifierProperty wikibase:qualifier ?qualifier .

So, there is our recipe, for everyone to copy/paste.

Completing the Greek alphabet example
OK, now since I actually started with the upper and lower case Unicode character for Greek letters, let's finish that query too. Since we need both, we need to use the template twice:

SELECT DISTINCT ?entityOfInterest ?lowerCase ?upperCase WHERE {
  ?entityOfInterest wdt:P361 wd:Q8216 .

  { # lower case
    ?entityOfInterest ?pprop ?statement .
    wd:P487 wikibase:claim ?propp ;
            wikibase:statementProperty ?statementProp .
    ?statement ?qualifier wd:Q8185162 ;
               ?statementProp ?lowerCase .  
    wd:P518 wikibase:qualifier ?qualifier .

  { # upper case
    ?entityOfInterest ?pprop2 ?statement2 .
    wd:P487 wikibase:claim ?propp2 ;
            wikibase:statementProperty ?statementProp2 .
    ?statement2 ?qualifier2 wd:Q98912 ;
               ?statementProp2 ?upperCase .  
    wd:P518 wikibase:qualifier ?qualifier2 .

Still one issue left to fix. Some greek letters have more than one upper case Unicode character. We need to concatenate those. That requires a GROUP BY and the GROUP_CONCAT function, and get this query:

SELECT DISTINCT ?entityOfInterest
  (GROUP_CONCAT(DISTINCT ?lowerCase; separator=", ") AS ?lowerCases)
  (GROUP_CONCAT(DISTINCT ?upperCase; separator=", ") AS ?upperCases)
  ?entityOfInterest wdt:P361 wd:Q8216 .

  { # lower case
    ?entityOfInterest ?pprop ?statement .
    wd:P487 wikibase:claim ?propp ;
            wikibase:statementProperty ?statementProp .
    ?statement ?qualifier wd:Q8185162 ;
               ?statementProp ?lowerCase .  
    wd:P518 wikibase:qualifier ?qualifier .

  { # upper case
    ?entityOfInterest ?pprop2 ?statement2 .
    wd:P487 wikibase:claim ?propp2 ;
            wikibase:statementProperty ?statementProp2 .
    ?statement2 ?qualifier2 wd:Q98912 ;
               ?statementProp2 ?upperCase .  
    wd:P518 wikibase:qualifier ?qualifier2 .
} GROUP BY ?entityOfInterest

Now, since most of my blog posts are not just fun, but typically also have a use case, allow me to shed light on the context. Since you are still reading, your officially part of the secret society of brave followers of my blog. Tweet to my egonwillighagen account a message consisting of a series of letters followed by two numbers (no spaces) and another series of letters, where the two numbers indicate the number of letters at the start and the end, for example, abc32yz or adasgfshjdg111x, and I will you add you to my secret list of brave followers (and I will like the tweet; if you disguise the string to suggest it has some meaning, I will also retweet it). Only that string is allowed and don't tell anyone what it is about, or I will remove you from the list again :) Anyway, my ambition is to make a Wikidata-based BINAS replacement.

So, we only have a human readable name. The frequently used SERVICE wikibase:label does a pretty decent job and we end up with this table:

Posted at 09:12

September 13

AKSW Group - University of Leipzig: AskNow 0.1 Released

Dear all,

we are very happy to announce AskNow 0.1 – the initial release of Question Answering Components and Tools over RDF Knowledge Graphs.


The following components with corresponding features are currently supported by AskNow:

  • AskNow UI 0.1: The UI interface works as a platform for users to pose their questions to the AskNow QA system. The UI displays the answers based on whether the answer is an entity or a list of entities, boolean or literal. For entities it shows the abstracts from DBpedia.

We want to thank everyone who helped to create this release, in particular the projects HOBBIT, SOLIDE, WDAqua, BigDataEurope.

View this announcement on Twitter:

Kind regards,
The AskNow Development Team

Posted at 13:35

Dublin Core Metadata Initiative: Webinar: SKOS - Overview and Modeling of Controlled Vocabularies

This webinar is scheduled for Thursday, October 11, 2018, 14:00 UTC (convert this time to your local timezone here) and is free for DCMI members. SKOS (Simple Knowledge Organization Systems) is the recommendation of the W3C to represent and publish datasets for classifications, thesauri, subject headings, glossaries and other types of controlled vocabularies and knowledge organization systems in general. The first part of the webinar includes an overview of the technologies of the semantic web and shows in detail the different elements of the SKOS model.

Posted at 00:00

Dublin Core Metadata Initiative: Webinar: SKOS - Overview and Modeling of Controlled Vocabularies

This webinar is scheduled for Thursday, October 11, 2018, 14:00 UTC ([convert this time to your local timezone here]()) and is free for DCMI members. SKOS (Simple Knowledge Organization Systems) is the recommendation of the W3C to represent and publish datasets for classifications, thesauri, subject headings, glossaries and other types of controlled vocabularies and knowledge organization systems in general. The first part of the webinar includes an overview of the technologies of the semantic web and shows in detail the different elements of the SKOS model.

Posted at 00:00

September 11

W3C Blog Semantic Web News: JSON-LD Guiding Principles, First Public Working Draft

Coming to consensus is difficult in any working group, and doubly so when the working group spans a broad cross-section of the web community. Everyone brings their own unique set of experiences, skills and desires for the work, but at the end of the process, there can be only one specification.  In order to provide a framework in which to manage the expectations of both participants and other stakeholders, the JSON-LD WG started out by establishing a set of guiding principles.  These principles do not constrain decisions, but provide a set of core aims and established consensus to reference during difficult discussions.  The principles are lights to lead us back out of the darkness of never-ending debate towards a consistent and appropriately scoped set of specifications. A set of specifications that have just been published as First Public Working Drafts.

These principles start with the uncontroversial “Stay on target!”, meaning to stay focused on the overall mission of the group to ensure the ease of creation and consumption of linked data using the JSON format by the widest possible set of developers. We note that the target audience is software developers generally, not necessarily browser-based applications.

To keep the work grounded, requiring use cases with actual data, that have support from at least two organizations (W3C members or otherwise) was also decided as important principles to keep in mind. The use cases are intended to be supporting evidence for the practicality and likely adoption of a proposed feature, not a heavyweight requirements analysis process.

Adoption of specifications is always a concern, and to maximize the likelihood of uptake, we have adopted several principles around simplicity, usability and preferring phased or incremental solutions. To encourage experimentation and to try and reduce the chances of needing a breaking change in the future, we have adopted a principle of defining only what is conforming to the specification, and leaving all other functionality open. Extensions are welcome, they are just not official or interoperable.

Finally, and somewhat controversially, we adopted the principle that new features should be compatible with the RDF Data Model. While there are existing features that cannot be expressed in RDF that are possible in JSON-LD, we do not intend to increase this separation between the specifications and hope to close it as much as possible.

Using these guidelines, the working group has gotten off to a very productive start and came to very quick consensus around whether or not many features suggested to the JSON-LD Community Group were in scope for the work or not, including approving the much requested lists-of-lists functionality. This will allow JSON arrays to directly include JSON arrays as items in JSON-LD 1.1, enabling a complete semantic mapping for JSON structures such as GeoJSON, and full round-tripping through RDF. The publication of the FPWD documents is a testimony to the efforts of the Working Group, and especially those of Gregg Kellogg as editor.

Posted at 22:19

September 08

Egon Willighagen: Also new this week: "Google Dataset Search"

There was a lot of Open Science news this week. The announcement of the Google Dataset Search was one of them:

 Of course, I first tried searching for "RDF chemistry" which shows some of my data sets (and a lot more):

It picks up data from many sources, such as Figshare in this image. That means it also works (well, sort of, as Noel O'Boyle noticed) for supplementary information from the Journal of Cheminformatics.

It picks up metadata in several ways, among which So, next week we'll see if we can get eNanoMapper extended to spit compatible JSON-LD for its data sets, called "bundles".

Integrated with Google Scholar?
While the URL for the search engine does not suggest the service is more than a 20% project, we can hope it will stay around like Google Scholar has been. But I do hope they will further integrate it with Scholar. For example, in the above figure, it did pick up that I am the author of that data set (well, repurposed from an effort of Rich Apodaca), it did not figure out that I am also on Scholar.

So, these data sets do not show up in your Google Scholar profile yet, but they must. Time will tell where this data search engine is going. There are many interesting features, and given the amount of online attention, they won't stop development just yet, and I expect to discover more and better features in the next months. Give it a spin!

Posted at 09:13

August 27

Bob DuCharme: Pipelining SPARQL queries in memory with the rdflib Python library

Using retrieved data to make more queries.

Posted at 13:55

August 18

Egon Willighagen: Compound (class) identifiers in Wikidata

Bar chart showing the number of compounds
with a particular chemical identifier.
I think Wikidata is a groundbreaking project, which will have a major impact on science. One of the reasons is the open license (CCZero), the very basic approach (Wikibase), and the superb community around it. For example, setting up your own Wikibase including a cool SPARQL endpoint, is easily done with Docker.

Wikidata has many sub projects, such as WikiCite, which captures the collective of primary literature. Another one is the WikiProject Chemistry. The two nicely match up, I think, making a public database linking chemicals to literature (tho, very much needs to be done here), see my recent ICCS 2018 poster (doi:10.6084/m9.figshare.6356027.v1, paper pending).

But Wikidata is also a great resource for identifier mappings between chemical databases, something we need for our metabolism pathway research. The mapping, as you may know, are used in the latter via BridgeDb and we have been using Wikidata as one of three sources for some time now (the others being HMDB and ChEBI). WikiProject Chemistry has a related ChemID effort, and while the wiki page does not show much recent activity, there is actually a lot of ongoing effort (see plot). And I've been adding my bits.

Limitations of the links
But not each identifier in Wikidata has the same meaning. While they are all classified as 'external-id', the actual link may have different meaning. This, of course, is the essence of scientific lenses, see this post and the papers cited therein. One reason here is the difference in what entries in the various databases mean.

Wikidata has an extensive model, defined by the aforementioned WikiProject Chemistry. For example, it has different concepts for chemical compounds (in fact, the hierarchy is pretty rich) and compound classes. And these are differently modeled. Furthermore, it has a model that formalizes that things with a different InChI are different, but even allows things with the same InChI to be different, if need arises. It tries to accurately and precisely capture the certainty and uncertainty of the chemistry. As such, it is a powerful system to handle identifier mappings, because databases are not clear, and chemistry and biological in data is even less: we measure experimentally a characterization of chemicals, but what we put in databases and give names, are specific models (often chemical graphs).

That model differs from what other (chemical) databases use, or seem to use, because not always do databases indicate what they actually have in a record. But I think this is a fair guess.

ChEBI (and the matching ChEBI ID) has entries for chemical classes (e.g. fatty acid) and specific compounds (e.g. acetate).

PubChem, ChemSpider, UniChem
These three resources use the InChI as central asset. While they do not really have the concept of compound classes so much (though increasingly they have classifications), they do have entries where stereochemistry is undefined or unknown. Each one has their own way to link to other databases themselves, which normally includes tons of structure normalization (see e.g. doi:10.1186/s13321-018-0293-8 and doi:10.1186/s13321-015-0072-8)

HMDB (and the matching P2057) has a biological perspective; the entries reflect the biology of a chemical. Therefore, for most compounds, they focus on the neutral forms of compounds. This makes linking to/from other databases where the compound is not neutral chemically less precise.

CAS registry numbers
CAS (and the matching P231) is pretty unique itself, and has identifiers for substances (see Q79529), much more than chemical compounds, and comes with a own set of unique features. For example, solutions of some compound, by design, have the same identifier. Previously, formaldehyde and formalin had different Wikipedia/Wikidata pages, both with the same CAS registry number.

Limitations of the links #2
Now, returning to our starting point: limitations in linking databases. If we want FAIR mappings, we need to be as precise as possible. Of course, that may mean we need more steps, but we can always simplify at will, but we never can have a computer make the links more complex (well, not without making assumptions, etc).

And that is why Wikidata is so suitable to link all these chemical databases: it can distinguish differences when needed, and make that explicit. It make mappings between the databases more FAIR.

Posted at 12:46

August 09

Egon Willighagen:

Posted at 11:59

August 07

AKSW Group - University of Leipzig: Jekyll RDF Tutorial Screencast

Since 2016 we are developing Jekyll-RDF a plugin for the famous Jekyll–static website generator. With Jekyll-RDF we took the slogan of Jekyll “Transform your plain text into static websites and blogs” and transformed it to “Transform your RDF Knowledge Graph into static websites and blogs”. This enables people without deep programming knowledge to publish data, which is encoded in complicated RDF structures, on the web in an easy to browse format.

To ease your start with Jekyll-RDF I’ve created a Tutorial Screencast that teaches you all the basics necessary to create a simple Jekyll page from an RDF knowledgebase. I hope that you enjoy it and that it is helpful for you!


Posted at 09:11

August 04

Egon Willighagen: WikiPathways Summit 2018

I was not there when WikiPathways was founded; I only joined in 2012 and I found my role in the area of metabolic pathways of this Open knowledge base (CC0, to be precise) of biological processes. This autumn, A WikiPathways Summit 2018 is organized in San Francisco to celebrate the 10th anniversary of the project, and everyone interested is kindly invited to join for three days of learning about WikiPathways, integrations, and use cases, data curation, and hacking on this great Open Science project.

Things that I would love to talk about (besides making metabolic pathways FAIR and Openly available) are the integrations with other platforms (Reactome, RaMP, MetaboLights, Pathway Commons, PubChem, Open PHACTS (using the RDF), etc, etc), Wikidata interoperability, and future interoperability with platoforms like AOPWiki, Open Targets, BRENDA, Europe PMC, etc, etc, etc.

Posted at 06:54

August 01

Libby Miller: Neue podcast in a box, part 1

Ages ago I wrote a post on how to create a physical podcast player (“podcast in a box”) using Radiodan. Since then, we’ve completely rewritten the software, so those instructions can be much improved and simplified. Here’s a revised technique, which will get you as far as reading an RFID card. I might write a part 2, depending on how much time I have.

You’ll need:

  • A Pi 3B or 3B+
  • An 8GB or larger class 10 microSD card
  • A cheapo USB soundcard (e.g.)
  • A speaker with a 3.5mm jack
  • A power supply for the Pi
  • An MFC522 RFID reader
  • A laptop and microSD card reader / writer

The idea of Radiodan is that as much as possible happens inside web pages. A server runs on the Pi. One webpage is opened headlessly on the Pi itself (internal.html) – this page will play the audio; another can be opened on another machine to act as a remote control (external.html).

They are connected using websockets, so each can access the same messages – the RFID service talks to the underlying peripheral on the Pi, making the data from the reader available.

Here’s what you need to do:

1. Set up the the Pi as per these instructions (“setting up your Pi”)

You need to burn a microSD card with the latest Raspian with Desktop to act as the Pi’s brain, and the easiest way to do this is with Etcher. Once that’s done, the easiest way to do the rest of the install is over ssh, and the quickest way to get that in place is to edit two files while the card is still in your laptop (I’m assuming a Mac):

Enable ssh by typing:

touch /Volumes/boot/ssh

Add your wifi network to boot by adding a file called


contents: (replace AP_NAME and AP_PASSWORD with your wifi details)

ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev


Then eject the card, put the card in the Pi, attach all the peripherals except for the RFID reader and switch it on. While on the same wifi network, you should be able to ssh to it like this:

ssh pi@raspberrypi.local

password: raspberry.

Then install the Radiodan software using the provisioning script like this:

curl | sudo bash

2. Enable SPI on the Pi

Don’t reboot yet; type:

sudo raspi-config

Under interfaces, enable SPI, then shut the Pi down

sudo halt

and unplug it.

3. Test Radiodan and configure it

If all is well and you have connected a speaker via a USB soundcard, you should hear it say “hello” as it boots.

Please note: Radiodan does not work with the default 3.5mm jack on the Pi. We’re not sure yet why. But USB soundcards are very cheap, and work well.

There’s one app available by default for Radiodan on the Pi. To use it,

  1. Navigate to http://raspberrypi.local/radio
  2. Use the buttons to play different audio clips. If you can hear things, then it’s all working



shut the Pi down and unplug it from the mains.

4. Connect up the RFID reader to the Pi

like this

Then start the Pi up again by plugging it in.

5. Add the piab app

Dan has made a very fancy mechanism for using Samba to drag and drop apps to the Pi, so that you can develop on your laptop. However, because we’re using RFID (which only works on the Pi), we may as well do everything on there. So, ssh to it again:

ssh pi@raspberrypi.local
cd /opt/radiodan/rde/apps/
git clone

This is currently a very minimal app, which just allows you to see all websocket messages going by, and doesn’t do anything else yet.

6. Enable the RFID service and piab app in the Radiodan web interface

Go to http://raspberrypi.local:5020

Enable “piab”, clicking ‘update’ beneath it. Enable the RFID service, clicking ‘update’ beneath it. Restart the manager (red button) and then install dependencies (green button), all within the web page.



Reboot the Pi (e.g. ssh in and sudo reboot). This will enable the RFID service.

7. Test the RFID reader

Open http://raspberrypi.local:5000/piab and open developer tools for that page. Place a card on the RFID reader. You should see a json message in the console with the RFID identifier.


The rest is a matter of writing javascript / html code to:

  • Associate a podcast feed with an RFID (e.g. a web form in external.html that allows the user to add a podcast feed url)
  • Parse the podcast feed when the appropriate card id is detected by the reader
  • Find the latest episode and play it using internal.html (see the radio app example for how to play audio)
  • Add more fancy options, such as remembering where you were in an episode, stopping when the card is removed etc.

As you develop, you can see the internal page on http://raspberrypi.local:5001 and the external page on http://raspberrypi.local:5000. You can reload the app using the blue button on http://raspberrypi.local:5020.

Many more details about the architecture of Radiodan are available; full installation instructions and instructions for running it on your laptop are here; docs are here; code is in github.

Posted at 09:11

July 22

Bob DuCharme: Dividing and conquering SPARQL endpoint retrieval

With the VALUES keyword.

Posted at 16:52

July 20

AKSW Group - University of Leipzig: DBpedia Day @ SEMANTiCS 2018

Don’t miss the 12th edition of the DBpedia Community Meeting in Vienna, the city with the highest quality of life in the world. The DBpedia Community will get together for the DBpedia Day on September 10, the first day of the SEMANTiCS Conference which will be held from September 10 to 13, 2018.

What cool things do you do with DBpedia? Present your tools and datasets at the DBpedia Community Meeting! Please submit your presentations, posters, demos or other forms of contributions through our web form.


  • Keynote#1: Dealing with Open Domain Data by Javier David Fernández García (WU)
  • Keynote #2: Linked Open Data cloud – act now before it’s too late by Mathieu d’Aquin (NUI Galway)
  • DBpedia Showcase Session
  • DBpedia Association Hour
  • Special Chapter Session with DBpedia language chapters from different parts of Europe


  • Attending the DBpedia Community Meeting costs €50 (excl. registration fee and VAT). DBpedia members get free admission. Please contact your nearest DBpedia chapter or the DBpedia Association for a promotion code.
  • Please check all details here!

We are looking forward to meeting you in Vienna!


Posted at 12:37

July 19

Dublin Core Metadata Initiative: DCMI 2018 Conference Programme Published

The conference programme for DCMI 2018 has now been published. With an exciting mixture of 3 keynotes, papers, presentations, workshops and working meetings, this year's conference promises to continue the excellent tradition of DCMI annual events. (and all this in the wonderful location of Porto, Portugal!). Register now!

Posted at 00:00

July 05

Dublin Core Metadata Initiative: Webinar: The Current State of Automated Content Tagging: Dangers and Opportunities

This webinar is scheduled for Thursday, July 19, 2018, 14:00 UTC (convert this time to your local timezone here) and is free for DCMI members. There are real opportunities to use technology to automate content tagging, and there are real dangers that automated content tagging will sometimes inappropriately promote and obscure content. We’ve all heard talks about AI, but little detail about how these applications actually work. Recently I’ve been working with clients to explore the current state of the art of so-called AI technology, and to trial several of these tools with research, policy and general news content.

Posted at 00:00

July 04

Dublin Core Metadata Initiative: Webinar: The Role of Dublin Core Metadata in the Expanding Digital and Analytical Skill Set Required by Data-Driven Organizations

This webinar is scheduled for Thursday, July 12, 2018, 14:00 UTC (convert this time to your local timezone here) and is free for DCMI members. Many areas of our world are being subject to digitalisation as leaders and policymakers embrace the possibilities that can be harnessed through the capturing and exploiting of data. New business models are being developed, and new revenue streams are being uncovered that require a solid and recognised data competence capacity.

Posted at 00:00

Copyright of the postings is owned by the original blog authors. Contact us.