Planet RDF

It's triples all the way down

October 05

Libby Miller: Etching on a laser cutter

I’ve been struggling with this for ages, but yesterday at Hackspace – thanks to Barney (and I now realise, Tiff said this too and I got distracted and never followed it up) – I got it to work.

The issue was this: I’d been assuming that everything you lasercut had to be a vector DXF, so I was tracing bitmaps using inkscape in order to make a suitable  SVG, converting to DXF, loading it into the lasercut software at hackspace, downloading it and – boom – “the polyline must be closed” for etching: no workie. No matter what I did it to the export in Inkscape or how I edited it, it just didn’t work.

The solution is simply to use a black and white png, with a non-transparent background. This loads directly into lasercut (which comes with JustAddSharks lasers) and…just…works.

As a bonus and for my own reference – I got good results with 300 speed / 30 power (below 30 didn’t seem to work) for etching (3mm acrylic).

 

Posted at 12:45

October 01

Ebiquity research group UMBC: paper: Early Detection of Cybersecurity Threats Using Collaborative Cognition

The CCS Dashboard’s sections provide information on sources and targets of network events, file operations monitored and sub-events that are part of the APT kill chain. An alert is generated when a likely complete APT is detected after reasoning over events.

The CCS Dashboard’s sections provide information on sources and targets of network events, file operations monitored and sub-events that are part
of the APT kill chain. An alert is generated when a likely complete APT is detected after reasoning over events.

Early Detection of Cybersecurity Threats Using Collaborative Cognition

Sandeep Narayanan, Ashwinkumar Ganesan, Karuna Joshi, Tim Oates, Anupam Joshi and Tim Finin, Early detection of Cybersecurity Threats using Collaborative Cognition, 4th IEEE International Conference on Collaboration and Internet Computing, Philadelphia, October. 2018.

 

The early detection of cybersecurity events such as attacks is challenging given the constantly evolving threat landscape. Even with advanced monitoring, sophisticated attackers can spend more than 100 days in a system before being detected. This paper describes a novel, collaborative framework that assists a security analyst by exploiting the power of semantically rich knowledge representation and reasoning integrated with different machine learning techniques. Our Cognitive Cybersecurity System ingests information from various textual sources and stores them in a common knowledge graph using terms from an extended version of the Unified Cybersecurity Ontology. The system then reasons over the knowledge graph that combines a variety of collaborative agents representing host and network-based sensors to derive improved actionable intelligence for security administrators, decreasing their cognitive load and increasing their confidence in the result. We describe a proof of concept framework for our approach and demonstrate its capabilities by testing it against a custom-built ransomware similar to WannaCry.

Posted at 13:20

September 28

Gregory Williams: Property Path use in Wikidata Queries

I recently began taking a look at the Wikidata query logs that were published a couple of months ago and wanted to look into how some features of SPARQL were being used on Wikidata. The first thing I’ve looked at is the use of property paths: how often paths are used, what path operators are used, and with what frequency.

Using the “interval 3” logs (2017-08-07–2017-09-03 representing ~78M successful queries1), I found that ~25% of queries used property paths. The vast majority of these use just a single property path, but there are queries that use as many as 19 property paths:

Pct. Count Number of Paths
74.3048% 58161337 0 paths used in query
24.7023% 19335490 1 paths used in query
0.6729% 526673 2 paths used in query
0.2787% 218186 4 paths used in query
0.0255% 19965 3 paths used in query
0.0056% 4387 7 paths used in query
0.0037% 2865 8 paths used in query
0.0030% 2327 9 paths used in query
0.0011% 865 6 paths used in query
0.0008% 604 11 paths used in query
0.0006% 434 5 paths used in query
0.0005% 398 10 paths used in query
0.0002% 156 12 paths used in query
0.0001% 110 15 paths used in query
0.0001% 101 19 paths used in query
0.0001% 56 13 paths used in query
0.0000% 12 14 paths used in query

I normalized IRIs and variable names used in the paths so that I could look at just the path operators and the structure of the paths. The type of path operators used skews heavily towards * (ZeroOrMore) as well as sequence and inverse paths that can be rewritten as simple BGPs. Here are the structures representing at least 0.1% of the paths in the dataset:

Pct. Count Path Structure
49.3632% 10573772 ?s <iri1> * ?o .
39.8349% 8532772 ?s <iri1> / <iri2> ?o .
4.6857% 1003694 ?s <iri1> / ( <iri2> * ) ?o .
1.8983% 406616 ?s ( <iri1> + ) / ( <iri2> * ) ?o .
1.4626% 313290 ?s ( <iri1> * ) / <iri2> ?o .
1.1970% 256401 ?s ( ^ <iri1> ) / ( <iri2> * ) ?o .
0.7339% 157212 ?s <iri1> + ?o .
0.1919% 41110 ?s ( <iri1> / ( <iri2> * ) ) / ( ^ <iri3> ) ?o .
0.1658% 35525 ?s <iri1> / <iri2> / <iri3> ?o .
0.1496% 32035 ?s <iri1> / ( <iri1> * ) ?o .
0.1124% 11889 ?s ( <iri1> / <iri2> ) / ( <iri3> * ) ?o .

There are also some rare but interesting uses of property paths in these logs:

Pct. Count Path Structure
0.0499% 5274 ?s ( ( <iri1> / ( <iri2> * ) ) / ( <iri3> / ( <iri2> * ) ) ) / ( <iri4> / ( <iri2> * ) ) ?o .
0.0015% 157 ?s ( <iri1> / <iri2> / <iri3> / <iri4> / <iri5> / <iri6> / <iri7> / <iri8> / <iri9> ) * ?o .
0.0003% 28 ?s ( ( ( ( <iri1> / <iri2> / <iri3> ) ? ) / ( <iri4> ? ) ) / ( <iri5> * ) ) / ( <iri6> / ( <iri7> ? ) ) ?o .

Without further investigation it’s hard to say if these represent meaningful queries or are just someone playing with SPARQL and/or Wikidata, but I found them curious.

  1. These numbers don’t align exactly with the Wikidata query dumps as there were some that I couldn’t parse with my tools. ↩︎

Posted at 17:06

September 23

Ebiquity research group UMBC: talk: Design and Implementation of an Attribute Based Access Controller using OpenStack Services

Design and Implementation of an Attribute Based Access Controller using OpenStack Services

Sharad Dixit, Graduate Student, UMBC
10:30am Monday, 24 September 2018, ITE346

With the advent of cloud computing, industries began a paradigm shift from the traditional way of computing towards cloud computing as it fulfilled organizations present requirements such as on-demand resource allocation, lower capital expenditure, scalability and flexibility but with that it brought a variety of security and user data breach issues. To solve the issues of user data and security breach, organizations have started to implement hybrid cloud where underlying cloud infrastructure is set by the organization and is accessible from anywhere around the world because of the distinguishable security edges provided by it. However, most of the cloud platforms provide a Role Based Access Controller which does not adequate for complex organizational structures. A novel mechanism is proposed using OpenStack services and semantic web technologies to develop a module which evaluates user’s and project’s multi-varied attributes and run them against access policy rules defined by an organization before granting the access to the user. Henceforth, an organization can deploy our module to obtain a robust and trustworthy access control based on multiple attributes of a user and the project the user has requested in a hybrid cloud platform like OpenStack.

Posted at 19:44

Bob DuCharme: Panic over "superhuman" AI

Robot overlords not on the way.

Posted at 16:27

September 22

Libby Miller: Simulating crap networks on a Raspberry Pi

I’ve been having trouble with libbybot (my Raspberry Pi / lamp based presence robot) in some locations. I suspect this is because the Raspberry Pi 3’s inbuilt wifi antenna isn’t as strong as that in, say a laptop, so wifi problems that go unnoticed most of the time are much more obvious.

The symptoms are these:

  • Happily listening / watching remotely
  • Stream dies
  • I get a re-notification that libbybot is online, but can’t connect to it properly

My hypothesis is that the Raspberry Pi is briefly losing wifi connectivity, Chromium auto-reconnects, but the webRTC stream doesn’t re-initiate.

Anyway, the first step to mitigating the problem was to try and emulate it. There were a couple of ways I could have gone about this. One was use network shaping tools on my laptop to try and emulate the problems by messing with the receiving end. A more realistic way would be to shape the traffic on the Pi itself, as that’s where the problem is occurring.

Searching for network shaping tools – and specifically dropped packets and network latency, led me to the FreeBSD firewall, called dummynet and referenced by ipfw. However, this is tightly coupled to the kernel and doesn’t seem suitable for the Raspberry Pi.

On the laptop, there is a tool for network traffic shaping on Mac OS – it used to be ipfw, but since 10.10 (details) it’s been an app called network link conditioner, available as part of Mac OS X developer tools.

Before going through the xcode palaver for something that wasn’t really what I wanted, I had one last dig for an easier way, and indeed there is: wondershaper led me to using tc to limit the bandwidth which in turn led to iptables for dropped packets.

But. None of these led to the behaviour that I wanted, in fact libbybot (which uses RTCMulticonnection for webRTC) worked perfectly under most conditions I could simulate. The same when using tc with with Netem, which can emulate network-wide delays – all fine.

Finally I twigged that the problem was probably a several-second network outage, and for that you can use iptables again. In this case using it to stop the web page (which runs on port 8443) being accessed from the Pi. Using this I managed to emulate the symptoms I’d been seeing.

Here are a few of the commands I used, for future reference.

The final, useful command: emulate a dropped network on a specific port for 20 seconds using iptables output command:

#/bin/bash
echo "stopping external to 8443"
iptables -A OUTPUT -p tcp --dport 8443 -j DROP
sleep 20
echo "restarting external to 8443"
iptables -D OUTPUT -p tcp --dport 8443 -j DROP

Other things I tried: drop 30% of (input or output) packets randomly, using iptable’s statistics plugin

sudo iptables -A INPUT -m statistic --mode random --probability 0.30 -j DROP

sudo iptables -A OUTPUT -m statistic --mode random --probability 0.30 -j DROP

list current iptables rules

iptables -L

clear all (flush)

iptables -F

Delay all packets by 100ms using tc and netem

sudo tc qdisc add dev wlan0 root netem delay 100ms

change that to 2000ms

sudo tc qdisc change dev wlan0 root netem delay 2000ms 10ms 25%

All the tc rules go away when you reboot.

Context and links:

tc and netem: openWRT: Netem (Network emulator)

iptablesUsing iptables to simulate service interruptions by Matt Parsons, and The Beginner’s guide to iptables, the linux firewall

 

Posted at 12:40

September 17

Dublin Core Metadata Initiative: A Successful DCMI 2018 Conference

The DCMI Annual Conference was held last week, hosted by the Faculty of Engineering of the University of Porto, Portugal. The conference was co-located with TPDL which meant that while many people arrived as part of one community, all left with the experience and appreciation of two! The full conference proceedings are now available, with copies of presentation slides where appropriate. Some photographs of the conference can be found on Flickr, tagged with 'dcmi18'.

Posted at 00:00

September 15

Egon Willighagen: Wikidata Query Service recipe: qualifiers and the Greek alphabet

Just because I need to look this up each time myself, I wrote up this quick recipe for how to get information from statement qualifiers from Wikidata. Let's say, I want to list all Greek letters, with in one column the lower case and in the other the upper case letter. This is what our data looks like:


So, let start with a simple query that lists all letters in the Greek alphabet:

SELECT ?letter WHERE {
  ?letter wdt:P361 wd:Q8216 .
}

Of course, that only gives me the Wikidata entries, and not the Unicode characters we are after. So, let's add that Unicode character property:

SELECT ?letter ?unicode WHERE {
  ?letter wdt:P361 wd:Q8216 ;
          wdt:P487 ?unicode .
}

Ah, that gets us somewhere:



But you see that the upper and lower case are still in separate rows, rather than columns. To fix that, we need access to those qualifiers. It's all in there in the Wikidata RDF, but the model is giving people a headache (so do many things, like math, but that does not mean we should stop doing it!). It all comes down to keeping notebooks, write down your tricks, etc. It's called the scientific method (there is more to that, than just keeping notebooks, tho).

Qualifiers
So, a lot of important information is put in qualifiers, and not just the statements. Let's first get all statements for a Greek letter. We would do that with:

?letter ?pprop ?statement .

One thing we want to know about the property we're looking at, is the entity linked to that. We do that by adding this bit:

?property wikibase:claim ?propp .

Of course, the property we are interested in is the Unicode character, so can put that directly in:

wd:P487 wikibase:claim ?propp .

Next, the qualifiers for the statement. We want them all:

?statement ?qualifier ?qualifierVal .
?qualifierProp wikibase:qualifier ?qualifier .

And because we do not want any qualifier but the applies to part, we can put that in too:

?statement ?qualifier ?qualifierVal .
wd:P518 wikibase:qualifier ?qualifier .

Furthermore, we are only interested in lower case and upper case, and we can put that in as well (for upper case):

?statement ?qualifier wd:Q98912 .
wd:P518 wikibase:qualifier ?qualifier .

So, if we want both upper and lower case, we now get this full query:

SELECT DISTINCT ?letter ?unicode WHERE {
  ?letter wdt:P361 wd:Q8216 ;
          wdt:P487 ?unicode .
  ?letter ?pprop ?statement .
  wd:P487 wikibase:claim ?propp .
  ?statement ?qualifier wd:Q8185162 .
  wd:P518 wikibase:qualifier ?qualifier .
}

We are not done yet, because you can see in the above example that we get the unicode character differently from the statement. This needs to be integrated, and we need the wikibase:statementProperty for that:

wd:P487 wikibase:statementProperty ?statementProp .
?statement ?statementProp ?unicode .

If we integrate that, we get this query, which is indeed getting complex:

SELECT DISTINCT ?letter ?unicode WHERE {
  ?letter wdt:P361 wd:Q8216 .
  ?letter ?pprop ?statement .
  wd:P487 wikibase:claim ?propp ;
          wikibase:statementProperty ?statementProp .
  ?statement ?qualifier wd:Q8185162 ;
             ?statementProp ?unicode .  
  wd:P518 wikibase:qualifier ?qualifier .
}

But basically we have our template here, with three parameters:
  1. the property of the statement (here P487: Unicode character)
  2. the property of the qualifier (here P518: applies to part)
  3. the object value of the qualifier (here Q98912: upper case)
If we use the SPARQL VALUES approach, we get the following template. Notice that I renamed the variables of ?letter and ?unicode. But I left the wdt:P361 wd:Q8216 (='part of' 'Greek alphabet') in, so that this query does not time out:

SELECT DISTINCT ?entityOfInterest ?statementDataValue WHERE {
  ?entityOfInterest wdt:P361 wd:Q8216 . # 'part of' 'Greek alphabet'
  VALUES ?qualifierObject { wd:Q8185162 }
  VALUES ?qualifierProperty { wd:P518 }
  VALUES ?statementProperty { wd:P487 }

  # template
  ?entityOfInterest ?pprop ?statement .
  ?statementProperty wikibase:claim ?propp ;
          wikibase:statementProperty ?statementProp .
  ?statement ?qualifier ?qualifierObject ;
             ?statementProp ?statementDataValue .  
  ?qualifierProperty wikibase:qualifier ?qualifier .
}

So, there is our recipe, for everyone to copy/paste.

Completing the Greek alphabet example
OK, now since I actually started with the upper and lower case Unicode character for Greek letters, let's finish that query too. Since we need both, we need to use the template twice:

SELECT DISTINCT ?entityOfInterest ?lowerCase ?upperCase WHERE {
  ?entityOfInterest wdt:P361 wd:Q8216 .

  { # lower case
    ?entityOfInterest ?pprop ?statement .
    wd:P487 wikibase:claim ?propp ;
            wikibase:statementProperty ?statementProp .
    ?statement ?qualifier wd:Q8185162 ;
               ?statementProp ?lowerCase .  
    wd:P518 wikibase:qualifier ?qualifier .
  }

  { # upper case
    ?entityOfInterest ?pprop2 ?statement2 .
    wd:P487 wikibase:claim ?propp2 ;
            wikibase:statementProperty ?statementProp2 .
    ?statement2 ?qualifier2 wd:Q98912 ;
               ?statementProp2 ?upperCase .  
    wd:P518 wikibase:qualifier ?qualifier2 .
  }
}

Still one issue left to fix. Some greek letters have more than one upper case Unicode character. We need to concatenate those. That requires a GROUP BY and the GROUP_CONCAT function, and get this query:

SELECT DISTINCT ?entityOfInterest
  (GROUP_CONCAT(DISTINCT ?lowerCase; separator=", ") AS ?lowerCases)
  (GROUP_CONCAT(DISTINCT ?upperCase; separator=", ") AS ?upperCases)
WHERE {
  ?entityOfInterest wdt:P361 wd:Q8216 .

  { # lower case
    ?entityOfInterest ?pprop ?statement .
    wd:P487 wikibase:claim ?propp ;
            wikibase:statementProperty ?statementProp .
    ?statement ?qualifier wd:Q8185162 ;
               ?statementProp ?lowerCase .  
    wd:P518 wikibase:qualifier ?qualifier .
  }

  { # upper case
    ?entityOfInterest ?pprop2 ?statement2 .
    wd:P487 wikibase:claim ?propp2 ;
            wikibase:statementProperty ?statementProp2 .
    ?statement2 ?qualifier2 wd:Q98912 ;
               ?statementProp2 ?upperCase .  
    wd:P518 wikibase:qualifier ?qualifier2 .
  }
} GROUP BY ?entityOfInterest

Now, since most of my blog posts are not just fun, but typically also have a use case, allow me to shed light on the context. Since you are still reading, your officially part of the secret society of brave followers of my blog. Tweet to my egonwillighagen account a message consisting of a series of letters followed by two numbers (no spaces) and another series of letters, where the two numbers indicate the number of letters at the start and the end, for example, abc32yz or adasgfshjdg111x, and I will you add you to my secret list of brave followers (and I will like the tweet; if you disguise the string to suggest it has some meaning, I will also retweet it). Only that string is allowed and don't tell anyone what it is about, or I will remove you from the list again :) Anyway, my ambition is to make a Wikidata-based BINAS replacement.

So, we only have a human readable name. The frequently used SERVICE wikibase:label does a pretty decent job and we end up with this table:


Posted at 09:12

September 13

AKSW Group - University of Leipzig: AskNow 0.1 Released

Dear all,

we are very happy to announce AskNow 0.1 – the initial release of Question Answering Components and Tools over RDF Knowledge Graphs.

Website: http://asknow.sda.tech/
Demo: http://asknowdemo.sda.tech
GitHub: https://github.com/AskNowQA

The following components with corresponding features are currently supported by AskNow:

  • AskNow UI 0.1: The UI interface works as a platform for users to pose their questions to the AskNow QA system. The UI displays the answers based on whether the answer is an entity or a list of entities, boolean or literal. For entities it shows the abstracts from DBpedia.
    Github: https://github.com/AskNowQA/AskNowUI

We want to thank everyone who helped to create this release, in particular the projects HOBBIT, SOLIDE, WDAqua, BigDataEurope.

View this announcement on Twitter: https://twitter.com/AskNowQA/status/1040205350853599233

Kind regards,
The AskNow Development Team
(http://asknow.sda.tech/people/)

Posted at 13:35

Dublin Core Metadata Initiative: Webinar: SKOS - Overview and Modeling of Controlled Vocabularies

This webinar is scheduled for Thursday, October 11, 2018, 14:00 UTC (convert this time to your local timezone here) and is free for DCMI members. SKOS (Simple Knowledge Organization Systems) is the recommendation of the W3C to represent and publish datasets for classifications, thesauri, subject headings, glossaries and other types of controlled vocabularies and knowledge organization systems in general. The first part of the webinar includes an overview of the technologies of the semantic web and shows in detail the different elements of the SKOS model.

Posted at 00:00

Dublin Core Metadata Initiative: Webinar: SKOS - Overview and Modeling of Controlled Vocabularies

This webinar is scheduled for Thursday, October 11, 2018, 14:00 UTC ([convert this time to your local timezone here]()) and is free for DCMI members. SKOS (Simple Knowledge Organization Systems) is the recommendation of the W3C to represent and publish datasets for classifications, thesauri, subject headings, glossaries and other types of controlled vocabularies and knowledge organization systems in general. The first part of the webinar includes an overview of the technologies of the semantic web and shows in detail the different elements of the SKOS model.

Posted at 00:00

September 11

W3C Blog Semantic Web News: JSON-LD Guiding Principles, First Public Working Draft

Coming to consensus is difficult in any working group, and doubly so when the working group spans a broad cross-section of the web community. Everyone brings their own unique set of experiences, skills and desires for the work, but at the end of the process, there can be only one specification.  In order to provide a framework in which to manage the expectations of both participants and other stakeholders, the JSON-LD WG started out by establishing a set of guiding principles.  These principles do not constrain decisions, but provide a set of core aims and established consensus to reference during difficult discussions.  The principles are lights to lead us back out of the darkness of never-ending debate towards a consistent and appropriately scoped set of specifications. A set of specifications that have just been published as First Public Working Drafts.

These principles start with the uncontroversial “Stay on target!”, meaning to stay focused on the overall mission of the group to ensure the ease of creation and consumption of linked data using the JSON format by the widest possible set of developers. We note that the target audience is software developers generally, not necessarily browser-based applications.

To keep the work grounded, requiring use cases with actual data, that have support from at least two organizations (W3C members or otherwise) was also decided as important principles to keep in mind. The use cases are intended to be supporting evidence for the practicality and likely adoption of a proposed feature, not a heavyweight requirements analysis process.

Adoption of specifications is always a concern, and to maximize the likelihood of uptake, we have adopted several principles around simplicity, usability and preferring phased or incremental solutions. To encourage experimentation and to try and reduce the chances of needing a breaking change in the future, we have adopted a principle of defining only what is conforming to the specification, and leaving all other functionality open. Extensions are welcome, they are just not official or interoperable.

Finally, and somewhat controversially, we adopted the principle that new features should be compatible with the RDF Data Model. While there are existing features that cannot be expressed in RDF that are possible in JSON-LD, we do not intend to increase this separation between the specifications and hope to close it as much as possible.

Using these guidelines, the working group has gotten off to a very productive start and came to very quick consensus around whether or not many features suggested to the JSON-LD Community Group were in scope for the work or not, including approving the much requested lists-of-lists functionality. This will allow JSON arrays to directly include JSON arrays as items in JSON-LD 1.1, enabling a complete semantic mapping for JSON structures such as GeoJSON, and full round-tripping through RDF. The publication of the FPWD documents is a testimony to the efforts of the Working Group, and especially those of Gregg Kellogg as editor.

Posted at 22:19

September 08

Egon Willighagen: Also new this week: "Google Dataset Search"

There was a lot of Open Science news this week. The announcement of the Google Dataset Search was one of them:


 Of course, I first tried searching for "RDF chemistry" which shows some of my data sets (and a lot more):


It picks up data from many sources, such as Figshare in this image. That means it also works (well, sort of, as Noel O'Boyle noticed) for supplementary information from the Journal of Cheminformatics.

It picks up metadata in several ways, among which schemas.org. So, next week we'll see if we can get eNanoMapper extended to spit compatible JSON-LD for its data sets, called "bundles".

Integrated with Google Scholar?
While the URL for the search engine does not suggest the service is more than a 20% project, we can hope it will stay around like Google Scholar has been. But I do hope they will further integrate it with Scholar. For example, in the above figure, it did pick up that I am the author of that data set (well, repurposed from an effort of Rich Apodaca), it did not figure out that I am also on Scholar.

So, these data sets do not show up in your Google Scholar profile yet, but they must. Time will tell where this data search engine is going. There are many interesting features, and given the amount of online attention, they won't stop development just yet, and I expect to discover more and better features in the next months. Give it a spin!

Posted at 09:13

August 27

Bob DuCharme: Pipelining SPARQL queries in memory with the rdflib Python library

Using retrieved data to make more queries.

Posted at 13:55

August 18

Egon Willighagen: Compound (class) identifiers in Wikidata

Bar chart showing the number of compounds
with a particular chemical identifier.
I think Wikidata is a groundbreaking project, which will have a major impact on science. One of the reasons is the open license (CCZero), the very basic approach (Wikibase), and the superb community around it. For example, setting up your own Wikibase including a cool SPARQL endpoint, is easily done with Docker.

Wikidata has many sub projects, such as WikiCite, which captures the collective of primary literature. Another one is the WikiProject Chemistry. The two nicely match up, I think, making a public database linking chemicals to literature (tho, very much needs to be done here), see my recent ICCS 2018 poster (doi:10.6084/m9.figshare.6356027.v1, paper pending).

But Wikidata is also a great resource for identifier mappings between chemical databases, something we need for our metabolism pathway research. The mapping, as you may know, are used in the latter via BridgeDb and we have been using Wikidata as one of three sources for some time now (the others being HMDB and ChEBI). WikiProject Chemistry has a related ChemID effort, and while the wiki page does not show much recent activity, there is actually a lot of ongoing effort (see plot). And I've been adding my bits.

Limitations of the links
But not each identifier in Wikidata has the same meaning. While they are all classified as 'external-id', the actual link may have different meaning. This, of course, is the essence of scientific lenses, see this post and the papers cited therein. One reason here is the difference in what entries in the various databases mean.

Wikidata has an extensive model, defined by the aforementioned WikiProject Chemistry. For example, it has different concepts for chemical compounds (in fact, the hierarchy is pretty rich) and compound classes. And these are differently modeled. Furthermore, it has a model that formalizes that things with a different InChI are different, but even allows things with the same InChI to be different, if need arises. It tries to accurately and precisely capture the certainty and uncertainty of the chemistry. As such, it is a powerful system to handle identifier mappings, because databases are not clear, and chemistry and biological in data is even less: we measure experimentally a characterization of chemicals, but what we put in databases and give names, are specific models (often chemical graphs).

That model differs from what other (chemical) databases use, or seem to use, because not always do databases indicate what they actually have in a record. But I think this is a fair guess.

ChEBI
ChEBI (and the matching ChEBI ID) has entries for chemical classes (e.g. fatty acid) and specific compounds (e.g. acetate).

PubChem, ChemSpider, UniChem
These three resources use the InChI as central asset. While they do not really have the concept of compound classes so much (though increasingly they have classifications), they do have entries where stereochemistry is undefined or unknown. Each one has their own way to link to other databases themselves, which normally includes tons of structure normalization (see e.g. doi:10.1186/s13321-018-0293-8 and doi:10.1186/s13321-015-0072-8)

HMDB
HMDB (and the matching P2057) has a biological perspective; the entries reflect the biology of a chemical. Therefore, for most compounds, they focus on the neutral forms of compounds. This makes linking to/from other databases where the compound is not neutral chemically less precise.

CAS registry numbers
CAS (and the matching P231) is pretty unique itself, and has identifiers for substances (see Q79529), much more than chemical compounds, and comes with a own set of unique features. For example, solutions of some compound, by design, have the same identifier. Previously, formaldehyde and formalin had different Wikipedia/Wikidata pages, both with the same CAS registry number.

Limitations of the links #2
Now, returning to our starting point: limitations in linking databases. If we want FAIR mappings, we need to be as precise as possible. Of course, that may mean we need more steps, but we can always simplify at will, but we never can have a computer make the links more complex (well, not without making assumptions, etc).

And that is why Wikidata is so suitable to link all these chemical databases: it can distinguish differences when needed, and make that explicit. It make mappings between the databases more FAIR.


Posted at 12:46

August 09

Egon Willighagen:

Posted at 11:59

August 07

AKSW Group - University of Leipzig: Jekyll RDF Tutorial Screencast

Since 2016 we are developing Jekyll-RDF a plugin for the famous Jekyll–static website generator. With Jekyll-RDF we took the slogan of Jekyll “Transform your plain text into static websites and blogs” and transformed it to “Transform your RDF Knowledge Graph into static websites and blogs”. This enables people without deep programming knowledge to publish data, which is encoded in complicated RDF structures, on the web in an easy to browse format.

To ease your start with Jekyll-RDF I’ve created a Tutorial Screencast that teaches you all the basics necessary to create a simple Jekyll page from an RDF knowledgebase. I hope that you enjoy it and that it is helpful for you!

crosspost: https://natanael.arndt.xyz/2018/08/07/jekyll-rdf-tutorial-screencast

Posted at 09:11

August 04

Egon Willighagen: WikiPathways Summit 2018

I was not there when WikiPathways was founded; I only joined in 2012 and I found my role in the area of metabolic pathways of this Open knowledge base (CC0, to be precise) of biological processes. This autumn, A WikiPathways Summit 2018 is organized in San Francisco to celebrate the 10th anniversary of the project, and everyone interested is kindly invited to join for three days of learning about WikiPathways, integrations, and use cases, data curation, and hacking on this great Open Science project.


Things that I would love to talk about (besides making metabolic pathways FAIR and Openly available) are the integrations with other platforms (Reactome, RaMP, MetaboLights, Pathway Commons, PubChem, Open PHACTS (using the RDF), etc, etc), Wikidata interoperability, and future interoperability with platoforms like AOPWiki, Open Targets, BRENDA, Europe PMC, etc, etc, etc.

Posted at 06:54

August 01

Libby Miller: Neue podcast in a box, part 1

Ages ago I wrote a post on how to create a physical podcast player (“podcast in a box”) using Radiodan. Since then, we’ve completely rewritten the software, so those instructions can be much improved and simplified. Here’s a revised technique, which will get you as far as reading an RFID card. I might write a part 2, depending on how much time I have.

You’ll need:

  • A Pi 3B or 3B+
  • An 8GB or larger class 10 microSD card
  • A cheapo USB soundcard (e.g.)
  • A speaker with a 3.5mm jack
  • A power supply for the Pi
  • An MFC522 RFID reader
  • A laptop and microSD card reader / writer

The idea of Radiodan is that as much as possible happens inside web pages. A server runs on the Pi. One webpage is opened headlessly on the Pi itself (internal.html) – this page will play the audio; another can be opened on another machine to act as a remote control (external.html).

They are connected using websockets, so each can access the same messages – the RFID service talks to the underlying peripheral on the Pi, making the data from the reader available.

Here’s what you need to do:

1. Set up the the Pi as per these instructions (“setting up your Pi”)

You need to burn a microSD card with the latest Raspian with Desktop to act as the Pi’s brain, and the easiest way to do this is with Etcher. Once that’s done, the easiest way to do the rest of the install is over ssh, and the quickest way to get that in place is to edit two files while the card is still in your laptop (I’m assuming a Mac):

Enable ssh by typing:

touch /Volumes/boot/ssh

Add your wifi network to boot by adding a file called

/Volumes/boot/wpa_supplicant.conf

contents: (replace AP_NAME and AP_PASSWORD with your wifi details)

country=GB
ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
update_config=1

network={
  ssid="AP_NAME"
  psk="AP_PASSWORD"
  key_mgmt=WPA-PSK
}

Then eject the card, put the card in the Pi, attach all the peripherals except for the RFID reader and switch it on. While on the same wifi network, you should be able to ssh to it like this:

ssh pi@raspberrypi.local

password: raspberry.

Then install the Radiodan software using the provisioning script like this:

curl https://raw.githubusercontent.com/andrewn/neue-radio/master/deployment/provision | sudo bash

2. Enable SPI on the Pi

Don’t reboot yet; type:

sudo raspi-config

Under interfaces, enable SPI, then shut the Pi down

sudo halt

and unplug it.

3. Test Radiodan and configure it

If all is well and you have connected a speaker via a USB soundcard, you should hear it say “hello” as it boots.

Please note: Radiodan does not work with the default 3.5mm jack on the Pi. We’re not sure yet why. But USB soundcards are very cheap, and work well.

There’s one app available by default for Radiodan on the Pi. To use it,

  1. Navigate to http://raspberrypi.local/radio
  2. Use the buttons to play different audio clips. If you can hear things, then it’s all working

 

radiodan_screenshot1

shut the Pi down and unplug it from the mains.

4. Connect up the RFID reader to the Pi

like this

Then start the Pi up again by plugging it in.

5. Add the piab app

Dan has made a very fancy mechanism for using Samba to drag and drop apps to the Pi, so that you can develop on your laptop. However, because we’re using RFID (which only works on the Pi), we may as well do everything on there. So, ssh to it again:

ssh pi@raspberrypi.local
cd /opt/radiodan/rde/apps/
git clone http://github.com/libbymiller/piab

This is currently a very minimal app, which just allows you to see all websocket messages going by, and doesn’t do anything else yet.

6. Enable the RFID service and piab app in the Radiodan web interface

Go to http://raspberrypi.local:5020

Enable “piab”, clicking ‘update’ beneath it. Enable the RFID service, clicking ‘update’ beneath it. Restart the manager (red button) and then install dependencies (green button), all within the web page.

radiodan_screenshot2

radiodan_screenshot4

Reboot the Pi (e.g. ssh in and sudo reboot). This will enable the RFID service.

7. Test the RFID reader

Open http://raspberrypi.local:5000/piab and open developer tools for that page. Place a card on the RFID reader. You should see a json message in the console with the RFID identifier.

radiodan_screenshot5

The rest is a matter of writing javascript / html code to:

  • Associate a podcast feed with an RFID (e.g. a web form in external.html that allows the user to add a podcast feed url)
  • Parse the podcast feed when the appropriate card id is detected by the reader
  • Find the latest episode and play it using internal.html (see the radio app example for how to play audio)
  • Add more fancy options, such as remembering where you were in an episode, stopping when the card is removed etc.

As you develop, you can see the internal page on http://raspberrypi.local:5001 and the external page on http://raspberrypi.local:5000. You can reload the app using the blue button on http://raspberrypi.local:5020.

Many more details about the architecture of Radiodan are available; full installation instructions and instructions for running it on your laptop are here; docs are here; code is in github.

Posted at 09:11

July 22

Bob DuCharme: Dividing and conquering SPARQL endpoint retrieval

With the VALUES keyword.

Posted at 16:52

July 20

AKSW Group - University of Leipzig: DBpedia Day @ SEMANTiCS 2018

Don’t miss the 12th edition of the DBpedia Community Meeting in Vienna, the city with the highest quality of life in the world. The DBpedia Community will get together for the DBpedia Day on September 10, the first day of the SEMANTiCS Conference which will be held from September 10 to 13, 2018.


What cool things do you do with DBpedia? Present your tools and datasets at the DBpedia Community Meeting! Please submit your presentations, posters, demos or other forms of contributions through our web form.


Highlights/Sessions

  • Keynote#1: Dealing with Open Domain Data by Javier David Fernández García (WU)
  • Keynote #2: Linked Open Data cloud – act now before it’s too late by Mathieu d’Aquin (NUI Galway)
  • DBpedia Showcase Session
  • DBpedia Association Hour
  • Special Chapter Session with DBpedia language chapters from different parts of Europe

Tickets

  • Attending the DBpedia Community Meeting costs €50 (excl. registration fee and VAT). DBpedia members get free admission. Please contact your nearest DBpedia chapter or the DBpedia Association for a promotion code.
  • Please check all details here!

We are looking forward to meeting you in Vienna!

 

Posted at 12:37

July 19

Dublin Core Metadata Initiative: DCMI 2018 Conference Programme Published

The conference programme for DCMI 2018 has now been published. With an exciting mixture of 3 keynotes, papers, presentations, workshops and working meetings, this year's conference promises to continue the excellent tradition of DCMI annual events. (and all this in the wonderful location of Porto, Portugal!). Register now!

Posted at 00:00

July 05

Dublin Core Metadata Initiative: Webinar: The Current State of Automated Content Tagging: Dangers and Opportunities

This webinar is scheduled for Thursday, July 19, 2018, 14:00 UTC (convert this time to your local timezone here) and is free for DCMI members. There are real opportunities to use technology to automate content tagging, and there are real dangers that automated content tagging will sometimes inappropriately promote and obscure content. We’ve all heard talks about AI, but little detail about how these applications actually work. Recently I’ve been working with clients to explore the current state of the art of so-called AI technology, and to trial several of these tools with research, policy and general news content.

Posted at 00:00

July 04

Dublin Core Metadata Initiative: Webinar: The Role of Dublin Core Metadata in the Expanding Digital and Analytical Skill Set Required by Data-Driven Organizations

This webinar is scheduled for Thursday, July 12, 2018, 14:00 UTC (convert this time to your local timezone here) and is free for DCMI members. Many areas of our world are being subject to digitalisation as leaders and policymakers embrace the possibilities that can be harnessed through the capturing and exploiting of data. New business models are being developed, and new revenue streams are being uncovered that require a solid and recognised data competence capacity.

Posted at 00:00

July 01

Egon Willighagen:

Posted at 12:09

June 26

AKSW Group - University of Leipzig: SANSA 0.4 (Semantic Analytics Stack) Released

We are happy to announce SANSA 0.4 – the fourth release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

  • Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format
  • Reading OWL files in various standard formats
  • Support for multiple data partitioning techniques
  • SPARQL querying via Sparqlify
  • Graph-parallel querying of RDF using SPARQL (1.0) via GraphX traversals (experimental)
  • RDFS, RDFS Simple, OWL-Horst, EL (experimental) forward chaining inference
  • Automatic inference plan creation (experimental)
  • RDF graph clustering with different algorithms
  • Terminological decision trees (experimental)
  • Anomaly detection (beta)
  • Knowledge graph embedding approaches: TransE (beta), DistMult (beta)

Noteworthy changes or updates since the previous release are:

  • Parser performance has been improved significantly e.g. DBpedia 2016-10 can be loaded in <100 seconds on a 7 node cluster
  • Support for a wider range of data partitioning strategies
  • A better unified API across data representations (RDD, DataFrame, DataSet, Graph) for triple operations
  • Improved unit test coverage
  • Improved distributed statistics calculation (see ISWC paper)
  • Initial scalability tests on 6 billion triple Ethereum blockchain data on a 100 node cluster
  • New SPARQL-to-GraphX rewriter aiming at providing better performance for queries exploiting graph locality
  • Numeric outlier detection tested on DBpedia (en)
  • Improved clustering tested on 20 GB RDF data sets

Deployment and getting started:

  • There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
  • The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
  • Example code is available for various tasks.
  • We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects Big Data Europe, HOBBIT, SAKE, Big Data Ocean, SLIPO, QROWD, BETTER, BOOST and SPECIAL.

Spread the word by retweeting our release announcement on Twitter. For more updates, please view our Twitter feed and consider following us.

Greetings from the SANSA Development Team

 

Posted at 16:33

June 17

Bob DuCharme: Running and querying my own Wikibase instance

Querying it, of course, with SPARQL.

Posted at 16:17

June 12

Dublin Core Metadata Initiative: Registration for DCMI 2018 now open

We are pleased to announce that registration is now open for the annual Dublinc Core conference to be held in Porto, Portugal, 10-13th September 2018. This year's conference is co-located with TPDL 2018. By registering once for the DCMI conference you will automatically have free access to the TPDL conference programme as well! Click here for more details about DCMI 2018, and how to register for the conference. We look forward to seeing you in Porto!

Posted at 10:03

June 04

Ebiquity research group UMBC: paper: Attribute Based Encryption for Secure Access to Cloud Based EHR Systems

Attribute Based Encryption for Secure Access to Cloud Based EHR Systems

Attribute Based Encryption for Secure Access to Cloud Based EHR Systems

Maithilee Joshi, Karuna Joshi and Tim Finin, Attribute Based Encryption for Secure Access to Cloud Based EHR Systems, IEEE International Conference on Cloud Computing, San Francisco CA, July 2018

 

Medical organizations find it challenging to adopt cloud-based electronic medical records services, due to the risk of data breaches and the resulting compromise of patient data. Existing authorization models follow a patient centric approach for EHR management where the responsibility of authorizing data access is handled at the patients’ end. This however creates a significant overhead for the patient who has to authorize every access of their health record. This is not practical given the multiple personnel involved in providing care and that at times the patient may not be in a state to provide this authorization. Hence there is a need of developing a proper authorization delegation mechanism for safe, secure and easy cloud-based EHR management. We have developed a novel, centralized, attribute based authorization mechanism that uses Attribute Based Encryption (ABE) and allows for delegated secure access of patient records. This mechanism transfers the service management overhead from the patient to the medical organization and allows easy delegation of cloud-based EHR’s access authority to the medical providers. In this paper, we describe this novel ABE approach as well as the prototype system that we have created to illustrate it.

Posted at 15:28

Dublin Core Metadata Initiative: Webinar: Introduction to Metadata Application Profiles

This webinar is scheduled for Thursday, June 14, 2018, 14:00 UTC (convert this time to your local timezone here) and is free for DCMI members. Successful data sharing requires that users of your data understand the data format, the data semantics, and the rules that govern your particular use of terms and values. Sharing often means the creation of “cross-walks” that transfer data from one schema to another using some or all of this information.

Posted at 00:00

Copyright of the postings is owned by the original blog authors. Contact us.