Planet RDF

It's triples all the way down

May 20

Peter Shaw: Visualising RDF with Incontext

Surfing around the internet I recently discovered SURF‘s InContext Visualiser, which I think is a neat way to visulaise of RDF relationships, especially OAI-ORE aggregated publications

I also discovered that people have already created a set of WordPress plugins (see: http://ep-books.ehumanities.nl/ ) to visualise books and other similar publications. However blogs do not fit into a book/chapter model.

However given there is already a schema for publishing blog data and my lh-rdf plugin already exposes most publicly available WordPress blog data as RDF using that format. It was an obvious next step to get the visualiser working with the LH RDF output. I have done so and hopefully you think the output is cool.

http://shawfactor.com/wp-content/plugins/lh-rdf/visualisation.php

I have bundled this visualiser with the lh rdf plugin, and in time I will polish it up and add shortcode support so it can be more easily be embedded in posts and pages.

Posted at 15:16

May 19

Norm Walsh: Numbered program listings

Putting line numbers in program listings is harder than it looks.

Posted at 20:38

Ebiquity research group UMBC: Google releases dataset linking strings and concepts

Yesterday Google announced a very interesting resource with 175M short, unique text strings that were used to refer to one of 7.6M Wikipedia articles. This should be very useful for research on information extraction from text.

“We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia’s groupings of articles into hierarchical categories.

The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article’s canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept’s url. Our database thus includes weights that measure degrees of association.”

The details of the data and how it was constructed are in an LREC 2012 paper by Valentin Spitkovsky and Angel Chang, A Cross-Lingual Dictionary for English Wikipedia Concepts. Get the data here.

Posted at 16:02

Ebiquity research group UMBC: Google Knowledge Graph: first impressions

The Google’s Knowledge Graph showed up for me this morning — it’s been slowly rolling out since the announcement on Wednesday. It builds lots of research from human language technology (e.g., entity recognition and linking) and the semantic web (graphs of linked data). The slogan, “things not strings”, is brilliant and easily understood.

My first impression is that it’s fast, useful and a great accomplishment but leaves lots of room for improvement and expansion. That last bit is a good thing, at least for those of us in the R&D community. Here are some comments based on some initial experimentation.

GKG only works on searches that are simple entity mentions like people, places, organizations. It doesn’t do products (Toyota Camray), events (World War II), or diseases (diabetes) but does recognize that ‘Mercury’ could be a planet or an element.

It’s a bit aggressive about linking: when searching for “John Smith” it zeros in on the 17th century English explorer. Poor Professor Michael Jordan never get a chance, and providing context by adding Berkeley just suppresses the GKG sidebar. “Mitt” goes right to you know who. “George Bush” does lead to a disambiguation sidebar, though. Given that GKG doesn’t seem to allow for context information, the only disambiguating evidence it has is popularity (i.e., pagerank).

Speaking of context, the GKG results seem not to draw on user-specific information, like my location or past search history. When I search for “Columbia” from my location here in Maryland, it suggests “Columbia University” and “Columbia, South Carolina” and not “Columbia, Maryland” which is just five miles away from me.

Places include not just GPEs (geo-political entities) but also locations (Mars, Patapsco river) and facilities (MOMA, empire state building). To the GKG, the White House is just a place.

Organizations seem like a weak spot. It recognizes schools (UCLA) but company mentions seem not to be directly handled, not even for “Google”. A search for “NBA” suggests three “people associated with NBA” and “National Basketball Association” is not recognized. Forget finding out about the Cult of the Dead Cow.

Mike Bergman has some insights based on his exploration of the GKG in Deconstructing the Google Knowledge Graph

The use of structured and semi-structure knowledge in search is an exciting area. I expect we will see much more of this showing up in search engines, including Bing.

Posted at 14:43

Semantic Web Company (Austria): Has Google hi-jacked the Semantic Web?

Just recently Google has launched the ‘Knowledge Graph‘ (GKG) which “understands real-world entities and their relationships to one another: things, not strings.” Has Google hi-jacked the idea of the ‘Semantic Web’ or at least its vocabulary?

Sean Golliher has compared the most central concepts of the SemWeb community to the wording of Google in his blog post, for instance: Google doesn´t talk about ‘Linked data’ or ‘URIs’ but rather about ‘things and their relationships’. We don´t know if Google uses standards like RDF but obviously a lot of concepts and ideas developed by the SemWeb community in recent years were implemented in GKG. Some people complain that Google should clearly state that this is an implementation of the ‘Semantic Web’ (which was not invented by Google), others say that most concepts like ‘taxonomies’ have been around for hundreds of years anyway.

I believe that both sides have now a great chance to work together: Whether Google’s goal, to “build the next generation of search, which taps into the collective intelligence of the web and understands the world a bit more like people do”, can be reached or not is a matter of the intelligence of the employees. A lot of potential can be found within the semantic web community: If Google gives credit where it is due, semantic web people will be a bit more inspired to support an eco-system built around GKG – and it won´t last long until an ‘Open Knowledge Graph’ will fit together with Google´s revenue model.

Posted at 08:51

May 18

Leigh Dodds:

Posted at 15:33

May 17

Ebiquity research group UMBC: Google Knowledge Graph: things, not string

Google announced its “knowledge graph” today and describes it as “an intelligent model—in geek-speak, a ‘graph’ — that understands real-world entities and their relationships to one another: things, not strings. … It currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects. And it’s tuned based on what people search for, and what we find out on the web.” Information from the knowledge graph will initially augment search results — the feature is already being rolled out to US English users. A short video explains more.

A CNET article quotes KG project manager Jack Menzel: Menzel pitches Knowledge Graph without using the word “semantic” even once. While he says, “I dream of the semantic Web,” he takes pains to point out that what Google is announcing today is not what people talk about when they discuss semantic Web concepts. “We do continue to work on how to make search semantic,” he says, “but talking about it brings out the crazy people.” I hope this did not come out the way he intended it to.

Posted at 03:54

May 16

W3C Blog Semantic Web News: Interview: BBC on Publishing and Linked Data

I chatted recently with Olivier Thereaux, Yves Raimond (senior technologist in R&D), and Silver Oliver (data architect) of the BBC about the Web, publishing, and linked data.

Ian: The BBC is prolific and large. How do you view yourselves?

Silver: The BBC is primarily a broadcasting organization. Content is developed or commissioned within different editorial domains (such as News or Music or Sports) then distributed through diverse channels (TV, Radio, web, apps, etc). This fragmentation exists also on the web, with development of individual sites being largely delegated to dedicated teams.

Ian: How do you move beyond silos?

Yves: We have a lot of data that we are now using to draw connections among various BBC TV and radio programs and entities in other domains, like music or nature. We also expose the corresponding data. For example the programmes site exposes data views giving details about all the music tracks played in a given radio programme, and those details link to (and draw from) artist profiles on the BBC's music site… which themselves are also available as data views.

Olivier: We also reuse data that's available on the Web (e.g., from musicbrainz and wikipedia). Because the public is curating the information they can update it more rapidly than we could on our own. In a way, the Web is our Content Management System.

Ian: What are you using to aggregate and expose the data?

Yves: For the programmes and music site we use a relational database internally but then we expose the information in RDF.

Olivier: And we benefit from the ways that people have innovated around the RDF data we expose. When people play with the interfaces and massage the data, we can build on their experience.

Ian: Why not use RDF internally?

Yves: I think the main reason is that the people who originally built these sites site were unaware of RDF, or were concerned about using an unfamiliar technology on such a big project. But we use it with other projects.

Ian: How has your uses of data affected reporting?

Silver: In the past our editorial efforts have been captured in whole HTML documents. This causes problems for reuse in new data views and across platforms and applications (including IPTV). The key is in working with existing editorial workflows to capture a sub-set of machine readable information. In its simplest form this might be a byline and small number of tags the story is about.

Ian: How do reporters use the data to make connections between stories?

Silver: Connections have always happened, but it didn't scale. Linking between sports and news was a manual process and reliant on a journalist's knowledge of BBC output. But now we have rich data models behind the scenes. These models help the BBC editorial staff represent their understanding of the world and our audience's interests, and let us make connections in a scalable fashion.

Olivier: The data is a substrate that pre-populates a lot of the site, and then journalists can focus on the stories and not re-entering the data bits.

Silver: In sport, for example, we pay for the sport data (fixtures, results and statistics) then we write stories about match reports, and tagging ensures that everything gets linked properly. That's how we built the sites for the 2010 world cup or the 2012 olympics.

Ian: Do the reporters add data to the system directly?

Silver: Yes, we ask them to tag the stories they pull together so that we can put those stories into different contexts (or aggregations). We were quite happy to realize the natural curatorial process was already happening, we just needed to give people a way to capture data.

Ian: You mentioned buying and using data from various sources, including commercial ones. Do you make use of data provenance information?

Yves: We need to be very transparent about where our data comes from. Our reporters, partners, official organisations, sometimes our audience too.

Olivier: There is an interesting tension between making use of provenance information and ensuring user privacy. These days people expect to receive personalized content. To achieve that we make use of "attention data": what you watch or like. We have been looking at how to guarantee that we uphold privacy while at the same time asking for the minimal amount of information to tailor the best experience. That's probably less about "Do Not Track" and closer to the spirit of W3C's older P3P technology. On the other hand, we want to know whether information is reliable. This is challenging for user-generated content in particular: who is the user? how much do we trust them?

Ian: Do you think making provenance information available to readers can help digital literacy?

Silver: We had an interesting debate internally whether to include links from health stories to the journals that published the original research. Some felt that readers would not be interested in the links or would find the research complex. Others encouraged the links so that the community could respond to our articles with their own interpretations, including challenging the articles from various angles. This, in turn, would generate more discussion and perspectives from a much larger audience.

Ian: How did it turn out?

Silver: In stories about politics we have begun to include links to relevant legislation. And we are exploring how to extend the linking to pull in data from these sources to weave into BBC story-telling. For example data about committees that commented on bills, which members of parliament commented, and so on. These data models allow us to make more connections among stories, as we discussed earlier.

Ian: This sounds like a linked data project!

Silver: Internally we have wholesale signed up for and understood the value of linked data as way to manage our organizational complexity. We will draw data from various sources and use RDF to stitch them together. We can make use of the information in ways we could not do before because it was either too costly or unmanageable. Semantic Web technology is now core to our strategy as an enterprise.

Ian: Have you measured cost savings by using Semantic Web technology?

Silver: It's still too early to say. There were costs associated with our initial projects, since we needed to acquire expertise. But we have since been able to roll out highly trafficked BBC content using Semantic Web technology.

Ian: Thank you all so much for your time!

Posted at 08:24

May 11

schema.org: Schema.org markup for external lists


The world is too rich, complex and interesting for a single schema to describe fully on its own. With schema.org we aim to find a balance, by providing a core schema that covers lots of situations, alongside extension mechanisms for extra detail. There are many situations where the use of existing controlled vocabularies, standards and datasets would improve schema.org markup. This is the role of the schema.org "external enumerations" mechanism.

We introduce "external enumerations" with a simple example - countries - and encourage implementors to join the schema.org community in W3C's 'Web Schemas' group where the full details are being discussed.

Each schema.org type (such as Person, PostalAddress) is associated with a set of properties, such as
"nationality", "addressCountry". In turn, each property has one or more expected types; in this case, both the "nationality" of a Person, and the "addressCountry" of a PostalAddress expect to have a Country value. Rather than adding large lists of specific countries to schema.org, instead we encourage the use of external lists.  We will publish a set of well-known authority lists, linked to the types and properties they are used with. To get started, we take simple Wikipedia links as an example of such an authority. Other more specialist examples (such as IPTC codes) will follow.

Taking our existing Movie example in Microdata, let's add nationality details for one of the actors. To do this, we simply add a link:


<div itemscope itemtype="http://schema.org/Movie">
<h1 itemprop="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
<span itemprop="description">Jack Sparrow and Barbossa embark on[...]</span>
<div itemprop="actor" itemscope itemtype="http://schema.org/Person">
  <span itemprop="name">Johnny Depp</span>
  <link itemprop="nationality" href="http://en.wikipedia.org/wiki/United_States"/>
</div>
</div>
 
Here we use  'http://en.wikipedia.org/wiki/United_States' to stand for the specific country. Other authorities also publish useful structured data about countries and have stable URLs that could be used. For example, we could use the UN FAO's GeoPolitical Ontology, and their URL for the USA. From a schema.org perspective, we do not take account of any types and properties defined by these external sites, since it is important to support a variety of quite different authority lists, who often have different ways of modeling things. Each external authority essentially supplies a set of URI/URL item identifiers that can be dropped into schema.org markup.

We've shown here the use of Wikipedia links for identifying members of the Country type. Take a look at the detailed document for discussion on how to use this with Microdata's 'itemid' attribute, if you want to describe the Country (or other object) in further detail. The W3C wiki also gives other examples, and shows how the markup would look in RDFa Lite

While there are more details to work out as we start to apply this idea across schema.org, we wanted to share this initial example.  The basic idea is very simple: everywhere in schema.org where external lists will help, we will need to have a specific schema.org type (like Country), for which the external authority supplies identifiers. In some cases, we will have to add new types to support this. Beyond the basics presented here, there are various technical details of syntax, discussion of exactly which authorities and URI identifiers to use, and so on. We welcome suggestions (here or via the Web Schemas group) for existing enumerations that would be useful additions, and feedback on the general approach.

Posted at 20:49

May 10

Dave Beckett: Undugg

Digg just announced that Digg Engineering Team Joins SocialCode and The Washington Post reported SocialCode hires 15 employees from Digg.com

This acquihire does NOT include me. I will be changing jobs shortly but have nothing further to announce at this time.

I wish my former Digg colleagues the best of luck in their new roles. I had a great time at Digg and learned a lot about working in a small company, social news, analytics, public APIs and the technology stack there.

Posted at 15:20

Michael Hausenblas: Turning tabular data into entities

Two widely used data formats on the Web are CSV and JSON. In order to enable fine-grained access in an hypermedia-oriented fashion I’ve started to work on

Posted at 15:03

W3C Semantic Web News: Linked Data Platform Working Group Launched

The W3C launched the new Linked Data Platform (LDP) Working Group to promote the use of linked data on the Web. Per its charter, the group will explain how to use a core set of services and technologies to build powerful applications capable of integrating public data, secured enterprise data, and personal data. The platform will be based on proven Web technologies including HTTP for transport, and RDF and other Semantic Web standards for data integration and reuse. The group will produce supporting materials, such as a description of uses cases, a list of requirements, and a test suite and/or validation tools to help ensure interoperability and correct implementation.

Posted at 09:57

May 08

W3C Semantic Web News: Three RDFa Specifications are Proposed Recommendations

The RDF Web Applications Working Group has published three Proposed Recommendations for RDFa Core 1.1, RDFa Lite 1.1 and XHTML+RDFa 1.1.

Together, these documents outline the vision for RDFa in a variety of XML and HTML-based Web markup languages. RDFa Core 1.1 specifies the core syntax and processing rules for RDFa 1.1 and how the language is intended to be used in XML documents. RDFa Lite 1.1 provides a simple subset of RDFa for novice web authors. XHTML+RDFa 1.1 specifies the usage of RDFa in the XHTML markup language. The group also published a draft of the RDFa 1.1 Primer today.

Posted at 17:14

May 07

Dublin Core Metadata Initiative: Presentations Available for "Five Years On" Seminar at the British Library

2012-05-07, Presentations from the successful DCMI-UK regional meeting "Five Years On" at the British Library on 26-27 April 2012 are available at http://dcevents.dublincore.org/index.php/BibData/fyo. Additional resources from the Seminar and the collocated meetings of the DCMI Bibliographic Metadata Task Group and the Vocabulary Management Community will be added to the website over the next several weeks.

Posted at 23:59

Dublin Core Metadata Initiative: Maintenance revision of DCMI Metadata Terms

2012-05-07, The DCMI Usage Board has approved a revision of the usage note for the element Subject. This maintenance release has been undertaken in preparation for the routine five-year review of ANSI/NISO Z39.85-2007 ("The Dublin Core Metadata Element Set").

Posted at 23:59

Semantic Web Company (Austria): PoolParty PowerTagging – bringing semantics to enterprises

PoolParty PowerTagging (PPP) is on its way: By extending Confluence´s label management, new application scenarios which make use of content recommendation and semantic indexing will be supported soon. PPP will be published at this year´s Atlassian Summit and at SemTechBiz in San Francisco at the beginning of June.

The Problem: weak semantics

Tagging is still not a very popular task, especially in corporate environments. Many users don´t see the benefit of creating metadata to describe the actual content. A typical counter-argument to social tagging is that there are too many words for the same thing. “Even if I am tagging very hard my colleagues won´t find necessarily my pages  because they will use different words to search for the content. I don´t have enough time to insert ‘New York City’, ‘NYC’, ‘Big Apple’ etc. as labels”.

The result: Tagging facilities of enterprise software platforms like Confluence are rarely used and don´t help to index content at all. Search is mostly based on classical full-text indexing. Semantic search as seen more and more on the WWW has still not entered the enterprise realm.

The Solution: thesaurus based indexing

W3C´s Semantic Web technology stack provides means to define controlled vocabularies like thesauri which results into more and more tools and data which make use of standards like SKOS. Tagging based on thesauri means that concepts are attached to pages & documents rather than putting labels on them. Labels like ‘New York City’, ‘NYC’ and ‘Big Apple’ refer to the same concept, thus it should be sufficient if one of the various terms is used for labeling, all the other names of this certain concept should be attached automatically.

PoolParty PowerTagging is able to analyse each Confluence page and to insert concepts from a thesaurus and all of their names automatically. Users can curate all suggested tags or they can also index their spaces automically resulting in a semantic index which makes search more comfortable than ever before.

Usage: enhanced collaboration with enterprise knowledge models

There are two main application scenarios which can be realised on top of Confluence and its PowerTagging extension:

  • Semantic Search: Fully integrated with Confluence´s built-in Lucene based search facility, users no longer have to type in search phrases literally: Even if only ‘New York City’ is mentioned on a page on a word-by-word basis, it´s sufficient to search for ‘Big Apple’ or ‘NYC’ and results will be generated. This feature is especially interesting for domains in which a lot of technical terms or abbreviations are commonly used or for enterprises in multi-lingual environments.
  • Content recommendation: Identifying similar and semantically matching contents especially in larger Confluence instances is a crucial task: Imagine you´re working for a recruiting company and you would like to match a new open position with all people in your applicant database. Or: Imagine you´re working on technical documentation and you can provide your customers automatically with further readings. Or: Imagine you´re working on a slidedeck and you´ll see instantly if some of your colleagues have worked on similar issues recently.

Don´t re-invent the wheel again and again. Save time and money. PPP will help to fulfill these tasks when creating rich contents more efficiently than ever before. You can link similar contents within Confluence automatically and you can fetch further readings even from the WWW like from Wikipedia.

If you are interested in trying out PowerTagging, please drop us a note and we will be happy to support you!

Posted at 16:13

Learn Linked Data: Understanding RDF serialisation formats

A couple of weeks ago, I wrote an Introduction To RDF. In that article, we discussed expressing a RDF single triple (which defined the location of my blog) as a blob and line diagram. But what if we want to describe more than one aspect of a resource?… or if we want to model the relationships between many resources? It probably wouldn’t come as a surprise that to do that, you’d need to create multiple triples. As I mentioned in that previous post, blob and line diagrams only work well for small numbers of triples, and they’re only really good for humans.

To communicate RDF between computer systems, we need something different. In this article, we’ll see how to serialise a graph (i.e. a collection, or set) of RDF triples in different ways, and explore the advantages and disadvantages of each.

In order to avoid me having to make up a contrived example, I’m going to use some data from the OpenDataCommunities site*, which contains a bunch of open data about English Local Authorities. The URI http://opendatacommunities.org/id/metropolitan-district-council/manchester identifies the Manchester Metropolitan District Council. If you click on that link, you’ll get an HTML representation of the data that the site holds about that authority.

But the site also offers the data in other formats via links at the bottom of the page. Let’s discuss each of those formats in turn.

RDF/XML

RDF is most commonly expressed in an XML format: RDF/XML.

A Single Resource as RDF/XML

The RDF/XML for what we know about the Manchester authority looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns0="http://opendatacommunities.org/def/local-government/" xmlns:ns1="http://data.ordnancesurvey.co.uk/ontology/admingeo/" xmlns:ns2="http://statistics.data.gov.uk/def/administrative-geography/" xmlns:ns3="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns4="http://www.w3.org/2000/01/rdf-schema#" xmlns:ns5="http://www.w3.org/2002/07/owl#" xmlns:ns6="http://xmlns.com/foaf/0.1/">
  <ns0:MetropolitanDistrictCouncil rdf:about="http://opendatacommunities.org/id/metropolitan-district-council/manchester">
    <ns1:gssCode>E08000003</ns1:gssCode>
    <ns1:hasCensusCode>00BN</ns1:hasCensusCode>
    <ns0:billingAuthorityCode>E4203</ns0:billingAuthorityCode>
    <ns0:governs rdf:resource="http://data.ordnancesurvey.co.uk/id/7000000000018821"/>
    <ns0:openlyLocalUrl rdf:resource="http://openlylocal.com/councils/157-Manchester-City-Council"/>
    <ns2:region rdf:resource="http://statistics.data.gov.uk/id/government-office-region/B"/>
    <ns3:type rdf:resource="http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"/>
    <ns3:type rdf:resource="http://opendatacommunities.org/def/local-government/LocalAuthority"/>
    <ns4:label>Manchester</ns4:label>
    <ns5:sameAs rdf:resource="http://statistics.data.gov.uk/id/local-authority/00BN"/>
    <ns6:page rdf:resource="http://www.manchester.gov.uk/"/>
  </ns0:MetropolitanDistrictCouncil>
</rdf:RDF>

Multiple Resources as RDF/XML

A couple of local authorities serialised as RDF/XML look like this:

<?xml version="1.0"?>
<rdf:RDF
    xmlns:j.0="http://data.ordnancesurvey.co.uk/ontology/admingeo/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.1="http://xmlns.com/foaf/0.1/"
    xmlns:j.2="http://opendatacommunities.org/def/local-government/"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:j.3="http://statistics.data.gov.uk/def/administrative-geography/"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" > 
  <rdf:Description rdf:about="http://opendatacommunities.org/id/district-council/babergh">
    <j.2:governs rdf:resource="http://data.ordnancesurvey.co.uk/id/7000000000015692"/>
    <j.0:hasCensusCode>42UB</j.0:hasCensusCode>
    <j.2:billingAuthorityCode>E3531</j.2:billingAuthorityCode>
    <rdfs:label>Babergh</rdfs:label>
    <j.1:page rdf:resource="http://www.babergh.gov.uk"/>
    <rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/LocalAuthority"/>
    <rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/DistrictCouncil"/>
    <j.0:gssCode>E07000200</j.0:gssCode>
    <owl:sameAs rdf:resource="http://statistics.data.gov.uk/id/local-authority/42UB"/>
    <j.3:region rdf:resource="http://statistics.data.gov.uk/id/government-office-region/G"/>
    <rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://opendatacommunities.org/id/london-borough-council/barking-and-dagenham">
    <owl:sameAs rdf:resource="http://statistics.data.gov.uk/id/local-authority/00AB"/>
    <j.0:gssCode>E09000002</j.0:gssCode>
    <rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"/>
    <j.2:openlyLocalUrl rdf:resource="http://openlylocal.com/councils/19-London-Borough-of-Barking-Dagenham"/>
    <rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/LondonBoroughCouncil"/>
    <j.2:governs rdf:resource="http://data.ordnancesurvey.co.uk/id/7000000000010949"/>
    <rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/LocalAuthority"/>
    <j.0:hasCensusCode>00AB</j.0:hasCensusCode>
    <j.3:region rdf:resource="http://statistics.data.gov.uk/id/government-office-region/H"/>
    <j.1:page rdf:resource="http://www.lbbd.gov.uk/"/>
    <j.2:billingAuthorityCode>E5030</j.2:billingAuthorityCode>
    <rdfs:label>Barking and Dagenham</rdfs:label>
  </rdf:Description>
</rdf:RDF>

Mime Type

RDF/XML is typically requested and sent over the Web using a MIME type of application/rdf+xml.

RDF/XML Summary

In RDF/XML, the triples for each resource, are contained within <rdf:Description> nodes, with a sub-node for each property and its value (full spec here).

This format has the advantage that most programming languages have support for XML, and you can make use of XML namespaces to avoid having to use full URIs everywhere, which keeps the size down. On the other hand, I don’t find it that easy to read manually.

Turtle

Turtle (Terse RDF Triple Language) is an RDF-specific subset of Tim Berners-Lee’s Notation3 language.

A Single Resource as Turtle

The RDF triples about that single Manchester authority could be represented in Turtle like this:

@prefix gov: <http://opendatacommunities.org/def/local-government/> .
@prefix admingeo: <http://data.ordnancesurvey.co.uk/ontology/admingeo/> .
@prefix statsgeo: <http://statistics.data.gov.uk/def/administrative-geography/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://opendatacommunities.org/id/metropolitan-district-council/manchester>
   a gov:MetropolitanDistrictCouncil, gov:LocalAuthority, gov:CivilAdministrativeAuthority;
   rdfs:label "Manchester";
   admingeo:gssCode "E08000003";
   admingeo:hasCensusCode "00BN";
   gov:billingAuthorityCode "E4203";
   gov:governs <http://data.ordnancesurvey.co.uk/id/7000000000018821>;
   gov:openlyLocalUrl <http://openlylocal.com/councils/157-Manchester-City-Council>;
   statsgeo:region <http://statistics.data.gov.uk/id/government-office-region/B>;
   owl:sameAs <http://statistics.data.gov.uk/id/local-authority/00BN>;
   foaf:page <http://www.manchester.gov.uk/> .

Multiple Resources as Turtle

Multiple local authorities could be serialised in Turtle like this:

@prefix gov: <http://opendatacommunities.org/def/local-government/> .
@prefix admingeo: <http://data.ordnancesurvey.co.uk/ontology/admingeo/> .
@prefix statsgeo: <http://statistics.data.gov.uk/def/administrative-geography/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://opendatacommunities.org/id/district-council/babergh>
    a gov:LocalAuthority, gov:DistrictCouncil, gov:CivilAdministrativeAuthority ;
    rdfs:label "Babergh" ;
    admingeo:gssCode "E07000200" ;
    admingeo:hasCensusCode "42UB" ;
    admingeo:billingAuthorityCode "E3531" ;
    gov:governs <http://data.ordnancesurvey.co.uk/id/7000000000015692> ;
    statsgeo:region  <http://statistics.data.gov.uk/id/government-office-region/G> ;
    owl:sameAs <http://statistics.data.gov.uk/id/local-authority/42UB> ;
    foaf:page <http://www.babergh.gov.uk> .

<http://opendatacommunities.org/id/london-borough-council/barking-and-dagenham>
    a gov:LocalAuthority, gov:LondonBoroughCouncil, gov:CivilAdministrativeAuthority ;
    rdfs:label "Barking and Dagenham" ;
    admingeo:gssCode "E09000002" ;
    admingeo:hasCensusCode "00AB" ;
    gov:billingAuthorityCode "E5030" ;
    gov:governs <http://data.ordnancesurvey.co.uk/id/7000000000010949> ;
    gov:openlyLocalUrl <http://openlylocal.com/councils/19-London-Borough-of-Barking-Dagenham> ;
    statsgeo:region <http://statistics.data.gov.uk/id/government-office-region/H> ;
    owl:sameAs <http://statistics.data.gov.uk/id/local-authority/00AB> ;
    foaf:page <http://www.lbbd.gov.uk/> .

Mime Type

Turtle is typically requested and sent over the Web using a MIME type of text/turtle.

Turtle Summary

The URI for the each resource is followed by the predicates and objects of the triples about it (essentially, as key-value pairs). Each pair is separated by a semi-colon, and the information about a resource is closed off by a dot. Note that the whitespace is not important here, but for readability, the predicates and objects are often indented and appear on separate lines.

Turtle is my favourite RDF serialisation format: It’s fairly easy to read for humans due to the lack of punctuation noise, it groups together the triples about each resource, and it allows you to define common @prefixes. Its terseness also helps keep the amount of bandwidth required to communicate RDF down to a minimum, and it’s well supported by RDF toolkits and libraries.

Note that the standard encoding of Turtle is UTF-8 (though escaped Unicode is allowed in Turtle as well). Turtle literals can contain line breaks, using the """long literal""" approach. For more information, see the Turtle W3C submission.

N-Triples

N-Triples is a simplified version of Turtle.

A Single Resource as N-Triples

Here’s the RDF for the Manchester authority again, this time as N-triples.

<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/2000/01/rdf-schema#label> "Manchester" .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://opendatacommunities.org/def/local-government/openlyLocalUrl> <http://openlylocal.com/councils/157-Manchester-City-Council> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://xmlns.com/foaf/0.1/page> <http://www.manchester.gov.uk/> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/2002/07/owl#sameAs> <http://statistics.data.gov.uk/id/local-authority/00BN> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode> "00BN" .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://opendatacommunities.org/def/local-government/billingAuthorityCode> "E4203" .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/MetropolitanDistrictCouncil> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/LocalAuthority> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://opendatacommunities.org/def/local-government/governs> <http://data.ordnancesurvey.co.uk/id/7000000000018821> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://statistics.data.gov.uk/def/administrative-geography/region> <http://statistics.data.gov.uk/id/government-office-region/B> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> "E08000003" .

Multiple Resources as N-Triples

<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://opendatacommunities.org/def/local-government/billingAuthorityCode> "E0931" .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/2002/07/owl#sameAs> <http://statistics.data.gov.uk/id/local-authority/16UB> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://opendatacommunities.org/def/local-government/openlyLocalUrl> <http://openlylocal.com/councils/38-Allerdale-Borough-Council> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://opendatacommunities.org/def/local-government/governs> <http://data.ordnancesurvey.co.uk/id/7000000000013065> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/2000/01/rdf-schema#label> "Allerdale" .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/DistrictCouncil> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://xmlns.com/foaf/0.1/page> <http://www.allerdale.gov.uk> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://statistics.data.gov.uk/def/administrative-geography/region> <http://statistics.data.gov.uk/id/government-office-region/B> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> "E07000026" .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/LocalAuthority> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode> "16UB" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://opendatacommunities.org/def/local-government/governs> <http://data.ordnancesurvey.co.uk/id/7000000000012941> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/LocalAuthority> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://opendatacommunities.org/def/local-government/billingAuthorityCode> "E0932" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/2002/07/owl#sameAs> <http://statistics.data.gov.uk/id/local-authority/16UC> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/DistrictCouncil> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://xmlns.com/foaf/0.1/page> <http://www.barrowbc.gov.uk/> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> "E07000027" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode> "16UC" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/2000/01/rdf-schema#label> "Barrow-in-Furness" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://statistics.data.gov.uk/def/administrative-geography/region> <http://statistics.data.gov.uk/id/government-office-region/B> .

Mime Type

N-triples are typically requested and sent over the Web using a MIME type of text/plain or application/n-triples.

N-Triples summary

With N-triples, each triple appears on its own line, separated by a dot.

N-triples’ simplicity makes it easy for software to parse and generate, but it lacks some of the features of RDF/XML and Turtle (such as support for nested resources). Due to the repetition of the resource URIs, it’s not as compact as Turtle, and the triples for each resource aren’t necessarily grouped together which makes it harder to read by eye.

It’s also worth mentioning here that as N-triples is a text/plain literals are only allowed to contain US ASCII characters: non-ASCII Unicode characters need to be escaped (in contrast to Turtle whose standard encoding is UTF-8). Also, in N-triples, line-breaks always need to be escaped.

This section of the W3C Turtle submission gives a useful comparison between Turtle and N-Triples. You can also find more details about N-Triples here.

JSON

RDF can also be expressed in JavaScript Object Notation (JSON).

A Single Resource as JSON

{"http://opendatacommunities.org/id/metropolitan-district-council/manchester":{"http://www.w3.org/2000/01/rdf-schema#label":[{"type":"literal","value":"Manchester"}],"http://opendatacommunities.org/def/local-government/openlyLocalUrl":[{"type":"uri","value":"http://openlylocal.com/councils/157-Manchester-City-Council"}],"http://xmlns.com/foaf/0.1/page":[{"type":"uri","value":"http://www.manchester.gov.uk/"}],"http://www.w3.org/2002/07/owl#sameAs":[{"type":"uri","value":"http://statistics.data.gov.uk/id/local-authority/00BN"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode":[{"type":"literal","value":"00BN"}],"http://opendatacommunities.org/def/local-government/billingAuthorityCode":[{"type":"literal","value":"E4203"}],"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"type":"uri","value":"http://opendatacommunities.org/def/local-government/MetropolitanDistrictCouncil"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/LocalAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"}],"http://opendatacommunities.org/def/local-government/governs":[{"type":"uri","value":"http://data.ordnancesurvey.co.uk/id/7000000000018821"}],"http://statistics.data.gov.uk/def/administrative-geography/region":[{"type":"uri","value":"http://statistics.data.gov.uk/id/government-office-region/B"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode":[{"type":"literal","value":"E08000003"}]}}

Multiple Resources as JSON

{"http://opendatacommunities.org/id/district-council/allerdale":{"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"type":"uri","value":"http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/DistrictCouncil"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/LocalAuthority"}],"http://opendatacommunities.org/def/local-government/billingAuthorityCode":[{"type":"literal","value":"E0931"}],"http://www.w3.org/2002/07/owl#sameAs":[{"type":"uri","value":"http://statistics.data.gov.uk/id/local-authority/16UB"}],"http://opendatacommunities.org/def/local-government/openlyLocalUrl":[{"type":"uri","value":"http://openlylocal.com/councils/38-Allerdale-Borough-Council"}],"http://opendatacommunities.org/def/local-government/governs":[{"type":"uri","value":"http://data.ordnancesurvey.co.uk/id/7000000000013065"}],"http://www.w3.org/2000/01/rdf-schema#label":[{"type":"literal","value":"Allerdale"}],"http://xmlns.com/foaf/0.1/page":[{"type":"uri","value":"http://www.allerdale.gov.uk"}],"http://statistics.data.gov.uk/def/administrative-geography/region":[{"type":"uri","value":"http://statistics.data.gov.uk/id/government-office-region/B"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode":[{"type":"literal","value":"E07000026"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode":[{"type":"literal","value":"16UB"}]},"http://opendatacommunities.org/id/district-council/barrow-in-furness":{"http://opendatacommunities.org/def/local-government/governs":[{"type":"uri","value":"http://data.ordnancesurvey.co.uk/id/7000000000012941"}],"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"type":"uri","value":"http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/LocalAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/DistrictCouncil"}],"http://opendatacommunities.org/def/local-government/billingAuthorityCode":[{"type":"literal","value":"E0932"}],"http://www.w3.org/2002/07/owl#sameAs":[{"type":"uri","value":"http://statistics.data.gov.uk/id/local-authority/16UC"}],"http://xmlns.com/foaf/0.1/page":[{"type":"uri","value":"http://www.barrowbc.gov.uk/"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode":[{"type":"literal","value":"E07000027"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode":[{"type":"literal","value":"16UC"}],"http://www.w3.org/2000/01/rdf-schema#label":[{"type":"literal","value":"Barrow-in-Furness"}],"http://statistics.data.gov.uk/def/administrative-geography/region":[{"type":"uri","value":"http://statistics.data.gov.uk/id/government-office-region/B"}]}}

Mime Type

JSON is typically requested and sent over the Web using a MIME type of application/json.

JSON Summary

This format serialises the triples by defining JavaScript objects, the identifier for each being the URI for the resource. Sub-objects are created for each predicate, the values being an array of the triples for that predicate. For each object of the triple, there is a JavaScript object with properties for its type (i.e. URI or literal), and actual value.

It’s not an official standard (the example above is an output from the rdf.rb ruby library, which is based on Talis’s RDF JSON spec), but many Linked Data sites are now supporting JSON as a serialisation format. It’s becoming popular due to the ease with which this kind of data can be consumed in JavaScript web applications. It’s not as easy to read by eye as Turtle (to me, at least), but it’s still not too bad in that respect. The limited punctuation also helps to keep the size down.

Conclusion

In this article, we’ve covered the most common serialisation formats for RDF triples. Hopefully it has helped you to understand the benefits and pitfalls of each, so that you can make an informed decision about which to choose for publishing or consuming your Linked Data.

If you’d be interested in more articles like this, please subscribe to our RSS feed, Twitter updates, or email alerts. We’ll try to keep the articles coming fairly regularly.

*In the interest of full disclosure: I should probably mention that my company worked on the OpenDataCommunities site.

Posted at 13:12

Learn Linked Data: Understanding RDF serialisation formats

A couple of weeks ago, I wrote an Introduction To RDF. In that article, we discussed expressing a RDF single triple (which defined the location of my blog) as a blob and line diagram. But what if we want to describe more than one aspect of a resource?… or if we want to model the relationships between many resources? It probably wouldn’t come as a surprise that to do that, you’d need to create multiple triples. As I mentioned in that previous post, blob and line diagrams only work well for small numbers of triples, and they’re only really good for humans.

To communicate RDF between computer systems, we need something different. In this article, we’ll see how to serialise a graph (i.e. a collection, or set) of RDF triples in different ways, and explore the advantages and disadvantages of each.

In order to avoid me having to make up a contrived example, I’m going to use some data from the OpenDataCommunities site*, which contains a bunch of open data about English Local Authorities. The URI http://opendatacommunities.org/id/metropolitan-district-council/manchester identifies the Manchester Metropolitan District Council. If you click on that link, you’ll get an HTML representation of the data that the site holds about that authority.

But the site also offers the data in other formats via links at the bottom of the page. Let’s discuss each of those formats in turn.

RDF/XML

RDF is most commonly expressed in an XML format: RDF/XML.

A Single Resource as RDF/XML

The RDF/XML for what we know about the Manchester authority looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns0="http://opendatacommunities.org/def/local-government/" xmlns:ns1="http://data.ordnancesurvey.co.uk/ontology/admingeo/" xmlns:ns2="http://statistics.data.gov.uk/def/administrative-geography/" xmlns:ns3="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns4="http://www.w3.org/2000/01/rdf-schema#" xmlns:ns5="http://www.w3.org/2002/07/owl#" xmlns:ns6="http://xmlns.com/foaf/0.1/">
  <ns0:MetropolitanDistrictCouncil rdf:about="http://opendatacommunities.org/id/metropolitan-district-council/manchester">
    <ns1:gssCode>E08000003</ns1:gssCode>
    <ns1:hasCensusCode>00BN</ns1:hasCensusCode>
    <ns0:billingAuthorityCode>E4203</ns0:billingAuthorityCode>
    <ns0:governs rdf:resource="http://data.ordnancesurvey.co.uk/id/7000000000018821"/>
    <ns0:openlyLocalUrl rdf:resource="http://openlylocal.com/councils/157-Manchester-City-Council"/>
    <ns2:region rdf:resource="http://statistics.data.gov.uk/id/government-office-region/B"/>
    <ns3:type rdf:resource="http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"/>
    <ns3:type rdf:resource="http://opendatacommunities.org/def/local-government/LocalAuthority"/>
    <ns4:label>Manchester</ns4:label>
    <ns5:sameAs rdf:resource="http://statistics.data.gov.uk/id/local-authority/00BN"/>
    <ns6:page rdf:resource="http://www.manchester.gov.uk/"/>
  </ns0:MetropolitanDistrictCouncil>
</rdf:RDF>

Multiple Resources as RDF/XML

A couple of local authorities serialised as RDF/XML look like this:

<?xml version="1.0"?>
<rdf:RDF
    xmlns:j.0="http://data.ordnancesurvey.co.uk/ontology/admingeo/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.1="http://xmlns.com/foaf/0.1/"
    xmlns:j.2="http://opendatacommunities.org/def/local-government/"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:j.3="http://statistics.data.gov.uk/def/administrative-geography/"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" > 
  <rdf:Description rdf:about="http://opendatacommunities.org/id/district-council/babergh">
    <j.2:governs rdf:resource="http://data.ordnancesurvey.co.uk/id/7000000000015692"/>
    <j.0:hasCensusCode>42UB</j.0:hasCensusCode>
    <j.2:billingAuthorityCode>E3531</j.2:billingAuthorityCode>
    <rdfs:label>Babergh</rdfs:label>
    <j.1:page rdf:resource="http://www.babergh.gov.uk"/>
    <rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/LocalAuthority"/>
    <rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/DistrictCouncil"/>
    <j.0:gssCode>E07000200</j.0:gssCode>
    <owl:sameAs rdf:resource="http://statistics.data.gov.uk/id/local-authority/42UB"/>
    <j.3:region rdf:resource="http://statistics.data.gov.uk/id/government-office-region/G"/>
    <rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://opendatacommunities.org/id/london-borough-council/barking-and-dagenham">
    <owl:sameAs rdf:resource="http://statistics.data.gov.uk/id/local-authority/00AB"/>
    <j.0:gssCode>E09000002</j.0:gssCode>
    <rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"/>
    <j.2:openlyLocalUrl rdf:resource="http://openlylocal.com/councils/19-London-Borough-of-Barking-Dagenham"/>
    <rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/LondonBoroughCouncil"/>
    <j.2:governs rdf:resource="http://data.ordnancesurvey.co.uk/id/7000000000010949"/>
    <rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/LocalAuthority"/>
    <j.0:hasCensusCode>00AB</j.0:hasCensusCode>
    <j.3:region rdf:resource="http://statistics.data.gov.uk/id/government-office-region/H"/>
    <j.1:page rdf:resource="http://www.lbbd.gov.uk/"/>
    <j.2:billingAuthorityCode>E5030</j.2:billingAuthorityCode>
    <rdfs:label>Barking and Dagenham</rdfs:label>
  </rdf:Description>
</rdf:RDF>

Mime Type

RDF/XML is typically requested and sent over the Web using a MIME type of application/rdf+xml.

RDF/XML Summary

In RDF/XML, the triples for each resource, are contained within <rdf:Description> nodes, with a sub-node for each property and its value (full spec here).

This format has the advantage that most programming languages have support for XML, and you can make use of XML namespaces to avoid having to use full URIs everywhere, which keeps the size down. On the other hand, I don’t find it that easy to read manually.

Turtle

Turtle (Terse RDF Triple Language) is an RDF-specific subset of Tim Berners-Lee’s Notation3 language.

A Single Resource as Turtle

The RDF triples about that single Manchester authority could be represented in Turtle like this:

@prefix gov: <http://opendatacommunities.org/def/local-government/> .
@prefix admingeo: <http://data.ordnancesurvey.co.uk/ontology/admingeo/> .
@prefix statsgeo: <http://statistics.data.gov.uk/def/administrative-geography/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://opendatacommunities.org/id/metropolitan-district-council/manchester>
   a gov:MetropolitanDistrictCouncil, gov:LocalAuthority, gov:CivilAdministrativeAuthority;
   rdfs:label "Manchester";
   admingeo:gssCode "E08000003";
   admingeo:hasCensusCode "00BN";
   gov:billingAuthorityCode "E4203";
   gov:governs <http://data.ordnancesurvey.co.uk/id/7000000000018821>;
   gov:openlyLocalUrl <http://openlylocal.com/councils/157-Manchester-City-Council>;
   statsgeo:region <http://statistics.data.gov.uk/id/government-office-region/B>;
   owl:sameAs <http://statistics.data.gov.uk/id/local-authority/00BN>;
   foaf:page <http://www.manchester.gov.uk/> .

Multiple Resources as Turtle

Multiple local authorities could be serialised in Turtle like this:

@prefix gov: <http://opendatacommunities.org/def/local-government/> .
@prefix admingeo: <http://data.ordnancesurvey.co.uk/ontology/admingeo/> .
@prefix statsgeo: <http://statistics.data.gov.uk/def/administrative-geography/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://opendatacommunities.org/id/district-council/babergh>
    a gov:LocalAuthority, gov:DistrictCouncil, gov:CivilAdministrativeAuthority ;
    rdfs:label "Babergh" ;
    admingeo:gssCode "E07000200" ;
    admingeo:hasCensusCode "42UB" ;
    admingeo:billingAuthorityCode "E3531" ;
    gov:governs <http://data.ordnancesurvey.co.uk/id/7000000000015692> ;
    statsgeo:region  <http://statistics.data.gov.uk/id/government-office-region/G> ;
    owl:sameAs <http://statistics.data.gov.uk/id/local-authority/42UB> ;
    foaf:page <http://www.babergh.gov.uk> .

<http://opendatacommunities.org/id/london-borough-council/barking-and-dagenham>
    a gov:LocalAuthority, gov:LondonBoroughCouncil, gov:CivilAdministrativeAuthority ;
    rdfs:label "Barking and Dagenham" ;
    admingeo:gssCode "E09000002" ;
    admingeo:hasCensusCode "00AB" ;
    gov:billingAuthorityCode "E5030" ;
    gov:governs <http://data.ordnancesurvey.co.uk/id/7000000000010949> ;
    gov:openlyLocalUrl <http://openlylocal.com/councils/19-London-Borough-of-Barking-Dagenham> ;
    statsgeo:region <http://statistics.data.gov.uk/id/government-office-region/H> ;
    owl:sameAs <http://statistics.data.gov.uk/id/local-authority/00AB> ;
    foaf:page <http://www.lbbd.gov.uk/> .

Mime Type

Turtle is typically requested and sent over the Web using a MIME type of text/turtle.

Turtle Summary

The URI for the each resource is followed by the predicates and objects of the triples about it (essentially, as key-value pairs). Each pair is separated by a semi-colon, and the information about a resource is closed off by a dot. Note that the whitespace is not important here, but for readability, the predicates and objects are often indented and appear on separate lines.

Turtle is my favourite RDF serialisation format: It’s fairly easy to read for humans due to the lack of punctuation noise, it groups together the triples about each resource, and it allows you to define common @prefixes. Its terseness also helps keep the amount of bandwidth required to communicate RDF down to a minimum, and it’s well supported by RDF toolkits and libraries.

Note that the standard encoding of Turtle is UTF-8 (though escaped Unicode is allowed in Turtle as well). Turtle literals can contain line breaks, using the """long literal""" approach. For more information, see the Turtle W3C submission.

N-Triples

N-Triples is a simplified version of Turtle.

A Single Resource as N-Triples

Here’s the RDF for the Manchester authority again, this time as N-triples.

<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/2000/01/rdf-schema#label> "Manchester" .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://opendatacommunities.org/def/local-government/openlyLocalUrl> <http://openlylocal.com/councils/157-Manchester-City-Council> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://xmlns.com/foaf/0.1/page> <http://www.manchester.gov.uk/> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/2002/07/owl#sameAs> <http://statistics.data.gov.uk/id/local-authority/00BN> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode> "00BN" .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://opendatacommunities.org/def/local-government/billingAuthorityCode> "E4203" .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/MetropolitanDistrictCouncil> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/LocalAuthority> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://opendatacommunities.org/def/local-government/governs> <http://data.ordnancesurvey.co.uk/id/7000000000018821> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://statistics.data.gov.uk/def/administrative-geography/region> <http://statistics.data.gov.uk/id/government-office-region/B> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> "E08000003" .

Multiple Resources as N-Triples

<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://opendatacommunities.org/def/local-government/billingAuthorityCode> "E0931" .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/2002/07/owl#sameAs> <http://statistics.data.gov.uk/id/local-authority/16UB> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://opendatacommunities.org/def/local-government/openlyLocalUrl> <http://openlylocal.com/councils/38-Allerdale-Borough-Council> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://opendatacommunities.org/def/local-government/governs> <http://data.ordnancesurvey.co.uk/id/7000000000013065> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/2000/01/rdf-schema#label> "Allerdale" .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/DistrictCouncil> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://xmlns.com/foaf/0.1/page> <http://www.allerdale.gov.uk> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://statistics.data.gov.uk/def/administrative-geography/region> <http://statistics.data.gov.uk/id/government-office-region/B> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> "E07000026" .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/LocalAuthority> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode> "16UB" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://opendatacommunities.org/def/local-government/governs> <http://data.ordnancesurvey.co.uk/id/7000000000012941> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/LocalAuthority> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://opendatacommunities.org/def/local-government/billingAuthorityCode> "E0932" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/2002/07/owl#sameAs> <http://statistics.data.gov.uk/id/local-authority/16UC> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/DistrictCouncil> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://xmlns.com/foaf/0.1/page> <http://www.barrowbc.gov.uk/> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> "E07000027" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode> "16UC" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/2000/01/rdf-schema#label> "Barrow-in-Furness" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://statistics.data.gov.uk/def/administrative-geography/region> <http://statistics.data.gov.uk/id/government-office-region/B> .

Mime Type

N-triples are typically requested and sent over the Web using a MIME type of text/plain or application/n-triples.

N-Triples summary

With N-triples, each triple appears on its own line, separated by a dot.

N-triples’ simplicity makes it easy for software to parse and generate, but it lacks some of the features of RDF/XML and Turtle (such as support for nested resources). Due to the repetition of the resource URIs, it’s not as compact as Turtle, and the triples for each resource aren’t necessarily grouped together which makes it harder to read by eye.

It’s also worth mentioning here that as N-triples is a text/plain literals are only allowed to contain US ASCII characters: non-ASCII Unicode characters need to be escaped (in contrast to Turtle whose standard encoding is UTF-8). Also, in N-triples, line-breaks always need to be escaped.

This section of the W3C Turtle submission gives a useful comparison between Turtle and N-Triples. You can also find more details about N-Triples here.

JSON

RDF can also be expressed in JavaScript Object Notation (JSON).

A Single Resource as JSON

{"http://opendatacommunities.org/id/metropolitan-district-council/manchester":{"http://www.w3.org/2000/01/rdf-schema#label":[{"type":"literal","value":"Manchester"}],"http://opendatacommunities.org/def/local-government/openlyLocalUrl":[{"type":"uri","value":"http://openlylocal.com/councils/157-Manchester-City-Council"}],"http://xmlns.com/foaf/0.1/page":[{"type":"uri","value":"http://www.manchester.gov.uk/"}],"http://www.w3.org/2002/07/owl#sameAs":[{"type":"uri","value":"http://statistics.data.gov.uk/id/local-authority/00BN"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode":[{"type":"literal","value":"00BN"}],"http://opendatacommunities.org/def/local-government/billingAuthorityCode":[{"type":"literal","value":"E4203"}],"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"type":"uri","value":"http://opendatacommunities.org/def/local-government/MetropolitanDistrictCouncil"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/LocalAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"}],"http://opendatacommunities.org/def/local-government/governs":[{"type":"uri","value":"http://data.ordnancesurvey.co.uk/id/7000000000018821"}],"http://statistics.data.gov.uk/def/administrative-geography/region":[{"type":"uri","value":"http://statistics.data.gov.uk/id/government-office-region/B"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode":[{"type":"literal","value":"E08000003"}]}}

Multiple Resources as JSON

{"http://opendatacommunities.org/id/district-council/allerdale":{"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"type":"uri","value":"http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/DistrictCouncil"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/LocalAuthority"}],"http://opendatacommunities.org/def/local-government/billingAuthorityCode":[{"type":"literal","value":"E0931"}],"http://www.w3.org/2002/07/owl#sameAs":[{"type":"uri","value":"http://statistics.data.gov.uk/id/local-authority/16UB"}],"http://opendatacommunities.org/def/local-government/openlyLocalUrl":[{"type":"uri","value":"http://openlylocal.com/councils/38-Allerdale-Borough-Council"}],"http://opendatacommunities.org/def/local-government/governs":[{"type":"uri","value":"http://data.ordnancesurvey.co.uk/id/7000000000013065"}],"http://www.w3.org/2000/01/rdf-schema#label":[{"type":"literal","value":"Allerdale"}],"http://xmlns.com/foaf/0.1/page":[{"type":"uri","value":"http://www.allerdale.gov.uk"}],"http://statistics.data.gov.uk/def/administrative-geography/region":[{"type":"uri","value":"http://statistics.data.gov.uk/id/government-office-region/B"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode":[{"type":"literal","value":"E07000026"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode":[{"type":"literal","value":"16UB"}]},"http://opendatacommunities.org/id/district-council/barrow-in-furness":{"http://opendatacommunities.org/def/local-government/governs":[{"type":"uri","value":"http://data.ordnancesurvey.co.uk/id/7000000000012941"}],"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"type":"uri","value":"http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/LocalAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/DistrictCouncil"}],"http://opendatacommunities.org/def/local-government/billingAuthorityCode":[{"type":"literal","value":"E0932"}],"http://www.w3.org/2002/07/owl#sameAs":[{"type":"uri","value":"http://statistics.data.gov.uk/id/local-authority/16UC"}],"http://xmlns.com/foaf/0.1/page":[{"type":"uri","value":"http://www.barrowbc.gov.uk/"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode":[{"type":"literal","value":"E07000027"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode":[{"type":"literal","value":"16UC"}],"http://www.w3.org/2000/01/rdf-schema#label":[{"type":"literal","value":"Barrow-in-Furness"}],"http://statistics.data.gov.uk/def/administrative-geography/region":[{"type":"uri","value":"http://statistics.data.gov.uk/id/government-office-region/B"}]}}

Mime Type

JSON is typically requested and sent over the Web using a MIME type of application/json.

JSON Summary

This format serialises the triples by defining JavaScript objects, the identifier for each being the URI for the resource. Sub-objects are created for each predicate, the values being an array of the triples for that predicate. For each object of the triple, there is a JavaScript object with properties for its type (i.e. URI or literal), and actual value.

It’s not an official standard (the example above is an output from the rdf.rb ruby library, which is based on Talis’s RDF JSON spec), but many Linked Data sites are now supporting JSON as a serialisation format. It’s becoming popular due to the ease with which this kind of data can be consumed in JavaScript web applications. It’s not as easy to read by eye as Turtle (to me, at least), but it’s still not too bad in that respect. The limited punctuation also helps to keep the size down.

Conclusion

In this article, we’ve covered the most common serialisation formats for RDF triples. Hopefully it has helped you to understand the benefits and pitfalls of each, so that you can make an informed decision about which to choose for publishing or consuming your Linked Data.

If you’d be interested in more articles like this, please subscribe to our RSS feed, Twitter updates, or email alerts. We’ll try to keep the articles coming fairly regularly.

*In the interest of full disclosure: I should probably mention that my company worked on the OpenDataCommunities site.

Posted at 13:12

May 04

W3C Semantic Web News: The PROV ontology – an update

The W3C Provenance working group has released a new set of working drafts for the PROV standard. In this post we present a brief overview of the PROV ontology (PROV-O) using an example from the PROV primer.
The core classes of PROV include the prov:Entity, prov:Activity and prov:Agent. Using these, one can describe the provenance of a resource in a brief, step-by-step manner. The example below, as explained in the primer, shows how the chart ex:chart1 in a fictional news article about crime figures has been made by ex:Derek, who has also composed data items ex:dataSet1 and ex:regionList. Click the image to see it full-size.

example provenance graph

An entity in PROV is a physical, digital, conceptual, or other kind of thing; real or imaginary. An entity should have some fixed aspects in order to state some provenance information about it. An activity is something that actually occurred over a period of time, and an agent is something or someone which was responsible for or otherwise associated with what happened in an activity. Entities can be attributed to agents who were responsible for their generation.
For the chart ex:chart1 we can start by stating who created the chart (prov:wasAttributedTo) and how it was created (prov:wasGeneratedBy). We can then say more by pointing out which data sources were used to create the chart, and then provide details about the provenance of these data sources, just like what we did for the chart.

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex: <http://example.com/article1?prov#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

ex:chart1      a                      prov:Entity ;
               prov:wasGeneratedBy    ex:illustrate ;
               prov:wasAttributedTo   ex:derek .

ex:derek       a                      prov:Person, prov:Agent ;
               foaf:givenName         "Derek" ;
               foaf:mbox              <mailto:derek@example.org> ;
               prov:actedOnBehalfOf   ex:chartgen .

ex:chartgen    a                      prov:Organization, prov:Agent ;
               foaf:name              "Chart Generators Inc" .

ex:illustrate  a                      prov:Activity ;
               prov:used              ex:composition ;
               prov:wasAssociatedWith ex:derek .

ex:composition a                      prov:Entity ;
               prov:wasGeneratedBy    ex:compose .

ex:compose     a                      prov:Activity ;
               prov:used              ex:dataSet1, ex:regionList ;
               prov:wasAssociatedWith ex:derek .

ex:dataSet1    a                      prov:Entity .
ex:regionList  a                      prov:Entity .

PROV-O is based on an activity-driven model. It is generic and can describe any provenance where the individual steps and agents are known. If more specifics are needed, PROV-O can be used as an extension or bridging point for defining or aligning domain specific subclasses:

@prefix ext:   <http://schema.example.com/prov-extension#>.
ext:DataSet    rdfs:subClassOf prov:Entity .
ext:Illustrate rdfs:subClassOf prov:Activity .

PROV-O allows binary relations like prov:used and prov:wasAssociatedWith to be qualified using their corresponding involvement classes, such as prov:Use and prov:Association, in order to specify additional attributes about these relations, like role, time, location or other domain-specific attributes:

ex:chart1 prov:wasGeneratedBy ex:illustrate ;
    prov:qualifiedGeneration [
       a prov:Generation ;
       prov:activity ex:illustrate ; # object of qualified wasGeneratedBy
       prov:atTime "2011-07-16T01:52:02Z"^^xsd:dateTime ;
       prov:atLocation <http://dbpedia.org/resource/Madrid> ;
       ext:colours ext:red, ext:blue ;
       ext:tool <http://dbpedia.org/resource/gephi>
] .

Sometimes not enough details are known to describe complete activity-agent-entity interactions or doing so becomes too verbose. PROV-O provides options to describe some indirect entity-entity and agent-agent relations, which are also important for understanding the history of the resources or regarded as shortcuts to the above activity-driven statements. PROV includes a predefined set of such relations for common use cases, such as derivation, attribution, quotation, responsibility, specialization and dictionaries.
The example below captures the core information from our earlier example, but does not show details such as how the region list was combined with the dataset.
ex:chart1 prov:wasDerivedFrom ex:dataSet1 ;
prov:tracedTo ex:regionList ;
prov:wasAttributedTo ex:derek .

For the purpose of tracking provenance, an entity in PROV has some fixed aspects as well as some changeable aspects. For instance a new crime chart created by using  an updated data set could be regarded as a new entity, ex:chart2, and have the following provenance information:

ex:chart2   prov:wasDerivedFrom ex:dataSet2 ;
            prov:wasRevisionOf  ex:chart1 .
ex:dataSet2 prov:wasRevisionOf  ex:dataSet1 .

These kind of structures in PROV allow asserters to transition from a high-level overview of an entity’s history to a granular provenance trace.
We hope that you will explore the PROV models and consider adapting the future standards in your products. For an in-depth introduction to PROV, see the PROV primer, for ontology details and the OWL file, see PROV-O, and for the underlying data model, see the PROV Data Model.
The W3C Provenance working group is seeking feedback from the wider community on the PROV working drafts. Please send any comments to public-prov-wg@w3.org (subscribe, archives) or use the Twitter hashtag #provwg.

Posted at 09:26

May 03

Leigh Dodds: Layered Data: A Paper & Some Commentary

Two years ago I wrote a short paper about “layering” data but for various reasons never got round to putting it online. The paper tried to capture some of my thinking at the time about the opportunities and approaches for publishing and aggregating data on the web. I’ve finally got around to uploading it and you can read it here.

I’ve made a couple of minor tweaks in a few places but I think it stands up well, even given the recent pace of change around data publishing and re-use. I still think the abstraction that it describes is not only useful but necessary to take us forward on the next wave of data publishing.

Rather than edit the paper to bring it completely up to date with recent changes, I thought I’d publish it as is and then write some additional notes and commentary in this blog post.

You’re probably best off reading the paper, then coming back to the notes here. The illustration referenced in the paper is also now up on slideshare.

RDF & Layering

I see that the RDF Working Group, prompted by Dan Brickley, is now exploring the term. I should acknowledge that I also heard the term “layer” in conjunction with RDF from Dan, but I’ve tried to explore the concept from a number of perspectives.

The RDF Working Group may well end up using the term “layer” to mean a “named graph”. I’m using the term much more loosely in my paper. In my view an entire dataset could be a layer, as well as some easily identifiable sub-set of it. My usage might therefore be closer to Pat Hayes’s concept of a “Surface”, but I’m not sure.

I think that RDF is still an important factor in achieving the goal I outlined of allowing domain experts to quickly assemble aggregates through a layering metaphor. Or, if not RDF, then I think it would need to be based around a graph model, ideally one with a strong notion of identity. I also think that mechanisms to encourage sharing of both schemas and annotations are also useful. It’d be possible to build such a system without RDF, but I’m not sure why you’d go to the effort.

User Experience

One of the things that appeals to me about the concept of layering is that there are some nice ways to create visualisation and interfaces to support the creation, management and exploration of layers. It’s not hard to see how, given some descriptive metadata for a collection of layers, you could create:

  • A drag-and-drop tool for creating and managing new composite layers
  • An inspection tool that would let you explore how the dataset for an application or visualisation has been constructed, e.g. to explore provenance or to support sharing and customization. Think “view source” for data aggregation.
  • A recommendation engine that suggested new useful layers that could be added to a composite, including some indication of what additional query options might become available

There’s been some useful work done on describing datasets within the Linked Data community: VoiD and DCat for example. However there’s not yet enough data routinely available about the structure and relationships of individual datasets, nor enough research into how to provide useful summaries.

This is what prompted my work on an RDF Report Card to try and move the conversation forward beyond simply counting triples.

To start working with layers, we need to understand what each layer contains and how they relate to and complement one another.

Linked Data & Layers

In the paper I suggest that RDF & Linked Data alone aren’t enough and that we need systems, tools and vocabularies for capturing the required descriptive data and enabling the kinds of aggregation I envisage.

I also think that the Linked Data community is spending far too much effort on creating new identifiers for the same things and worrying how best to define equivalences.

I think the leap of faith that’s required, and that people like the BBC have already taken, is that we just need to get much more comfortable re-using other people’s identifiers and publishing annotations. Yes, there will be times when identifiers diverge, but there’s a lot to be gained, especially in terms of efficiency around data curation from just focusing on the value-added data, not re-publishing any copy of a core set of facts.

There are efficiency gains to be had from existing businesses, as well as faster routes to market for startups, if they can reliably build on some existing data. I suspect that there are also businesses that currently compete with one another — because they’re having to compile or re-compile the same core data assets — that could actually complement one another if they could instead focus on the data curation or collection tasks at which they excel.

Types of Data

In the paper I set out seven different facets which I think cover the majority of types of data that we routinely capture and publish. I think the classification could be debated, but I think its a reasonable first attempt.

The intention is to try and illustrate that we can usefully group together different types of data. And organisations may be particularly good at creating or collecting particular types of data. There’s scope for organisations to focus on being really good in a particular area and by avoiding needless competition around collecting and re-collecting the same core facts, there are almost certainly efficiency gains and cost savings to be had.

I’m sure there must be some prior work in this space, particularly around the core categories, so if anyone has pointers please share them.

There are also other ways to usefully categorise data. One area that springs to mind is how the data itself is collected, i.e. its provenance. E.g. is it collected automatically by sensors, or as a side-effect of user activity, or entered by hand by a human curator? Are those curators trained or are they self-selected contributors? Is the data derived from some form of statistical analysis?

I had toyed with provenance as a distinct facet, but I think its an orthogonal concern.

Layering & Big Data

A lot has happened in the last two years and I winced a bit at all of the Web 2.0 references in the paper. Remember that? If I were writing this now then the obvious trend to discuss as context to this approach is Big Data.

Chatting with Matt Biddulph recently he characterised a typical Big Data analysis as being based on “Activity Data” and “Reference Data”. Matt described reference data as being the core facts and information on top of which the activity data — e.g. from users of an application — is added. The analysis then draws on the combination to create some new insight, i.e. more data.

I referenced Matt’s characterisation in my Strata talk (with acknowledgement!). Currently Linked Data does really well in the Reference category but there’s not a great deal of Activity data. So while its potentially useful in a Big Data world, there’s a lot of value still not being captured.

I think Matt’s view of the world chimes well with both the layered data concept and the data classifications that I’ve proposed. Most of the facets in the paper really define different types of Reference data. The outcome of a typical Big Data analysis is usually a new facet, an obvious one being “Comparative” data, e.g. identifying the most popular, most connected, most referenced resources in a network.

However there’s clearly a different in approach between typical Big Data processing and the graph models that I think underpin a layered view of the world.

MapReduce workflows seem to work best with more regular data, however newer approaches like Pregel illustrate the potential for “graph-native” Big Data analysis. But setting that aside, there’s no real contention as a layering approach to combining data doesn’t say anything about how the data must actually be used: it can be easily projected out into structures that are amenable for indexing and processing in different ways.

Kasabi

Looking at the last section of the paper it should be obvious that much of the origin of this analysis was early preparation for Kasabi.

I still think that there’s a great deal of potential to create a marketplace around data layers and tools for interacting with them. But we’re not there yet though for several reasons. Firstly its taken time to get the underlying platform in place to support that. We’ve done that now and you can expect more information on that from more official sources shortly. Secondly I under estimated how much effort is still required to move the market forward: there’s still lots to be done to support organisations in opening up data before we can really explore more horizontal marketplaces. But that is a topic for another post.

This has been quite a ramble of a blog post but hopefully there are some useful thoughts here that chime with your own experience. Let me know what you think.

Posted at 18:08

W3C Semantic Web News: What is new in the Fourth Working Draft of the PROV provenance model?

The Provenance Working Group has released the fourth public working draft of its data model. The purpose of this blog is to summarize the changes that occurred since the third working draft.

From an editorial perspective, three significant changes took place since the last release.

  1. The document has been reorganized into three separate documents. The data model document focuses on defining the vocabulary, in terms of its types and relations. A second document lists the contraints that should be checked to determine if provenance descriptions are valid. Finally, a third documentpresents the details of the PROV notation aimed at human consumption
  2. Each concept is defined with a simple English definition. A few starting points of the data model are presented early and used to illustrate the data model on an example.
  3. The types and relations of the data model are structured in a set of six components, respectively dealing with: (1) entities and activities, and the time at which they were created, used, or ended; (2) agents bearing responsibility for entities that were generated and activities that happened; (3) derivations of entities from entities; (4) properties to link entities that refer to the same thing; (5) collections forming a logical structure for its members; (6) a simple annotation mechanism.

As far as the data model is concerned, our aim to simplify its various concepts has paid, result in a data model that is more mature and stable. Key highlights include:

  • We simplified notion of derivation (and its subypes);
  • We clarified what identifiers denote;
  • We introduced a notion of entity invalidation, an event that marks the end of an entity’s lifetime;
  • We dropped the idea that accounts can be nested.

The fourth working draft includes these changes, and we feel that the data model, expressed according to various technologies, e.g. rdf, xml, json, is now usable. Examples of provenance can be expressed concisely for simple use cases, but the model is also expressive enough to tackle sophisticated ones. Tools are now being developed to manipulate PROV representations. Ultimately, the data model offers a vocabulary, consisting of 22 or so terms. The use of this vocabulary is essentially unconstrained. To help developers, a notion of valid provenance has been defined; a set of of constraints have to be satisfied for provenance assertions to be valid.

The PROV Working Group has decided to produce a synchronized release of most of its documents, including a PROV primer, and a PROV ontology. See Paul’s blog for an overview of these documents.

Work on the next working draft has already begun. Our aim for the data model is to address the remaining technical issues related to provenance of provenance, further simplification of the data model. When complete, I will blog again about it.

 

Posted at 17:14

W3C Semantic Web News: PROV: synchronized and ready for your input

Back in January, the Provenance Working Group released a series of draft specifications: the PROV family of specs. Since then, we’ve been working hard to simplify, organize and improve those specifications to enable the interchange of provenance information on the Web. The group is happy to release a complete set of specifications for modeling provenance information designed for the Web. These specifications are synchronized and can be read and used as a whole.

We’ve included starting points, more examples, clearer guidance and a modular structure. In the rest of this post we’ll walk you through the 3 specifications that make up this family.

An Overview

The best place to start is the PROV-PRIMER, this provides an intuitive overview of how to use PROV to describe provenance. Key concepts are illustrated using a newspaper scenario and examples are given in turtle.

If your aiming to use PROV in a Semantic Web application your next stop will be PROV-O. This is the OWL version of the PROV Data Model and defines all the classes and properties to interchange provenance. The document provides not only a reference to the ontology itself but also a guide to its components including examples in turtle. You can download the ontology itself, which has extensive documentation.

To help ensure that PROV can be used in a variety of technology settings, the model itself is defined in a serialization independent way in the PROV-DM. This document is the core reference for PROV and provides natural language definitions and examples of all concepts in PROV. It is complemented by PROV-N and PROV-CONSTRAINTS. PROV-N is a helpful notation particularly designed for people to write examples of PROV. Finally, PROV-CONSTRAINTS provides a definitions of constraints and inferences that can be applied to provenance information represented in PROV.

Approaching Last Call

The working group is now looking for your feedback. We believe that we are approaching a spec that’s ready to be used and spread around the Web. This is your chance to give us input. Will it work for your application? Is anything missing? Does something not fit? Send us your input at public-prov-comments@w3.org. We look forward to seeing many prov: in your applications and pages.

Posted at 16:54

W3C Semantic Web News: Five Provenance Drafts Published

The Provenance Working Group published 5 Working Drafts today related to the PROV data model. Provenance information can be used for many purposes, such as understanding how data was collected so it can be meaningfully used, determining ownership and rights over an object, making judgments about information to determine whether to trust it, verifying that the process and steps used to obtain a result complies with given requirements, and reproducing how something was generated. The PROV model is used to represent provenance records, which contain descriptions of the entities and activities involved in producing and delivering or otherwise influencing a given object.

  • PROV-DM: The PROV Data Model introduces the provenance concepts found in PROV and defines PROV-DM types and relations.
  • Constraints of the Provenance Data Model introduces a further set of concepts useful for understanding the PROV data model and defines inferences that are allowed on provenance statements and validity constraints that PROV instances should follow. These inferences and constraints are useful for readers who develop applications that generate provenance or reason over provenance. (First Public Working Draft)
  • PROV-N: The Provenance Notation allows serializations of PROV instances to be created in a compact manner. (First Public Working Draft)
  • PROV-O: The PROV Ontology expresses the PROV Data Model using the OWL2 Web Ontology Language (OWL2).
  • PROV Model Primer provides an intuitive introduction and guide to the PROV specification for provenance on the Web.

Posted at 14:23

May 02

W3C Semantic Web News: Three SPARQL 1.1 Last Call Drafts Published

The SPARQL Working Group published three Last Call Working Drafts:

  • SPARQL 1.1 Overview, which provides an introduction to a set of W3C specifications that facilitate querying and manipulating RDF graph content on the Web or in an RDF store.
  • SPARQL 1.1 Graph Store HTTP Protocol, whichd escribes the use of HTTP operations for the purpose of managing a collection of RDF graphs in the REST architectural style.
  • SPARQL 1.1 Query Results CSV and TSV Formats, which describes the use of
    CSV(comma separated values) and TSV (tab separated values) for expressing SPARQL query results from SELECT queries.

Comments are welcome through 01 June.

The group is further planning to shortly release a 2nd Last Call working draft of the SPARQL 1.1 Query Language, after which we plan to advance all Recommendation track drafts in the next iteration to Proposed Recommendation directly. To this end, the group is currently gathering implementation reports and would appreciate reports from the community of implementations of any of the SPARQL1.1 specifications.

Posted at 07:32

April 30

Sebastian Trueg: Nepomuk Tasks: KActivityManager Crash

After a little silence during which I was occupied with Eastern and OpenLink related work I bring you news about the second

Posted at 16:31

W3C Read Write Web Community Group: Read Write Web — Monthly Open Thread — (April 2012)

Summary

A big month all round.  April saw over 2500 people attend the WWW conference, in Lyon.  An excellent wrap-up from Yves Raimond, is available here.  If you haven’t had a chance to see it yet, the keynote is still available online.

“One of the main messages from the panel is that structured web data is already mainstream – Yahoo! reports that 25% of all web pages contain RDFa data and 7% contain Microdata. ”

 

Communications and Outreach

Our google page launch, was well received with 59 likes, so far.  Thanks to everyone that has helped or contributed.  Jürgen Jakobitsch has challenged us to reach 120 circles by the end of the year, so, keep spreading the word!

The Read Write Web CG blog is now syndicated out planet rdf, which is a chance for a wider linked data audience, to see what’s going on in the RWW.

 

Community Group

The CG welcomes new participants from MetaSolutions, Institut Telecom, University of Leipzig, Seoul National University and the University of Florida.   I know that some of the new members are top experts, in the payments and online currencies field, so it’s great to have that expertise on board!

The wiki has been updated in some areas.  As with most wikis, small incremental changes seem to work best.  We have a new page covering several Social Systems and a stub page used to collect Screencasts.  Additionally, the Global Square (occupy movement) have told us are keen to use RWW standards in their upcoming drupal based social project.

Applications

The big news this month is that the first RWW interoperation, between hetrogeneous social networks was achieved, using the semantic pingback protcol.  Congratulations to Andrei Sambra (My Profile) and Kingsley (Openlink Data Spaces) for reaching this big milestone.  In the coming weeks, we hope to see more social networks join the system via pingback, including work being done on bergnet and tabulator/data.fm, and hopefully many more!

Great work from the team behind My Profile for putting this fantastic new RWW social system live.  The source code is also available on github.  For those that have not yet seen it, you can sign up here:

https://my-profile.eu/profile.php

Although My Profile is WebID based, it should be pointed out that the universal nature allows it to be extended to almost any login method.  The philosophy of using HTTP URIs to describe things, that has served Facebook and others so well, is just a staring point, rather than, a closed loop.

The system includes, (FOAF) profile creation and edit, personal wall, subscriptions, private messaging, public wall, certificate generation, friends list and lookup, federated login using your own FOAF, an application platform, cross platform messaging and much more.  Do check it out!

 

Last but not least…

A picture speaks a thousand words.  Kudos to Sarven Capadisli for this innovative use of his WebID in the DERI cafe!

Posted at 12:55

Norm Walsh: Easthampton extension

How far does the Easthampton extension go?

Posted at 11:10

Copyright of the postings is owned by the original blog authors. Contact us.