It's triples all the way down
Surfing around the internet I recently discovered SURF‘s InContext Visualiser, which I think is a neat way to visulaise of RDF relationships, especially OAI-ORE aggregated publications
I also discovered that people have already created a set of WordPress plugins (see: http://ep-books.ehumanities.nl/ ) to visualise books and other similar publications. However blogs do not fit into a book/chapter model.
However given there is already a schema for publishing blog data and my lh-rdf plugin already exposes most publicly available WordPress blog data as RDF using that format. It was an obvious next step to get the visualiser working with the LH RDF output. I have done so and hopefully you think the output is cool.
http://shawfactor.com/wp-content/plugins/lh-rdf/visualisation.php
I have bundled this visualiser with the lh rdf plugin, and in time I will polish it up and add shortcode support so it can be more easily be embedded in posts and pages.
Posted at 15:16
Posted at 20:38
Yesterday Google announced a very interesting resource with 175M short, unique text strings that were used to refer to one of 7.6M Wikipedia articles. This should be very useful for research on information extraction from text.
“We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia’s groupings of articles into hierarchical categories.
The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article’s canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept’s url. Our database thus includes weights that measure degrees of association.”
The details of the data and how it was constructed are in an LREC 2012 paper by Valentin Spitkovsky and Angel Chang, A Cross-Lingual Dictionary for English Wikipedia Concepts. Get the data here.
Posted at 16:02

The Google’s Knowledge Graph showed up for me this morning — it’s been slowly rolling out since the announcement on Wednesday. It builds lots of research from human language technology (e.g., entity recognition and linking) and the semantic web (graphs of linked data). The slogan, “things not strings”, is brilliant and easily understood.
My first impression is that it’s fast, useful and a great accomplishment but leaves lots of room for improvement and expansion. That last bit is a good thing, at least for those of us in the R&D community. Here are some comments based on some initial experimentation.
GKG only works on searches that are simple entity mentions like people, places, organizations. It doesn’t do products (Toyota Camray), events (World War II), or diseases (diabetes) but does recognize that ‘Mercury’ could be a planet or an element.
It’s a bit aggressive about linking: when searching for “John Smith” it zeros in on the 17th century English explorer. Poor Professor Michael Jordan never get a chance, and providing context by adding Berkeley just suppresses the GKG sidebar. “Mitt” goes right to you know who. “George Bush” does lead to a disambiguation sidebar, though. Given that GKG doesn’t seem to allow for context information, the only disambiguating evidence it has is popularity (i.e., pagerank).
Speaking of context, the GKG results seem not to draw on user-specific information, like my location or past search history. When I search for “Columbia” from my location here in Maryland, it suggests “Columbia University” and “Columbia, South Carolina” and not “Columbia, Maryland” which is just five miles away from me.
Places include not just GPEs (geo-political entities) but also locations (Mars, Patapsco river) and facilities (MOMA, empire state building). To the GKG, the White House is just a place.
Organizations seem like a weak spot. It recognizes schools (UCLA) but company mentions seem not to be directly handled, not even for “Google”. A search for “NBA” suggests three “people associated with NBA” and “National Basketball Association” is not recognized. Forget finding out about the Cult of the Dead Cow.
Mike Bergman has some insights based on his exploration of the GKG in Deconstructing the Google Knowledge Graph
The use of structured and semi-structure knowledge in search is an exciting area. I expect we will see much more of this showing up in search engines, including Bing.
Posted at 14:43
Just recently Google has launched the ‘Knowledge Graph‘ (GKG) which “understands real-world entities and their relationships to one another: things, not strings.” Has Google hi-jacked the idea of the ‘Semantic Web’ or at least its vocabulary?
Sean Golliher has compared the most central concepts of the SemWeb community to the wording of Google in his blog post, for instance: Google doesn´t talk about ‘Linked data’ or ‘URIs’ but rather about ‘things and their relationships’. We don´t know if Google uses standards like RDF but obviously a lot of concepts and ideas developed by the SemWeb community in recent years were implemented in GKG. Some people complain that Google should clearly state that this is an implementation of the ‘Semantic Web’ (which was not invented by Google), others say that most concepts like ‘taxonomies’ have been around for hundreds of years anyway.
I believe that both sides have now a great chance to work together: Whether Google’s goal, to “build the next generation of search, which taps into the collective intelligence of the web and understands the world a bit more like people do”, can be reached or not is a matter of the intelligence of the employees. A lot of potential can be found within the semantic web community: If Google gives credit where it is due, semantic web people will be a bit more inspired to support an eco-system built around GKG – and it won´t last long until an ‘Open Knowledge Graph’ will fit together with Google´s revenue model.
Posted at 08:51
Posted at 15:33
Google announced its “knowledge graph” today and describes it as “an intelligent model—in geek-speak, a ‘graph’ — that understands real-world entities and their relationships to one another: things, not strings. … It currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects. And it’s tuned based on what people search for, and what we find out on the web.” Information from the knowledge graph will initially augment search results — the feature is already being rolled out to US English users. A short video explains more.
A CNET article quotes KG project manager Jack Menzel: Menzel pitches Knowledge Graph without using the word “semantic” even once. While he says, “I dream of the semantic Web,” he takes pains to point out that what Google is announcing today is not what people talk about when they discuss semantic Web concepts. “We do continue to work on how to make search semantic,” he says, “but talking about it brings out the crazy people.” I hope this did not come out the way he intended it to.
Posted at 03:54
I chatted recently with Olivier Thereaux, Yves Raimond (senior technologist in R&D), and Silver Oliver (data architect) of the BBC about the Web, publishing, and linked data.
Ian: The BBC is prolific and large. How do you view yourselves?
Silver: The BBC is primarily a broadcasting organization. Content is developed or commissioned within different editorial domains (such as News or Music or Sports) then distributed through diverse channels (TV, Radio, web, apps, etc). This fragmentation exists also on the web, with development of individual sites being largely delegated to dedicated teams.
Ian: How do you move beyond silos?
Yves: We have a lot of data that we are now using to draw connections among various BBC TV and radio programs and entities in other domains, like music or nature. We also expose the corresponding data. For example the programmes site exposes data views giving details about all the music tracks played in a given radio programme, and those details link to (and draw from) artist profiles on the BBC's music site… which themselves are also available as data views.
Olivier: We also reuse data that's available on the Web (e.g., from musicbrainz and wikipedia). Because the public is curating the information they can update it more rapidly than we could on our own. In a way, the Web is our Content Management System.
Ian: What are you using to aggregate and expose the data?
Yves: For the programmes and music site we use a relational database internally but then we expose the information in RDF.
Olivier: And we benefit from the ways that people have innovated around the RDF data we expose. When people play with the interfaces and massage the data, we can build on their experience.
Ian: Why not use RDF internally?
Yves: I think the main reason is that the people who originally built these sites site were unaware of RDF, or were concerned about using an unfamiliar technology on such a big project. But we use it with other projects.
Ian: How has your uses of data affected reporting?
Silver: In the past our editorial efforts have been captured in whole HTML documents. This causes problems for reuse in new data views and across platforms and applications (including IPTV). The key is in working with existing editorial workflows to capture a sub-set of machine readable information. In its simplest form this might be a byline and small number of tags the story is about.
Ian: How do reporters use the data to make connections between stories?
Silver: Connections have always happened, but it didn't scale. Linking between sports and news was a manual process and reliant on a journalist's knowledge of BBC output. But now we have rich data models behind the scenes. These models help the BBC editorial staff represent their understanding of the world and our audience's interests, and let us make connections in a scalable fashion.
Olivier: The data is a substrate that pre-populates a lot of the site, and then journalists can focus on the stories and not re-entering the data bits.
Silver: In sport, for example, we pay for the sport data (fixtures, results and statistics) then we write stories about match reports, and tagging ensures that everything gets linked properly. That's how we built the sites for the 2010 world cup or the 2012 olympics.
Ian: Do the reporters add data to the system directly?
Silver: Yes, we ask them to tag the stories they pull together so that we can put those stories into different contexts (or aggregations). We were quite happy to realize the natural curatorial process was already happening, we just needed to give people a way to capture data.
Ian: You mentioned buying and using data from various sources, including commercial ones. Do you make use of data provenance information?
Yves: We need to be very transparent about where our data comes from. Our reporters, partners, official organisations, sometimes our audience too.
Olivier: There is an interesting tension between making use of provenance information and ensuring user privacy. These days people expect to receive personalized content. To achieve that we make use of "attention data": what you watch or like. We have been looking at how to guarantee that we uphold privacy while at the same time asking for the minimal amount of information to tailor the best experience. That's probably less about "Do Not Track" and closer to the spirit of W3C's older P3P technology. On the other hand, we want to know whether information is reliable. This is challenging for user-generated content in particular: who is the user? how much do we trust them?
Ian: Do you think making provenance information available to readers can help digital literacy?
Silver: We had an interesting debate internally whether to include links from health stories to the journals that published the original research. Some felt that readers would not be interested in the links or would find the research complex. Others encouraged the links so that the community could respond to our articles with their own interpretations, including challenging the articles from various angles. This, in turn, would generate more discussion and perspectives from a much larger audience.
Ian: How did it turn out?
Silver: In stories about politics we have begun to include links to relevant legislation. And we are exploring how to extend the linking to pull in data from these sources to weave into BBC story-telling. For example data about committees that commented on bills, which members of parliament commented, and so on. These data models allow us to make more connections among stories, as we discussed earlier.
Ian: This sounds like a linked data project!
Silver: Internally we have wholesale signed up for and understood the value of linked data as way to manage our organizational complexity. We will draw data from various sources and use RDF to stitch them together. We can make use of the information in ways we could not do before because it was either too costly or unmanageable. Semantic Web technology is now core to our strategy as an enterprise.
Ian: Have you measured cost savings by using Semantic Web technology?
Silver: It's still too early to say. There were costs associated with our initial projects, since we needed to acquire expertise. But we have since been able to roll out highly trafficked BBC content using Semantic Web technology.
Ian: Thank you all so much for your time!
Posted at 08:24
<div itemscope itemtype="http://schema.org/Movie">
<h1 itemprop="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
<span itemprop="description">Jack Sparrow and Barbossa embark on[...]</span>
<div itemprop="actor" itemscope itemtype="http://schema.org/Person">
<span itemprop="name">Johnny Depp</span>
<link itemprop="nationality" href="http://en.wikipedia.org/wiki/United_States"/>
</div>
</div>
Posted at 20:49
Digg just announced that Digg Engineering Team Joins SocialCode and The Washington Post reported SocialCode hires 15 employees from Digg.com
This acquihire does NOT include me. I will be changing jobs shortly but have nothing further to announce at this time.
I wish my former Digg colleagues the best of luck in their new roles. I had a great time at Digg and learned a lot about working in a small company, social news, analytics, public APIs and the technology stack there.
Posted at 15:20
Two widely used data formats on the Web are CSV and JSON. In order to enable fine-grained access in an hypermedia-oriented fashion I’ve started to work on
Posted at 15:03
The W3C launched the new Linked Data Platform (LDP) Working Group to promote the use of linked data on the Web. Per its charter, the group will explain how to use a core set of services and technologies to build powerful applications capable of integrating public data, secured enterprise data, and personal data. The platform will be based on proven Web technologies including HTTP for transport, and RDF and other Semantic Web standards for data integration and reuse. The group will produce supporting materials, such as a description of uses cases, a list of requirements, and a test suite and/or validation tools to help ensure interoperability and correct implementation.
Posted at 09:57
The RDF Web Applications Working Group has published three Proposed Recommendations for RDFa Core 1.1, RDFa Lite 1.1 and XHTML+RDFa 1.1.
Together, these documents outline the vision for RDFa in a variety of XML and HTML-based Web markup languages. RDFa Core 1.1 specifies the core syntax and processing rules for RDFa 1.1 and how the language is intended to be used in XML documents. RDFa Lite 1.1 provides a simple subset of RDFa for novice web authors. XHTML+RDFa 1.1 specifies the usage of RDFa in the XHTML markup language. The group also published a draft of the RDFa 1.1 Primer today.
Posted at 17:14
Posted at 23:59
Posted at 23:59
PoolParty PowerTagging (PPP) is on its way: By extending Confluence´s label management, new application scenarios which make use of content recommendation and semantic indexing will be supported soon. PPP will be published at this year´s Atlassian Summit and at SemTechBiz in San Francisco at the beginning of June.
Tagging is still not a very popular task, especially in corporate environments. Many users don´t see the benefit of creating metadata to describe the actual content. A typical counter-argument to social tagging is that there are too many words for the same thing. “Even if I am tagging very hard my colleagues won´t find necessarily my pages because they will use different words to search for the content. I don´t have enough time to insert ‘New York City’, ‘NYC’, ‘Big Apple’ etc. as labels”.
The result: Tagging facilities of enterprise software platforms like Confluence are rarely used and don´t help to index content at all. Search is mostly based on classical full-text indexing. Semantic search as seen more and more on the WWW has still not entered the enterprise realm.
The Solution: thesaurus based
indexing
W3C´s Semantic Web technology stack provides means to define controlled vocabularies like thesauri which results into more and more tools and data which make use of standards like SKOS. Tagging based on thesauri means that concepts are attached to pages & documents rather than putting labels on them. Labels like ‘New York City’, ‘NYC’ and ‘Big Apple’ refer to the same concept, thus it should be sufficient if one of the various terms is used for labeling, all the other names of this certain concept should be attached automatically.
PoolParty PowerTagging is able to analyse each Confluence page and to insert concepts from a thesaurus and all of their names automatically. Users can curate all suggested tags or they can also index their spaces automically resulting in a semantic index which makes search more comfortable than ever before.
Usage: enhanced collaboration
with enterprise knowledge models
There are two main application scenarios which can be realised on top of Confluence and its PowerTagging extension:
Don´t re-invent the wheel again and again. Save time and money. PPP will help to fulfill these tasks when creating rich contents more efficiently than ever before. You can link similar contents within Confluence automatically and you can fetch further readings even from the WWW like from Wikipedia.
If you are interested in trying out PowerTagging, please drop us a note and we will be happy to support you!
Posted at 16:13
A couple of weeks ago, I wrote an Introduction To RDF. In that article, we discussed expressing a RDF single triple (which defined the location of my blog) as a blob and line diagram. But what if we want to describe more than one aspect of a resource?… or if we want to model the relationships between many resources? It probably wouldn’t come as a surprise that to do that, you’d need to create multiple triples. As I mentioned in that previous post, blob and line diagrams only work well for small numbers of triples, and they’re only really good for humans.
To communicate RDF between computer systems, we need something different. In this article, we’ll see how to serialise a graph (i.e. a collection, or set) of RDF triples in different ways, and explore the advantages and disadvantages of each.
In order to avoid me having to make up a contrived example, I’m going to use some data from the OpenDataCommunities site*, which contains a bunch of open data about English Local Authorities. The URI http://opendatacommunities.org/id/metropolitan-district-council/manchester identifies the Manchester Metropolitan District Council. If you click on that link, you’ll get an HTML representation of the data that the site holds about that authority.

But the site also offers the data in other formats via links at the bottom of the page. Let’s discuss each of those formats in turn.
RDF is most commonly expressed in an XML format: RDF/XML.
The RDF/XML for what we know about the Manchester authority looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns0="http://opendatacommunities.org/def/local-government/" xmlns:ns1="http://data.ordnancesurvey.co.uk/ontology/admingeo/" xmlns:ns2="http://statistics.data.gov.uk/def/administrative-geography/" xmlns:ns3="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns4="http://www.w3.org/2000/01/rdf-schema#" xmlns:ns5="http://www.w3.org/2002/07/owl#" xmlns:ns6="http://xmlns.com/foaf/0.1/">
<ns0:MetropolitanDistrictCouncil rdf:about="http://opendatacommunities.org/id/metropolitan-district-council/manchester">
<ns1:gssCode>E08000003</ns1:gssCode>
<ns1:hasCensusCode>00BN</ns1:hasCensusCode>
<ns0:billingAuthorityCode>E4203</ns0:billingAuthorityCode>
<ns0:governs rdf:resource="http://data.ordnancesurvey.co.uk/id/7000000000018821"/>
<ns0:openlyLocalUrl rdf:resource="http://openlylocal.com/councils/157-Manchester-City-Council"/>
<ns2:region rdf:resource="http://statistics.data.gov.uk/id/government-office-region/B"/>
<ns3:type rdf:resource="http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"/>
<ns3:type rdf:resource="http://opendatacommunities.org/def/local-government/LocalAuthority"/>
<ns4:label>Manchester</ns4:label>
<ns5:sameAs rdf:resource="http://statistics.data.gov.uk/id/local-authority/00BN"/>
<ns6:page rdf:resource="http://www.manchester.gov.uk/"/>
</ns0:MetropolitanDistrictCouncil>
</rdf:RDF>
A couple of local authorities serialised as RDF/XML look like this:
<?xml version="1.0"?>
<rdf:RDF
xmlns:j.0="http://data.ordnancesurvey.co.uk/ontology/admingeo/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.1="http://xmlns.com/foaf/0.1/"
xmlns:j.2="http://opendatacommunities.org/def/local-government/"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:j.3="http://statistics.data.gov.uk/def/administrative-geography/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" >
<rdf:Description rdf:about="http://opendatacommunities.org/id/district-council/babergh">
<j.2:governs rdf:resource="http://data.ordnancesurvey.co.uk/id/7000000000015692"/>
<j.0:hasCensusCode>42UB</j.0:hasCensusCode>
<j.2:billingAuthorityCode>E3531</j.2:billingAuthorityCode>
<rdfs:label>Babergh</rdfs:label>
<j.1:page rdf:resource="http://www.babergh.gov.uk"/>
<rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/LocalAuthority"/>
<rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/DistrictCouncil"/>
<j.0:gssCode>E07000200</j.0:gssCode>
<owl:sameAs rdf:resource="http://statistics.data.gov.uk/id/local-authority/42UB"/>
<j.3:region rdf:resource="http://statistics.data.gov.uk/id/government-office-region/G"/>
<rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"/>
</rdf:Description>
<rdf:Description rdf:about="http://opendatacommunities.org/id/london-borough-council/barking-and-dagenham">
<owl:sameAs rdf:resource="http://statistics.data.gov.uk/id/local-authority/00AB"/>
<j.0:gssCode>E09000002</j.0:gssCode>
<rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"/>
<j.2:openlyLocalUrl rdf:resource="http://openlylocal.com/councils/19-London-Borough-of-Barking-Dagenham"/>
<rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/LondonBoroughCouncil"/>
<j.2:governs rdf:resource="http://data.ordnancesurvey.co.uk/id/7000000000010949"/>
<rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/LocalAuthority"/>
<j.0:hasCensusCode>00AB</j.0:hasCensusCode>
<j.3:region rdf:resource="http://statistics.data.gov.uk/id/government-office-region/H"/>
<j.1:page rdf:resource="http://www.lbbd.gov.uk/"/>
<j.2:billingAuthorityCode>E5030</j.2:billingAuthorityCode>
<rdfs:label>Barking and Dagenham</rdfs:label>
</rdf:Description>
</rdf:RDF>
RDF/XML is
typically requested and sent over the Web using a MIME type of application/rdf+xml.
In RDF/XML,
the triples for each resource, are contained within
<rdf:Description> nodes, with a sub-node for
each property and its value (full spec here).
This format has the advantage that most programming languages have support for XML, and you can make use of XML namespaces to avoid having to use full URIs everywhere, which keeps the size down. On the other hand, I don’t find it that easy to read manually.
Turtle (Terse RDF Triple Language) is an RDF-specific subset of Tim Berners-Lee’s Notation3 language.
The RDF triples about that single Manchester authority could be represented in Turtle like this:
@prefix gov: <http://opendatacommunities.org/def/local-government/> .
@prefix admingeo: <http://data.ordnancesurvey.co.uk/ontology/admingeo/> .
@prefix statsgeo: <http://statistics.data.gov.uk/def/administrative-geography/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester>
a gov:MetropolitanDistrictCouncil, gov:LocalAuthority, gov:CivilAdministrativeAuthority;
rdfs:label "Manchester";
admingeo:gssCode "E08000003";
admingeo:hasCensusCode "00BN";
gov:billingAuthorityCode "E4203";
gov:governs <http://data.ordnancesurvey.co.uk/id/7000000000018821>;
gov:openlyLocalUrl <http://openlylocal.com/councils/157-Manchester-City-Council>;
statsgeo:region <http://statistics.data.gov.uk/id/government-office-region/B>;
owl:sameAs <http://statistics.data.gov.uk/id/local-authority/00BN>;
foaf:page <http://www.manchester.gov.uk/> .
Multiple local authorities could be serialised in Turtle like this:
@prefix gov: <http://opendatacommunities.org/def/local-government/> .
@prefix admingeo: <http://data.ordnancesurvey.co.uk/ontology/admingeo/> .
@prefix statsgeo: <http://statistics.data.gov.uk/def/administrative-geography/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<http://opendatacommunities.org/id/district-council/babergh>
a gov:LocalAuthority, gov:DistrictCouncil, gov:CivilAdministrativeAuthority ;
rdfs:label "Babergh" ;
admingeo:gssCode "E07000200" ;
admingeo:hasCensusCode "42UB" ;
admingeo:billingAuthorityCode "E3531" ;
gov:governs <http://data.ordnancesurvey.co.uk/id/7000000000015692> ;
statsgeo:region <http://statistics.data.gov.uk/id/government-office-region/G> ;
owl:sameAs <http://statistics.data.gov.uk/id/local-authority/42UB> ;
foaf:page <http://www.babergh.gov.uk> .
<http://opendatacommunities.org/id/london-borough-council/barking-and-dagenham>
a gov:LocalAuthority, gov:LondonBoroughCouncil, gov:CivilAdministrativeAuthority ;
rdfs:label "Barking and Dagenham" ;
admingeo:gssCode "E09000002" ;
admingeo:hasCensusCode "00AB" ;
gov:billingAuthorityCode "E5030" ;
gov:governs <http://data.ordnancesurvey.co.uk/id/7000000000010949> ;
gov:openlyLocalUrl <http://openlylocal.com/councils/19-London-Borough-of-Barking-Dagenham> ;
statsgeo:region <http://statistics.data.gov.uk/id/government-office-region/H> ;
owl:sameAs <http://statistics.data.gov.uk/id/local-authority/00AB> ;
foaf:page <http://www.lbbd.gov.uk/> .
Turtle is typically requested and sent over the Web using a
MIME type of
text/turtle.
The URI for the each resource is followed by the predicates and objects of the triples about it (essentially, as key-value pairs). Each pair is separated by a semi-colon, and the information about a resource is closed off by a dot. Note that the whitespace is not important here, but for readability, the predicates and objects are often indented and appear on separate lines.
Turtle is my favourite RDF
serialisation format: It’s fairly easy to read for humans due to
the lack of punctuation noise, it groups together the triples about
each resource, and it allows you to define common
@prefixes. Its terseness also helps keep the amount of
bandwidth required to communicate RDF
down to a minimum, and it’s well supported by RDF toolkits and libraries.
Note that the standard encoding of Turtle is UTF-8 (though escaped Unicode is allowed in Turtle as
well). Turtle literals can contain line breaks, using the
"""long literal""" approach. For more information, see
the Turtle W3C
submission.
N-Triples is a simplified version of Turtle.
Here’s the RDF for the Manchester authority again, this time as N-triples.
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/2000/01/rdf-schema#label> "Manchester" .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://opendatacommunities.org/def/local-government/openlyLocalUrl> <http://openlylocal.com/councils/157-Manchester-City-Council> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://xmlns.com/foaf/0.1/page> <http://www.manchester.gov.uk/> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/2002/07/owl#sameAs> <http://statistics.data.gov.uk/id/local-authority/00BN> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode> "00BN" .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://opendatacommunities.org/def/local-government/billingAuthorityCode> "E4203" .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/MetropolitanDistrictCouncil> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/LocalAuthority> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://opendatacommunities.org/def/local-government/governs> <http://data.ordnancesurvey.co.uk/id/7000000000018821> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://statistics.data.gov.uk/def/administrative-geography/region> <http://statistics.data.gov.uk/id/government-office-region/B> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> "E08000003" .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://opendatacommunities.org/def/local-government/billingAuthorityCode> "E0931" .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/2002/07/owl#sameAs> <http://statistics.data.gov.uk/id/local-authority/16UB> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://opendatacommunities.org/def/local-government/openlyLocalUrl> <http://openlylocal.com/councils/38-Allerdale-Borough-Council> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://opendatacommunities.org/def/local-government/governs> <http://data.ordnancesurvey.co.uk/id/7000000000013065> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/2000/01/rdf-schema#label> "Allerdale" .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/DistrictCouncil> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://xmlns.com/foaf/0.1/page> <http://www.allerdale.gov.uk> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://statistics.data.gov.uk/def/administrative-geography/region> <http://statistics.data.gov.uk/id/government-office-region/B> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> "E07000026" .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/LocalAuthority> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode> "16UB" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://opendatacommunities.org/def/local-government/governs> <http://data.ordnancesurvey.co.uk/id/7000000000012941> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/LocalAuthority> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://opendatacommunities.org/def/local-government/billingAuthorityCode> "E0932" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/2002/07/owl#sameAs> <http://statistics.data.gov.uk/id/local-authority/16UC> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/DistrictCouncil> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://xmlns.com/foaf/0.1/page> <http://www.barrowbc.gov.uk/> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> "E07000027" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode> "16UC" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/2000/01/rdf-schema#label> "Barrow-in-Furness" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://statistics.data.gov.uk/def/administrative-geography/region> <http://statistics.data.gov.uk/id/government-office-region/B> .
N-triples are typically requested and sent over the Web using a
MIME type of text/plain or
application/n-triples.
With N-triples, each triple appears on its own line, separated by a dot.
N-triples’ simplicity makes it easy for software to parse and generate, but it lacks some of the features of RDF/XML and Turtle (such as support for nested resources). Due to the repetition of the resource URIs, it’s not as compact as Turtle, and the triples for each resource aren’t necessarily grouped together which makes it harder to read by eye.
It’s also worth mentioning here that as N-triples is a
text/plain literals are only allowed to contain US
ASCII characters: non-ASCII Unicode characters need to be escaped (in
contrast to Turtle whose standard encoding is UTF-8). Also, in N-triples, line-breaks always need
to be escaped.
This section of the W3C Turtle submission gives a useful comparison between Turtle and N-Triples. You can also find more details about N-Triples here.
RDF can also be expressed in JavaScript Object Notation (JSON).
{"http://opendatacommunities.org/id/metropolitan-district-council/manchester":{"http://www.w3.org/2000/01/rdf-schema#label":[{"type":"literal","value":"Manchester"}],"http://opendatacommunities.org/def/local-government/openlyLocalUrl":[{"type":"uri","value":"http://openlylocal.com/councils/157-Manchester-City-Council"}],"http://xmlns.com/foaf/0.1/page":[{"type":"uri","value":"http://www.manchester.gov.uk/"}],"http://www.w3.org/2002/07/owl#sameAs":[{"type":"uri","value":"http://statistics.data.gov.uk/id/local-authority/00BN"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode":[{"type":"literal","value":"00BN"}],"http://opendatacommunities.org/def/local-government/billingAuthorityCode":[{"type":"literal","value":"E4203"}],"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"type":"uri","value":"http://opendatacommunities.org/def/local-government/MetropolitanDistrictCouncil"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/LocalAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"}],"http://opendatacommunities.org/def/local-government/governs":[{"type":"uri","value":"http://data.ordnancesurvey.co.uk/id/7000000000018821"}],"http://statistics.data.gov.uk/def/administrative-geography/region":[{"type":"uri","value":"http://statistics.data.gov.uk/id/government-office-region/B"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode":[{"type":"literal","value":"E08000003"}]}}
{"http://opendatacommunities.org/id/district-council/allerdale":{"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"type":"uri","value":"http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/DistrictCouncil"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/LocalAuthority"}],"http://opendatacommunities.org/def/local-government/billingAuthorityCode":[{"type":"literal","value":"E0931"}],"http://www.w3.org/2002/07/owl#sameAs":[{"type":"uri","value":"http://statistics.data.gov.uk/id/local-authority/16UB"}],"http://opendatacommunities.org/def/local-government/openlyLocalUrl":[{"type":"uri","value":"http://openlylocal.com/councils/38-Allerdale-Borough-Council"}],"http://opendatacommunities.org/def/local-government/governs":[{"type":"uri","value":"http://data.ordnancesurvey.co.uk/id/7000000000013065"}],"http://www.w3.org/2000/01/rdf-schema#label":[{"type":"literal","value":"Allerdale"}],"http://xmlns.com/foaf/0.1/page":[{"type":"uri","value":"http://www.allerdale.gov.uk"}],"http://statistics.data.gov.uk/def/administrative-geography/region":[{"type":"uri","value":"http://statistics.data.gov.uk/id/government-office-region/B"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode":[{"type":"literal","value":"E07000026"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode":[{"type":"literal","value":"16UB"}]},"http://opendatacommunities.org/id/district-council/barrow-in-furness":{"http://opendatacommunities.org/def/local-government/governs":[{"type":"uri","value":"http://data.ordnancesurvey.co.uk/id/7000000000012941"}],"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"type":"uri","value":"http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/LocalAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/DistrictCouncil"}],"http://opendatacommunities.org/def/local-government/billingAuthorityCode":[{"type":"literal","value":"E0932"}],"http://www.w3.org/2002/07/owl#sameAs":[{"type":"uri","value":"http://statistics.data.gov.uk/id/local-authority/16UC"}],"http://xmlns.com/foaf/0.1/page":[{"type":"uri","value":"http://www.barrowbc.gov.uk/"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode":[{"type":"literal","value":"E07000027"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode":[{"type":"literal","value":"16UC"}],"http://www.w3.org/2000/01/rdf-schema#label":[{"type":"literal","value":"Barrow-in-Furness"}],"http://statistics.data.gov.uk/def/administrative-geography/region":[{"type":"uri","value":"http://statistics.data.gov.uk/id/government-office-region/B"}]}}
JSON is typically requested and sent
over the Web using a MIME type of
application/json.
This format serialises the triples by defining JavaScript objects, the identifier for each being the URI for the resource. Sub-objects are created for each predicate, the values being an array of the triples for that predicate. For each object of the triple, there is a JavaScript object with properties for its type (i.e. URI or literal), and actual value.
It’s not an official standard (the example above is an output from the rdf.rb ruby library, which is based on Talis’s RDF JSON spec), but many Linked Data sites are now supporting JSON as a serialisation format. It’s becoming popular due to the ease with which this kind of data can be consumed in JavaScript web applications. It’s not as easy to read by eye as Turtle (to me, at least), but it’s still not too bad in that respect. The limited punctuation also helps to keep the size down.
In this article, we’ve covered the most common serialisation formats for RDF triples. Hopefully it has helped you to understand the benefits and pitfalls of each, so that you can make an informed decision about which to choose for publishing or consuming your Linked Data.
If you’d be interested in more articles like this, please subscribe to our RSS feed, Twitter updates, or email alerts. We’ll try to keep the articles coming fairly regularly.
*In the interest of full disclosure: I should probably mention that my company worked on the OpenDataCommunities site.
Posted at 13:12
A couple of weeks ago, I wrote an Introduction To RDF. In that article, we discussed expressing a RDF single triple (which defined the location of my blog) as a blob and line diagram. But what if we want to describe more than one aspect of a resource?… or if we want to model the relationships between many resources? It probably wouldn’t come as a surprise that to do that, you’d need to create multiple triples. As I mentioned in that previous post, blob and line diagrams only work well for small numbers of triples, and they’re only really good for humans.
To communicate RDF between computer systems, we need something different. In this article, we’ll see how to serialise a graph (i.e. a collection, or set) of RDF triples in different ways, and explore the advantages and disadvantages of each.
In order to avoid me having to make up a contrived example, I’m going to use some data from the OpenDataCommunities site*, which contains a bunch of open data about English Local Authorities. The URI http://opendatacommunities.org/id/metropolitan-district-council/manchester identifies the Manchester Metropolitan District Council. If you click on that link, you’ll get an HTML representation of the data that the site holds about that authority.

But the site also offers the data in other formats via links at the bottom of the page. Let’s discuss each of those formats in turn.
RDF is most commonly expressed in an XML format: RDF/XML.
The RDF/XML for what we know about the Manchester authority looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns0="http://opendatacommunities.org/def/local-government/" xmlns:ns1="http://data.ordnancesurvey.co.uk/ontology/admingeo/" xmlns:ns2="http://statistics.data.gov.uk/def/administrative-geography/" xmlns:ns3="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns4="http://www.w3.org/2000/01/rdf-schema#" xmlns:ns5="http://www.w3.org/2002/07/owl#" xmlns:ns6="http://xmlns.com/foaf/0.1/">
<ns0:MetropolitanDistrictCouncil rdf:about="http://opendatacommunities.org/id/metropolitan-district-council/manchester">
<ns1:gssCode>E08000003</ns1:gssCode>
<ns1:hasCensusCode>00BN</ns1:hasCensusCode>
<ns0:billingAuthorityCode>E4203</ns0:billingAuthorityCode>
<ns0:governs rdf:resource="http://data.ordnancesurvey.co.uk/id/7000000000018821"/>
<ns0:openlyLocalUrl rdf:resource="http://openlylocal.com/councils/157-Manchester-City-Council"/>
<ns2:region rdf:resource="http://statistics.data.gov.uk/id/government-office-region/B"/>
<ns3:type rdf:resource="http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"/>
<ns3:type rdf:resource="http://opendatacommunities.org/def/local-government/LocalAuthority"/>
<ns4:label>Manchester</ns4:label>
<ns5:sameAs rdf:resource="http://statistics.data.gov.uk/id/local-authority/00BN"/>
<ns6:page rdf:resource="http://www.manchester.gov.uk/"/>
</ns0:MetropolitanDistrictCouncil>
</rdf:RDF>
A couple of local authorities serialised as RDF/XML look like this:
<?xml version="1.0"?>
<rdf:RDF
xmlns:j.0="http://data.ordnancesurvey.co.uk/ontology/admingeo/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.1="http://xmlns.com/foaf/0.1/"
xmlns:j.2="http://opendatacommunities.org/def/local-government/"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:j.3="http://statistics.data.gov.uk/def/administrative-geography/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" >
<rdf:Description rdf:about="http://opendatacommunities.org/id/district-council/babergh">
<j.2:governs rdf:resource="http://data.ordnancesurvey.co.uk/id/7000000000015692"/>
<j.0:hasCensusCode>42UB</j.0:hasCensusCode>
<j.2:billingAuthorityCode>E3531</j.2:billingAuthorityCode>
<rdfs:label>Babergh</rdfs:label>
<j.1:page rdf:resource="http://www.babergh.gov.uk"/>
<rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/LocalAuthority"/>
<rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/DistrictCouncil"/>
<j.0:gssCode>E07000200</j.0:gssCode>
<owl:sameAs rdf:resource="http://statistics.data.gov.uk/id/local-authority/42UB"/>
<j.3:region rdf:resource="http://statistics.data.gov.uk/id/government-office-region/G"/>
<rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"/>
</rdf:Description>
<rdf:Description rdf:about="http://opendatacommunities.org/id/london-borough-council/barking-and-dagenham">
<owl:sameAs rdf:resource="http://statistics.data.gov.uk/id/local-authority/00AB"/>
<j.0:gssCode>E09000002</j.0:gssCode>
<rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"/>
<j.2:openlyLocalUrl rdf:resource="http://openlylocal.com/councils/19-London-Borough-of-Barking-Dagenham"/>
<rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/LondonBoroughCouncil"/>
<j.2:governs rdf:resource="http://data.ordnancesurvey.co.uk/id/7000000000010949"/>
<rdf:type rdf:resource="http://opendatacommunities.org/def/local-government/LocalAuthority"/>
<j.0:hasCensusCode>00AB</j.0:hasCensusCode>
<j.3:region rdf:resource="http://statistics.data.gov.uk/id/government-office-region/H"/>
<j.1:page rdf:resource="http://www.lbbd.gov.uk/"/>
<j.2:billingAuthorityCode>E5030</j.2:billingAuthorityCode>
<rdfs:label>Barking and Dagenham</rdfs:label>
</rdf:Description>
</rdf:RDF>
RDF/XML is
typically requested and sent over the Web using a MIME type of application/rdf+xml.
In RDF/XML,
the triples for each resource, are contained within
<rdf:Description> nodes, with a sub-node for
each property and its value (full spec here).
This format has the advantage that most programming languages have support for XML, and you can make use of XML namespaces to avoid having to use full URIs everywhere, which keeps the size down. On the other hand, I don’t find it that easy to read manually.
Turtle (Terse RDF Triple Language) is an RDF-specific subset of Tim Berners-Lee’s Notation3 language.
The RDF triples about that single Manchester authority could be represented in Turtle like this:
@prefix gov: <http://opendatacommunities.org/def/local-government/> .
@prefix admingeo: <http://data.ordnancesurvey.co.uk/ontology/admingeo/> .
@prefix statsgeo: <http://statistics.data.gov.uk/def/administrative-geography/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester>
a gov:MetropolitanDistrictCouncil, gov:LocalAuthority, gov:CivilAdministrativeAuthority;
rdfs:label "Manchester";
admingeo:gssCode "E08000003";
admingeo:hasCensusCode "00BN";
gov:billingAuthorityCode "E4203";
gov:governs <http://data.ordnancesurvey.co.uk/id/7000000000018821>;
gov:openlyLocalUrl <http://openlylocal.com/councils/157-Manchester-City-Council>;
statsgeo:region <http://statistics.data.gov.uk/id/government-office-region/B>;
owl:sameAs <http://statistics.data.gov.uk/id/local-authority/00BN>;
foaf:page <http://www.manchester.gov.uk/> .
Multiple local authorities could be serialised in Turtle like this:
@prefix gov: <http://opendatacommunities.org/def/local-government/> .
@prefix admingeo: <http://data.ordnancesurvey.co.uk/ontology/admingeo/> .
@prefix statsgeo: <http://statistics.data.gov.uk/def/administrative-geography/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<http://opendatacommunities.org/id/district-council/babergh>
a gov:LocalAuthority, gov:DistrictCouncil, gov:CivilAdministrativeAuthority ;
rdfs:label "Babergh" ;
admingeo:gssCode "E07000200" ;
admingeo:hasCensusCode "42UB" ;
admingeo:billingAuthorityCode "E3531" ;
gov:governs <http://data.ordnancesurvey.co.uk/id/7000000000015692> ;
statsgeo:region <http://statistics.data.gov.uk/id/government-office-region/G> ;
owl:sameAs <http://statistics.data.gov.uk/id/local-authority/42UB> ;
foaf:page <http://www.babergh.gov.uk> .
<http://opendatacommunities.org/id/london-borough-council/barking-and-dagenham>
a gov:LocalAuthority, gov:LondonBoroughCouncil, gov:CivilAdministrativeAuthority ;
rdfs:label "Barking and Dagenham" ;
admingeo:gssCode "E09000002" ;
admingeo:hasCensusCode "00AB" ;
gov:billingAuthorityCode "E5030" ;
gov:governs <http://data.ordnancesurvey.co.uk/id/7000000000010949> ;
gov:openlyLocalUrl <http://openlylocal.com/councils/19-London-Borough-of-Barking-Dagenham> ;
statsgeo:region <http://statistics.data.gov.uk/id/government-office-region/H> ;
owl:sameAs <http://statistics.data.gov.uk/id/local-authority/00AB> ;
foaf:page <http://www.lbbd.gov.uk/> .
Turtle is typically requested and sent over the Web using a
MIME type of
text/turtle.
The URI for the each resource is followed by the predicates and objects of the triples about it (essentially, as key-value pairs). Each pair is separated by a semi-colon, and the information about a resource is closed off by a dot. Note that the whitespace is not important here, but for readability, the predicates and objects are often indented and appear on separate lines.
Turtle is my favourite RDF
serialisation format: It’s fairly easy to read for humans due to
the lack of punctuation noise, it groups together the triples about
each resource, and it allows you to define common
@prefixes. Its terseness also helps keep the amount of
bandwidth required to communicate RDF
down to a minimum, and it’s well supported by RDF toolkits and libraries.
Note that the standard encoding of Turtle is UTF-8 (though escaped Unicode is allowed in Turtle as
well). Turtle literals can contain line breaks, using the
"""long literal""" approach. For more information, see
the Turtle W3C
submission.
N-Triples is a simplified version of Turtle.
Here’s the RDF for the Manchester authority again, this time as N-triples.
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/2000/01/rdf-schema#label> "Manchester" .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://opendatacommunities.org/def/local-government/openlyLocalUrl> <http://openlylocal.com/councils/157-Manchester-City-Council> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://xmlns.com/foaf/0.1/page> <http://www.manchester.gov.uk/> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/2002/07/owl#sameAs> <http://statistics.data.gov.uk/id/local-authority/00BN> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode> "00BN" .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://opendatacommunities.org/def/local-government/billingAuthorityCode> "E4203" .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/MetropolitanDistrictCouncil> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/LocalAuthority> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://opendatacommunities.org/def/local-government/governs> <http://data.ordnancesurvey.co.uk/id/7000000000018821> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://statistics.data.gov.uk/def/administrative-geography/region> <http://statistics.data.gov.uk/id/government-office-region/B> .
<http://opendatacommunities.org/id/metropolitan-district-council/manchester> <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> "E08000003" .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://opendatacommunities.org/def/local-government/billingAuthorityCode> "E0931" .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/2002/07/owl#sameAs> <http://statistics.data.gov.uk/id/local-authority/16UB> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://opendatacommunities.org/def/local-government/openlyLocalUrl> <http://openlylocal.com/councils/38-Allerdale-Borough-Council> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://opendatacommunities.org/def/local-government/governs> <http://data.ordnancesurvey.co.uk/id/7000000000013065> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/2000/01/rdf-schema#label> "Allerdale" .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/DistrictCouncil> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://xmlns.com/foaf/0.1/page> <http://www.allerdale.gov.uk> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://statistics.data.gov.uk/def/administrative-geography/region> <http://statistics.data.gov.uk/id/government-office-region/B> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> "E07000026" .
<http://opendatacommunities.org/id/district-council/allerdale> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/LocalAuthority> .
<http://opendatacommunities.org/id/district-council/allerdale> <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode> "16UB" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://opendatacommunities.org/def/local-government/governs> <http://data.ordnancesurvey.co.uk/id/7000000000012941> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/LocalAuthority> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://opendatacommunities.org/def/local-government/billingAuthorityCode> "E0932" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/2002/07/owl#sameAs> <http://statistics.data.gov.uk/id/local-authority/16UC> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://opendatacommunities.org/def/local-government/DistrictCouncil> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://xmlns.com/foaf/0.1/page> <http://www.barrowbc.gov.uk/> .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> "E07000027" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode> "16UC" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://www.w3.org/2000/01/rdf-schema#label> "Barrow-in-Furness" .
<http://opendatacommunities.org/id/district-council/barrow-in-furness> <http://statistics.data.gov.uk/def/administrative-geography/region> <http://statistics.data.gov.uk/id/government-office-region/B> .
N-triples are typically requested and sent over the Web using a
MIME type of text/plain or
application/n-triples.
With N-triples, each triple appears on its own line, separated by a dot.
N-triples’ simplicity makes it easy for software to parse and generate, but it lacks some of the features of RDF/XML and Turtle (such as support for nested resources). Due to the repetition of the resource URIs, it’s not as compact as Turtle, and the triples for each resource aren’t necessarily grouped together which makes it harder to read by eye.
It’s also worth mentioning here that as N-triples is a
text/plain literals are only allowed to contain US
ASCII characters: non-ASCII Unicode characters need to be escaped (in
contrast to Turtle whose standard encoding is UTF-8). Also, in N-triples, line-breaks always need
to be escaped.
This section of the W3C Turtle submission gives a useful comparison between Turtle and N-Triples. You can also find more details about N-Triples here.
RDF can also be expressed in JavaScript Object Notation (JSON).
{"http://opendatacommunities.org/id/metropolitan-district-council/manchester":{"http://www.w3.org/2000/01/rdf-schema#label":[{"type":"literal","value":"Manchester"}],"http://opendatacommunities.org/def/local-government/openlyLocalUrl":[{"type":"uri","value":"http://openlylocal.com/councils/157-Manchester-City-Council"}],"http://xmlns.com/foaf/0.1/page":[{"type":"uri","value":"http://www.manchester.gov.uk/"}],"http://www.w3.org/2002/07/owl#sameAs":[{"type":"uri","value":"http://statistics.data.gov.uk/id/local-authority/00BN"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode":[{"type":"literal","value":"00BN"}],"http://opendatacommunities.org/def/local-government/billingAuthorityCode":[{"type":"literal","value":"E4203"}],"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"type":"uri","value":"http://opendatacommunities.org/def/local-government/MetropolitanDistrictCouncil"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/LocalAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"}],"http://opendatacommunities.org/def/local-government/governs":[{"type":"uri","value":"http://data.ordnancesurvey.co.uk/id/7000000000018821"}],"http://statistics.data.gov.uk/def/administrative-geography/region":[{"type":"uri","value":"http://statistics.data.gov.uk/id/government-office-region/B"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode":[{"type":"literal","value":"E08000003"}]}}
{"http://opendatacommunities.org/id/district-council/allerdale":{"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"type":"uri","value":"http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/DistrictCouncil"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/LocalAuthority"}],"http://opendatacommunities.org/def/local-government/billingAuthorityCode":[{"type":"literal","value":"E0931"}],"http://www.w3.org/2002/07/owl#sameAs":[{"type":"uri","value":"http://statistics.data.gov.uk/id/local-authority/16UB"}],"http://opendatacommunities.org/def/local-government/openlyLocalUrl":[{"type":"uri","value":"http://openlylocal.com/councils/38-Allerdale-Borough-Council"}],"http://opendatacommunities.org/def/local-government/governs":[{"type":"uri","value":"http://data.ordnancesurvey.co.uk/id/7000000000013065"}],"http://www.w3.org/2000/01/rdf-schema#label":[{"type":"literal","value":"Allerdale"}],"http://xmlns.com/foaf/0.1/page":[{"type":"uri","value":"http://www.allerdale.gov.uk"}],"http://statistics.data.gov.uk/def/administrative-geography/region":[{"type":"uri","value":"http://statistics.data.gov.uk/id/government-office-region/B"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode":[{"type":"literal","value":"E07000026"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode":[{"type":"literal","value":"16UB"}]},"http://opendatacommunities.org/id/district-council/barrow-in-furness":{"http://opendatacommunities.org/def/local-government/governs":[{"type":"uri","value":"http://data.ordnancesurvey.co.uk/id/7000000000012941"}],"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"type":"uri","value":"http://opendatacommunities.org/def/local-government/CivilAdministrativeAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/LocalAuthority"},{"type":"uri","value":"http://opendatacommunities.org/def/local-government/DistrictCouncil"}],"http://opendatacommunities.org/def/local-government/billingAuthorityCode":[{"type":"literal","value":"E0932"}],"http://www.w3.org/2002/07/owl#sameAs":[{"type":"uri","value":"http://statistics.data.gov.uk/id/local-authority/16UC"}],"http://xmlns.com/foaf/0.1/page":[{"type":"uri","value":"http://www.barrowbc.gov.uk/"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode":[{"type":"literal","value":"E07000027"}],"http://data.ordnancesurvey.co.uk/ontology/admingeo/hasCensusCode":[{"type":"literal","value":"16UC"}],"http://www.w3.org/2000/01/rdf-schema#label":[{"type":"literal","value":"Barrow-in-Furness"}],"http://statistics.data.gov.uk/def/administrative-geography/region":[{"type":"uri","value":"http://statistics.data.gov.uk/id/government-office-region/B"}]}}
JSON is typically requested and sent
over the Web using a MIME type of
application/json.
This format serialises the triples by defining JavaScript objects, the identifier for each being the URI for the resource. Sub-objects are created for each predicate, the values being an array of the triples for that predicate. For each object of the triple, there is a JavaScript object with properties for its type (i.e. URI or literal), and actual value.
It’s not an official standard (the example above is an output from the rdf.rb ruby library, which is based on Talis’s RDF JSON spec), but many Linked Data sites are now supporting JSON as a serialisation format. It’s becoming popular due to the ease with which this kind of data can be consumed in JavaScript web applications. It’s not as easy to read by eye as Turtle (to me, at least), but it’s still not too bad in that respect. The limited punctuation also helps to keep the size down.
In this article, we’ve covered the most common serialisation formats for RDF triples. Hopefully it has helped you to understand the benefits and pitfalls of each, so that you can make an informed decision about which to choose for publishing or consuming your Linked Data.
If you’d be interested in more articles like this, please subscribe to our RSS feed, Twitter updates, or email alerts. We’ll try to keep the articles coming fairly regularly.
*In the interest of full disclosure: I should probably mention that my company worked on the OpenDataCommunities site.
Posted at 13:12
The W3C Provenance
working group has released a
new set of working drafts for the PROV standard. In this post
we present a brief overview of the PROV ontology
(PROV-O) using an example from the PROV
primer.
The core classes of PROV include the prov:Entity,
prov:Activity
and prov:Agent.
Using these, one can describe the provenance of a resource in a
brief, step-by-step manner. The example below, as
explained in the primer, shows how the chart ex:chart1 in a
fictional news article about crime figures has been made by
ex:Derek, who has also composed data items ex:dataSet1 and
ex:regionList. Click the image to see it full-size.
An entity
in PROV is a physical, digital, conceptual, or other kind of
thing; real or imaginary. An entity should have some fixed aspects
in order to state some provenance information about it. An activity
is something that actually occurred over a period of time, and an
agent
is something or someone which was responsible for or otherwise
associated with what happened in an activity. Entities can be
attributed to agents who were responsible for their
generation.
For the chart ex:chart1 we can start by stating who created the
chart (prov:wasAttributedTo) and how it was created
(prov:wasGeneratedBy). We can then say more by pointing out which
data sources were used to create the chart, and then provide
details about the provenance of these data sources, just like what
we did for the chart.
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex: <http://example.com/article1?prov#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
ex:chart1 a prov:Entity ;
prov:wasGeneratedBy ex:illustrate ;
prov:wasAttributedTo ex:derek .
ex:derek a prov:Person, prov:Agent ;
foaf:givenName "Derek" ;
foaf:mbox <mailto:derek@example.org> ;
prov:actedOnBehalfOf ex:chartgen .
ex:chartgen a prov:Organization, prov:Agent ;
foaf:name "Chart Generators Inc" .
ex:illustrate a prov:Activity ;
prov:used ex:composition ;
prov:wasAssociatedWith ex:derek .
ex:composition a prov:Entity ;
prov:wasGeneratedBy ex:compose .
ex:compose a prov:Activity ;
prov:used ex:dataSet1, ex:regionList ;
prov:wasAssociatedWith ex:derek .
ex:dataSet1 a prov:Entity .
ex:regionList a prov:Entity .
PROV-O is based on an activity-driven model. It is generic and can describe any provenance where the individual steps and agents are known. If more specifics are needed, PROV-O can be used as an extension or bridging point for defining or aligning domain specific subclasses:
@prefix ext: <http://schema.example.com/prov-extension#>. ext:DataSet rdfs:subClassOf prov:Entity . ext:Illustrate rdfs:subClassOf prov:Activity .
PROV-O allows binary relations like prov:used and prov:wasAssociatedWith to be qualified using their corresponding involvement classes, such as prov:Use and prov:Association, in order to specify additional attributes about these relations, like role, time, location or other domain-specific attributes:
ex:chart1 prov:wasGeneratedBy ex:illustrate ;
prov:qualifiedGeneration [
a prov:Generation ;
prov:activity ex:illustrate ; # object of qualified wasGeneratedBy
prov:atTime "2011-07-16T01:52:02Z"^^xsd:dateTime ;
prov:atLocation <http://dbpedia.org/resource/Madrid> ;
ext:colours ext:red, ext:blue ;
ext:tool <http://dbpedia.org/resource/gephi>
] .
Sometimes not enough details are known to describe complete
activity-agent-entity interactions or doing so becomes too verbose.
PROV-O provides options to describe some indirect entity-entity and
agent-agent relations, which are also important for understanding
the history of the resources or regarded as shortcuts to the above
activity-driven statements. PROV includes a predefined set of such
relations for common use cases, such as derivation,
attribution,
quotation,
responsibility,
specialization
and
dictionaries.
The example below captures the core information from our earlier
example, but does not show details such as how the region list was
combined with the dataset.
ex:chart1 prov:wasDerivedFrom ex:dataSet1 ;
prov:tracedTo ex:regionList ;
prov:wasAttributedTo ex:derek .
For the purpose of tracking provenance, an entity in PROV has some fixed aspects as well as some changeable aspects. For instance a new crime chart created by using an updated data set could be regarded as a new entity, ex:chart2, and have the following provenance information:
ex:chart2 prov:wasDerivedFrom ex:dataSet2 ;
prov:wasRevisionOf ex:chart1 .
ex:dataSet2 prov:wasRevisionOf ex:dataSet1 .
These kind of structures in PROV allow asserters to transition
from a high-level overview of an entity’s history to a granular
provenance trace.
We hope that you will explore the PROV models and consider adapting
the future standards in your products. For an in-depth introduction
to PROV, see the PROV
primer, for ontology details and the OWL file, see PROV-O, and for
the underlying data model, see the PROV Data
Model.
The W3C Provenance working group is seeking feedback from the wider
community on the PROV working drafts. Please send any comments to
public-prov-wg@w3.org
(subscribe,
archives)
or use the Twitter hashtag #provwg.
Posted at 09:26
Two years ago I wrote a short paper about “layering” data but for various reasons never got round to putting it online. The paper tried to capture some of my thinking at the time about the opportunities and approaches for publishing and aggregating data on the web. I’ve finally got around to uploading it and you can read it here.
I’ve made a couple of minor tweaks in a few places but I think it stands up well, even given the recent pace of change around data publishing and re-use. I still think the abstraction that it describes is not only useful but necessary to take us forward on the next wave of data publishing.
Rather than edit the paper to bring it completely up to date with recent changes, I thought I’d publish it as is and then write some additional notes and commentary in this blog post.
You’re probably best off reading the paper, then coming back to the notes here. The illustration referenced in the paper is also now up on slideshare.
I see that the RDF Working Group, prompted by Dan Brickley, is now exploring the term. I should acknowledge that I also heard the term “layer” in conjunction with RDF from Dan, but I’ve tried to explore the concept from a number of perspectives.
The RDF Working Group may well end up using the term “layer” to mean a “named graph”. I’m using the term much more loosely in my paper. In my view an entire dataset could be a layer, as well as some easily identifiable sub-set of it. My usage might therefore be closer to Pat Hayes’s concept of a “Surface”, but I’m not sure.
I think that RDF is still an important factor in achieving the goal I outlined of allowing domain experts to quickly assemble aggregates through a layering metaphor. Or, if not RDF, then I think it would need to be based around a graph model, ideally one with a strong notion of identity. I also think that mechanisms to encourage sharing of both schemas and annotations are also useful. It’d be possible to build such a system without RDF, but I’m not sure why you’d go to the effort.
One of the things that appeals to me about the concept of layering is that there are some nice ways to create visualisation and interfaces to support the creation, management and exploration of layers. It’s not hard to see how, given some descriptive metadata for a collection of layers, you could create:
There’s been some useful work done on describing datasets within the Linked Data community: VoiD and DCat for example. However there’s not yet enough data routinely available about the structure and relationships of individual datasets, nor enough research into how to provide useful summaries.
This is what prompted my work on an RDF Report Card to try and move the conversation forward beyond simply counting triples.
To start working with layers, we need to understand what each layer contains and how they relate to and complement one another.
In the paper I suggest that RDF & Linked Data alone aren’t enough and that we need systems, tools and vocabularies for capturing the required descriptive data and enabling the kinds of aggregation I envisage.
I also think that the Linked Data community is spending far too much effort on creating new identifiers for the same things and worrying how best to define equivalences.
I think the leap of faith that’s required, and that people like the BBC have already taken, is that we just need to get much more comfortable re-using other people’s identifiers and publishing annotations. Yes, there will be times when identifiers diverge, but there’s a lot to be gained, especially in terms of efficiency around data curation from just focusing on the value-added data, not re-publishing any copy of a core set of facts.
There are efficiency gains to be had from existing businesses, as well as faster routes to market for startups, if they can reliably build on some existing data. I suspect that there are also businesses that currently compete with one another — because they’re having to compile or re-compile the same core data assets — that could actually complement one another if they could instead focus on the data curation or collection tasks at which they excel.
In the paper I set out seven different facets which I think cover the majority of types of data that we routinely capture and publish. I think the classification could be debated, but I think its a reasonable first attempt.
The intention is to try and illustrate that we can usefully group together different types of data. And organisations may be particularly good at creating or collecting particular types of data. There’s scope for organisations to focus on being really good in a particular area and by avoiding needless competition around collecting and re-collecting the same core facts, there are almost certainly efficiency gains and cost savings to be had.
I’m sure there must be some prior work in this space, particularly around the core categories, so if anyone has pointers please share them.
There are also other ways to usefully categorise data. One area that springs to mind is how the data itself is collected, i.e. its provenance. E.g. is it collected automatically by sensors, or as a side-effect of user activity, or entered by hand by a human curator? Are those curators trained or are they self-selected contributors? Is the data derived from some form of statistical analysis?
I had toyed with provenance as a distinct facet, but I think its an orthogonal concern.
A lot has happened in the last two years and I winced a bit at all of the Web 2.0 references in the paper. Remember that? If I were writing this now then the obvious trend to discuss as context to this approach is Big Data.
Chatting with Matt Biddulph recently he characterised a typical Big Data analysis as being based on “Activity Data” and “Reference Data”. Matt described reference data as being the core facts and information on top of which the activity data — e.g. from users of an application — is added. The analysis then draws on the combination to create some new insight, i.e. more data.
I referenced Matt’s characterisation in my Strata talk (with acknowledgement!). Currently Linked Data does really well in the Reference category but there’s not a great deal of Activity data. So while its potentially useful in a Big Data world, there’s a lot of value still not being captured.
I think Matt’s view of the world chimes well with both the layered data concept and the data classifications that I’ve proposed. Most of the facets in the paper really define different types of Reference data. The outcome of a typical Big Data analysis is usually a new facet, an obvious one being “Comparative” data, e.g. identifying the most popular, most connected, most referenced resources in a network.
However there’s clearly a different in approach between typical Big Data processing and the graph models that I think underpin a layered view of the world.
MapReduce workflows seem to work best with more regular data, however newer approaches like Pregel illustrate the potential for “graph-native” Big Data analysis. But setting that aside, there’s no real contention as a layering approach to combining data doesn’t say anything about how the data must actually be used: it can be easily projected out into structures that are amenable for indexing and processing in different ways.
Looking at the last section of the paper it should be obvious that much of the origin of this analysis was early preparation for Kasabi.
I still think that there’s a great deal of potential to create a marketplace around data layers and tools for interacting with them. But we’re not there yet though for several reasons. Firstly its taken time to get the underlying platform in place to support that. We’ve done that now and you can expect more information on that from more official sources shortly. Secondly I under estimated how much effort is still required to move the market forward: there’s still lots to be done to support organisations in opening up data before we can really explore more horizontal marketplaces. But that is a topic for another post.
This has been quite a ramble of a blog post but hopefully there are some useful thoughts here that chime with your own experience. Let me know what you think.
Posted at 18:08
The Provenance Working Group has released the fourth public working draft of its data model. The purpose of this blog is to summarize the changes that occurred since the third working draft.
From an editorial perspective, three significant changes took place since the last release.
As far as the data model is concerned, our aim to simplify its various concepts has paid, result in a data model that is more mature and stable. Key highlights include:
The fourth working draft includes these changes, and we feel that the data model, expressed according to various technologies, e.g. rdf, xml, json, is now usable. Examples of provenance can be expressed concisely for simple use cases, but the model is also expressive enough to tackle sophisticated ones. Tools are now being developed to manipulate PROV representations. Ultimately, the data model offers a vocabulary, consisting of 22 or so terms. The use of this vocabulary is essentially unconstrained. To help developers, a notion of valid provenance has been defined; a set of of constraints have to be satisfied for provenance assertions to be valid.
The PROV Working Group has decided to produce a synchronized release of most of its documents, including a PROV primer, and a PROV ontology. See Paul’s blog for an overview of these documents.
Work on the next working draft has already begun. Our aim for the data model is to address the remaining technical issues related to provenance of provenance, further simplification of the data model. When complete, I will blog again about it.
Posted at 17:14
Back in January, the Provenance Working Group released a series of draft specifications: the PROV family of specs. Since then, we’ve been working hard to simplify, organize and improve those specifications to enable the interchange of provenance information on the Web. The group is happy to release a complete set of specifications for modeling provenance information designed for the Web. These specifications are synchronized and can be read and used as a whole.
We’ve included starting points, more examples, clearer guidance and a modular structure. In the rest of this post we’ll walk you through the 3 specifications that make up this family.
The best place to start is the PROV-PRIMER, this provides an intuitive overview of how to use PROV to describe provenance. Key concepts are illustrated using a newspaper scenario and examples are given in turtle.
If your aiming to use PROV in a Semantic Web application your next stop will be PROV-O. This is the OWL version of the PROV Data Model and defines all the classes and properties to interchange provenance. The document provides not only a reference to the ontology itself but also a guide to its components including examples in turtle. You can download the ontology itself, which has extensive documentation.
To help ensure that PROV can be used in a variety of technology settings, the model itself is defined in a serialization independent way in the PROV-DM. This document is the core reference for PROV and provides natural language definitions and examples of all concepts in PROV. It is complemented by PROV-N and PROV-CONSTRAINTS. PROV-N is a helpful notation particularly designed for people to write examples of PROV. Finally, PROV-CONSTRAINTS provides a definitions of constraints and inferences that can be applied to provenance information represented in PROV.
The working group is now looking for your feedback. We believe that we are approaching a spec that’s ready to be used and spread around the Web. This is your chance to give us input. Will it work for your application? Is anything missing? Does something not fit? Send us your input at public-prov-comments@w3.org. We look forward to seeing many prov: in your applications and pages.
Posted at 16:54
The Provenance Working Group published 5 Working Drafts today related to the PROV data model. Provenance information can be used for many purposes, such as understanding how data was collected so it can be meaningfully used, determining ownership and rights over an object, making judgments about information to determine whether to trust it, verifying that the process and steps used to obtain a result complies with given requirements, and reproducing how something was generated. The PROV model is used to represent provenance records, which contain descriptions of the entities and activities involved in producing and delivering or otherwise influencing a given object.
Posted at 14:23
The SPARQL Working Group published three Last Call Working Drafts:
Comments are welcome through 01 June.
The group is further planning to shortly release a 2nd Last Call working draft of the SPARQL 1.1 Query Language, after which we plan to advance all Recommendation track drafts in the next iteration to Proposed Recommendation directly. To this end, the group is currently gathering implementation reports and would appreciate reports from the community of implementations of any of the SPARQL1.1 specifications.

Posted at 07:32
After a little silence during which I was occupied with Eastern and OpenLink related work I bring you news about the second
Posted at 16:31
A big month all round. April saw over 2500 people attend the WWW conference, in Lyon. An excellent wrap-up from Yves Raimond, is available here. If you haven’t had a chance to see it yet, the keynote is still available online.
“One of the main messages from the panel is that structured web data is already mainstream – Yahoo! reports that 25% of all web pages contain RDFa data and 7% contain Microdata. ”

Our google page launch, was well received with 59 likes, so far. Thanks to everyone that has helped or contributed. Jürgen Jakobitsch has challenged us to reach 120 circles by the end of the year, so, keep spreading the word!
The Read Write Web CG blog is now syndicated out planet rdf, which is a chance for a wider linked data audience, to see what’s going on in the RWW.

The CG welcomes new participants from MetaSolutions, Institut Telecom, University of Leipzig, Seoul National University and the University of Florida. I know that some of the new members are top experts, in the payments and online currencies field, so it’s great to have that expertise on board!
The wiki has been updated in some areas. As with most wikis, small incremental changes seem to work best. We have a new page covering several Social Systems and a stub page used to collect Screencasts. Additionally, the Global Square (occupy movement) have told us are keen to use RWW standards in their upcoming drupal based social project.
The big news this month is that the first RWW interoperation, between hetrogeneous social networks was achieved, using the semantic pingback protcol. Congratulations to Andrei Sambra (My Profile) and Kingsley (Openlink Data Spaces) for reaching this big milestone. In the coming weeks, we hope to see more social networks join the system via pingback, including work being done on bergnet and tabulator/data.fm, and hopefully many more!
Great work from the team behind My Profile for putting this fantastic new RWW social system live. The source code is also available on github. For those that have not yet seen it, you can sign up here:
https://my-profile.eu/profile.php
Although My Profile is WebID based, it should be pointed out that the universal nature allows it to be extended to almost any login method. The philosophy of using HTTP URIs to describe things, that has served Facebook and others so well, is just a staring point, rather than, a closed loop.
The system includes, (FOAF) profile creation and edit, personal wall, subscriptions, private messaging, public wall, certificate generation, friends list and lookup, federated login using your own FOAF, an application platform, cross platform messaging and much more. Do check it out!

A picture speaks a thousand words. Kudos to Sarven Capadisli for this innovative use of his WebID in the DERI cafe!
Posted at 12:55
Posted at 11:10