Planet RDF

It's triples all the way down

November 24

Leigh Dodds: Who is the intended audience for open data?

This post is part of my ongoing series:

Posted at 17:29

November 23

AKSW Group - University of Leipzig: AKSW Colloquium, 23-11-2015, CVtec and Patty

Andreas NareikeCVtec and model-driven semantification by Andreas Nareike

In this presentation, I will give a short introduction to our project CVtec ( CVtec is concerned with knowledge management for technical facilities and uses methods of model-driven software development. Although CVtec uses relational databases to persist data right now, we are researching possibilities to move towards a Semantic data model. I will give an overview of our different approaches and hope to get some valuable feedback.

René SpeckRené Speck will present Patty: A Taxonomy of Relational Patterns with Semantic Types by Ndapandula Nakashole, Gerhard Weikum, Fabian SuchanekMax Planck Institute for Informatics,  Saarbücken, Germany

This paper presents PATTY: a large resource for textual patterns that denote binary relations between entities. The patterns are semantically typed and organized into a subsumption taxonomy. The PATTY system is based on efficient algorithms for frequent itemset mining and can process Web-scale corpora. It harnesses the rich type system and entity population of large knowledge bases. The PATTY taxonomy comprises 350,569 pattern synsets. Random-sampling-based evaluation shows a pattern accuracy of 84.7%. PATTY has 8,162 subsumptions, with a random-sampling-based precision of 75%. The PATTY resource is freely available for interactive access and download.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 12:28

November 21

Ebiquity research group UMBC: Semantic Interpretation of Structured Log Files


Piyush Nimbalkar, Semantic Interpretation of Structured Log Files, M.S. thesis, University of Maryland, Baltimore County, August, 2015.

Log files comprise a record of different events happening in various applications, operating systems and even in network devices. Originally they were used to record information for diagnostic and debugging purposes. Nowadays, logs are also used to track events which can be used in auditing and forensics in case of malicious activities or systems attacks. Various softwares like intrusion detection systems, web servers, anti-virus and anti-malware systems, firewalls and network devices generate logs with useful information, that can be used to protect against such system attacks. Analyzing log files can help in pro- actively avoiding attacks against the systems. While there are existing tools that do a good job when the format of log files is known, the challenge lies in cases where log files are from unknown devices and of unknown formats. We propose a framework that takes any log file and automatically gives out a semantic interpretation as a set of RDF Linked Data triples. The framework splits a log file into columns using regular expression-based or dictionary-based classifiers. Leveraging and modifying our existing work on inferring the semantics of tables, we identify every column from a log file and map it to concepts either from a general purpose KB like DBpedia or domain specific ontologies such as IDS. We also identify relationships between various columns in such log files. Converting large and verbose log files into such semantic representations will help in better search, integration and rich reasoning over the data.

Posted at 15:44

November 17

Bob DuCharme: 13 ways to make your writing look more professional

Simple copyediting things.

Posted at 19:35

November 15

Leigh Dodds: Managing risks when publishing open data

A question that I frequently encounter when talking to organisations about publishing open data is: “what if someone misuses or misunderstands our data?“.

These concerns stem from several different sources:

  • that the data might be analysed incorrectly, drawing incorrect conclusions that might be attributed to the publisher
  • that the data has known limitations and this might reflect on the publisher’s abilities, e.g. exposing issues with their operations
  • that the data might be used against the publisher in some way, e.g. to paint them in a bad light
  • that the data might be used for causes with which the publisher does not want to be aligned
  • that the data might harm the business activities of the publisher, e.g. by allowing someone to replicate a service or product

All of these are understandable and reasonable concerns. And the truth is that when publishing open data you are giving up a great deal of control over your data.

But the same is true of publishing any information: there will always cases of accidental and wilful misuse of information. Short of not sharing information at all, all organisations already face this risk. It’s just that open data, which anyone can access, use and share for any purpose, really draws this issue into the spotlight.

In this post I wanted to share some thoughts about how organisations can manage the risks associated with publishing open data.

Risks of not sharing

Firstly its worth noting that the risks of not sharing data are often unconsciously discounted.

There’s increasing evidence that holding on to data can hamper innovation whereas

Posted at 11:24

November 13

Leigh Dodds: Fictional data

The phrase “fictional data” popped into my head recently, largely because of odd connections between a couple of projects I’ve been working on.

It’s stuck with me because, if you set aside the literal meaning of “data that doesn’t actually exist“, there are some interesting aspects to it. For example the phrase could apply to:

  1. data that is deliberately wrong or inaccurate in order to mislead – lies or spam
  2. data that is deliberately wrong as a proof of origin or claim of ownership – e.g. inaccuracies introduced into maps to identify their sources, or

Posted at 18:19

November 10

Libby Miller: Radiodan Part 2: Unexpectedly discovering latent user needs

As I explained in the

Posted at 12:00

Libby Miller: Radiodan Part 2: Drawing customisations to discover latent user needs

As I explained in the

Posted at 12:00

November 08

Ebiquity research group UMBC: Supporting Situationally Aware Cybersecurity Systems

Zareen Syed, Tim Finin, Ankur Padia and M. Lisa Mathews, Supporting Situationally Aware Cybersecurity Systems, Technical Report, Computer Science and Electrical Engineering, UMBC, 30 September 2015.

In this report, we describe the Unified Cyber Security ontology (UCO) to support situational awareness in cyber security systems. The ontology is an effort to incorporate and integrate heterogeneous information available from different cyber security systems and most commonly used cyber security standards for information sharing and exchange. The ontology has also been mapped to a number of existing cyber security ontologies as well as concepts in the Linked Open Data cloud. Similar to DBpedia which serves as the core for Linked Open Data cloud, we envision UCO to serve as the core for the specialized cyber security Linked Open Data cloud which would evolve and grow with the passage of time with additional cybersecurity data sets as they become available. We also present a prototype system and concrete use-cases supported by the UCO ontology. To the best of our knowledge, this is the first cyber security ontology that has been mapped to general world ontologies to support broader and diverse security use-cases. We compare the resulting ontology with previous efforts, discuss its strengths and limitations, and describe potential future work directions.

Posted at 22:59

November 07

AKSW Group - University of Leipzig: AKSW Colloquium, 09-11-2015, Versioning of Arbitrary RDF Data (PhD progress report) and GraphLab Platform

GraphLab Platform – Overview and History  by Simon Bin

SimonGraphLab is a graph-based distributed computation framework. It was developed from 2009 at Carnegie Mellon University. At that time it was competing with Hadoop on Graph processing. The typical example algorithm demonstrated with it is the PageRank calculation. It still appears today in the Spark GraphX documentation as a filler for the computation step. We will look at the architecture, sample code and what happened to GraphLab today.

Versioning of Arbitrary RDF Data (PhD progress report) by Marvin Frommhold


A major challenge of B2B Data Networks is efficient synchronization of data between the participants, this is especially true for Linked Data based networks. The exchange of the differences only has thereby proved to be very bandwidth and memory-friendly. Unfortunately, there is a  lack of robust and highly efficient versioning and synchronization protocols for Linked Data which hinders a wide adoption of Linked Data in B2B communication. For this reason we develop a versioning system for arbitrary RDF data as part of the LUCID and LEDS research projects. The system will be a feature of the eccenca Linked Data Suite. A big challenge in versioning of RDF data is blank node support. Our approach creates patches which allow to address blank nodes without the need to make changes to the original dataset. This forms the foundation for a comprehensive versioning of any RDF data which enables efficient data exchange in a distributed network.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 10:43

November 06 what's new?

[starburst visualization of's hierarchy]
It's time for a round-up of recent developments at

We have just published version 2.2. As usual this combines many small fixes with a mix of new vocabulary, as well as efforts to improve the integration and documentation of our existing vocabulary. And as always you can read the full details in our releases page, which in turn links to our issue tracker for even more details. Here are some highlights:

  • We made a number of improvements relating to the description of services, including the addition of providerMobility to indicate dynamic locations, OfferCatalog for hierarchical collections of offers, as well as introduced the notion of a GeoCircle to make it possible to describe service availability in terms of distance from a point or postcode.
  • A new type: ExhibitionEvent for describing exhibitions (e.g. in museums, galleries), alongside a property workFeatured that indicates a CreativeWork featured in an Event. This is quite a typical change: it generalizes existing vocabulary - workPerformed, workPresented - to cover more scenarios with less terminology. 
  • Added an inverse of the makesOffer property: offeredBy to simplify the description of not-for-profit offers (e.g. library book lending).
  • Improved our support for feed-oriented structured data, by adding DataFeed and DataFeedItem
  • Introduced a new type to represent barcodes.
These are just a small sample of the vocabulary changes introduced in v2.2. This release also includes non-vocabulary improvements, such as a simpler feedback form (available from every page in the 'more...' section), some updates to the FAQ on documentation re-use and https. We are aware that the technical nature of our issue tracking site on Github is not ideal for some people, and hope that the improved feedback form will make it easier for the project to listen to a broader audience.

Finally, the illustration above is included here as a reminder that there is more to collaboration than fixing bugs and adding new vocabulary. The interactive version applies the D3 visualization toolkit to exploring the hierarchy. Thanks to Fabio Valsecchi (who made this starburst demo), Gregg Kellogg and Sandro Hawke for their investigations in this area. We are collecting visualization ideas and links in our issue tracker. Another area we also encourage collaboration is around finding even simpler ways of sharing structured data. In particular we would like to draw attention to the CSV on the Web work at W3C, which offers new ways of mapping between tabular datasets and descriptions. To join our discussions on vocabularies, visualization, syntax issues and more, you can join the community group at W3C.

Posted at 15:13

Semantic Web Company (Austria): If you like “Friends” you probably also will like “Veronica’s Closet” (find out with SPARQL why)

In a previous blog post I have discussed the power of SPARQL to go beyond data retrieval to analytics. Here I look into the possibilities to implement a product recommender all in SPARQL. Products are considered to be similar if they share relevant characteristics, and the higher the overlap the higher the similarity. In the case of movies or TV programs there are static characteristics (e.g. genre, actors, director) and dynamic ones like viewing patterns of the audience.

The static part of this we can look up in resources like the DBpedia. If we look at the data related to the resource <> (that represents the TV show “Friends”) we can use for example the associated subjects (see predicate dcterms:subject). In this case we find for example <> or <> If we want to find other TV shows that are related to the same subjects we can do this with the following query:

Bildschirmfoto 2015-11-06 um 13.39.02

click to get code

The query can be exectuted at the DBpedia SPARQL endpoint (default graph Read from the inside out the query does the following:
  1. Count the number of subjects related to TV show “Friends”.
  2. Get all TV shows that share at least one subject with “Friends” and count how many they have in common.
  3. For each of those related shows count the number of subjects they are related to.
  4. Now we can calculate the relative overlap in subjects which is (number of shared subjects) / (numbers of subjects for “Friends” + number of subjects for other show – number of common subjects).

This gives us a score of how related one show is to another one. The results are sorted by score (the higher the better) and these are the results for “Friends”:

subjCount ShowAB
subjCount ShowA
subjCount ShowB
subj Score
Will_&_Grace 10 16 18 0.416667
Sex_and_the_City 10 16 21 0.37037
Seinfeld 10 16 23 0.344828
Veronica’s_Closet 7 16 12 0.333333
The_George_Carlin_Show 6 16 9 0.315789
Frasier 8 16 18 0.307692

In the fist line of the results we see that “Friends” is associated with 16 subjects (that is the same in every line), “Will & Grace” with 18, and they share 10 subjects. That results into a score of 0.416667. Other characteristics to look at are actors starring a show, the creators (authors), or executive producers.

We can pack all this in one query and retrieve similar TV shows based on shared subjects, starring actors, creators, and executive producers. The inner queries retrieve the shows that share some of those characteristics, count numbers as shown before and calculate a score for each dimension. The individual scores can be weighted, in the example here the creator score is multiplied by 0.5 and the producer score by 0.75 to adjust the influence of each of them.

Bildschirmfoto 2015-11-06 um 13.43.27

click to get code

 This results into:

subj Score
star Score
creator Score
execprod Score
integrated Score
The_Powers_That_Be_(TV_series) 0.17391 0.0 1.0 0.0 0.1684782608
Veronica’s_Closet 0.33333 0.0 0.0 0.428571 0.1636904761
Family_Album_(1993_TV_series) 0.14285 0.0 0.666667 0.0 0.1190476190
Jesse_(TV_series) 0.28571 0.0 0.0 0.181818 0.1055194805
Will_&_Grace 0.41666 0.0 0.0 0.0 0.1041666666
Sex_and_the_City 0.37037 0.0 0.0 0.0 0.0925925925
Seinfeld 0.34482 0.0 0.0 0.0 0.0862068965
Work_It_(TV_series) 0.13043 0.0 0.0 0.285714 0.0861801242
Better_with_You 0.25 0.0 0.0 0.125 0.0859375
Dream_On_(TV_series) 0.16666 0.0 0.333333 0.0 0.0833333333
The_George_Carlin_Show 0.31578 0.0 0.0 0.0 0.0789473684
Frasier 0.30769 0.0 0.0 0.0 0.0769230769
Everybody_Loves_Raymond 0.30434 0.0 0.0 0.0 0.0760869565
Madman_of_the_People 0.3 0.0 0.0 0.0 0.075
Night_Court 0.3 0.0 0.0 0.0 0.075
0.25 0.0 0.0 0.0625 0.07421875
Monty_(TV_series) 0.15 0.14285 0.0 0.0 0.0732142857
Go_On_(TV_series) 0.13043 0.07692 0.0 0.111111 0.0726727982
The_Trouble_with_Larry 0.19047 0.1 0.0 0.0 0.0726190476
Joey_(TV_series) 0.21739 0.07142 0.0 0.0 0.0722049689

Each line shows the individual scores for each of the predicates used and in the last column the final score. You can also try out the query with “House” <> or “Suits” <> and get shows related to those.

This approach can be used for any similar data, too, where we want to obtain similar items based on characteristics they share. One could for example compare persons (by e.g. profession, interests, …), or consumer electronic products like photo cameras (resolution, storage, size or price range).

Posted at 12:40

Ebiquity research group UMBC: Extracting Structured Summaries from Text Documents

Extracting Structured Summaries
from Text Documents

Dr. Zareen Syed
Research Assistant Professor, UMBC

10:30am, Monday, 9 November 2015, ITE 346, UMBC

In this talk, Dr. Syed will present unsupervised approaches for automatically extracting structured summaries composed of slots and fillers (attributes and values) and important facts from articles, thus effectively reducing the amount of time and effort spent on gathering intelligence by humans using traditional keyword based search approaches. The approach first extracts important concepts from text documents and links them to unique concepts in Wikitology knowledge base. It then exploits the types associated with the linked concepts to discover candidate slots and fillers. Finally it applies specialized approaches for ranking and filtering slots to select the most relevant slots to include in the structured summary.

Compared with the state of the art, Dr. Syed’s approach is unrestricted, i.e., it does not require manually crafted catalogue of slots or relations of interest that may vary over different domains. Unlike Natural Language Processing (NLP) based approaches that require well-formed sentences, the approach can be applied on semi-structured text. Furthermore, NLP based approaches for fact extraction extract lexical facts and sentences that require further processing for disambiguating and linking to unique entities and concepts in a knowledge base, whereas, in Dr. Syed’s approach, concept linking is done as a first step in the discovery process. Linking concepts to a knowledge base provides the additional advantage that the terms can be explicitly linked or mapped to semantic concepts in other ontologies and are thus available for reasoning in more sophisticated language understanding systems.

Posted at 02:48

November 03

Semantic Web Company (Austria): ADEQUATe for the Quality of Open Data

The ADEQUATe project builds on two observations: An increasing amount of Open Data becomes available as an important resource for emerging businesses and furtheron the integration of such open, freely re-usable data sources into organisations’ data warehouse and data management systems is seen as a key success factor for competitive advantages in a data-driven economy.

The project now identifies crucial issues which have to be tackled to fully exploit the value of open data and the efficient integration with other data sources:

  1. the overall quality issues with meta data and the data itself
  2. the lack of interoperability between data sources

AdequateThe projects approch is now to address this point already in an early stage – when the open data is freshly provided by either governmental organisations or others.

The ADEQUATe project works with a combination of data and community driven approaches to address the above mentioned challenges. This include 1) the continuously assessment of Data Quality of Open Data Portals based on a comprehensive list of quality metrics, 2) the application of a set of (semi)-automatic algorithms in combination with crowdsourcing approaches to improve identified quality issues and 3) the use of Semantic Web Technologies to transform legacy Open Data sources (mainly common text formats) into Linked Data.

So the project intends to research and develop novel automated and community-driven data quality improvement techniques and then integrate pilot implentations into existing Open Data portals ( and  Furtheron a quality assessment & monitoring framework will evaluate and demonstrate the impact of the ADEQUATe solutions for the above mentioned business case.

About: ADEQUATe is funded by the Austrian FFG under the Programme ICT of the Future. The Project is run by Semantic Web Company together with Institute for Information Business of Vienna University of Economics & Business and the Department for E-Governance and Administration at the Danube University Krems. The Project started by August 2015 and will run to March 2018.



Posted at 10:19

November 01

AKSW Group - University of Leipzig: AKSW Colloquium, 2 November, 3pm, Automating Geo-spatial RDF Dataset Integration and Enrichment

Mohamed Sherif depictionOn November 2nd at 3 PM, Mohamed Sherif will present the progress of his PhD titled “Automating Geo-spatial RDF Dataset Integration and Enrichment”.


Within this thesis, we will spur the transition from islands of isolated Geographic Information Systems (GIS) to enriched geo-spatial Linked Data sets with which geographic information can easily be integrated and processed. To achieve this goal, we will provide concepts, approaches and use cases that facilitate the combination and manipulation of geographic information with other data types that are already present on the Linked Data Web. Moreover, we will provide means to automate the proposed approaches by applying unsupervised machine learning algorithms or weakly supervised algorithms.


About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 19:38

October 31

Ebiquity research group UMBC: The KELVIN Information Extraction System

In this week’s ebiquity lab meeting (10:30am Monday Nov 2), Tim Finin will describe recent work on the Kelvin information extraction system and its performance in two tasks in the 2015 NIST Text Analysis Conference. Kelvin has been under development at the JHU Human Language Center of Excellence for several years. Kelvin reads documents in several languages and extracts entities and relations between them. This year it was used for the Coldstart Knowledge Base Population and Trilingual Entity Discovery and Linking tasks. Key components in the tasks are a system for cross-document coreference and another that links entities to entries in the Freebase knowledge base.

Posted at 03:41

October 29

Ebiquity research group UMBC: Lyrics Augmented Multi-modal Music Recommendation

Lyrics Augmented Multi-modal
Music Recommendation

Abhay Kashyap

1:00pm Friday 30 October, ITE 325b

In an increasingly mobile and connected world, digital music consumption has rapidly increased. More recently, faster and cheaper mobile bandwidth has given the average mobile user the potential to access large troves of music through streaming services like Spotify and Google Music that boast catalogs with tens of millions of songs. At this scale, effective music recommendation is critical for music discovery and personalized user experience.

Recommenders that rely on collaborative information suffer from two major problems: the long tail problem, which is induced by popularity bias, and the cold start problem caused by new items with no data. In such cases, they fall back on content to compute similarity. For music, content based features can be divided into acoustic and textual domains. Acoustic features are extracted from the audio signal while textual features come from song metadata, lyrical content, collaborative tags and associated web text.

Research in content based music similarity has largely been focused in the acoustic domain while text based features have been limited to metadata, tags and shallow methods for web text and lyrics. Song lyrics house information about the sentiment and topic of a song that cannot be easily extracted from the audio. Past work has shown that even shallow lyrical features improved audio-only features and in some tasks like mood classification, outperformed audio-only features. In addition, lyrics are also easily available which make them a valuable resource and warrant a deeper analysis.

The goal of this research is to fill the lyrical gap in existing music recommender systems. The first step is to build algorithms to extract and represent the meaning and emotion contained in the song’s lyrics. The next step is to effectively combine lyrical features with acoustic and collaborative information to build a multi-modal recommendation engine.

For this work, the genre is restricted to Rap because it is a lyrics-centric genre and techniques built for Rap can be generalized to other genres. It was also the highest streamed genre in 2014, accounting for 28.5% of all music streamed. Rap lyrics are scraped from dedicated lyrics websites like and while the semantic knowledge base comprising artists, albums and song metadata come from the MusicBrainz project. Acoustic features are directly used from EchoNest while collaborative information like tags, plays, co-plays etc. come from

Preliminary work involved extraction of compositional style features like rhyme patterns and density, vocabulary size, simile and profanity usage from over 10,000 songs by over 150 artists. These features are available for users to browse and explore through interactive visualizations on Song semantics were represented using off-the-shelf neural language based vector models (doc2vec). Future work will involve building novel language models for lyrics and latent representations for attributes that is driven by collaborative information for multi-modal recommendation.

Committee: Drs. Tim Finin (Chair), Anupam Joshi, Pranam Kolari (WalmartLabs), Cynthia Matuszek and Tim Oates

Posted at 17:57

October 27

Tetherless World Constellation group RPI: Do we have a magic flute for K-12 Web Science?

In early July of 2015, Tetherless World Constellation (TWC) opened its door for four young men of the 2015 summer program of Rensselaer Research Experience for High School Students. The program covered a period of four weeks and each student was asked to choose a small and focused topic for research experience. They each were also asked to prepared a poster and present it in public at the end of the program.

Web Science was the discipline chosen by the four high school students at TWC. Before their arrival several professors, research scientists and graduate students formed a mentoring group, and officially I was assigned the task to mentor two of the four students. Such a fresh experience! And then a question came up was: do we have a curriculum of Web Science for High School Students? And for a period of four weeks? We do have excellent textbooks for Semantic Web, Data Science, and more, but most of them are not for high school students. Also the ‘research centric’ feature of the summer program indicated that we should not focus only on teaching but perhaps needed to spend more time on advising a small research project.

My simple plan was, for week 1 we focused on basic concepts, for weeks 2 and 3 the students were assigned a specific topic taken from an existing project, and for week 4 we focused on result analysis, wrap up and poster preparation. A google doc was used to record the basic concepts, technical resources and assignments we introduced and discussed in week 1. I thought those materials could be a little bit more for the students, but to my surprise they took them up really fast, which gave me the confidence to assign them research topics from ongoing projects. One of the students was asked to do statistical analysis of records on the Deep Carbon Observatory Data Portal, and presented the results in interactive visualizations. The other student worked on the visualization of geologic time and connections to Web resources such as Wikipedia. Technologies used were RDF database, SPARQL query, JavaScript, D3.js and JSON data format.

Hope the short program has evoked the students’ interest to explore more and deeper in Web Science. Some of them will soon graduate from high school and go to universities. Wish them good luck!

Posted at 15:59

October 21

Dublin Core Metadata Initiative: DCMI Webinar: Two-Part Series on with Richard Wallis

2015-10-21, Join independent consultant Richard Wallis, former Technology Evangelist for OCLC and currently working with Google on, for this two part, in-depth webinar mini-series look at titled " in Two Parts: From Use to Extension". The webinar series examines the use of and its extension in the bibliographic and wider domains. Part 1 of the series titled "Fit For a Bibliographic Purpose" on 18 November 2015: (a) traces the history of the vocabulary, plus its applicability to the bibliographic domain, and (b) the Schema Bib Extend W3C Community Group--why it was set up, how it approached the creation of bibliographic extension proposals, and how those proposals were shaped. The second more technical webinar in the series on 2 December 2015 explains the extension mechanism for external and reviewed/hosted extensions, and their relationship to the core vocabulary. Wallis will take an in-depth look at, demonstrate, and share experiences in designing, and creating a potential extension to the vocabulary. He will step through the process of creating the required vocabulary definition and examples files on a local system using a few simple tools, then sharing them on a publicly visible temporary cloud instance before proposing to the group. For more information and to register for this free webinar, visit

Posted at 23:59

Dublin Core Metadata Initiative: UNESP joins DCMI as Institutional Member

2015-10-21, DCMI is pleased to announce that São Paulo State University (Universidade Estadual Paulista—UNESP) has joined DCMI as an Institutional Member. UNESP is one of the largest and most important Brazilian universities, with distinguished achievements in teaching, research and extension services. UNESP is supported by State funds and along with USP (Universidade de São Paulo) and Unicamp (Universidade Estadual de Campinas) offers free public higher education in São Paulo State. UNESP was founded in 1976 and is the most successful model of a multicampus university in Brazil supporting intense and diversified activities in São Paulo—the most developed State in Brazil. UNESP's influence is recognized through the level of regional development where its campuses are located—one in the State capital and 23 others strategically distributed throughout the State. UNESP has appointed Flávia Bastos, Coordinator of Libraries, General Coordination at the São Paulo State University, as its representative to the DCMI Governing Board. For information on your organization joining DCMI, see the membership page at

Posted at 23:59

October 20

Norm Walsh: Design

Design vs. use.

Posted at 23:00

Jeen Broekstra: Sesame 4: boilerplate elimination

Sesame 4.0 was officially released today. This first new major release of the Sesame RDF framework in over 7 years focuses on major usability improvements in the core APIs. In particular, its goals are to reduce boilerplate code in Sesame-based projects, and to facilitate easier, more flexible, and streaming access to RDF data and query results. To this end, release 4 uses several new features of the Java 8 platform, such as lambda expressions, the Stream API, and the AutoCloseable interface.

The Sesame Programmers Manual gives an overview of what these changes entail. However, to illustrate how much of a difference these changes can make for even very simple code, I will show off some of the features in more detail here.

To set the stage, here’s a simple Java program using Sesame 2, which creates a new (in-memory) Sesame database, loads a file into it, queries it using SPARQL, and then uses the result of that query to isolate a subset of the data, which it then retrieves and writes to a new RDF file on disk:

public class SimpleExample {

  public static void main(String[] args) {

    Repository rep = new SailRepository(new MemoryStore());
    try {

      RepositoryConnection conn = rep.getConnection();
      try {
        // load a FOAF file (in Turtle format) to our in-memory database
        File foafData = new File("/path/to/example/foaf.ttl");
        conn.add(foafData, null, RDFFormat.TURTLE);

        // SPARQL query to retrieve all people with the given name "Bob".
        String queryStr = "SELECT ?x WHERE { ?x a foaf:Person ; foaf:givenName \"Bob\"; ?p ?y .} ";

        TupleQueryResult result = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryStr).evaluate();
        try {
          while (result.hasNext()) {
            BindingSet bs =;
            URI x = (URI)bs.getValue("x");

            // retrieve all data with this person x as the subject, and
            // put it in a Model
            Model m = Iterations.addAll(conn.getStatements(x, null, null, true), new LinkedHashModel());
            // write this data to disk using the person's last name as  the filename.
            try {
              if (m.contains(x, FOAF.FAMILYN_NAME, null)) {
                  Literal familyName = m.filter(x, FOAF.FAMILY_NAME, null).objectLiteral();
                  File outputFile = new File("/path/to/example/" + familyName.getLabel() + ".ttl");
                  Rio.write(m, new FileOutputStream(outputFile), RDFFormat.TURTLE);
            catch (ModelException e) {
            catch (RDFHandlerException e) {
        finally {

      catch (RDFParseException e) {
      catch (IOException e) {
      catch (QueryEvaluationException e) {
      catch (MalformedQueryException e) {
      finally {
    catch (RepositoryException e) {

That is quite a hefty bit of code for such a relatively simple task. Of course, some of the exception handling I’m doing here is more elaborate than strictly necessary – we could simply catch the super type of all Sesame exceptions, OpenRDFException, or even just add throws Exception to our main method and be done with it. But the reason I spell it out in such detail is that I want to make a point: if you catch every possibly exception Sesame 2 throws at you, code can quickly become unwieldy. And throwing everything upward is not always possible or desirable.

Since in Sesame 2, all these exceptions are checked exceptions, you are forced, at coding time, to figure out what to do with them, even in cases where it’s absolutely clear that nothing exceptional will ever happen. For example, in the above code (line 38) a ModelException is caught. If we look at why this is potentially thrown, we can find that this is only the case if the Model from which we’re trying to retrieve the family name does not actually contain any family name. However, we already checked in our code (using an if-statement, line 30) that it does contain that, so we can be quite sure that this exception will never actually be thrown. Nevertheless, we are forced to deal with it.

If we look at the above code and count the lines that are actually doing our business versus the line that are simply boilerplate, we arrive at 10-20 lines of actual business code (creating the database, loading the data, doing the query, processing the result and writing the file). The other lines are all boilerplate code (try-catch blocks, mostly).

Sesame 4 improves on this situation in a number of ways. Here’s the same program, but done using several new Sesame 4 features.

public class SimpleExample {

  public static void main(String[] args) {

    Repository rep = new SailRepository(new MemoryStore());

    try (RepositoryConnection conn = rep.getConnection()) {
      // load a FOAF file (in Turtle format) to our in-memory database
      File foafData = new File("/path/to/example/foaf.ttl");
      conn.add(foafData, null, RDFFormat.TURTLE);

      // SPARQL query to retrieve all people with the given name "Bob".
      String queryStr = "SELECT ?x WHERE { ?x a foaf:Person ; foaf:givenName \"Bob\"; ?p ?y .} ";

      try (TupleQueryResult result = conn.prepareTupleQuery(queryStr).evaluate()) {
        List<BindingSet> list = QueryResults.asList(result);
        for (BindingSet bs : list) {
          IRI x = (IRI)bs.getValue("x");

          // retrieve all data with this person x as the subject, and put
          // it in a Model
          Model m = QueryResults.asModel(conn.getStatements(x, null, null));

          Models.objectLiteral(m.filter(x, FOAF.FAMILY_NAME, null)).ifPresent(familyName -> {
            File outputFile = new File("/path/to/example/" + familyName.getLabel() + ".ttl");
            try {
              Rio.write(m, new FileOutputStream(outputFile), RDFFormat.TURTLE);
            catch (FileNotFoundException e) {
    catch (IOException e) {

As you can see, this is already quite a bit shorter, and more to the point, than the Sesame 2 equivalent. Because in Sesame 4 the exceptions are no longer checked, we can get rid of all the enforced catch clauses. In addition we have used a few simple new features and convenience functions to shorten other bits of code. To name a few:

  • AutoCloseable and try-with-resources (lines 8 and 16)
    Instead of explicitly wrapping all code that operates on a RepositoryConnection or a QueryResult in a try block and adding a finally-clause to ensure that connections and results are closed when we are done, we rely on the new try-with-resources feature that Java 8 allows on resources that implement AutoCloseable.
  • Use of lambda expressions (line 25-33) allows us to shorten our business logic, making the entire operation on the family name essentially a single line (yes I know the body of the lambda here is not a single line – you know what I mean though, I hope :)).
  • Use of Optional parameters in the Model API (line 25). By using Optional.ifPresent() we eliminate the need for the separate existence-check for the statement.
  • Several methods now have sensible defaults for parameters, which means you can often leave them out. This leads to shorter, more readable code. For example, preparing the query (line 16) no longer requires the QueryLanguage parameter (if you don’t specify it, we assume it’s SPARQL), and retrieving statements (getStatements, line 32) no longer requires the boolean includeInferred parameter.

With these simple measures, we have nearly halved the number of lines in this program, and overall made it easier to read (and therefore, to maintain).

So, haven’t I cheated here? After all, the original code caught all these exceptions and printed out their stack traces, and in this new code we’re simply ignoring those exceptions completely. So up to a point this new program will behave differently from the original. However, the point I’m making is that the above is valid in Sesame 4. You can choose to completely ignore most exceptions (because they are unchecked exceptions, now). Of course, if you want you still can catch these exceptions, but it’s no longer really necessary if you just want to fire-and-forget.

Sesame 4 is not just about getting shorter code, of course. A lot of the introduced functionality is to increase usability and flexibility. For example, if you find it a pain to have to open and close RepositoryConnections every time you wish to do something on a Repository, there now is a utility class that allows you to ignore this completely. For example, to load our initial RDF file without actually opening a RepositoryConnection first, we can do something like this:

Repository rep = new SailRepository(new MemoryStore());

// load a FOAF file (in Turtle format) to our in-memory database
File foafData = new File("/path/to/example/foaf.ttl");
Repositories.consume(rep, conn -> {
  try {
    conn.add(foafData, null, RDFFormat.TURTLE);
  catch (IOException e) {

This isn’t necessarily a lot shorter, although it can be if you are dealing with more complex transactions: Repositories.consume can take care of committing and/or rolling back your transaction for you. However, regardless of whether it is shorter, it saves you the trouble of juggling with the RepositoryConnection object yourself.

There is more to go into, but this should give a little taster of what Sesame 4 offers. I plan to write several more tech blogs in the near future about Sesame’s new features, and how the leveraging of Java 8 really helps you to focus on the business end of your code (and not the bureaucracy/boilerplate surrounding it).

Posted at 03:58

October 17

Libby Miller: Hackspace Hat quick install (or: audio and video streaming from a Raspberry Pi to a remote or local WebRTC-compatible-browser)

I’ve been distracted by other things, but just in case it’s useful to anyone, here’s how to make a

Posted at 15:28

Bob DuCharme: Data wrangling, feature engineering, and dada

And surrealism, and impressionism...

Posted at 14:58

Copyright of the postings is owned by the original blog authors. Contact us.