Planet RDF

It's triples all the way down

December 18

Norm Walsh: XInclude 1.1 (Last Call mark II)

Implementation experience, there's nothing like it.

Posted at 13:05

December 15

Ebiquity research group UMBC: Semantics for Privacy and Shared Context

Roberto Yus, Primal Pappachan, Prajit Das, Tim Finin, Anupam Joshi, and Eduardo Mena, Semantics for Privacy and Shared Context, Workshop on Society, Privacy and the Semantic Web-Policy and Technology, held at Int. Semantic Web Conf., Oct. 2014.

Capturing, maintaining, and using context information helps mobile applications provide better services and generates data useful in specifying information sharing policies. Obtaining the full benefit of context information requires a rich and expressive representation that is grounded in shared semantic models. We summarize some of our past work on representing and using context models and briefly describe Triveni, a system for cross-device context discovery and enrichment. Triveni represents context in RDF and OWL and reasons over context models to infer additional information and detect and resolve ambiguities and inconsistencies. A unique feature, its ability to create and manage “contextual groups” of users in an environment, enables their members to share context information using wireless ad-hoc networks. Thus, it enriches the information about a user’s context by creating mobile ad hoc knowledge networks.

Posted at 17:01

December 13

Bob DuCharme: Hadoop

What it is and how people use it: my own summary.

Posted at 14:13

Libby Miller: Catwigs: A conversation with your project

Posted at 10:47

December 11 v1.92: Music, Video Games, Sports, Itemlist, breadcrumbs and more!

We are happy to announce version v1.92 of With this update we "soft launch" a substantial collection of improvements that will form the basis for a version 2.0 release in early 2015. There remain a number of site-wide improvements, bugfixes and clarifications that we'd like to make before we feel ready to use the name "v2.0". However the core vocabulary improvements are stable and available for use from today. As usual see the release notes page for details.

Please get in touch via the W3C Web Schemas group or our Github issue tracker if you'd like to share feedback with us and the wider community. We won't go into the details of each update in today's blog post, but there are a lot of additions and fixes, and more coming in 2015. Many thanks to all those who contributed to this release!

Posted at 15:28

December 10

Dublin Core Metadata Initiative: DCMI/ASIST Webinar: The Libhub Initiative: Increasing the Web Visibility of Libraries

2014-12-10, In this webinar, Eric Miller, President of Zepheira, will talk about the transition libraries must make to achieve Web visibility, explain recent trends that support these efforts, and introduce the Libhub Initiative -- an active exploration of what can happen when libraries begin to speak the language of the Web. As a founding sponsor, Zepheira's introduction of the Libhub Initiative creates an industry-wide focus on the collective visibility of libraries and their resources on the Web. Libraries and memory organizations have rich content and resources that the Web can not see or use. The Libhub Initiative aims to find common ground for libraries, providers, and partners to publish and use data with non-proprietary, web standards. Libraries can then communicate in a way Web applications understand and Web users can see through the use of enabling technology like Linked Data and shared vocabularies such as and BIBFRAME. The Libhub Initiative uniquely prioritizes the linking of these newly exposed library resources to each other and to other resources across the Web, a critical requirement of increased Web visibility. Additional information about the webinar and and registration can be found at

Posted at 23:59

Libby Miller: Huffduffer / Radiodan Digression – NFC control

I’d like to be able to change the URL of the RSS feed using NFC (~RFID). This is a tiny bit of over-engineering, but could also be very cool. I have a couple of NFC boards I’ve been planning on playing with for a while.

One’s an

Posted at 17:37

December 09

Frederick Giasson: Open Semantic Framework 3.1 Released

Structured Dynamics is happy to announce the immediate availability of the Open Semantic Framework version 3.1. This new version includes a set of fixes to different components of the framework in the last few months. The biggest change is deployment of OSF using Virtuoso Open Source version 7.1.0. triple_120

We also created a new API for Clojure developers called: clj-osf. Finally we created a new Open Semantic Framework web portal that better describes the project and is hopefully easier to use and more modern.

Quick Introduction to the Open Semantic Framework

What is the Open Semantic Framework?

The Open Semantic Framework (OSF) is an integrated software stack using semantic technologies for knowledge management. It has a layered architecture that combines existing open source software with additional open source components. OSF is designed as an integrated content platform accessible via the Web, which provides needed knowledge management capabilities to enterprises. OSF is made available under the Apache 2 license.

OSF can integrate and manage all types of content – unstructured documents, semi-structured files, spreadsheets, and structured databases – using a variety of best-of-breed data indexing and management engines. All external content is converted to the canonical RDF data model, enabling common tools and methods for tagging and managing all content. Ontologies provide the schema and common vocabularies for integrating across diverse datasets. These capabilities can be layered over existing information assets for unprecedented levels of integration and connectivity. All information within OSF may be powerfully searched and faceted, with results datasets available for export in a variety of formats and as linked data.

A new Open Semantic Framework website

The OSF 3.1 release also triggered the creation of a new website for the project. We wanted something leaner and more modern and that is what I think we delivered. We also reworked the content, we wrote about a series of usecases 1 2 3 4 5 6 and we better aggregated and presented information for each web service endpoint.

A new OSF sandbox

We also created an OSF sandbox where people can test each web service endpoint and test how each functionality works. All of the web services are open to users. The sandbox is not meant to be stable considering that everybody have access to all endpoints. However, the sandbox server will be recreated on a periodic basis. If the sandbox is totally broken and users experiment issues, they can always request a re-creation of the server directly on the OSF mailing list.

Each of the web service pages on the new OSF portal has a Sandbox section where you see some code examples of how to use the endpoint and how to send requests to the sandbox. Here are the instructions to use the sandbox server.

A new OSF API for Clojure: clj-osf

The OSF release 3.1 also includes a new API for Clojure developers: clj-osf.

clj-osf is a Domain Specific Language (DSL) that should lower the threshold to use the Open Semantic Framework.

To use the DSL, you only have to configure your application to use a specific OSF endpoint. Here is an example of how to do this for the Sandbox server:

;; Define the OSF Sandbox credentials (or your own):
(require '[clj-osf.core :as osf])

(osf/defosf osf-test-endpoint {:protocol :http
                               :domain ""
                               :api-key "EDC33DA4D977CFDF7B90545565E07324"
                               :app-id "administer"})

(osf/defuser osf-test-user {:uri ""})

Then you can send simple OSF web service queries. Here is an example that sends a search query to return records of type foaf:Person that also match the keyword “bob”:

(require '[ :as search])

 (search/query "bob")
 (search/type-filters [""]))

A complete set of clj-osf examples is available on the OSF wiki.

Finally the complete clj-osf DSL documentation is available here.

A community effort

This new release of the OSF Installer is another effort of the growing Open Semantic Framework community. The upgrade of the installer to deploy the OSF stack using Virtuoso Open Source version 7.1.0 has been created by William (Bill) Anderson.

Deploying a new OSF 3.1 Server

Using the OSF Installer

OSF 3.1 can easily be deployed on a Ubuntu 14.04 LTS server using the osf-installer application. It can easily be done by executing the following commands in your terminal:

mkdir -p /usr/share/osf-installer/

cd /usr/share/osf-installer/


chmod 755


./osf-installer --install-osf -v

Using a Amazon AMI

If you are an Amazon AWS user, you also have access to a free AMI that you can use to create your own OSF instance. The full documentation for using the OSF AMI is available here.

Upgrading Existing Installations

Existing OSF installations can be upgraded using the OSF Installer. However, note that the upgrade won’t deploy Virtuoso Open Source 7.1.0 for you. All the code will be upgraded, but Virtuoso will remain the version you were last using on your instance. All the code of OSF 3.1 is compatible with previous versions of Virtuoso, but you won’t benefit the latest improvements to Virtuoso (in terms of performances) and its latest SPARQL 1.1 implementations. If you want to upgrade Virtuoso to version 7.1.0 on an existing OSF instance you will have to do this by hands.

To upgrade the OSF codebase, the first thing is to upgrade the installer itself:

# Upgrade the OSF Installer

Then you can upgrade the components using the following commands:

# Upgrade the OSF Web Services
./usr/share/osf-installer/osf --upgrade-osf-web-services="3.1.0"

# Upgrade the OSF WS PHP API
./usr/share/osf-installer/osf --upgrade-osf-ws-php-api="3.1.0"

# Upgrade the OSF Tests Suites
./usr/share/osf-installer/osf --upgrade-osf-tests-suites="3.1.0"

# Upgrade the Datasets Management Tool
./usr/share/osf-installer/osf --upgrade-osf-datasets-management-tool="3.1.0"

# Upgrade the Data Validator Tool
./usr/share/osf-installer/osf --upgrade-osf-data-validator-tool="3.1.0"

Posted at 17:08

Libby Miller: Hufferduffer Radiodan part 4 – server side Radiodan API and buttons

Server side calls

Andrew has been helping me with server-side radiodan calls. I was close with

Posted at 15:30

December 08

Libby Miller: Huffduffer Radiodan part 3 – using the switch with Radiodan code

Yesterday I got a microswitch working – today I want to make it turn on the radio. Given that I can only start my podcasts in client-side mode at the moment, I’ll have to bodge it a bit so I can actually see it working (Andrew is going to help me understand the server side API better tomorrow).

So my plan is to

1. Switch back to the radiodan “magic button” radio
2. Make my switch replace the default on button
3. Make my switch turn the radio on when it’s open

and if I have time

4. attach a simple rotary encoder for volume
5. attach an RGB LED for the status lights

For the magic button radio, we have had some simple PCBs made to make the soldering easier for two RGB LED-rotary encoder-buttons, but it doesn’t fit in my new box, and anyway, I want to see how difficult it is without the PCB.

It’s 3pm now and I have about an hour free.

Switch back to the radiodan “magic button” radio

This is easy. I’ve sshed in and do this:

pi@radiodan-libby ~ $ sudo radiodan-device-type

Current device type: radiodan-type-huffduffer
Device type required. Valid types radiodan-type-example, radiodan-type-huffduffer, radiodan-type-magic

so I do

pi@radiodan-libby ~ $ sudo radiodan-device-type radiodan-type-magic

and reboot

Once we’re back up, if I go to

Posted at 17:08

December 07

Libby Miller: Huffduffer Radiodan part 2 – a switch

I wanted to make a nice box for the

Posted at 17:10

December 05

Libby Miller: Huffduffer Radiodan

I’ve been wanting a physical radio that plays podcasts for a long time, and it’s something we’ve discussed quite a lot in the

Posted at 10:15

December 02

Sebastian Trueg: YouID Identity Claim


Posted at 08:20

November 28

Dublin Core Metadata Initiative: DCMI invites public comment on draft LRMI in RDF

2014-11-28, DCMI invites public comment on a draft RDF specification for LRMI version 1.1. The draft RDF specification can be found at The one month public comment period is from 1 December 2014 through 31 December 2014. The RDF specification is intended to embody the current Learning Resource Metadata Initiative version 1.1 term declarations at

Posted at 23:59

Dublin Core Metadata Initiative: DCMI joins as inaugural member of DLMA

2014-11-28, DCMI, along with IMS Global) and the International Digital Publishing Forum (IDPF), announce the formation of the Digital Learning Metadata Alliance (DLMA). The DLMA will focus on coordination of adoption and development of existing metadata standards in support of digital learning and education. For more information about DLMA, visit and read the IMS Global press release at

Posted at 23:59

Semantic Web Company (Austria): Automatic Semantic Tagging for Drupal CMS launched

REEEP [1] and CTCN [2] have recently launched Climate Tagger, a new tool to automatically scan, label, sort and catalogue datasets and document collections. Climate Tagger now incorporates a Drupal Module for automatic annotation of Drupal content nodes. Climate Tagger addresses knowledge-driven organizations in the climate and development arenas, providing automated functionality to streamline, catalogue and link their Climate Compatible Development data and information resources.

Climate Tagger

Climate Tagger for Drupal is a simple, FREE and easy-to-use way to integrate the well-known Reegle Tagging API [3], originally developed in 2011 with the support of CDKN [4], (now part of the Climate Tagger suite as Climate Tagger API) into any web site based on the Drupal Content Management System [5]. Climate Tagger is backed by the expansive Climate Compatible Development Thesaurus, developed by experts in multiple fields and continuously updated to remain current (explore the thesaurus at The thesaurus is available in English, French, Spanish, German and Portuguese. And can connect content on different portals published in these different languages.

Climate Tagger for Drupal can be fine-tuned to individual (and existing) configuration of any Drupal 7 installation by:

  • determining which content types and fields will be automatically tagged
  • scheduling “batch jobs” for automatic updating (also for already existing contents; where the option is available to re-tag all content or only tag with new concepts found via a thesaurus expansion / update)
  • automatically limit and manage volumes of tag results based on individually chosen scoring thresholds
  • blending with manual tagging
click to enlarge

click to enlarge

“Climate Tagger [6] brings together the semantic power of Semantic Web Company’s PoolParty Semantic Suite [7] with the domain expertise of REEEP and CTCN, resulting in an automatic annotation module for Drupal 7 with an accuracy never seen before” states Martin Kaltenböck, Managing Partner of Semantic Web Company [8], which acts as the technology provider behind the module.

Climate Tagger is the result of a shared commitment to breaking down the ‘information silos’ that exist in the climate compatible development community, and to provide concrete solutions that can be implemented right now, anywhere” said REEEP Director General Martin Hiller. “Together with CTCN and SWC laid the foundations for a system that can be continuously improved and expanded to bring new sectors, systems and organizations into the climate knowledge community.”

For the Open Data and Linked Open Data communities, a Climate Tagger plugin for CKAN [9] has also been published, which was developed by developed by NREL [10] in cooperation with CTCN’s support, harnessing the same taxonomy and expert vetted thesaurus behind the Climate Tagger, helping connect open data to climate compatible content through the simultaneous use of these tools.

REEEP Director General Martin Hiller and CTCN Director Jukka Uosukainen will be talking about Climate Tagger at the COP20 side event hosted by the Climate Knowledge Brokers Group in Lima [11], Peru, on Monday, December 1st at 4:45pm.

Further reading and downloads

About REEEP:

REEEP invests in clean energy markets in developing countries to lower CO2 emissions and build prosperity. Based on strategic portfolio of high impact projects, REEEP works to generate energy access, improve lives and economic opportunities, build sustainable markets, and combat climate change.

REEEP understands market change from a practice, policy and financial perspective. We monitor, evaluate and learn from our portfolio to understand opportunities and barriers to success within markets. These insights then influence policy, increase public and private investment, and inform our portfolio strategy to build scale within and replication across markets. REEEP is committed to open access to knowledge to support entrepreneurship, innovation and policy improvements to empower market shifts across the developing world.

About the CTCN

The Climate Technology Centre & Network facilitates the transfer of climate technologies by providing technical assistance, improving access to technology knowledge, and fostering collaboration among climate technology stakeholders. The CTCN is the operational arm of the UNFCCC Technology Mechanism and is hosted by the United Nations Environment Programme (UNEP) in collaboration with the United Nations Industrial Development Organization (UNIDO) and 11 independent, regional organizations with expertise in climate technologies.

About Semantic Web Company

Semantic Web Company (SWC, is a technology provider headquartered in Vienna (Austria). SWC supports organizations from all industrial sectors worldwide to improve their information and data management. Their products have outstanding capabilities to extract meaning from structured and unstructured data by making use of linked data technologies.

Posted at 14:30

Semantic Web Company (Austria): Introducing the Linked Data Business Cube

With the increasing availability of semantic data on the World Wide Web and its reutilization for commercial purposes, questions arise about the economic value of interlinked data and business models that can be built on top of it. The Linked Data Business Cube provides a systematic approach to conceptualize business models for Linked Data assets. Similar to an OLAP Cube, the Linked Data Business Cube provides an integrated view on stakeholders (x-axis), revenue models (y-axis) and Linked Data assets (z-axis), thus allowing to systematically investigate the specificities of various Linked Data business models.

Linked Data Business Cube_Full


Mapping Revenue Models to Linked Data Assets

By mapping revenue models to Linked Data assets we can modify the Linked Data Business Cube as illustrated in the figure below.

Linked Data Business Cube_Revenue-Type

The figure indicates that with increasing business value of a resource the opportunities to derive direct revenues rise. Assets that are easily substitutable generate little incentives for direct revenues but can be used to trigger indirect revenues. This basically applies to instance data and metadata. On the other side, assets that are unique and difficult to imitate and substitute, i.e. in terms of competence and investments necessary to provide the service, carry the highest potential for direct revenues. This applies to assets like content, service and technology. Generally speaking, the higher the value proposition of an asset – in terms of added value – the higher the willingness to pay.

Ontologies seem to function as a “mediating layer” between “low-incentive assets” and “high-incentive assets”. This means that ontologies as a precondition for the provision and utilization of Linked Data can be capitalized in a variety of ways, depending on the business strategy of the Linked Data provider.

It is important to note that each revenue model has specific merits and flaws and requires certain preconditions to work properly. Additionally they often occur in combination as they are functionally complementary.

Mapping Revenue Models to Stakeholders

A Linked Data ecosystem is usually comprised of several stakeholders that engage in the value creation process. The cube can help us to elaborate the most reasonable business model for each stakeholder.

Linked Data Business Cube_Stakeholders

Summing up, Linked Data generates new business opportunities, but the commercialization of Linked Data is very context specific. Revenue models change in accordance to the various assets involved and the stakeholders who take use of them. Knowing these circumstances is crucial in establishing successful business models, but to do so it requires a holistic and interconnected understanding of the value creation process and the specific benefits and limitations Linked Data generates at each step of the value chain.

Read more: Asset Creation and Commercialization of Interlinked Data

Posted at 12:09

November 24

AKSW Group - University of Leipzig: Highlights of the 1st Meetup on Question Answering Systems – Leipzig, November 21st

On November 21st, AKSW group was hosting the 1st meetup on “Question Answering” (QA) systems. In this meeting, researchers from AKSW/University of Leipzig, CITEC/University of Bielefeld, Fraunhofer IAIS/University of BonDERI/National University of Ireland and the University of Passau presented the recent results of their work on QA systems. The following themes were discussed during the meeting:

  • Ontology-driven QA on the Semantic Web. Christina Unger presented Pythia system for ontology-based QA. Slides are available here.
  • Distributed Semantic Models for achieving scalability & consistency on QA. André Freitas presented TREO and EasyESA which employ vector-based approach for semantic approximation.
  • Template-based QA. Jens Lehmann presented TBSL for Template-based Question Answering over RDF Data.
  • Keyword-based QA. Saeedeh Shekarpour presented SINA approach for semantic interpretation of user queries for QA on interlinked data.
  • Hybrid QA over Linked Data. Ricardo Usbeck presented HAWK for hybrid question answering using Linked Data and full-text indizes.
  • Semantic Parsing with Combinatory Categorial Grammars (CCG). Sherzod Hakimov. Slides are available here.
  • QA on statistical Linked Data. Konrad Höffner presented LinkedSpending and RDF Data Cube vocabulary to apply QA on statistical Linked Data.
  • WDAqua (Web Data and Question Answering) project. Christoph Lange presented the WDAqua project which is part of the EU’s Marie Skłodowska-Curie Action Innovative Training Networks. WDAqua focuses on answering different aspects of the question, “how can we answer complex questions with web data?”
  • OKBQA (Open Knowledge Base & Question-Answering). Axel-C. Ngonga Ngomo presented OKBQA which aims to bring cutting edge experts in knowledge base construction and application in order to create an extensive architecture for QA systems which has no restriction on programming languages.
  • Open QA. Edgard Marx presented open source question answering framework that unifies QA approaches from several domain experts.

The meetup decided to meet biannually to fuse efforts. All agreed upon investigating existing architecture for question answering systems to be able to offer a promising, collaborative architecture for future endeavours. Join us next time! For more information contact Ricardo Usbeck.

Ali and Ricardo on behalf of the QA meetup

Posted at 09:09

November 21

Libby Miller: Product Space and Workshops

I ran a workshop this week for a different bit of the organisation. It’s a bit like holding a party. People expect to enjoy themselves (and this is an important part of the process). But workshops also have to have outcomes and goals and the rest of it. And there’s no booze to help things along.

I always come out of them feeling a bit deflated. Even if others found them enjoyable and useful, the general stress of organising and the responsibility of it all mean that I don’t, plus I have to co-opt colleagues into quite complicated and full-on roles as facilitators, so they can’t really enjoy the process either.

This time we were trying to think more creatively about future work. There are various things niggling me about it, and I want to think about how to improve things next time, while it’s still fresh in my mind.

One of the goals was – in the terms I’ve been thinking of – to explore the space in which products could exist, more thoroughly. Excellent articles by

Posted at 11:12

November 20

AKSW Group - University of Leipzig: Announcing GERBIL: General Entity Annotator Benchmark Framework

Dear all,

We are happy to announce GERBIL – a General Entity Annotation Benchmark Framework, a demo can be found at! With GERBIL, we aim to establish a highly available, easy quotable and liable focal point for Named Entity Recognition and Named Entity Disambiguation (Entity Linking) evaluations:

  • GERBIL provides persistent URLs for experimental settings. By these means, GERBIL also addresses the problem of archiving experimental results.
  • The results of GERBIL are published in a human-readable as well as a machine-readable format. By these means, we also tackle the problem of reproducibility.
  • GERBIL provides 11 different datasets and 9 different entity annotators. Please talk to us if you want to add yours.

To ensure that the GERBIL framework is useful to both end users and tool developers, its architecture and interface were designed with the following principles in mind:

  • Easy integration of annotators: We provide a web-based interface that allows annotators to be evaluated via their NIF-based REST interface. We provide a small NIF library for an easy implementation of the interface.
  • Easy integration of datasets: We also provide means to gather datasets for evaluation directly from data services such as DataHub.
  • Extensibility: GERBIL is provided as an open-source platform that can be extended by members of the community both to new tasks and different purposes.
  • Diagnostics: The interface of the tool was designed to provide developers with means to easily detect aspects in which their tool(s) need(s) to be improved.
  • Portability of results: We generate human- and machine-readable results to ensure maximum usefulness and portability of the results generated by our framework.

We are looking for your feedback!

Best regards,

Ricardo Usbeck for The GERBIL Team

Posted at 09:08

November 17

Jeen Broekstra: Insight Visualised

fragment of a canvas

An InsightNG thought canvas (click to open)

The past few years I have been heavily involved in a new concept for e-learning and knowledge discovery, called InsightNG. Recently, we released the first public beta of our platform. It is free to sign up for while we are in beta, so by all means give it a try, we would love to hear what you think of it. Or if you want to have a quick look at InsightNG to get an idea of what it’s about, visit this public canvas on The Semantic Web I created.

So what is InsightNG? One way to say what we’re doing is that we are visualising insight. An InsightNG ‘Thought Canvas’ is an interactive map of its creator’s learning process for a particular complex problem or topic. Where a simple search presents you with results without context, we constantly relate everything to the wider scope of your thinking, associating things and building relations between websites, articles, pictures and videos (any sort of web resource), but also more abstract/’real world’ things such as concepts, tasks, events, people, etc. Insight is gained, not by getting individual search results, but by seeing all the little puzzle pieces in this broader context. When it ‘clicks’, when you go “Aha, I get it!”, you have gained insight.

The philosophy behind building these canvases is that knowledge is, at a fundamental level,  a social thing, and a never-ending journey. We do not expect you to just sit there and dream a canvas up from scratch – instead we immediately engage in a dialog: while you add new elements and relations to your canvas, our platform constantly analyses what is being added and how it relates to each other, and it uses this to discover suggestions from various external sources. We scour the Web in general, Wikipedia, dedicated sources like the Springer Open Access journal library, and more. Our suggestion discovery engine uses a combination of KR/Semantic Web technologies and text mining/NLP strategies to find results that are not just keyword matches but contextually relevant suggestions: we weigh and measure everything, trying to determine how well it fits in with the broader scope of your canvas, and then rate it and feed it back to you. In short: we’re trying to be smart about what we recommend.

Thought Canvases are a great way to explore any topic where you wish to learn more, increase your understanding or awareness, or even just record your thoughts – a great mental excercise for achieving clarity. You can share canvases with friends or coworkers to show your findings and get their feedback. The fact that our discovery engine continues to look for additional related content for you means that revisiting an older canvas can often be very rewarding, as we will have found new interesting stuff for you to look at.

InsightNG is a very broadly applicable visual learning/discovery tool, and we hope you’ll give it a try and tell us what you think of our first beta. There’s a brief interactive tutorial available in the tool itself that automatically starts when you first sign up, and of course we have some more in-depth information as well, such as a Best Practice guide to creating a Canvas, and a more theoretical explanation of our ICE (Inquire, Connect, Enlighten) Methodology .

Posted at 23:29

AKSW Group - University of Leipzig: @BioASQ challenge gaining momentum

BioASQ is a series of challenges aiming to bring us closer to the vision of machines that can answer questions of biomedical professionals and researchers. The second BioASQ challenge started in February 2013. It comprised two different tasks: Large-scale biomedical semantic indexing (Task 2a), and biomedical semantic question answering (Task 2b).

In total 216 users and 142 systems registered to the automated evaluation system of BioASQ in order to participate in the challenge; 28 teams (with 95 systems) finally submitted their suggested solutions and answers. The final results were presented at the BioASQ workshop in the Cross Language Evaluation Forum (CLEF), which took place between September 23 and 26 in Sheffield, U.K.

The Awards Went To The Following Teams

Task 2a (Large-scale biomedical semantic indexing):

  • Fudan University (China)
  • NCBI (USA)
  • Aristotle University of Thessaloniki (Greece) and (USA)

Task 2b (Biomedical semantic question answering):

  • Fudan University (China)
  • NCBI (USA)
  • University of Alberta (Canada)
  • Seoul National University (South Korea)
  • Toyota Technological Institute (Japan)
  • Aristotle University of Thessaloniki (Greece) and (USA)

Best Overall Contribution:

  • NCBI (USA)
The second BioASQ competition, challenge continued the impressive achievements of the first one, pushing the research frontiers in biomedical indexing and question answering. The systems that participated in both tasks of the challenge achieved a notable increase in accuracy over the first year. Among the highlights is the fact that the best systems in task 2a outperformed again the very strong baseline MTI system provided by NLM. This is despite the fact that the MTI system itself has been improved by incorporating ideas proposed by last year’s winning systems. The end of the second challenge marks also the end of the financial support for BioASQ, by the European Commission. We would like to take this opportunity to thank the EC for supporting our vision. The main project results (incl. frameworks, datasets and publications) can be found at the project showcase page at
Nevertheless, the BioASQ challenge will continue with its third round BioASQ3, which will start in February 2015. Stay tuned!

About BioASQ

The BioASQ team combines researchers with complementary expertise from 6 organisations in 3 countries: the Greek National Center for Scientific Research “Demokritos” (coordinator), participating with its Institutes of ‘Informatics & Telecommunications’ and ‘Biosciences & Applications’, the German IT company Transinsight GmbH, the French University Joseph Fourier, the German research Group for Agile Knowledge Engineering and Semantic Web at the University of Leipzig, the French University Pierre et Marie Curie‐Paris 6 and the Department of Informatics of the Athens University of Economics and Business in Greece (visit the BioASQ project partners page). Moreover, biomedical experts from several countries assist in the creation of the evaluation data and a number of key players in the industry and academia from around the world participate in the advisory board of the project.
BioASQ started in October 2012 and was funded for two years by the European Commission as a support action (FP7/2007-2013: Intelligent Information Management, Targeted Competition Framework; grant agreement n° 318652). More information can be found at:
Project Coordinator: George Paliouras (

Posted at 14:29

November 16

Libby Miller: Bitmap to SVG

Creating an SVG from a bitmap is pretty easy in Inkscape. Copy and paste your bitmap into a new Inkscape file, select it, do

path -> trace bitmap

Posted at 15:13

November 13

Orri Erling: LDBC: Making Semantic Publishing Execution Rules

LDBC SPB (Semantic Publishing Benchmark) is based on the BBC Linked Data use case. Thus the data modeling and transaction mix reflect the BBC's actual utilization of RDF. But a benchmark is not only a condensation of current best practice. The BBC Linked Data is deployed on Ontotext GraphDB (formerly known as OWLIM).

So, in SPB we wanted to address substantially more complex queries than the lookups than the BBC linked data deployment primarily serves. Diverse dataset summaries, timelines, and faceted search qualified by keywords and/or geography, are examples of online user experience that SPB needs to cover.

SPB is not an analytical workload, per se, but we still find that the queries fall broadly in two categories:

  • Some queries are centered on a particular search or entity. The data touched by the query size does not grow at the same rate as the dataset.
  • Some queries cover whole cross sections of the dataset, e.g., find the most popular tags across the whole database.
These different classes of questions need to be separated in a metric, otherwise the short lookup dominates at small scales, and the large query at large scales.

Another guiding factor of SPB was the BBC's and others' express wish to cover operational aspects such as online backups, replication, and fail-over in a benchmark. True, most online installations have to deal with these, yet these things are as good as absent from present benchmark practice. We will look at these aspects in a different article; for now, I will just discuss the matter of workload mix and metric.

Normally, the lookup and analytics workloads are divided into different benchmarks. Here, we will try something different. There are three things the benchmark does:

  • Updates - These sometimes insert a graph, sometimes delete and re-insert the same graph, sometimes just delete a graph. These are logarithmic to data size.

  • Short queries - These are lookups that most often touch on recent data and can drive page impressions. These are roughly logarithmic to data scale.

  • Analytics - These cover a large fraction of the dataset and are roughly linear to data size.

A test sponsor can decide on the query mix within certain bounds. A qualifying run must sustain a minimum, scale-dependent update throughput and must execute a scale-dependent number of analytical query mixes, or run for a scale-dependent duration. The minimum update rate, the minimum number of analytics mixes and the minimum duration all grow logarithmically to data size.

Within these limits, the test sponsor can decide how to mix the workloads. Publishing several results emphasizing different aspects is also possible. A given system may be especially good at one aspect, leading the test sponsor to accentuate this.

The benchmark has been developed and tested at small scales, between 50 and 150M triples. Next we need to see how it actually scales. There we expect to see how the two query sets behave differently. One effect that we see right away when loading data is that creating the full text index on the literals is in fact the longest running part. For a SF 32 ( 1.6 billion triples) SPB database we have the following space consumption figures:

  • 46,886 MB of RDF literal text
  • 23,924 MB of full text index for RDF literals
  • 23,598 MB of URI strings
  • 21,981 MB of quads, stored column-wise with default index scheme

Clearly, applying column-wise compression to the strings is the best move for increasing scalability. The literals are individually short, so literal per literal compression will do little or nothing but applying this by the column is known to get a 2x size reduction with Google Snappy.

The full text index does not get much from column store techniques, as it already consists of words followed by space efficient lists of word positions. The above numbers are measured with Virtuoso column store, with quads column-wise and the rest row-wise. Each number includes the table(s) and any extra indices associated to them.

Let's now look at a full run at unit scale, i.e., 50M triples.

The run rules stipulate a minimum of 7 updates per second. The updates are comparatively fast, so we set the update rate to 70 updates per second. This is seen not to take too much CPU. We run 2 threads of updates, 20 of short queries, and 2 of long queries. The minimum run time for the unit scale is 10 minutes, so we do 10 analytical mixes, as this is expected to take a little over 10 minutes. The run stops by itself when the last of the analytical mixes finishes.

The interactive driver reports:

Seconds run : 2,144
        2 agents

        68,164 inserts (avg :   46  ms, min :    5  ms, max :   3002  ms)
         8,440 updates (avg :   72  ms, min :   15  ms, max :   2471  ms)
         8,539 deletes (avg :   37  ms, min :    4  ms, max :   2531  ms)

        85,143 operations (68,164 CW Inserts   (98 errors), 
                            8,440 CW Updates   ( 0 errors), 
                            8,539 CW Deletions ( 0 errors))
        39.7122 average operations per second

        20 agents

        4120  Q1   queries (avg :    789  ms, min :   197  ms, max :   6,767   ms, 0 errors)
        4121  Q2   queries (avg :     85  ms, min :    26  ms, max :   3,058   ms, 0 errors)
        4124  Q3   queries (avg :     67  ms, min :     5  ms, max :   3,031   ms, 0 errors)
        4118  Q5   queries (avg :    354  ms, min :     3  ms, max :   8,172   ms, 0 errors)
        4117  Q8   queries (avg :    975  ms, min :    25  ms, max :   7,368   ms, 0 errors)
        4119  Q11  queries (avg :    221  ms, min :    75  ms, max :   3,129   ms, 0 errors)
        4122  Q12  queries (avg :    131  ms, min :    45  ms, max :   1,130   ms, 0 errors)
        4115  Q17  queries (avg :  5,321  ms, min :    35  ms, max :  13,144   ms, 0 errors)
        4119  Q18  queries (avg :    987  ms, min :   138  ms, max :   6,738   ms, 0 errors)
        4121  Q24  queries (avg :    917  ms, min :    33  ms, max :   3,653   ms, 0 errors)
        4122  Q25  queries (avg :    451  ms, min :    70  ms, max :   3,695   ms, 0 errors)

        22.5239 average queries per second. 
        Pool 0, queries [ Q1 Q2 Q3 Q5 Q8 Q11 Q12 Q17 Q18 Q24 Q25 ]

        45,318 total retrieval queries (0 timed-out)
        22.5239 average queries per second

The analytical driver reports:

        2 agents

        14    Q4   queries (avg :   9,984  ms, min :   4,832  ms, max :   17,957  ms, 0 errors)
        12    Q6   queries (avg :   4,173  ms, min :      46  ms, max :    7,843  ms, 0 errors)
        13    Q7   queries (avg :   1,855  ms, min :   1,295  ms, max :    2,415  ms, 0 errors)
        13    Q9   queries (avg :     561  ms, min :     446  ms, max :      662  ms, 0 errors)
        14    Q10  queries (avg :   2,641  ms, min :   1,652  ms, max :    4,238  ms, 0 errors)
        12    Q13  queries (avg :     595  ms, min :     373  ms, max :    1,167  ms, 0 errors)
        12    Q14  queries (avg :  65,362  ms, min :   6,127  ms, max :  136,346  ms, 2 errors)
        13    Q15  queries (avg :  45,737  ms, min :  12,698  ms, max :   59,935  ms, 0 errors)
        13    Q16  queries (avg :  30,939  ms, min :  10,224  ms, max :   38,161  ms, 0 errors)
        13    Q19  queries (avg :     310  ms, min :      26  ms, max :    1,733  ms, 0 errors)
        12    Q20  queries (avg :  13,821  ms, min :  11,092  ms, max :   15,435  ms, 0 errors)
        13    Q21  queries (avg :  36,611  ms, min :  14,164  ms, max :   70,954  ms, 0 errors)
        13    Q22  queries (avg :  42,048  ms, min :   7,106  ms, max :   74,296  ms, 0 errors)
        13    Q23  queries (avg :  48,474  ms, min :  18,574  ms, max :   93,656  ms, 0 errors)
        0.0862 average queries per second. 
        Pool 0, queries [ Q4 Q6 Q7 Q9 Q10 Q13 Q14 Q15 Q16 Q19 Q20 Q21 Q22 Q23 ]

        180 total retrieval queries (2 timed-out)
        0.0862 average queries per second

The metric would be 22.52 qi/s , 310 qa/h, 39.7 u/s @ 50Mt (SF 1)

The SUT is dual Xeon E5-2630, all in memory. The platform utilization is steadily above 2000% CPU (over 20/24 hardware threads busy on the DBMS). The DBMS is Virtuoso Open Source (v7fasttrack at, feature/analytics branch).

The minimum update rate of 7/s was sustained, but fell short of the target of 70/s. In this run, most demand was put on the interactive queries. Different thread allocations would give different ratios of the metric components. The analytics mix, for example, is about 3x faster without other concurrent activity.

Is this good or bad? I would say that this is possible but better can certainly be accomplished.

The initial observation is that Q17 is the worst of the interactive lot. 3x better is easily accomplished by avoiding a basic stupidity. The query does the evil deed of checking for a substring in a URI. This is done in the wrong place and accounts for most of the time. The query is meant to test geo retrieval but ends up doing something quite different. Optimizing this right would by itself almost double the interactive score. There are some timeouts in the analytical run, which as such disqualifies the run. This is not a fully compliant result, but is close enough to give an idea of the dynamics. So we see that the experiment is definitely feasible, is reasonably defined, and that the dynamics seen make sense.

As an initial comment of the workload mix, I'd say that interactive should have a few more very short point-lookups, to stress compilation times and give a higher absolute score of queries per second.

Adjustments to the mix will depend on what we find out about scaling. As with SNB, it is likely that the workload will shift a little so this result might not be comparable with future ones.

In the next SPB article, we will look closer at performance dynamics and choke points and will have an initial impression on scaling the workload.

Posted at 21:19

Copyright of the postings is owned by the original blog authors. Contact us.