Planet RDF

It's triples all the way down

May 03

AKSW Group - University of Leipzig: AKSW Colloquium, 09.05.2016: Hebrew MMoOn inventory, federated SPARQL query processing

In this week’s colloquium Bettina Klimek will give a practice talk of the paper ‘Creating Linked Data Morphological Language Resources with MMoOn – The Hebrew Morpheme Inventory‘, which she will present at the LREC conference 2016, 23-28 May 2016, Slovenia, Portorož.

Abstract

The development of standard models for describing general lexical resources has led to the emergence of numerous lexical datasets of various languages in the Semantic Web. However, there are no models that describe the domain of morphology in a similar manner. As a result, there are hardly any language resources of morphemic data available in RDF to date. This paper presents the creation of the Hebrew Morpheme Inventory from a manually compiled tabular dataset comprising around 52.000 entries. It is an ongoing effort of representing the lexemes, word-forms and morphologigal patterns together with their underlying relations based on the newly created Multilingual Morpheme Ontology (MMoOn). It will be shown how segmented Hebrew language data can be granularly described in a Linked Data format, thus, serving as an exemplary case for creating morpheme inventories of any inflectional language with MMoOn. The resulting dataset is described a) according to the structure of the underlying data format, b) with respect to the Hebrew language characteristic of building word-forms directly from roots, c) by exemplifying how inflectional information is realized and d) with regard to its enrichment with external links to sense resources.

 

As a second talk, Muhammad Saleem will present his thesis titled “Efficient Source Selection For SPARQL Endpoint Federation” . This thesis addresses two key areas of federated SPARQL query processing: (1) efficient source selection, and (2) comprehensive SPARQL benchmarks to test and ranked federated SPARQL engines as well as triple stores.

Posted at 11:57

Ebiquity research group UMBC: chmod 000 Freebase

rip freebase

He’s dead, Jim.

Google recently shut down the query interface to Freebase. All that is left of this innovative service is the ability to download a few final data dumps.

Freebase was launched nine years ago by Metaweb as an online source of structured data collected from Wikipedia and many other sources, including individual, user-submitted uploads and edits. Metaweb was acquired by Google in July  2010 and Freebase subsequently grew to have more than 2.4 billion facts about 44 million subjects. In December 2014, Google announced that it was closing Freebase and four months later it became read-only. Sometime this week the query interface was shut down.

I’ve enjoyed using Freebase in various projects in the past two years and found that it complemented DBpedia in many ways. Although its native semantics differed from that of RDF and OWL, it was close enough to allow all of Freebase to be exported as RDF.  Its schema was larger than DBpedia’s and the data tended to be a bit cleaner.

Google generously  decided to donate the data to the Wikidata project, which began migrating Freebase’s data to Wikidata in 2015.  The Freebase data also lives on as part of Google’s Knowledge Graph.  Google recently allowed very limited querying of its knowledge graph and my limited experimenting with it suggests that has Freebase data at its core.

Posted at 01:22

May 02

Dublin Core Metadata Initiative: DC-2016 Paper Submission Deadline Extended

2016-05-02, Upon request, DCMI is extending the submission deadline for papers, project reports, and posters for the DC-2016 Technical Program from 13 May to 27 May 2016. The Call for Participation can be found on the conference website at http://dcevents.dublincore.org/index.php/IntConf/dc-2016/schedConf/cfp.

Posted at 23:59

Dublin Core Metadata Initiative: Elsevier's Bradley Allen to deliver DC-2016 keynote

2016-05-02, Bradley P. Allen, Chief Architect at Elsevier, will deliver the keynote address at DC-2016 in Copenhagen, Denmark. Brad has been a serial entrepreneur who has built and led teams in the design, development, launch and operation of innovative Web businesses. He works at the nexus of information retrieval, linked data and machine learning technologies, and is two for three on successful startup exits. Brad leads the Architecture group within Elsevier's technology organization, focusing on aligning technology vision and roadmap with corporate strategy, helping core development teams build and evolve products and infrastructure, and guiding Elsevier Labs' collaborative research into the future of scientific and medical publishing. For more information about the conference, visit http://dcevents.dublincore.org/index.php/IntConf/dc-2016/schedConf/ and subscribe to receive notifications. Conference registration will open June 1.

Posted at 23:59

Dublin Core Metadata Initiative: DCMI Webinar: Modeling and Publishing of Controlled Vocabularies for the UNESKOS Project [Presented in Spanish]

2016-05-02, DCMI/ASIS&T Webinar: This webinar with Juan Antonio Pastor Sánchez, University of Murcia, presents the modeling and publishing process of the vocabularies for the UNESKOS project by applying Semantic Web technologies. More specifically, the vocabularies represented are the UNESCO Thesaurus and the Nomenclature for fields of Science and Technology. Both vocabularies are published as RDF datasets with a structure that allows its query and reuse according to the principles of Linked Open Data. The webinar will demonstrate the application of ISO-25964 standard to represent the UNESCO thesaurus using SKOS and the ISO-THES ontology. Technological solutions used for the project will also be discussed. For more information about the webinar and to register, visit http://dublincore.org/resources/training/#2016sanchez.

Se presentan los procesos de modelado y publicación de los vocabularios del proyecto UNESKOS aplicando tecnologías de la Web Semántica. Más específicamente, los vocabularios representados son el Tesauro de la UNESCO y la Nomenclatura de Ciencia y Tecnología. Ambos vocabularios están publicados como conjuntos de datos RDF con una estructura para facilitar su consulta y reutilización según los principios Linked Open Data. También se muestra como se ha aplicado la norma ISO-25964 para representar el tesauro de la UNESCO utilizando conjuntamente SKOS y la ontología ISO-THES. Asímismo se analizarán las soluciones tecnológicas empleadas para el proceso de publicación y consulta de ambos vocabularios. Registro: http://dublincore.org/resources/training/#2016sanchez.

Posted at 23:59

May 01

Ebiquity research group UMBC: Representing and Reasoning with Temporal Properties/Relations in OWL/RDF

Representing and Reasoning with Temporal
Properties/Relations in OWL/RDF

Clare Grasso

10:30-11:30 Monday, 2 May 2016, ITE346

OWL ontologies offer the means for modeling real-world domains by representing their high-level concepts, properties and interrelationships. These concepts and their properties are connected by means of binary relations. However, this assumes that the model of the domain is either a set of static objects and relationships that do not change over time, or a snapshot of these objects at a particular point in time. In general, relationships between objects that change over time (dynamic properties) are not binary relations, since they involve a temporal interval in addition to the object and the subject. Representing and querying information evolving in time requires careful consideration of how to use OWL constructs to model dynamic relationships and how the semantics and reasoning capabilities within that architecture are affected.

Posted at 21:13

April 30

Leigh Dodds: 101100

Today I am 101100.

That’s

Posted at 08:10

Leigh Dodds: 101100

Today I am 101100.

That’s

Posted at 08:10

April 25

Leigh Dodds: Data marketplaces, we hardly knew ye

I’m on a panel at the ODI lunchtime lecture this week, where I’m hoping to help answer the question of “

Posted at 16:21

Leigh Dodds: Data marketplaces, we hardly knew ye

I’m on a panel at the ODI lunchtime lecture this week, where I’m hoping to help answer the question of “

Posted at 16:21

April 23

Bob DuCharme: Playing with a proximity beacon

Nine-dollar devices send URLs to your phone over Bluetooth.

Posted at 13:30

April 22

AKSW Group - University of Leipzig: AKSW Colloquium, 25.04.2016, DISPONTE, Workbench for Big Data Dev

In this colloquium, Frank Nietzsche will present his master thesis titled “Game Theory- distributed solving”

Game theory analyzes the behavior of individuals in complex situations. One popular game in Europe and North America with such a complex situation is Skat. For the analysis of the game, the counterfactual regret minimization algorithm (CFR algorithm) was applied. Unfortunately, there is no guarantee that the algorithm works in three-person games. In general, it is difficult to solve three-person games. In addition, the algorithm calculates only a epsilon-Nash equilibrium. But for Skat, the Perfect Bayesian equilibrium would be a better solution. In fact, the Perfect Bayesian equilibrium is a subset of the Nash equilibrium. This raises the question of whether a Perfect Bayesian equilibrium can be calculated using the CFR algorithm. The analysis of this problem will be the last part of the presentation.

The second talk of the colloquium,  Dr. Michael Martin will  announce the student thesis on the AKSW website.

 

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 14:05

April 19

Ebiquity research group UMBC: Context-Sensitive Policy Based Security in Internet of Things

Prajit Kumar Das, Sandeep Nair, Nitin Kumar Sharma, Anupam Joshi, Karuna Pande Joshi, and Tim Finin, Context-Sensitive Policy Based Security in Internet of Things, 1st IEEE Workshop on Smart Service Systems, co-located with IEEE Int. Conf. on Smart Computing, St. Louis, 18 May 2016.

According to recent media reports, there has been a surge in the number of devices that are being connected to the Internet. The Internet of Things (IoT), also referred to as Cyber-Physical Systems, is a collection of physical entities with computational and communication capabilities. The storage and computing power of these devices is often limited and their designs currently focus on ensuring functionality and largely ignore other requirements, including security and privacy concerns. We present the design of a framework that allows IoT devices to capture, represent, reason with, and enforce information sharing policies. We use Semantic Web technologies to represent the policies, the information to be shared or protected, and the IoT device context. We discuss use-cases where our design will help in creating an “intelligent” IoT device and ensuring data security and privacy using context-sensitive information sharing policies.

Posted at 04:13

April 18

AKSW Group - University of Leipzig: AKSW Colloquium, 18.04.2016, DISPONTE, Workbench for Big Data Dev

In this week’s Colloquium, today 18th of April at 3 PM, Patrick Westphal will present the paper ‘Probabilistic Description Logics under the Distribution Semantics‘ by Riguzzi et. al.

Abstract

Representing uncertain information is crucial for modeling real world domains. In this paper we present a technique for the integration of probabilistic information in Description Logics (DLs) that is based on the distribution semantics for probabilistic logic programs. In the resulting approach, that we called DISPONTE, the axioms of a probabilistic knowledge base (KB) can be annotated with a real number between 0 and 1. A probabilistic knowledge base then defines a probability distribution over regular KBs called worlds and the probability of a given query can be obtained from the joint distribution of the worlds and the query by marginalization. We present the algorithm BUNDLE for computing the probability of queries from DISPONTE KBs. The algorithm exploits an underlying DL reasoner, such as Pellet, that is able to return explanations for queries. The explanations are encoded in a Binary Decision Diagram from which the probability of the query is computed. The experimentation of BUNDLE shows that it can handle probabilistic KBs of realistic size.

The second talk of the colloquium will be Spark/HDFS Big Data Workbench, which enables developers to easily setup HDFS/Spark cluster and run Spark jobs over it (presented by Ivan Ermilov).

Posted at 08:17

April 11

AKSW Group - University of Leipzig: AKSW Colloquium, 11.04.2016, METEOR with DBnary

Depiction of Diego MoussallemIn this week’s Colloquium, today 11th of April at 3 PM, Diego Moussallem will present the paper by Zied Elloumi et al. titled “METEOR for Multiple Target Languages using DBnary.” [PDF].

Abstract

This paper proposes an extension of METEOR, a well-known MT evaluation metric, for multiple target languages using an in-house lexical resource called DBnary (an extraction from Wiktionary provided to the community as a Multilingual Lexical Linked Open Data). Today, the use of the synonymy module of METEOR is only exploited when English is the target language (use of WordNet). A synonymy module using DBnary would allow its use for the 21 languages (covered up to now) as target languages. The code of this new instance of METEOR, adapted to several target languages, is provided to the community. We also show that our DBnary augmented METEOR increases the correlation with human judgements on the WMT 2013 and 2014 metrics dataset for English-to-(French, Russian, German, Spanish) language pairs.

Posted at 07:07

April 04

AKSW Group - University of Leipzig:

Depiction of Lorenz BühmannIn this week’s Colloquium, today 4th of April at 3 PM, Lorenz Bühmann will present the paper by Galárraga et al. titled “AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases.” [PDF].

Abstract

Recent advances in information extraction have led to huge knowledge bases (KBs), which capture knowledge in a machine-readable format. Inductive Logic Programming (ILP) can be used to mine logical rules from the KB. These rules can help deduce and add missing knowledge to the KB. While ILP is a mature field, mining logical rules from KBs is different in two aspects: First, current rule mining systems are easily overwhelmed by the amount of data (state-of-the art systems cannot even run on today’s KBs). Second, ILP usually requires counterexamples. KBs, however, implement the open world assumption (OWA), meaning that absent data cannot be used as counterexamples. In this paper, we develop a rule mining model that is explicitly tailored to support the OWA scenario. It is inspired by association rule mining and introduces a novel measure for confidence. Our extensive experiments show that our approach outperforms state-of-the-art approaches in terms of precision and coverage. Furthermore, our system, AMIE, mines rules orders of magnitude faster than state-of-the-art approaches.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 11:51

AKSW Group - University of Leipzig: AKSW Colloquium, 04.04.2016, AMIE + Structured Feedback

Depiction of Lorenz BühmannIn this week’s Colloquium, today 4th of April at 3 PM, Lorenz Bühmann will present the paper by Galárraga et al. titled “AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases.” [PDF].

Abstract

Recent advances in information extraction have led to huge knowledge bases (KBs), which capture knowledge in a machine-readable format. Inductive Logic Programming (ILP) can be used to mine logical rules from the KB. These rules can help deduce and add missing knowledge to the KB. While ILP is a mature field, mining logical rules from KBs is different in two aspects: First, current rule mining systems are easily overwhelmed by the amount of data (state-of-the art systems cannot even run on today’s KBs). Second, ILP usually requires counterexamples. KBs, however, implement the open world assumption (OWA), meaning that absent data cannot be used as counterexamples. In this paper, we develop a rule mining model that is explicitly tailored to support the OWA scenario. It is inspired by association rule mining and introduces a novel measure for confidence. Our extensive experiments show that our approach outperforms state-of-the-art approaches in terms of precision and coverage. Furthermore, our system, AMIE, mines rules orders of magnitude faster than state-of-the-art approaches.

Depiction of Natanael Arndt Subsequently Natanael Arndt will practice the presentation of his paper “Structured Feedback: A Distributed Protocol for Feedback and Patches on the Web of Data” (Natanael Arndt, Kurt Junghanns, Roy Meissner, Philipp Frischmuth, Norman Radtke, Marvin Frommhold and Michael Martin) [PDF] which is accepted for presentation at the WWW2016 workshop: Linked Data on the Web (LDOW2016) in Montréal.

Abstract

The World Wide Web is an infrastructure to publish and retrieve information through web resources. It evolved from a static Web 1.0 to a multimodal and interactive communication and information space which is used to collaboratively contribute and discuss web resources, which is better known as Web 2.0. The evolution into a Semantic Web (Web 3.0) proceeds. One of its remarkable advantages is the decentralized and interlinked data composition. Hence, in contrast to its data distribution, workflows and technologies for decentralized collaborative contribution are missing. In this paper we propose the Structured Feedback protocol as an interactive addition to the Web of Data. It offers support for users to contribute to the evolution of web resources, by providing structured data artifacts as patches for web resources, as well as simple plain text comments. Based on this approach it enables crowd-supported quality assessment and web data cleansing processes in an ad-hoc fashion most web users are familiar with.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 11:51

April 03

Ebiquity research group UMBC: Policies For Oblivious Cloud Storage Using Semantic Web Technologies

Policies For Oblivious Cloud Storage
Using Semantic Web Technologies

Vaishali Narkhede
10:30am, Monday, 4 April 2016, ITE 346, UMBC

Consumers want to ensure that their enterprise data is stored securely and obliviously on the cloud, such that the data objects or their access patterns are not revealed to anyone, including the cloud provider, in the public cloud environment. We have created a detailed ontology describing the oblivious cloud storage models and role based access controls that should be in place to manage this risk. We have also implemented the ObliviCloudManager application that allows users to manage their cloud data using oblivious data structures. This application uses role based access control model and collection based document management to store and retrieve data efficiently. Cloud consumers can use our system to define policies for storing data obliviously and manage storage on untrusted cloud platforms, even if they are not familiar with the underlying technology and concepts of the oblivious data structure.

Posted at 14:47

April 02

Leigh Dodds: On accessibility of data

My third open data “parable”. You can read the

Posted at 10:46

Leigh Dodds: On accessibility of data

My third open data “parable”. You can read the

Posted at 10:46

April 01

W3C Read Write Web Community Group: Read Write Web — Q1 Summary — 2016

Summary

The start of this year has seen a much the discussions on read write technology, move from the mailing list, to specific repos, issue trackers, and, particularly, gitter chatrooms.  Most of the work that I have noticed has been focused around the Solid standard.

An interesting new Linked Data project called, GIESER, kicked off in March, described as, “an open cloud-based platform for integrating geospatial data with sensor data from cyberphysical systems based on semantic and Big Data technologies.”  Openlink also announced the release of their JavaScript based RDF Editor.

Discussion in the group ticked up slightly, but most of work has been focused on implementations of servers, libraries and apps.

Communications and Outreach

The Qatar Research and Computing institute (QCRI) was paid a visit and continue to support the development of read and write standards on the web, particularly crosscloud and solid and their own app platform, meccano.

There was a successful hack day at MIT, and new interest in read write apps, with also a short tutorial used to show attendees how to write a pastebin app.  For those interested in hacking on read write apps, please join us on gitter every Friday for a coding session, where we try to come up with new and interesting ideas.

 

Community Group

We welcome Dmitri Zagidulin to the group, who has started working full time with the team at MIT on Solid (and doing an amazing job!).  On the mailing list, there was some discussion based around “Solid Cookies“, and about connecting two LDP servers together.

solid

Applications

The Solid specification continues to improve, in order to cater for a growing set of applications.  It has now been organized into a number of smaller self-contained specifications.  Some work has been done on authentication, and discovery methods have been documented for inbox, storage and application type registries.

On the server side we have seen a lot of work go into maturing the javascript solid server, ldnode.  And on the client library side solid.js has also added new features and documentation.   In addition, a tutorial has been added for rdflib.js

Some new apps have been added, or started, mostly still in early stages.  Two of the apps I created during hack days are, 2048, the popular puzzle game, and a simple markdown editor that saves files to Solid storage.  Briefly, Zagel is the start of a chat app, work has begun on a Solid welcome app and a Solid Signup system has been created.  More apps have been created in the form of, contactorator a contacts app, simple slideshow integrated into tabulator, errol which is notifications for Solid inboxes, midichlorian a tool for fetching files from Solid servers.

kuzzle

Last but not Least…

Interesting food for thought.  Kuzzle describes itself as an :

open-source back-end solution for various applications. It combines a high level API, a database, a real-time engine, subscription and notification mechanisms as well as some advanced search features. The API is accessible through several standard protocols.

Posted at 13:47

March 30

Semantic Web Company (Austria): Insights into Nature’s Data Publishing Portal

In recent years, Nature has adopted linked data technologies on a broader scale. Andreas Blumauer was intrigued to discover more about the strategy and technologies behind. He had the opportunity to talk with Michele Pasin and Tony Hammond who are the architects of Nature’s data publishing portal.

 

Semantic Puzzle: Nature’s data publishing portal is one of the most renowned ones in the linked data community. Could you talk a bit about its history? Why was this project initiated and who have been the brains behind it since then?

Michele PasinMichele Pasin: We have been involved with semantic technologies at Macmillan since 2010. At the time it was primarily my colleague Tony Hammond who saw the potential of these technologies for metadata management and data sharing. Tony set up the data.nature.com portal in April 2012 (and expanded in July 2012), in the context of a broader company initiative aimed at moving towards a ‘digital first’ publication workflow.

The data.nature.com platform was essentially a public RDF output of some of the metadata embedded in our XML articles archive. This included a SPARQL endpoint for data about articles published by NPG from 1845 through to the present day. Additionally the datasets include NPG product and subject ontologies. These datasets are available under a Creative Commons Zero waiver.

The data.nature.com platform was only for external use though, so it was essentially detached from the products end users would see on nature.com. Still, it allowed us to mature a better understanding of how to make use of these tools within our existing technology stack. It is important to remember that in the years the company has been investing a considerable amount of resources on an XML-centered architecture, so finding a solution that could leverage the legacy infrastructure with these new technologies has always been a fundamental requirement for us.

More recently, in 2013 we started working on a new hybrid linked data platform, this time with a much stronger focus on supporting our internal applications. That’s pretty much around the time I joined the company. In essence, we made the point that in order to achieve stronger interoperability levels within our systems we had to create an architecture where RDF is core to the publishing workflow as much as XML is. (By the data.nature.comway if you are interested in the details of this, we presented a paper about this at ISWC 2014.) As part of this phase, we also built a more sophisticated set of ontologies used for encoding the semantics of our data, together with improved versions of the datasets previously released.

The nature.com ontologies portal came out in early 2015 as the result of this second phase of work. On the portal one can find extensive documentation about all of our models, as well as periodical downloads in various RDF formats. The idea is to make it easier for people – both within the enterprise and externally – to access, understand and reuse our linked data.

At the same time, since user engagement level on data.nature.com was not as good as expected, we decided to terminate that service. In the future, we plan to keep releasing periodic snapshots of the datasets and the ontologies we are using, but not a public endpoint in the immediate future.

Semantic Puzzle: As one of your visions you’re stating that your “primary reason for adopting linked data technologies is quite simply better metadata management”. How did you deal with metadata before you started with this transition? What has changed since then, also from a business point of view?

Michele Pasin:  Our pre–linked data approach to dealing with metadata and enterprise taxonomies is probably not unheard of, especially within similar sized companies: a vast array of custom-made solutions, varying from simple word documents sitting in someone’s computer, to Excel spreadsheets or, in the best of cases, database tables in one of our production systems. Of course, there were also a number of ad-hoc applications/scripts responsible for the reading/updating of these metadata sources, as often they would be critical to one or more system in the publishing workflow (e.g. think of the journal’s master list, or the list of approved article-types).

linked-data-aims-alt

It is worth stressing that the lack of a unified technical infrastructure aspect was a key problem, of course, but not the only one. In fact I would argue that addressing the lack of a centralized data governance approach was even more crucial. For example, most often you would not know who/which department was in charge of a particular controlled vocabulary or metadata specification. In some cases, no single source of truth was actually available, because different people/groups were in charge of specific aspects of a single specification (due to their differing interests).  

Hence you need a certain amount of management buy-in to implement such a wide-ranging approach to metadata; moving to a single platform and technical solution based on linked data was fundamental, but an equally fundamental organizational change was also needed. Even more so, if one considers that this is not a time-boxed project but rather an ongoing process, an approach which pays off only as much as you can guarantee that as new products and services get launched, they all subscribe to the same metadata management ‘philosophy’.

Semantic Puzzle: One of the promises of Linked Data is that by “using a common data model and a common naming architecture, users can begin to realize the benefits and efficiencies of web scaling.” Could you describe a bit more in detail into which eco-system your content workflows and publishing processes are embedded (internally and externally) and why the use of standards is important for this?

Tony Hammond: We operate with an XML-based workflow for documents where we receive XML from our suppliers and store that within an XML database (MarkLogic). Increasingly we are beginning to move towards a dynamic publishing solution from that database. We are also using the database to provide a full-text search across all our content. In the past we had various workflows and a small number of different DTDs to reconcile, although we are currently converging on a single DTD. To facilitate search across this mixed XML content we abstracted certain key metadata elements into a common header. This was managed organically and was somewhat unpredictable both in terms of content model and naming.

By moving to a linked data solution for managing our metadata which is based on a single, core ontology we bypass our normalized metadata header and start to build on a new simpler data model (triples) with a common naming architecture. In effect, we have moved from a nominally normalized metadata to a super-normalized metadata which uses web standards for data (URI, RDF, OWL).

Semantic Puzzle: Your contents are also multimedia (image, video, …). How do you embed this non-textual contents into your linked data ecosystem? Which gateways, tools and connectors are used to bridge your linked data environment with multimedia?

Tony Hammond: Some years ago we embarked on a new initiative internally to streamline our production workflows. Our brief was to support a distributed content warehouse where digital assets would be stored in various locations. The idea was to abstract out our storage concerns and to maintain pointers to the various storage subsystems along with other physical characteristics required for accessing that storage.

In practice our main content was housed as XML documents within a MarkLogic XML database and associated media assets (e.g. images) were primarily stored on the filesystem with some secondary asset types (e.g. videos) being sourced from cloud services.

To relate a physical asset (e.g. an XML document, or a JEPG file) to the underlying concept (e.g. an article, or an image) we made use of XMP packets (a technology developed by Adobe Systems and standardized through ISO) which as simple RDF/XML descriptions allowed us to capture metadata about physical characteristics and to relate those properties to our data model. An XMP packet is a description of one physical resource and could be simply linked to the related conceptual resource.

We started this project with an RDF triplestore for maintaining and querying our metadata, but over time we moved towards a hybrid technology where our semantic descriptions were buried within XML documents as RDF/XML descriptions and could be queried within an XML context using XQuery to deliver a highly performant JSON API. These semantic descriptions enclosed minimal XMP documents which described the storage entities.

Semantic Puzzle: Nature links its datasets to external ones, e.g. to DBpedia or MeSH. Who exactly is benefiting from this and how?

Michele Pasin:  I would say that there are at least two reasons why we did this. First, we wanted to maximize the potential reuse of our datasets and models within the semantic web. Building owl:sameAs relationships to other vocabularies, or marking up our ontology classes and properties with subclass/subproperty relationships pointing to external vocabularies is a way to be good ‘linked data citizens’. Moreover, this is a deliberate attempt to counterbalance one of our key design principles: minimal commitment to external vocabularies. This approach to data modeling means that we tend to create our own models and define them within our own namespaces, rather than building production-level software against third party ontologies. It is worth pointing out that this is not because we think our ontologies are better – but because we want our data architecture to reflect as closely as possible the ontological commitment of a publishing enterprise with decades of established business practices, naming conventions etc. In other words, we aimed at creating a very cohesive and robust domain model, one which is resilient to external changes but that also supports semantic interoperability by providing a number of links and mappings to other semantic web standards.

Pointing to external vocabularies is a way to be good ‘linked data citizens’

The second reason for creating these links is to enable more innovative discovery services. For example, a nature.com subject page about photosyntesis could surface encyclopedic materials automatically retrieved from DBpedia; or it could provide links to highly cited articles retrieved from PubMed using MeSH mappings. This just scratches the surface of what one could do. The real difficulty is, how to do it in such a way that the overall user experience improves, rather than adding up to the information overload the majority of internet users already have to deal with. So at the moment, while the data people (us) are focusing on building a rich network of entities for our knowledge graph, the UX and front end teams are exploring design and interaction models that truly take advantage of these functionalities. Hopefully we see these activities continue to converge!

Semantic Puzzle: How do you deal with data quality management in general, and how can linked data technologies help to improve it?

Tony Hammond: We can distinguish between two main types of data: documents and ontologies. (And by ontologies we also comprehend thesauri and taxonomies.) Our documents are created by our suppliers using XML and are amenable to some data validations. We use automated DTD validation in our new workflow and by hand DTD validation in the older workflows. We also use Schematron rulesets to validate certain data points but these address only certain elements. We have a couple hundred Schematron rules which implement various business rules and are also synchronized with our ontologies.

PoolPartyOur ontologies, on the other hand, are by their nature more curated datasets. These are mastered as RDF Turtle files and stored within GitHub. These are currently maintained by hand, although we are beginning now to transition some of our taxonomies to the PoolParty taxonomy manager. We have a build process for deploying the ontologies to our XML database where they are combined with our XML documents. During this build process we both validate the RDF as well as running SPIN rules over the datasets which can validate data elements as well as expanding the dataset with new triples from rules-based inferencing.

Semantic Puzzle: For a publisher like Nature it is somehow “natural” that Linked Data is used. How could other industries make use of these principles for information management?

Tony Hammond: The main reason for using linked data is not to do with publishing the data (and indeed many other data models are generally used for data publishing), but with the desire to join one dataset with other datasets – or rather, the data within a dataset to the data within other datasets. It is for this reason that we make use of URIs as common (global) names for data points. Linking data is not just a goal in publishing data but applies equally when consuming data from various sources and integrating over those data sources within an internal environment. Indeed, arguably, the biggest use case for linked data is within private enterprises rather than surfaced on the open web. Once that point is appreciated there is no restriction on any industry in being more disposed to using linked data than any other, and it is used as a means to maximize the data surface that a company operates over.

The biggest use case for linked data is within private enterprises rather than surfaced on the open web

Semantic Puzzle: Where are the limits of Linked Data from your perspective, and do you believe they will ever be exceeded?

Tony Hammond: The limits to using linked data are more to do with top-down vs bottom-up approaches in dealing with data, i.e. linked data vs big data, or data curation vs data crunching. Linked data makes use of global names (URIs), schemas, ontologies. It is highly structured, organized data.

Now, whether it is feasible to bring this level of organization to data at large or whether data crunching will provide the appropriate insights over the data is an open question. Our expectation is that we will still need to use ontologies – and hence linked data – as an organizing principle, or reference, to guide us in processing large datasets and for sharing those data organizations. The question may be how much human curation is required in assembling these ontologies.

Michele Pasin: On a more practical level, I’d say that the biggest problem with linked data is still its rather limited adoption on a large scale. I’m referring in particular to the data publishing and reuse aspect. On this front, we really struggled to get the levels of uptake the business was expecting from us. Consider this: we have been publishing metadata for our entire archive since 2012 (approx. 1.2m documents, resulting in almost half a billion triples). However very few people made use of these data, either in the form of bulk downloads or via the SPARQL API we once hosted (and that was then retired due to low usage). This is in stark contrast with other – arguably less flexible – services we make available, e.g. the OpenSearch APIs, or a JSON REST service, which often see significant traffic.

Last year we gave a paper at the Linked Science workshop (affiliated with ISWC 2015) with the specific intent to address the problem within that community. What seemed to emerge is that possibly this has to do with the same reason why this technology has been so useful to us. RDF is an extremely flexible and powerful model, however, when it comes to data consumption and access, the average user cares more about simplicity than flexibility. Also, outside linked data circles we all know that the standard tech for APIs is JSON and REST, rather than RDF and SPARQL.

Lowering the bar to the adoption of semantic tech

The good news though is that we are seeing more initiatives aimed at bridging these two worlds. One that we are keeping an eye on, for example, is JSON-LD. The way this format hides various RDF complexities behind a familiar JSON structure makes it an ideal candidate for a linked data publishing product with a much wider user base. Which is exactly what we are looking for: lowering the bar to the adoption of semantic tech.

 

About Michele Pasin

Michele PasinMichele Pasin is an information architect and product manager with a focus on enterprise metadata management and semantic technologies.

Michele currently works for Springer Nature, a publishing company resulting from the May 2015 merger of Springer Science+Business Media and Holtzbrinck Publishing Group’s Nature Publishing Group, Palgrave Macmillan, and Macmillan Education.

He has recently taken up the role of product manager for the knowledge graph project, an initiative whose goal is to bring together various preexisting linked data repositories, plus a number of other structured and unstructured data sources, into a unified, highly integrated knowledge discovery platform. Before that, he worked on projects like nature.com’s subject pages (a dynamic section of the website that allow users to navigate content by topic) and the nature.com ontologies portal (a public repository of linked open data).

He holds a PhD in semantic web technologies from the Knowledge Media Institute (The Open University, UK) and advanced degrees in logic and philosophy of language from the University of Venice (Italy). Previously, he was a research associate at King’s College Department of Digital Humanities (London), where he developed on a number of cultural informatics projects such as the People of Medieval Scotland and the Art of Making in AntiquityOnline Portfolio: http://www.michelepasin.org/projects/

Michele Pasin will give a keynote at this year’s SEMANTiCS conference.SEMANTiCS 2016

About Tony Hammond

Tony Hammond is a data architect with a primary focus in the general area of machine-readable description technologies. He has been actively involved in developing industry standards for network identifiers and metadata frameworks. He has had experience working on both sides of the scientific publishing information chain, from international research centres to leading publishing houses. His background is in physics with astrophysics.

Tony currently works for Springer Nature, a publishing company resulting from the May 2015 merger of Springer Science+Business Media and Holtzbrinck Publishing Group’s Nature Publishing Group, Palgrave Macmillan, and Macmillan Education.

Posted at 09:05

March 29

Dublin Core Metadata Initiative: DCMI Webinar: Publishing SKOS concept schemes with Skosmos (2016-04-06)

2016-03-29, With more and more thesauri, classifications, and other knowledge organization systems being published as Linked Data using SKOS, the question arises how best to make them available on the web. While just publishing the Linked Data triples is possible using a number of RDF publishing tools, those tools are not very well suited for SKOS data, because they cannot support term-based searching and lookup. This free webinar presents Skosmos, an open source web-based SKOS vocabulary browser that uses a SPARQL endpoint as its back-end. It can be used by e.g. libraries and archives as a publishing platform for controlled vocabularies such as thesauri, lightweight ontologies, classifications and authority files. The Finnish national thesaurus and ontology service Finto, operated by the National Library of Finland, is built using Skosmos. Osma Suominen, National Library of Finland, will describe what kind of infrastructure is necessary for Skosmos and how to set it up for your own SKOS data. He will also present examples where Skosmos is being used around the world. The webinar is presented in partnership with AIMS: Agricultural Information Management Standards. For information on how to register for this free webinar, go to http://dublincore.org/resources/training/#2016skos2r (space limited).

Posted at 23:59

AKSW Group - University of Leipzig: International Semantic Web Community meets in Leipzig, Sept. 12-15, 2016

logo-semantics-16-blogpost

At the annual SEMANTiCS Conference, experts from academia and industry meet to discuss semantic computing, its benefits and future business implications. Since 2005, SEMANTiCS has been attracting the opinion leaders in semantic web and big data technology, ranging from information managers and software engineers, to commerce experts and business developers as well as researchers and IT architects, when it comes to defining the future of information technology.

The SEMANTiCS 2016 takes place from September 12th to 15th at the second oldest university of Germany – the Leipzig University. Leipzig University hosts several departments in particular AKSW focused on Linked Data and Semantic Web and is therefore THE European hotspot, when it comes to graph-based technologies and knowledge engineering.

You want to be a part of the SEMANTiCS Conference and are interested to get in touch with the following audiences?

  • IT professionals & IT architects
  • Software developers
  • Knowledge Management Executives
  • Innovation Executives
  • R&D Executives

Calls are open now. Industrial presentation offer a platform to reach a huge network of practicioners and users to get feedback and academic submission are published in the well-known ACM-ICPS series (deadline 21st April, 23% acceptance rate). To submit your contribution, please visit the section calls on our website. To attend the workshops, the tutorials or to enjoy the talks in one of the offered sessions, please visit our registration site.

You want to partner with SEMANTiCS 2016? Then get a sponsor package or become an exhibitor! For more details, please click here.

We are looking forward to meeting you! Come and join us in Leipzig!

To be up-to-date, stay tuned and follow us on facebook, twitter (@SemanticsConf) or visit our website for the latest news.

Posted at 09:40

March 24

Leigh Dodds: A key difference between open data and open source

In “

Posted at 21:35

Leigh Dodds: A key difference between open data and open source

In “

Posted at 21:35

AKSW Group - University of Leipzig: AKSW takes part in BMWi-funded GEISER project

GEISER

The AKSW group is the technical lead of the recently started GEISER (from sensor data towards internet-based geospatial services) project funded by the Federal Ministry for Economic Affairs and Energy (BMWi) under grant agreement number 01MD16014E. The GEISER project will run from March 1st, 2016 to February 28th, 2019.

Many applications of cyberphysical systems rely on an integration of geospatial data and sensor data. In the engineering industry, dynamic mission planning of service technicians and locating suppliers can benefit from such integrated data. Other potential applications include intelligent parking and refueling by finding available parking spots and fuel pumps or charging spots nearby. Sensors of satellite navigation systems in cars and intelligent fuel pumps, connected charging points and industrial machinery generate terabytes of industry-relevant data every day. Combining many data sources is the most promising approach, but this is difficult. Relevant geospatial data is distributed among structured (e.g., sensors), semi-structured (e.g., OpenStreetMap) and unstructured (e.g., Twitter) data sources. Due to the significant volume and variety of data sources, innovative solutions are required for the acquisition of geospatial data, integrating them with sensor data and building intelligent services on top.

The GEISER project aims to design and implement innovative functionality for developing services for transforming, storing, integrating and processing geospatial and sensor data.Here, machine learning approaches will be applied for tasks such as computing topological relations between resources and time-efficient generation of link specifications. The resulting tools will be integrated as microservices in an open cloud-based platform. The AKSW group of Universität Leipzig particularly works on the extraction and integration of geospatial data. We will develop and evaluate scalable methods for analysing, extracting and fusing RDF from various data sources.

Our partners in this project are USU Software AG (Coordinator), Yellow Map, metaphacts GmbH, Frauenhofer IAIS and TomTom.

The project kick-off meeting will take place March, 14th in Karlsruhe at the office of USU Software AG, so stay tuned for futher project updates and follow us on aksw-blog for the latest news.

The project is funded by:

BMWi-logo_englULEi_logo

Posted at 13:03

Leigh Dodds: left-pad and the data commons

Yesterday, Javascript developers around the world were

Posted at 11:32

Leigh Dodds: left-pad and the data commons

Yesterday, Javascript developers around the world were

Posted at 11:32

March 23

Semantic Web Company (Austria): International Semantic Web Community meets in Leipzig, Sept. 12-15, 2016

At the annual SEMANTiCS Conference, experts from academia and industry meet to discuss semantic computing, its benefits and future business implications. Since 2005, SEMANTiCS has been attracting the opinion leaders in semantic web and big data technology, ranging from information managers and software engineers, to commerce experts and business developers as well as researchers and IT architects, when it comes to defining the future of information technology.

logo-semantics-16_0

The SEMANTiCS 2016 takes place from September 12th to 15th at the second oldest university of Germany – the Leipzig University. Leipzig University hosts several departments in particular AKSW focused on Linked Data and Semantic Web and is therefore THE European hotspot, when it comes to graph-based technologies and knowledge engineering.

You want to be a part of the SEMANTiCS Conference and are interested to get in touch with the following audiences?

  • IT professionals & IT architects
  • Software developers
  • Knowledge Management Executives
  • Innovation Executives
  • R&D Executives

Calls are open now. Industrial presentation offer a platform to reach a huge network of practicioners and users to get feedback and academic submission are published in the well-known ACM-ICPS series (deadline 21st April, 23% acceptance rate). To submit your contribution, please visit the section calls on our website. To attend the workshops, the tutorials or to enjoy the talks in one of the offered sessions, please visit our registration site.

You want to partner with SEMANTiCS 2016? Then get a sponsor package or become an exhibitor! For more details, please click here.

To be up-to-date, stay tuned and follow us on facebook, twitter (@SemanticsConf) or visit our website for the latest news.

Posted at 09:27

Copyright of the postings is owned by the original blog authors. Contact us.