Planet RDF

It's triples all the way down

August 28

Bob DuCharme: Converting between MIDI and RDF: readable MIDI and more fun with RDF

Listen to my fun!

Posted at 17:24

August 24

Frederick Giasson: Winnipeg City’s NOW [Data] Portal

The Winnipeg City’s NOW (Neighbourhoods Of Winnipeg) Portal is an initiative to create a complete neighbourhood web portal for its citizens. At the core of the project we have a set of about 47 fully linked, integrated and structured datasets of things of interests to Winnipegers. The focal point of the portal is Winnipeg’s 236 neighbourhoods, which define the main structure of the portal. The portal has six main sections: topics of interests, maps, history, census, images and economic development. The portal is meant to be used by citizens to find things of interest in their neibourhood, to learn their history, to see the images of the things of interest, to find tools to help economic development, etc.

The NOW portal is not new; Structured Dynamics was also its main technical contractor for its first release in 2013. However we just finished to help Winnipeg City’s NOW team to migrate their older NOW portal from OSF 1.x to OSF 3.x and from Drupal 6 to Drupal 7; we also trained them on the new system. Major improvements accompany this upgrade, but the user interface design is essentially the same.

The first thing I will do is to introduce each major section of the portal and I will explain the main features of each. Then I will discuss the new improvements of the portal.

Datasets

A NOW portal user won’t notice any of this, but the main feature of the portal is the data it uses. The portal manages 47 datasets (and growing) of fully structured, integrated and linked datasets of things of interests to Winnipegers. What the portal does is to manage entities. Each kind of entity (swimming pools, parks, places, images, addresses, streets, etc.) are defined with multiple properties and values. Several of the entities reference other entities in other datasets (for example, an assessment parcel from the Assessment Parcels dataset references neighbourhoods entities and property addresses entities from their respective datasets).

The fact that these datasets are fully structured and integrated means that we can leverage these characteristics to create a powerful search experience by enabling filtering of the information on any of the properties, to bias the searches depending where a keyword search match occurs, etc.

Here is the list of all the 47 datasets that currently exists in the portal:

  1. Aboriginal Service Providers
  2. Arenas
  3. Neighbourhoods of Winnipeg City
  4. Streets
  5. Economic Development Images
  6. Recreation & Leisure Images
  7. Neighbourhoods Images
  8. Volunteer Images
  9. Library Images
  10. Parks Images
  11. Census 2006
  12. Census 2001
  13. Winnipeg Internal Websites
  14. Winnipeg External Websites
  15. Heritage Buildings and Resources
  16. NOW Local Content Dataset
  17. Outdoor Swimming Pools
  18. Zoning Parcels
  19. School Divisions
  20. Property Addresses
  21. Wading Pools
  22. Electoral wards of Winnipeg City
  23. Assessment Parcels
  24. Libraries
  25. Community Centres
  26. Police Service Centers
  27. Community Gardens
  28. Leisure Centres
  29. Parks and Open Spaces
  30. Community Committee
  31. Commercial real estates
  32. Sports and Recreation Facilities
  33. Community Characterization Areas
  34. Indoor Swimming Pools
  35. Neighbourhood Clusters
  36. Fire and Paramedic Stations
  37. Bus Stops
  38. Fire and Paramedic Service Images
  39. Animal Services Images
  40. Skateboard Parks
  41. Daycare Nurseries
  42. Indoor Soccer Fields
  43. Schools
  44. Truck Routes
  45. Fire Stations
  46. Paramedic Stations
  47. Spray Parks Pads

Structured Search

The most useful feature of the portal to me is its full-text search engine. It is simple, clean and quite effective. The search engine is configured to try to give the most relevant results a NOW portal user may be searching. For example, it will positively bias some results that comes from some specific datasets, or matches that occurs in specific property values. The goal of this biasing is to improve the quality of the returned results. This is somewhat easy to do since the context of the portal is well known and we can easily boost scoring of search results since everything is fully structured.

Another major gain is that all the search results are fully templated. The search results do not simply return a title and some description for your search results. It does template all the information the system has about the matched results, but also displays the most relevant information to the users in the search results.

For example, if I search for a indoor swimming pool, in most of the cases it may be to call the front desk to get some information about the pool. This is why different key information will be displayed directly in the search results. That way, most of the users won’t even have to click on the result to get the information they were looking for directly in the search results page.

Here is an example of a search for the keywords main street. As you can notice, you are getting different kind of results. Each result is templated to get the core information about these entities. You have the possibility to focus on particular kind of entities, or to filter by their location in specific neighbourhoods.

now--search-1

Templated Search Results

Now let’s see some of the kind of entities that can be searched on the portal and how they are presented to the users.

Here is an example of an assessment parcel that is located in the St. John’s neighbourhood. The address, the value, the type and the location of the parcel on a map is displayed directly into the search results.

Another kind of entity that can be searched are the property addresses. These are located on a map, the value of the parcels and the building and the zoning of the address is displayed. The property is also linked to its assessment parcel entity which can be clicked to get additional information about the parcel.

Another interesting type of entity that can be searched are the streets. What is interesting in this case is that you get the complete outline of the street directly on a map. That way you know where it starts and where it ends and where it is located in the city.

There are more than a thousand geo-localized images of all different things in the city that can be searched. A thumbnail of the image and the location of the thing that appears on the image appears in the search results.

If you were searching for a nursery for your new born child, then you can quickly see the name, location on a map and the phone number of the nursery directly in the search result.

There are just a few examples of the fifty different kind of entities that can appear like this in the search results.

Mapping

The mapping tool is another powerful feature of the portal. You can search like if you were using the full-text search engine (the top search box on the portal) however you will only get the results that can be geo-localized on a map. You can also simply browse entities from a dataset or you can filter entities by their properties/values. You can persist entities you find on the map and save the map for future reference.

In the example below, it shows that someone searched for a street (main street) and then he persisted it on the map. Then he search for other things like nurseries and selected the ones that are near the street he persisted, etc. That way he can visualize the different known entities in the portal on a map to better understand where things are located in the city, what exists near a certain location, within a neighbourhood, etc.

now--map

Census Analysis

Census information is vital to the good development of a city. They are necessary to understand the trends of a sector, who populates it, etc., such that the city and other organizations may properly plan their projects to have has much impact as possible.

These are some of the reason why one of the main section of the site is dedicated to census data. Key census indicators have been configured in the portal. Then users can select different kind of regions (neighbourhood clusters, community areas and electoral wards) to get the numbers for each of these indicators. Then they can select multiple of these regions to compare each other. A chart view and a table view is available for presenting the census data.

History, Images & Points of Interest

The City took the time to write the history of each of its neighbourhoods. In additional to that, they hired professional photographs to photograph the points of interests of the city, to geo-localize them and to write a description for each of these photos. Because of this dedication, users of the portal can learn a much about the city in general and the neighbourhood they live in. This is what the History and Image sections of the website are about.

Historic buildings are displayed on a map and they can be browsed from there.

Images of points of interests in the neighbourhood are also located on a map.

Find Your Neighbourhood

Ever wondered in which neighbourhood you live in? No problem, go on the home page, put your address in the Find your Neighbourhood section and you will know it right away. From there you can learn more about your neighbourhood like its history, the points of interest, etc.

Your address will be located on a map, and your neighbourhood will be outlined around it. Not only you will know in which neighbourhood you live, but you will also know where you live within it. From there you can click on the name of the neigbourhood to get to the neighbourhood’s page and start learning more about it like its history, to see photos of points of interest that exists in your neighbourhood, etc.

Browsing Content by Topic

Because all the content of the portal is fully structured, it is easy to browse its content using a well defined topic structure. The city developed its own ontology that is used to help the users browse the content of the portal by browsing topics of interest. In the example below, I clicked the Economic Development node and then the Land use topic. Finally I clicked the Map button to display things that are related to land use: in this case, zoning and assessment parcels are displayed to the user.

This is another way to find meaningful and interesting content from the portal.

Depending on the topic you choose, and the kind of information related to that topic, you may end up with different options like a map, a list of links to documents related to that topic, etc.

Export Content

Now that I made an overview of each of the main features of the portal, let’s go back to the geeky things. The first thing I said about this portal is that at its core, all information it manages is fully structured, integrated and linked data. If you get to the page of an entity, you have the possibility to see the underlying data that exists about it in the system. You simply have to click the Export tab at the top of the entity’s page. Then you will have access to the description of that entity in multiple different formats.

In the future, the City should (or at least I hope will) make the whole set of datasets fully downloadable. Right now you only have access to that information via that export feature per entity. I hope because this NOW portal is fully disconnected from another initiative by the city: data.winnipeg.ca, which uses Socrata. The problem is that barely any of the datasets from NOW are available on data.winnipeg.ca, and the ones that are appearing are the raw ones (semi-structured, un-documented, un-integrated and non-linked) all the normalization work, the integration work, the linkage work done by the NOW team hasn’t been leveraged to really improve the data.winnipeg.ca datasets catalog.

New with the upgrades

Those who are familiar with the NOW portal will notice a few changes. The user interface did not change that much, but multiple little things got improved in the process. I will cover the most notable of these changes.

The major changes that happened are in the backend of the portal. The data management in OSF for Drupal 7 is incompatible with what was available in Drupal 6. The management of the entities became easier, the configuration of OSF networks became a breeze. A revisioning system has been added, the user interface is more intuitive, etc. There is no comparison possible. However, portal users’ won’t notice any of this, since these are all site administrator functions.

The first thing that users will notice is the completely new full-text search engine. The underlying search engine is almost the same, but the presentation is far better. All entity types have gotten their own special template, which are displayed in a special way in the search results. Most of the time results should be much more relevant, filtering is easier and cleaner. The search experience is much better in my view.

The overall site performance is much better since different caching strategies have been put in place in OSF 3.x and OSF for Drupal. This means that most of the features of the portal should react more swiftly.

Now every type of entity managed by the portal is templated: their webpage is templated in specific ways to optimize the information they want to convey to users along with their search result “mini page” when they get returned as the result of a search query.

Multi-linguality is now fully supported by the portal, however not everything is currently templated. However expect a fully translated NOW portal in French in the future.

Creating a Network of Portals

One of the most interesting features that goes with this upgrade is that the NOW portal is now in a position to participate into a network of OSF instances. What does that mean? Well, it means that the NOW portal could create partnerships with other local (regional, national or international) organizations to share datasets (and their maintenance costs).

Are there other organizations that uses this kind of system? Well, there is at least another one right in Winnipeg City: MyPeg.ca, also developed by Structured Dynamics. MyPeg uses RDF to model its information and uses OSF to manage its information. MyPeg is a non-profit organization that uses census (and other indicator) data to do studies on the well being of Winnipegers. The team behind MyPeg.ca are research experts in indicator data. Their indicator datasets (which includes census data) is top notch.

Let’s hypothetize that there would be interest between the two groups to start collaborating. Let’s say that the NOW portal would like to use MyPeg’s census datasets instead of its own since they are more complete, accurate and include a larger number of important indicators. What they basically want is to outsource the creation and maintenance of the census/indicators data to a local, dedicated and highly professional organization. The only things they would need to do is to:

  1. Formalize their relationship by signing a usage agreement
  2. The NOW portal would need to configure the MyPeg.ca OSF network into their OSF for Drupal instance
  3. The NOW portal would need to register the datasets it want to use from MyPeg.ca.

Once these 3 steps are done, taking no more than a couple of minutes, then the system administrators of the NOW portal could start using the MyPeg.ca indicator datasets like they were existing on their own network. (The reverse could also be true for MyPeg.) Everything would be transparent to them. From then on, all the fixes and updates performed by MyPeg.ca to their indicator datasets would immediately appear on the NOW portal and accessible to its users.

This is one possibility to collaborate. Another possibility would be to simply on a routine basis (every month, every 6 months, every year) share the serialized datasets such that the NOW portal re-import the dataset from the files shared by MyPeg.ca. This is also possible since both organizations use the same Ontology to describe the indicator data. This means that no modification is required by the City to take that new information into account, they only have to import and update their local datasets. This is the beauty of ontologies.

Conclusion

The new NOW portal is a great service for citizens of Winnipeg City. It is also a really good example of a web portal that leverages fully structured, integrated and linked data. To me, the NOW portal is a really good example of the features that should go along with a municipal data portal.

Posted at 17:33

August 16

Dublin Core Metadata Initiative: DC-2016 final program published

2016-08-16, DCMI is pleased to announce publication of the final program for DC-2016. The program consists of an array of presentations, lightning talks, papers, project reports, posters, special sessions, and workshops. To review the program, visit the Program Page at http://dcevents.dublincore.org/IntConf/dc-2016/schedConf/program where titles link to abstracts. Registration is open at http://dcevents.dublincore.org/IntConf/index/pages/view/reg16 with early rates available through 2 September 2016. Significant registration savings are available for DCMI members or ASIST members wishing to attend the collocated DC-2016 and ASIST conferences. The ASIST program is available at https://www.asist.org/events/annual-meeting/annual-meeting-2016/program/ and with seminars and workshops at https://www.asist.org/events/annual-meeting/annual-meeting-2016/seminars-and-workshops/.

Posted at 23:59

Dublin Core Metadata Initiative: Mike Lauruhn appointed to DCMI Governing Board<</title>

2016-08-16, DCMI is pleased to announce that Mike Lauruhn has accepted an appointment to the DCMI Governing Board as an Independent Member. Mike is Technology Research Director at Elsevier Labs and has been a longstanding participant in DCMI, serving as a member of the Education and Outreach Committee's "Linked Data for Professional Education (LD4PE) initiative" and as co-Program Chair for the DCMI annual conference in 2010. Before joining Elsevier Labs in 2010, he was a consultant with Taxonomy Strategies LLC working with numerous private companies, nonprofits, and government agencies to help define and implement taxonomies and metadata schemas. He began his library career cataloging at the California Newspaper Project at the Center for Bibliographic Studies & Research at the University of California, Riverside. Mike's three year term will begin at the close of DC-2016 in Copenhagen. For additional information, visit the Governing Board page at http://dublincore.org/about/oversight/#lauruhn.

Posted at 23:59

Semantic Web Company (Austria): Attend and contribute to the SEMANTiCS 2016 in Leipzig

UniLeipzig_Andreas Schmidt_2_1The 12th edition of the SEMANTiCS, which is a well known platform for professionals and researchers who make semantic computing work, will be held in the city of Leipzig from September 12th till 15th. We are proud to announce the final program of the SEMANTiCS conference. The program will cover 6 keynote speakers, 40 industry presentations, 30 scientific paper presentations, 40 poster & demo presentations and a huge number of satellite events. Special talks will given by Thomas Vavra from IDC and Sören Auer, who will feature the LEDS track. On top of that there will be a fishbowl session ‘Knowledge Graphs – A Status Update’ with lightning talks from Hans Uszkoreit (DFKI) and Andreas Blumenauer (SWC). This week, the set of our distinguished keynote speakers has been fixed and we are quite excited to have them at this years’ edition of SEMANTiCS. Please join us to listen to talks from representatives from IBM, Siemens, Springer Nature, Wikidata, International Data Corporation (IDC), Fraunhofer IAIS, Oxford University Press and the Hasso-Plattner-Institut, who will share their latest insights on applications of Semantic technologies with us. To register and be part of the SEMANTiCS 2016 in Leipzig, please go to: http://2016.semantics.cc/registration.

Share your ideas, tools and ontologies, last minute submissions
Meetup: Big Data & Linked Data – The Best of Both Worlds  

On the first eve of the SEMANTiCS conference we will discuss how Big Data & Linked Data technologies could become a perfect match. This meetup gathers experts on Big and Linked Data to discuss the future agenda on research and implementation of a joint technology development.

  • Register (free)

  • If you are interested to present your idea, approach or project which links Semantic technologies with Big Data in an ad-hoc lightning talk, please get in touch with Thomas Thurner (t.thurner@semantic-web.at).

WORKSHOPS/TUTORIALS

This year’s SEMANTiCS is starting on September 12th with a full day of exciting and interesting satellite events. In 6 parallel tracks scientific and industrial workshops and tutorials are scheduled to provide a forum for groups of researchers and practitioners to discuss and learn about hot topics in Semantic Web research.

How to find users and feedback for your vocabulary or ontology?

The Vocabulary Carnival is a unique opportunity for vocabulary publishers to showcase and share their work in form of a poster and a short presentation, meet the growing community of vocabulary publishers and users to build useful semantic, technical and social links. You can join the Carnival Minute Madness on the 13th of September.

How to submit to ELDC?

The European Linked Data Contest awards prizes to stories, products, projects or persons presenting novel and innovative projects, products and industry implementations involving linked data. The ELDC is more than yet another competition. We envisage to build a directory of the best European projects in the domain of Linked Data and the Semantic Web. This year the ELDC is awarded in the categories Linked Enterprise Data and Linked Open Data, with €1.500,- for each of the winners. Submission deadline is August 31, 2016.

7th DBpedia Community Meeting in Leipzig 2016

Co-located with SEMANTiCS, the next DBpedia meeting will be held at Leipzig on September 15th. Experts will speak about topics such as Wikidata: bringing structured data to Wikipedia with 16.000 volunteers. The 7th edition of this event covers a DBpedia showcase session, breakout sessions and a DBpedia Association meeting where we will discuss new strategies and which direction is important for DBpedia. If you like to become part of the DBpedia community and present your ideas, please submit your proposal or check our meeting website: http://wiki.dbpedia.org/meetings/Leipzig2016

Sponsorship  opportunities

We would be delighted to welcome new sponsors for SEMANTiCS 2016. You will find a number of sponsorship packages with an indication of benefits and prices here: http://semantics.cc/sponsorship-packages.

Special offer: You can buy a special SEMANTiCS industry ticket for €400 which includes a poster presentation at our marketplace. So take the opportunity to increase the visibility of your company, organisation or project among an international and high impact community. If you are interested, please contact us via email to semantics2016@fu-confirm.de.  

Posted at 15:58

August 15

Semantic Web Company (Austria): Introducing a Graph-based Semantic Layer in Enterprises

this-is-not-a-pipeThings, not Strings
Entity-centric views on enterprise information and all kinds of data sources provide means to get a more meaningful picture about all sorts of business objects. This method of information processing is as relevant to customers, citizens, or patients as it is to knowledge workers like lawyers, doctors, or researchers. People actually do not search for documents, but rather for facts and other chunks of information to bundle them up to provide answers to concrete questions.

Strings, or names for things are not the same as the things they refer to. Still, those two aspects of an entity get mixed up regularly to nurture the Babylonian language confusion. Any search term can refer to different things, therefore also Google has rolled out its own knowledge graph to help organizing information on the web at a large scale.

Semantic graphs can build the backbone of any information architecture, not only on the web. They can enable entity-centric views also on enterprise information and data. Such graphs of things contain information about business objects (such as products, suppliers, employees, locations, research topics, …), their different names, and relations to each other. Information about entities can be found in structured (relational databases), semi-structured (XML), and unstructured (text) data objects. Nevertheless, people are not interested in containers but in entities themselves, so they need to be extracted and organized in a reasonable way.

Machines and algorithms make use of semantic graphs to retrieve not only simply the objects themselves but also the relations that can be found between the business objects, even if they are not explicitly stated. As a result, ‘knowledge lenses’ are delivered that help users to better understand the underlying meaning of business objects when put into a specific context.

Personalization of information
The ability to take a view on entities or business objects in different ways when put into various contexts is key for many knowledge workers. For example, drugs have regulatory aspects, a therapeutical character, and some other meaning to product managers or sales people. One can benefit quickly when only confronted with those aspects of an entity that are really relevant in a given situation. This rather personalized information processing has heavy demand for a semantic layer on top of the data layer, especially when information is stored in various forms and when scattered around different repositories.

Understanding and modelling the meaning of content assets and of interest profiles of users are based on the very same methodology. In both cases, semantic graphs are used, and also the linking of various types of business objects works the same way.

Recommender engines based on semantic graphs can link similar contents or documents that are related to each other in a highly precise manner. The same algorithms help to link users to content assets or products. This approach is the basis for ‘push-services’ that try to ‘understand’ users’ needs in a highly sophisticated way.

‘Not only MetaData’ Architecture
Together with the data and content layer and its corresponding metadata, this approach unfolds into a four-layered information architecture as depicted here.

Following the NoSQL paradigm, which is about ‘Not only SQL’, one could call this content architecture ‘Not only Metadata’, thus ‘NoMeDa’ architecture. It stresses the importance of the semantic layer on top of all kinds of data. Semantics is no longer buried in data silos but rather linked to the metadata of the underlying data assets. Therefore it helps to ‘harmonize’ different metadata schemes and various vocabularies. It makes the semantics of metadata, and of data in general, explicitly available. While metadata most often is stored per data source, and therefore not linked to each other, the semantic layer is no longer embedded in databases. It reflects the common sense of a certain domain and through its graph-like structure it can serve directly to fulfill several complex tasks in information management:

  • Knowledge discovery, search and analytics
  • Information and data linking
  • Recommendation and personalization of information
  • Data visualization

Graph-based Data Modelling
Graph-based semantic models resemble the way how human beings tend to build their own models of the world. Any person, not only subject matter experts, organize information by at least the following six principles:

  1. Draw a distinction between all kinds of things: ‘This thing is not that thing’
  2. Give things names: ‘This thing is my dog Goofy’ (some might call it Dippy Dawg, but it’s still the same thing)
  3. Categorize things: ‘This thing is a dog but not a cat’
  4. Create general facts and relate categories to each other: ‘Dogs don’t like cats’
  5. Create specific facts and relate things to each other: ‘Goofy is a friend of Donald’, ‘Donald is the uncle of Huey, Dewey, and Louie’, etc.
  6. Use various languages for this; e.g. the above mentioned fact in German is ‘Donald ist der Onkel von Tick, Trick und Track’ (remember: the thing called ‘Huey’ is the same thing as the thing called ‘Tick’ – it’s just that the name or label for this thing that is different in different languages).

These fundamental principles for the organization of information are well reflected by semantic knowledge graphs. The same information could be stored as XML, or in a relational database, but it’s more efficient to use graph databases instead for the following reasons:

  • The way people think fits well with information that is modelled and stored when using graphs; little or no translation is necessary.
  • Graphs serve as a universal meta-language to link information from structured and unstructured data.
  • Graphs open up doors to a better aligned data management throughout larger organizations.
  • Graph-based semantic models can also be understood by subject matter experts, who are actually the experts in a certain domain.
  • The search capabilities provided by graphs let you find out unknown linkages or even non-obvious patterns to give you new insights into your data.
  • For semantic graph databases, there is a standardized query language called SPARQL that allows you to explore data.
  • In contrast to traditional ways to query databases where knowledge about the database schema/content is necessary, SPARQL allows you to ask “tell me what is there”.

Standards-based Semantics
Making the semantics of data and metadata explicit is even more powerful when based on standards. A framework for this purpose has evolved over the past 15 years at W3C, the World Wide Web Consortium. Initially designed to be used on the World Wide Web, many enterprises have been adopting this stack of standards for Enterprise Information Management. They now benefit from being able to integrate and link data from internal and external sources with relatively low costs.

At the base of all those standards, the Resource Description Framework (RDF) serves as a ‘lingua franca’ to express all kinds of facts that can involve virtually any kind of category or entity, and also all kinds of relations. RDF can be used to describe the semantics of unstructured text, XML documents, or even relational databases. The Simple Knowledge Organization System (SKOS) is based on RDF. SKOS is widely used to describe taxonomies and other types of controlled vocabularies. SPARQL can be used to traverse and make queries over graphs based on RDF or standard schemes like SKOS.

With SPARQL, far more complex queries can be executed than with most other database query languages. For instance, hierarchies can be traversed and aggregated recursively: a geographical taxonomy can then be used to find all documents containing places in a certain region although the region itself is not mentioned explicitly.

Standards-based semantics also helps to make use of already existing knowledge graphs. Many government organisations have made available high-quality taxonomies and semantic graphs by using semantic web standards. These can be picked up easily to extend them with own data and specific knowledge.

Semantic Knowledge Graphs will grow with your needs!
Standards-based semantics provide yet another advantage: it is becoming increasingly simpler to hire skilled people who have been working with standards like RDF, SKOS or SPARQL before. Even so, experienced knowledge engineers and data scientists are a comparatively rare species. Therefore it’s crucial to grow graphs and modelling skills over time. Starting with SKOS and extending an enterprise knowledge graph over time by introducing more schemes and by mapping to other vocabularies and datasets over time is a well established agile procedure model.

A graph-based semantic layer in enterprises can be expanded step-by-step, just like any other network. Analogous to a street network, start first with the main roads, introduce more and more connecting roads, classify streets, places, and intersections by a more and more distinguished classification system. It all comes down to an evolving semantic graph that will serve more and more as a map of your data, content and knowledge assets.

Semantic Knowledge Graphs and your Content Architecture
It’s a matter of fact that semantics serves as a kind of glue between unstructured and structured information and as a foundation layer for data integration efforts. But even for enterprises dealing mainly with documents and text-based assets, semantic knowledge graphs will do a great job.

Semantic graphs extend the functionality of a traditional search index. They don’t simply annotate documents and store occurrences of terms and phrases, they introduce concept-based indexing in contrast to term based approaches. Remember: semantics helps to identify the things behind the strings. The same applies to concept-based search over content repositories: documents get linked to the semantic layer, and therefore the knowledge graph can be used not only for typical retrieval but to classify, aggregate, filter, and traverse the content of documents.

PoolParty combines Machine Learning with Human Intelligence

Semantic knowledge graphs have the potential to innovate data and information management in any organisation. Besides questions around integrability, it is crucial to develop strategies to create and sustain the semantic layer efficiently.

Looking at the broad spectrum of semantic technologies that can be used for this endeavour, they range from manual to fully automated approaches. The promise to derive high-quality semantic graphs from documents fully automatically has not been fulfilled to date. On the other side, handcrafted semantics is error-prone, incomplete, and too expensive. The best solution often lies in a combination of different approaches. PoolParty combines Machine Learning with Human Intelligence: extensive corpus analysis and corpus learning support taxonomists, knowledge engineers and subject matter experts with the maintenance and quality assurance of semantic knowledge graphs and controlled vocabularies. As a result, enterprise knowledge graphs are more complete, up to date, and consistently used.

“An Enterprise without a Semantic Layer is like a Country without a Map.

Posted at 13:34

August 10

AKSW Group - University of Leipzig: AKSW Colloquium, 15th August, 3pm, RDF query relaxation

Michael Roeder On the 15th of August at 3 PM, Michael Röder will present the paper “RDF Query Relaxation Strategies Based on Failure Causes” of Fokou et al. in P702.

Abstract

Recent advances in Web-information extraction have led to the creation of several large Knowledge Bases (KBs). Querying these KBs often results in empty answers that do not serve the users’ needs. Relaxation of the failing queries is one of the cooperative techniques used to retrieve alternative results. Most of the previous work on RDF query relaxation compute a set of relaxed queries and execute them in a similarity-based ranking order. Thus, these approaches relax an RDF query without knowing its failure causes (FCs). In this paper, we study the idea of identifying these FCs to speed up the query relaxation process. We propose three relaxation strategies based on various information levels about the FCs of the user query and of its relaxed queries as well. A set of experiments conducted on the LUBM benchmark show the impact of our proposal in comparison with a state-of-the-art algorithm.

The paper is available at researchgate.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 09:03

August 09

schema.org: schema.org update: hotels, datasets, "health-lifesci" and "pending" extensions...

Schema.org 3.1 has been released! Many thanks to everyone in the community who has contributed to this update, which includes substantial new vocabulary for describing hotels and accommodation, some improvements around dataset description, as well as the usual collection of new examples, bugfixes, usability, infrastructural, standards compatibility and conceptual consistency improvements.

This release builds upon the recent 3.0 release. In version 3.0 we created a health-lifesci extension as a new home for the extensive collection of medical/health terms that were introduced back in 2012. Publishers and webmasters do not need to update their markup for this change, it is best considered an improvement to the structure of our documentation. Our extension system allows us to provide deeper coverage of specialist topics without cluttering the core project pages. Version 3.0 also included some improvements from the FIBO project, improving our representation of various financial products.

We have also introduced a special extension called "pending", which provides a place for newly proposed schema.org terms to be documented, tested and revised. We hope that this will help schema proposals get wider visibility and review, supporting greater participation from non-developer collaborators. You should not need to be a computer programmer to be part of our project, and "pending" is one step towards making work-in-progress schema proposals more visible without requiring knowledge of highly technical systems like GitHub. We have linked each term in pending.schema.org to the technical discussions at Github, but also to a simple feedback form. We anticipate updating the "pending" area relatively frequently, in between formal releases.

The site also features a new "how we work" document, oriented towards the Web standards community and toolmakers, explaining the evolving process we have adopted towards creating new and improved schemas. See also commentary on this in the UK government technology blog post about making job adverts more open with schema.org.

Many people were involved in these updates, but particular thanks are due to Martin Hepp for leading the hotels/accommodation design, and to Marc Twagirumukiza for chairing the "schemed" W3C community group that led the creation of our new health-lifesci extension.

Finally, we would like to dedicate this release to Peter Mika, who has served on our steering group since the early days. Peter has stepped down as Yahoo's representative, passing his duties to Nicolas Torzec. Thanks, Peter! Welcome, Nicolas...

For more details on version 3.1 of schema.org, check out the release notes

Posted at 17:47

August 04

Semantic Web Company (Austria): PoolParty Academy is opening in September 2016

PoolParty Academy offers three E-Learning tracks that enable customers, partners and individual professionals to learn Semantic Web technologies and PoolParty Semantic Suite in particular.

You can pre-register for the PoolParty Academy training tracks at the academy’s website or join our live class-room at the biggest European industrial Semantic Web conference – SEMANTiCS 2016.

read more

Posted at 07:16

August 02

AKSW Group - University of Leipzig: Article accepted in Journal of Web Semantics

We are happy to announce that the article “DL-Learner – A Framework for Inductive Learning on the Semantic Web” by Lorenz Bühmann, Jens Lehmann and Patrick Westphal was accepted for publication in the Journal of Web Semantics: Science, Services and Agents on the World Wide Web.

Abstract:

In this system paper, we describe the DL-Learner framework, which supports supervised machine learning using OWL and RDF for background knowledge representation. It can be beneficial in various data and schema analysis tasks with applications in different standard machine learning scenarios, e.g. in the life sciences, as well as Semantic Web specific applications such as ontology learning and enrichment. Since its creation in 2007, it has become the main OWL and RDF-based software framework for supervised structured machine learning and includes several algorithm implementations, usage examples and has applications building on top of the framework. The article gives an overview of the framework with a focus on algorithms and use cases.

Posted at 07:54

August 01

Dublin Core Metadata Initiative: DC-2016 Preliminary Program Announced

2016-08-01, DCMI is pleased to announce the publication of the Preliminary Program for DC-2016 at http://dcevents.dublincore.org/IntConf/dc-2016/schedConf/program. The program includes 28 Full Papers, Project Reports, and Presentations on Metadata as well as 14 Posters. Six Special Sessions on significant metadata topics as well as 6 half- and full-day Workshops round out the program. The keynote will be delivered by Bradley P. Allen, Chief Architect at Elsevier, the world's leading scientific publisher. The Final Program including the authors and abstracts of Papers, Project Reports, Posters, and Presentations on Metadata will be available on 15 August. Registration is now open at http://dcevents.dublincore.org/IntConf/index/pages/view/reg16 with an early registration rate available through 2 September 2016.

Posted at 23:59

July 31

Bob DuCharme: SPARQL in a Jupyter (a.k.a. IPython) notebook

With just a bit of Python to frame it all.

Posted at 15:15

July 23

Leigh Dodds: Reputation data portability

Yesterday I went to the

Posted at 11:24

July 18

AKSW Group - University of Leipzig: AKSW Colloquium, 18.07.2016, AEGLE and node2vec

On Monday 18.07.2016, Kleanthi Georgala will give her Colloquium presentation for her paper “An Efficient Approach for the Generation of Allen Relations”, that was accepted at the European Conference on Artificial Intelligence (ECAI) 2016.

Abstract

Event data is increasingly being represented according to the Linked Data principles. The need for large-scale machine learning on data represented in this format has thus led to the need for efficient approaches to compute RDF links between resources based on their temporal properties. Time-efficient approaches for computing links between RDF resources have been developed over the last years. However, dedicated approaches for linking resources based on temporal relations have been paid little attention to. In this paper, we address this research gap by presenting AEGLE, a novel approach for the efficient computation of links between events according to Allen’s interval algebra. We study Allen’s relations and show that we can reduce all thirteen relations to eight simpler relations. We then present an efficient algorithm with a complexity of O(n log n) for computing these eight relations. Our evaluation of the runtime of our algorithms shows that we outperform the state of the art by up to 4 orders of magnitude while maintaining a precision and a recall of 1.

Tommaso SoruAfterwards, Tommaso Soru will present a paper considered the latest chapter of the Everything-2-Vec saga, which encompasses outstanding works such as Word2Vec and Doc2Vec. The paper title is node2vec: Scalable Feature Learning for Networks” [PDF] by Aditya Grover and Jure Leskovec, accepted for publication at the International Conference on Knowledge Discovery and Data Mining (KDD), 2016 edition.

Posted at 12:56

July 11

Dublin Core Metadata Initiative: FINAL call for DC-2016 Presentations and Best Practice Posters/Demos

2016-07-11, The submission deadline of 15 July is rapidly approaching for the Presentations and Best Practice Posters and Demos tracks at DC-2016. Both presentations and posters/demos provide the opportunity to practitioners and researchers specializing in metadata design, implementation, and use to present their work at the International Conference on Dublin Core and Metadata Applications in Copenhagen. No paper is required for presentations or posters/demos. Accepted submissions in the Presentations track will have approximately 20-25 minutes to present and 5-10 minutes for questions and discussion. Proposal abstracts will be reviewed for selection by the Program Committee. The presentation slide decks and the poster images will be openly available as part of the permanent record of the DC-2016 conference. If you are interested in presenting at DC-2016, please submit a proposal abstract through the DC-2016 submission system before the 15 July deadline at http://dcevents.dublincore.org/index.php/IntConf/dc-2016/schedConf/cfp. For a fuller description of the Presentations track, see http://dcevents.dublincore.org/IntConf/index/pages/view/pre16.

Posted at 23:59

Dublin Core Metadata Initiative: Dublin Core at 21

2016-07-11, 2016-07-11, Announcing an IFLA Satellite event: Friday, August 19 at OCLC in Dublin, OH. The Dublin Core originated in 1995 at a meeting at OCLC (in the very room where this IFLA Satellite event will take place). This special event will bring an historical view of DCMI's history through people who were there when the Web was young and Dublin Core was new and evolving rapidly. Presenters will include metadata experts with long ties to Dublin Core including several who were at the original invitational meeting in 1995. A panel discussion will permit speakers to reflect on activities and trends past and present, and project what the future will look like. Attendees are invited to attend a complimentary reception and special unveiling following the presentation portion of the day. The event is being sponsored by the IFLA Information Technology Section and the Dublin Core Metadata Initiative with support from OCLC. Additional information and registration is available at https://www.eventbrite.com/e/dublin-core-at-21-tickets-25525829443.

Posted at 23:59

July 10

Egon Willighagen: Setting up a local SPARQL endpoint

... has never been easier, and I have to say, with Virtuoso it already was easy.

Step 1: download the jar and fire up the server
OK, you do need Java installed, and for many this is still the case, despite Oracle doing their very best to totally ruin it for everyone. But seriously, visit the Blazegraph website (@blazegraph) and download the jar and type:

$ java -jar blazegraph.jar

It will give some output on the console, including a webpage with SPARQL endpoint, upload form etc.


That it tracks past queries is a nice extra.

Step 2: there is no step two

Step 3: OK, OK, you also want to try a SPARQL from the command line
Now, I have to say, the webpage does not have a "Download CSV" button on the SPARQL endpoint. That would be great, but doing so from the command line is not too hard either.

$ curl -i -H "Accept: text/csv" --data-urlencode \
  query@list.rq http://192.168.0.233:9999/blazegraph/sparql

But it would be nice if you would not have to copy/paste the query into a file, or go to the command line in the first place. Also, I had some trouble finding the correct SPARQL endpoint URL, as it seems to have changed at least twice in recent history, given the (outdated) documentation I found online (common problem; no complaint!).

HT to Andra who first mentioned Blazegraph to me, and the Blazegraph team.

Posted at 18:57

July 02

Libby Miller: Working from home

A colleague asked me about my experiences working from home so I’ve made a few notes here.

I’m unusual in my department in that I work from home three or four days a week, and one or two in London, or very occasionally Salford. I started off in this job on an EU-funded project where everyone was remote, and so it made little difference where I was physically as long as we synced up regularly. Since then I’ve worked on multiple other projects where the other participants are mostly in one place and I’m elsewhere. That’s made it more difficult, but also, sometimes, better.

A buddy

Where everyone else is in one place, the main thing I need to function well is one or more buddies who are physically there, who remember to call me in for meetings and let me know anything significant that’s happening that I’m missing because I’m not physically there. The first of these is the most important. Being remote you are easily forgettable. Without Andrew, Dan, Joanne, Tristan, and now Henry and Tim, I’d sometimes be left out.

IRC or slack

I’ve used IRC for years for various remote things (we used to do “scheduled topic chats” 15 year ago on freenode for various Semantic Web topics), the various bots that keep you informed and help you share information easily – loggers and @Edd’s “chump” in particular, but also #swhack bots of many interesting kinds. I learned a huge amount from friends in W3C who are mostly remote from each other and have made lots of tools and bots for helping them manage conference calls for many years.

Recently our team have started using slack as well as irc, so now I’m on both: Slack means that a much more diverse set of people are happy to participate, which is great. It can be very boring working on your own, and these channels make for a sense of community, as well as being useful for specific timely exchanges of information.

Lots of time on organisation

I spend a lot of time figuring out where I need to be and making decisions about what’s most important, and what needs to be face to face and what can be a call. Also: trying to figure out how annoying I’m going to be to the other people in a meeting, and whether I’m going to be able to contribute successfully, or whether it’s best to skip it. I’ve had to learn to ignore the fomo.

I have a text based todo list, which can get a little out of control, but in general has high level goals for this week and next, goals for the day, as well as specific tasks that need to be done on any particular day or a particular time. I spend a little time each morning figuring these out, and making sure I have a good sense of my calendar (Dan Connolly taught me to do this!). In general, juggling urgent and project-managery and less-urgent exploratory work is difficult and I probably don’t do enough of the latter (and I probably don’t look far enough ahead, either). I sometimes schedule my day quite concretely with tasks at specific times to make sure I devote thinking time for specific problems, or when I have a ton to do, or a lot of task switching.

Making an effort not to work

Working at home means I could work any time, and having an interesting job means that I’d probably quite enjoy it, too. There’s a temptation to do the boring admin stuff in work and leave the fun stuff until things are quieter in the evenings or at the weekend. But I make an effort not to do this, and it helps that the team I work in don’t work late or at weekends. This is a good thing. We need downtime or we’ll get depleted (I did in my last job, a startup, where I also worked at home most of the time, and where we were across multiple timezones).

Weekends are fairly easy to not work in, evenings are harder, so I schedule other things to do where possible (Bristol Hackspace, cinema, watching something specific on TV, other technical personal projects).

Sometimes you just have to be there

I’m pretty good at doing meetings remotely but we do a lot of workshops which involve getting up and doing things, writing things down on whiteboards etc. I also chair a regular meeting that I feel works better if I’m there. When I need to be there a few days, I’m lucky enough to be able to stay with some lovely friends, which means its a pleasure rather than being annoying and boring to not be at home.

What I miss and downsides

What I miss is the unscheduled time working or just hanging out with people. When I’m in London my time is usually completely scheduled, which is pretty knackering. Socialising gets crammed into short trips to the pub. The commute means I lose my evening at least once a week and sometimes arrive at work filled with train-rage (I guess the latter is normal for anyone who commutes by rail).

Not being in the same place as everything day to day means that I miss some of the up-and down-sides of being physically there, which are mostly about spontaneity: I never get included in ad-hoc meetings, so have more time to concentrate but also miss some interesting things; I don’t get distracted (by fun or not-fun) things, including bad moods in the organisation, gossip, but also impromptu games, fun trips out etc etc.

And finally…

For me, working from home in various capacities has given me opportunities I’d never have had, and I’m very lucky to be able to do it in my current role.


Posted at 17:11

Egon Willighagen: Two Apache Jena SPARQL query performance observations

Doing searches in RDF stores is commonly done with SPARQL queries. I have been using this with the semantic web translation of WikiPathways by Andra to find common content issues, though sometimes combined with some additional Java code. For example, find PubMed identifiers that are not numbers.

Based on Ryan's work on interactions, a more complex curation query I recently wrote in reply to issues that Alex ran into with converting pathways to BioPax, is to find interactions that convert a gene to another gene. Such occurred in WikiPathways because graphically you do not see the difference. I originally had this query:

SELECT (str(?organismName) as ?organism) ?page
       ?gene1 ?gene2 ?interaction
WHERE {
  ?gene1 a wp:GeneProduct .
  ?gene2 a wp:GeneProduct .
  ?interaction wp:source ?gene1 ;
    wp:target ?gene2 ;
    a wp:Conversion ;
    dcterms:isPartOf ?pathway .
  ?pathway foaf:page ?page ;
    wp:organismName ?organismName .
} ORDER BY ASC(?organism)

This query properly found all gene-gene conversions to be fixed. However, it was also horribly slow with my JUnit/Apache Jena set up. The queries runs very efficiently on the Virtuoso-based SPARQL end point. I had been trying to speed it up in the past, but without much success. Instead, I ended up batching the testing on our Jenkins instance. But this got a bit silly, with at some point subsets of less than 100 pathways.

Observation #1
So, I turned to twitter, and quite soon got three useful leads. The first two suggestions did not help, but helped me rule out the problem. Of course, there is literature about optimizing, like this recent paper by Antonis (doi:10.1016/j.websem.2014.11.003), but I haven't been able to convert this knowledge into practical steps either. After ruling out these options (though I kept the sameTerm() suggestion), and realized it had to be the first two triples with the variables ?gene1 and ?gene2. So, I tried using FILTER there too, resulting with this query:

WHERE {
  ?interaction wp:source ?gene1 ;
    wp:target ?gene2 ;
    a wp:Conversion ;
    dcterms:isPartOf ?pathway .
  ?pathway foaf:page ?page ;
    wp:organismName ?organismName .
  FILTER (!sameTerm(?gene1, ?gene2))
  FILTER (?gene1 a wp:GeneProduct)
  FILTER (?gene2 a wp:GeneProduct)
} ORDER BY ASC(?organism)

That did it! The time to run a query halved. Not so surprising, in retrospect, but it all depends on the SPARQL engine: which parts does it run first. Apparently, Jena's SPARQL engine starts at the top. This seems to be confirmed by the third comment I got. However, I always understood engine can also start at the bottom.

Observation #2
But that's not all. This speed up made me wonder something else. The problem clearly seems to engine approach to run parts of the query. So, what if I remove further choices in what to run first? That leads me to a second observation. It helps significantly if you reduce the number of subgraphs it should later "merge". Instead, if possible, use property paths. That again, about halved the runtime of the query. I ended up with the below query, which, obviously, no longer give me access to the pathway resources, but I can live with that:

WHERE {
  ?interaction wp:source ?gene1 ;
    wp:target ?gene2 ;
    a wp:Conversion ;
    dcterms:isPartOf/foaf:page ?pathway ;
    dcterms:isPartOf/wp:organismName ?organismName .
  FILTER (!sameTerm(?gene1, ?gene2))
  FILTER EXISTS {?gene1 a wp:GeneProduct}
  FILTER EXISTS {?gene2 a wp:GeneProduct}
} ORDER BY ASC(?organism)

I'm hoping these two observations may help other with using Apache Jena with unit and integrated testing of RDF generation too.

Loizou, A., Angles, R., Groth, P., Mar. 2015. On the formulation of performant SPARQL queries. Web Semantics: Science, Services and Agents on the World Wide Web 31, 1-26. http://dx.doi.org/10.1016/j.websem.2014.11.003

Posted at 13:07

July 01

W3C Read Write Web Community Group: Read Write Web — Q2 Summary — 2016

Summary

Decentralization is becoming more and more of a theme on the web, and this quarter witnessed the Decentralized Web Summit, in San Francisco.  Keynotes from Tim Berners-Lee and Vint Cerf are definitely worth checking out.

Some interesting work is coming up in as a verified claims working group has been proposed.  The editors draft is available for review.

In the Community Group, there has been a some discussion, but main focus is on apps, and also a specification for linked data notifications has been started.

Communications and Outreach

Aside from the Decentralized Web Summit, some folks attended the ID2020 summit which aims to help provide legal identifiers to everyone by 2030.  There was much interest there on the idea of decentralized identifiers.

There was also a session at WWW 2016 entitled Building Decentralized Applications on the Social Web.

 

Community Group

A new spec has been created called “Linked Data Notifications“.  The system combines the concept of an inbox with the idea sending notifications to users.  Please feel free to take a look at this work, provide feedback or raise issues.  Some light discussion on the mailing list and a few apps and demos have been released.  More below!

solid

Applications

Lots of application work going on in the solid and linkeddata github repositories.  Solid.js has been renamed to solid client, and provides lots of useful features for dealing with solid servers.

The Solid connections UI is an app to help manage your social graph.  Work has begun on an improved solid profile manager. A new signup widget is also being worked on.

The tabulator project has been modularized and split up into various components and apps (aka panes) so that anyone can build and skin a “data browser” capable of working together with web 3.0.  Lots more interesting work in the individual repos.

I have done some work on the economic side release version 0.1 of webcredits, which provides a linked data based ledger for apps.  A demo of the functionality built on top can be seen at : testcoin.org.  Im also happy to report that I have got solid node server running on my phone and it performs really well!

readinglist

Last but not Least…

Nicola Greco has collected a cool set of papers described as a “reading list” : of articles or papers relevant to the work of Solid.  It is possible to drop your own paper in by adding a new issue.  Happy reading!

 

Posted at 16:41

Semantic Web Company (Austria): PoolParty at Connected Data London

ConnectedDataConnected Data London “brings together the Linked and Graph Data communities”.

More and more Linked Data applications seem to emerge in the business world and software companies make it part of their business plan to integrate Graph Data in their data stories or in their features.

MarkLogic is opening a new wave to how enterprise databases should be used to push over the limits of closed, rigid structures to integrate more data. Neo4j explains how you can enrich existing data and follow new connections and leads for investigations of the Panama Papers.

No wonder the communities in different locations gather to share, exchange and network around topics like Linked Data. In London, a new conference is emerging exactly for this purpose: Connected Data London. The conference sets the stage for industry leaders and early adopters as well as researchers to present their use cases and stories. You can hear talks from multiple domains about how they put Linked Data to a good use: space exploration, financial crime, bioinformatics, publishing and more.

The conference will close with an interesting panel discussion about “How to build a Connected Data capability in your organization.” You can hear from the specialists how this task is approached. And immediately after acquiring the know-how you will need a easy-to-use and easy-to-integrate software to help with your Knowledge Model creation and maintenance as well as Text Mining and Concept Annotating.

Semantic Web Company has an experienced team of professional consultants who help you in all your steps of implementing the acquired know-how together with PoolParty.

In our dedicated slot we present how a Connected Data Application is born from a Knowledge Model and which are the steps to get there.

Register today!

Connected Data London – London, 12th July, Holiday Inn Mayfair

Posted at 13:33

June 29

AKSW Group - University of Leipzig: AKSW Colloquium, 04.07.2016. Big Data, Code Quality.

On the upcoming Monday (04.07.2016), AKSW group will discuss topics related to Semantic Web and Big Data as well as programming languages and code quality. In particular, the following papers will be presented:

S2RDF: RDF Querying with SPARQL on Spark

by Alexander Schätzle et al.
Presented by: Ivan Ermilov

RDF has become very popular for semantic data publishing due to its flexible and universal graph-like data model. Yet, the ever-increasing size of RDF data collections makes it more and more infeasible to store and process them on a single machine, raising the need for distributed approaches. Instead of building a standalone but closed distributed RDF store, we endorse the usage of existing infrastructures for Big Data processing, e.g. Hadoop. However, SPARQL query performance is a major challenge as these platforms are not designed for RDF processing from ground. Thus, existing Hadoop-based approaches often favor certain query pattern shape while performance drops significantly for other shapes. In this paper, we describe a novel relational partitioning schema for RDF data called ExtVP that uses a semi-join based preprocessing, akin to the concept of Join Indices in relational databases, to efficiently minimize query input size regardless of its pattern shape and diameter. Our prototype system S2RDF is built on top of Spark and uses its relational interface to execute SPARQL queries over ExtVP. We demonstrate its superior performance in comparison to state
of the art SPARQL-on-Hadoop approaches using the recent WatDiv test suite. S2RDF achieves sub-second runtimes for majority of queries on a billion triples RDF graph

A Large Scale Study of Programming Languages and Code Quality in Github

by Baishakhi Ray et al.
Presented by: Tim Ermilov

What is the effect of programming languages on software quality? This question has been a topic of much debate for a very long time. In this study, we gather a very large data set from GitHub (729 projects, 80 Million SLOC, 29,000 authors, 1.5 million commits, in 17 languages) in an attempt to shed some empirical light on this question. This reasonably large sample size allows us to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static v.s. dynamic typing, strong v.s. weak typing on software quality. By triangulating findings from different methods,
and controlling for confounding effects such as team size, project size, and project history, we report that language design does have a significant, but modest effect on software quality. Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size. However, we hasten to caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, e.g., the preference of certain personality types for functional, static and strongly typed languages

Each paper will be presented in 20 minutes, which will be followed by 10 minutes discussion. After the talks, there is more time for discussion in smaller groups as well as coffee and cake. The colloquium starts at 3 p.m. and is located on 7th floor (Leipzig, Augustusplatz 10, Paulinum).

Posted at 10:34

June 27

AKSW Group - University of Leipzig: Accepted Papers of AKSW Members @ Semantics 2016

logo-semantics-16This year’s SEMANTiCS conference which is taking place between September 12 – 15, 2016 in Leipzig recently invited for the submission of research papers on semantic technologies. Several AKSW members seized the opportunity and got their submitted papers accepted for presentation at the conference.

These are listed below:

  • Executing SPARQL queries over Mapped Document Stores with SparqlMap-M (Jörg Unbehauen, Michael Martin )
  • Distributed Collaboration on RDF Datasets Using Git: Towards the Quit Store (Natanael Arndt, Norman Radtke and Michael Martin)
  • Towards Versioning of Arbitrary RDF Data (Marvin Frommhold, Ruben Navarro Piris, Natanael Arndt, Sebastian Tramp, Niklas Petersen and Michael Martin)
  • DBtrends: Exploring query logs for ranking RDF data (Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg)
  • MEX Framework: Automating Machine Learning Metadata Generation (Diego Esteves, Pablo N. Mendes, Diego Moussallem, Julio Cesar Duarte, Maria Claudia Cavalcanti, Jens Lehmann, Ciro Baron Neto and Igor Costa)

logo-www.leds-projekt.deAnother AKSW-driven event of the SEMANTiCS 2016 will be the Linked Enterprise Data Services (LEDS) Track taking place between September 13-14, 2016. This track is specifically organized by the BMBF-funded LEDS project which is part of the Entrepreneurial Regions program – a BMBF Innovation Initiative for the New German Länder. Focus is on discussing with academic and industrial partners new approaches to discover and integrate background knowledge into business and governmental environments.

DBpediaLogoFullSEMANTiCS 2016 will also host the 7th edition of the DBpedia Community Meeting on the last day of the conference (September 15 – ‘DBpedia Day‘). DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and link the different data sets on the Web to Wikipedia data.

So come and join SEMANTiCS 2016, talk and discuss with us!

More information on the program can be found here.

LEDS is funded by:                      Part of:
BMBF_CMYK_Gef_L_300dpi

Wachstumskern Region

Posted at 10:50

June 26

AKSW Group - University of Leipzig: AKSW Colloquium, 27.06.2016, When owl:sameAs isn’t the Same + Towards Versioning for Arbitrary RDF Data

In the next Colloquium, June the 27th at 3 PM, two papers will be presented:

When owl:sameAs isn’t the Same: An Analysis of Identity in Linked Data

andre_terno_itaAndré Valdestilhas will present the paper “When owl:sameAs isn’t the Same: An Analysis of Identity in Linked Data” by Halpin et al. [PDF]:

Abstract:  In Linked Data, the use of owl:sameAs is ubiquitous in interlinking data-sets. There is however, ongoing discussion about its use, and potential misuse, particularly with regards to interactions with inference. In fact, owl:sameAs can be viewed as encoding only one point on a scale of similarity, one that is often too strong for many of its current uses. We describe how referentially opaque contexts that do not allow inference exist, and then outline some varieties of referentially-opaque alternatives to owl:sameAs. Finally, we report on an empirical experiment over randomly selected owl:sameAs statements from the Web of data. This theoretical apparatus and experiment shed light upon how owl:sameAs is being used (and misused) on the Web of data.

Towards Versioning for Arbitrary RDF Data

marvin-frommhold-foto.256x256Afterwards, Marvin Frommhold will practice the presentation of his paper “Towards Versioning for Arbitrary RDF Data” (Marvin Frommhold, Rubén Navarro Piris, Natanael Arndt, Sebastian Tramp, Niklas Petersen, and Michael Martin) [PDF] which is accepted at the main conference of the Semantics 2016 in Leipzig.

Abstract: Coherent and consistent tracking of provenance data and in particular update history information is a crucial building block for any serious information system architecture. Version Control Systems can be a part of such an architecture enabling users to query and manipulate versioning information as well as content revisions. In this paper, we introduce an RDF versioning approach as a foundation for a full featured RDF Version Control System. We argue that such a system needs support for all concepts of the RDF specification including support for RDF datasets and blank nodes. Furthermore, we placed special emphasis on the protection against unperceived history manipulation by hashing the resulting patches. In addition to the conceptual analysis and an RDF vocabulary for representing versioning information, we present a mature implementation which captures versioning information for changes to arbitrary RDF datasets.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 13:46

June 25

Egon Willighagen: New Paper: "Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources"


Andra Waagmeester published a paper on his work on a semantic web version of the WikiPathways (doi:10.1371/journal.pcbi.1004989). The paper outlines the design decisions, shows the SPARQL endpoint, and several examples SPARQL queries. These include federates queries, like a mashup with DisGeNET (doi:10.1093/database/bav028) and EMBL-EBI's Expression Atlas. That results in nice visualisations like this:


If you have the relevant information in the pathway, these pathways can help a lot in helping understanding of what is biologically going on. And, of course, used for exactly that a lot.

Press release
Because press releases have become an interesting tool in knowledge dissemination, I wanted to learn what it involved to get one out. This involved the people as PLOS Computational Biology and the press offices of the Gladstone Institutes and our Maastricht University (press release 1, press release 2 EN/NL). There is already one thing I learned in retrospect, and I am pissed with myself that I did not think of this: you should always have a graphics supporting your story. I have been doing this for a long time in my blog now (sometimes I still forget), but did not think of that in the press release. The press release was picked up by three outlets, though all basically as we presented it to them (thanks to Altmetric.com):


SPARQL
But what makes me appreciate this piece of work, and WikiPathways itself, is how it creates a central hub of biological knowledge. Pathway databases capture knowledge not easily embedded an generally structured (relational) databases. As such, expression this in the RDF format seems simple enough. The thing I really love about this approach, is that your queries become machine readable stories, particularly when you start using human readable variants of SPARQL for this. And you can share these queries with the online scientific community with, for example, myExperiment.

There are two applications how I have used SPARQL on WikiPathways data for metabolomics: 1. curation; 2. statistics. Data analysis is harder, because in the RDF world resources scientific lenses are needed to accommodate for the chemical structural-temporal complexity of metabolites. For curation, we have long used SPARQL for unit tests to support the curation of WikiPathways. Moreover, I have manually used the SPARQL end point to find curation tasks. But now that the paper is out, I can blog about this more. For now, many examples SPARQL queries can be found in the WikiPathways wiki. It features several queries showing statistics, but also some for curation. This is an example query I use to improve the interoperability of WikiPathways with Wikidata (also for BridgeDb):

SELECT DISTINCT ?metabolite WHERE {
  ?metabolite a wp:Metabolite .
  OPTIONAL { ?metabolite wp:bdbWikidata ?wikidata . }
  FILTER (!BOUND(?wikidata))
}

Feel free to give this query a go at sparql.wikipathways.org!

Triptych
This papers completes a nice triptych of three papers about WikiPathways in the past 6 months. Thanks to whole community and the very many contributors! All three papers are linked below.

Waagmeester, A., Kutmon, M., Riutta, A., Miller, R., Willighagen, E. L., Evelo, C. T., Pico, A. R., Jun. 2016. Using the semantic web for rapid integration of WikiPathways with other biological online data resources. PLoS Comput Biol 12 (6), e1004989+. http://dx.doi.org/10.1371/journal.pcbi.1004989
Bohler, A., Wu, G., Kutmon, M., Pradhana, L. A., Coort, S. L., Hanspers, K., Haw, R., Pico, A. R., Evelo, C. T., May 2016. Reactome from a WikiPathways perspective. PLoS Comput Biol 12 (5), e1004941+. http://dx.doi.org/10.1371/journal.pcbi.1004941
Kutmon, M., Riutta, A., Nunes, N., Hanspers, K., Willighagen, E. L., Bohler, A., Mélius, J., Waagmeester, A., Sinha, S. R., Miller, R., Coort, S. L., Cirillo, E., Smeets, B., Evelo, C. T., Pico, A. R., Jan. 2016. WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Research 44 (D1), D488-D494. http://dx.doi.org/10.1093/nar/gkv1024

Posted at 09:44

June 23

Leigh Dodds: The state of open licensing

I spend a lot of time reading through licences and terms & conditions. Much more so than I thought I would when I first started getting involved with open data. After all, I largely just like making things with data.

But there’s still so much data that is

Posted at 19:03

June 22

AKSW Group - University of Leipzig: Should I publish my dataset under an open license?

Undecided, stand back we know flowcharts:

Did you ever try to apply the halting problem to a malformed flowchart?

 

Taken from my slides for my keynote  at TKE:

Linguistic Linked Open Data, Challenges, Approaches, Future Work from Sebastian Hellmann

Posted at 09:41

June 21

Leigh Dodds: From services to products

Over the course of my career I’ve done a variety of consulting projects as both an employee and freelancer. I’ve helped found and run a small consulting team. And, through my experience leading engineering teams, some experience of designing products and platforms. I’ve been involved in a few discussions, particularly over the last 12 months or so, around how to generate repeatable products off the back of consulting engagements.

I wanted to jot down a few thoughts here based on my own experience and a bit of background reading. I don’t claim to have any special insight or expertise, but the topic is one that I’ve encountered time and again. And as I’m trying to write things down more frequently, I thought I’d share my perspective in the hope that it may be useful to someone wrestling with the same issues.

Please comment if you disagree with anything. I’m learning too.

What are Products and Services?

Lets start with some definitions.

A service is a a bespoke offering that typically involves a high-level of expertise. In a consulting business you’re usually selling people or a team who have a particular set of skills that are useful to another organisation. While the expertise and skills being offered are common across projects, the delivery is usually highly bespoke and tailored for the needs of the specific client.

The outcomes of an engagement are also likely to be highly bespoke as you’re delivering to a custom specification. Custom software development, specially designed training packages, and research projects are all examples of services.

A product is a packaged solution to a known problem. A product will be designed to meet a particular need and will usually be designed for a specific audience. Products are often, but not always, software. I’m ignoring manufacturing here.

Products can typically be rapidly delivered as they can be installed or delivered via a well-defined process. While a product may be tailored for a specific client they’re usually very well-defined. Product customisation is usually a service in its own right. As is product support.

The Service-Product Spectrum

I think its useful to think about services and products being at opposite ends of a spectrum.

At the service end of the spectrum your offerings are:

  • are highly manual, because you’re reliant on expert delivery
  • are difficult to scale, because you need to find the people with the skills and expertise which are otherwise in short supply
  • have low repeatability, because you’re inevitably dealing with bespoke engagements

At the product end of the spectrum your offerings are:

  • highly automated, because you’re delivering a software product or following a well defined delivery process
  • scalable, because you need fewer (or at least different) skills to deliver the product
  • highly repeatable, because each engagement is well defined, has clear life-cycle, etc.

Products are a distillation of expertise and skills.

Actually, there’s arguably a stage before service. Lets call those “capabilities” to

Posted at 17:15

June 18

Leigh Dodds: “The Wizard of the Wash”, an open data parable

The fourth

Posted at 15:31

June 15

Dublin Core Metadata Initiative: Deadline of 15 July for DC-2016 Presentations and Best Practice Poster tracks

2016-06-15, The deadline of 15 July is approaching for abstract submissions for the Presentations on Metadata track and the Best Practice Poster track for DC-2016 in Copenhagen. Both tracks provide metadata practitioners and researchers the opportunity to present their work in Copenhagen. Neither of the tracks require a paper submission. Submit your proposal abstract for either track at http://dcevents.dublincore.org/index.php/IntConf/dc-2016/schedConf/cfp. Selections for presentation in Copenhagen will be made by the DC-2016 Organizing Team.

Posted at 23:59

Copyright of the postings is owned by the original blog authors. Contact us.