Planet RDF

It's triples all the way down

June 25

Bob DuCharme: Creating Wide CSV files with SPARQL

Lots of columns and commas, but all in the right place.

Posted at 14:47

June 23

Leigh Dodds: Lunchtime Lecture: “How you (yes, you) can contribute to open data”

The following is a written version of the lunchtime lecture I gave today at the Open Data Institute. I’ll put in a link to the video when it comes online. It’s not a transcript, I’m just writing down what I had planned to say.

Posted at 18:40

June 18

Sandro Hawke: Ridesharing 3.0: Forget About Uber

Uber, whatever its faults, provides value to its users, both the drivers and the riders. People appreciate or even enjoy the service, even if they don’t like the corporate behavior or economic disruption.   Solutions seem to mostly include boycotts (in favor of taxis or competitors like Lyft) and legal action.  But most of those those solutions are pushing water uphill, because people actually like the service.

I have another solution: let’s rebuild the service without any major company involved. Let’s help software eat the world on behalf of the users, not the stockholders. In this post, I’ll explain a way to do it.  It’s certainly not trivial, and has some risks, but I think it’s possible and would be a good thing.

The basic idea is similar to this: riders post to social media describing the rides they want, and drivers post about the rides they are available to give.  They each look around their extended social graph for posts that line up with what they want, and also check for reasons to trust or distrust each other. That’s about it. You could do this today on Twitter, but it would take some decent software to make it pleasant and reliable, to make the user experience as good as with Uber.  To be clear: I’m aiming for a user experience similar to the Uber app; I’m proposing using social media as an underlying layer, not a UI.

What’s deeply different in this model is the provider of the software does not control the market. If I build this today and get millions of users, someone else can come along tomorrow with a slightly better interface or nicer ads, and the customers can move easily, even in the middle of a ride.  In particular, the upstart doesn’t need to convince all the riders and driver to switch in order to bootstrap their system!  The market, with all its relationships and data are outside the ridesharing system. As a user, you wouldn’t even know what software the other riders and drivers are be using, unless they choose to tell you.

With this approach, open source solutions would also be viable.  Then the competition could arise quite literally tomorrow, as someone just forks a product and makes a few small changes.

This is no fun for big money investors looking for their unicorn exit, but it’s great for end users.  They get non-stop innovation, and serious competition for their business.

There are many details, below, including some open issues.  The details span areas of expertise, so I’m sure I’ve gotten parts incomplete or wrong. If this vision appeals to you, please help fill it in, in comments or posts of your own.

Critical Mass

Perhaps the hardest problem with establishing any kind of multi-sided market is getting a critical mass of buyers and sellers.  Why would a rider use the system, if there are not yet any drivers?  Why would drivers bother using the software when there are no riders? Each time someone tries the system, they find no one else there and they go away unhappy.

In this case, however, I think we have some options.   For example, existing drivers, including taxi operators, could start to use the software while they’re doing the driving they already do, with minimum additional effort.  Reasons to do it, in addition to optimistically wanting to help bootstrap this: it could help them keep track of their work, and it could establish a track record for when riders start to show up.

Similarly, riders could start to use it in areas without drivers if they understand they’re priming the pump, helping establish demand, and perhaps there were some fun or useful self-tracking features.

Various communities and businesses, not in the ridesharing business, might benefit from promoting this system: companies who have a lot of employees commuting, large events where parking is a bottleneck, towns with traffic issues, etc.  In these cases, in a niche, it’s much easier to get critical mass, which can then spread outward.

Finally, there are existing ridesharing systems that might choose to play in this open ecosystem, either because their motivation is noncommercial (eg eco carpooling) or because they see a way to share the market and still make their cut (eg taxi companies).

Privacy

In the model as I’ve described it so far, there’s no privacy. If I want a ride from San Francisco to Sebastopol, the whole world could see that. My friends might ask, uncomfortably, what I was doing in Sebastopol that day. This is a tricky problem, and there might not be a perfect solution.

In the worst case, the system ends up as only viable for the kind of trips you’re fine being public, perhaps your commute to work, or your trip to an event you’re going to post about anyway. But we can probably do better than that.  I currently see two imperfect classes of solution:

  1. Trust some third party organizations, perhaps to act as information brokers, seeing all the posts from both sides and informing each when there is a match, possibly masking some details. Or perhaps they certify drivers, which gives them access to your data, with an enforceable contract they’ll use it for these purposes only.
  2. Trust people to act appropriately when given the right social cues and pressure: basically, use advisory  access control, where anyone can see the data, but only after they clearly agree that they are acting as part of the ridesharing system and that they will only use the data for that purpose. There might be social or legal penalties for violating this agreement.

There might also be cryptographic solutions, perhaps as an application of homomorphic encryption, but I’m not yet aware of any results that would fully address this issue.

Personal Safety

When I was much younger, hitchhiking was common. If you wanted to go somewhere without having a car, you could stand on the side of the road and stick out your thumb. But there was some notion this might be dangerous for either party, and in some places it became illegal. (Plus there was that terrifying Rutger Hauer and C Thomas Howell movie.) There have been a few stories of assaults by Uber drivers, and the company claims to carefully vet drivers. So how could this work without a company like Uber, standing behind the drivers?

There are several approaches here, that can all work together:

  1. Remote trust assessment. Each party should be able to see data on the other before agreeing to the ride.  This might include social graph connections to the other person, reviews posted about the other person (and by whom), and official certifications about the other person  (including even: the ride is from a licensed taxicab).  When legally permissible this should, I think, even include information that might be viewed as discriminatory, like

Posted at 14:30

June 16

Sandro Hawke: Back to Blogging

As I remember it, about ten years ago, I started this blog for one main reason. I had just watched a talk from the CTO of Second Life (remember them, when they were hot?) about his vision for how to expand by opening up the system, making it decentralized.  I thought to myself: that’s going to be really hard, but I’ve thought about it a lot; I should blog my ideas about it.

As I dug into the problem, however, I realized how many sub-problems I couldn’t really solve, yet. So I never posted.   (Soon thereafter, Cory Ondrejka left the Second Life project, moving on to run engineering at Facebook.  Not sure if that’s ironic.)

Now, what feels like several lifetimes later, I’m ready to try again.

This time the “industry darling” I want to tackle first is Uber.  Okay, it’s already become widely hated, but the valuation is still, shall we say, … considerable.

So, coming soon: how to decentralized Uber.


Posted at 18:32

June 13

AKSW Group - University of Leipzig: SANSA 0.2 (Semantic Analytics Stack) Released

The AKSW and Smart Data Analytics groups are happy to announce SANSA 0.2 – the second release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing for semantic technologies in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

  • Reading and writing RDF files in N-Triples format
  • Reading OWL files in various standard formats
  • Querying and partitioning based on Sparqlify
  • RDFS/RDFS Simple/OWL-Horst forward chaining inference
  • RDF graph clustering with different algorithms
  • Rule mining from RDF graphs

Deployment and getting started:

  • There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
  • The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
  • There is example code for various tasks available.
  • We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular, the projects Big Data Europe,  HOBBIT , SAKE and Big Data Ocean.

SANSA Development Team

Posted at 16:18

June 12

AKSW Group - University of Leipzig: AKSW at ESWC 2017

Hello Community! The ESWC 2017 just ended and we give a short report of the course at the conference, especially regarding the AKSW-Group.

Our members Dr. Muhammad Saleem, Dr. Mohamed Ahmed Sherif, Claus Stadler, Michael Röder, Prof. Dr. Jens Lehmann and Edgard Marx participated at the conference. They held a number of presentations, workshops and tutorials:

Michael Röder

Mohamed Ahmed Sherif

Muhammad Saleem

Edgard Marx

  • Presented a Workshop paper „Exploring the Evolution and Provenance of Git Versioned RDF Data“ by Natanael Arndt, Patrick Naumann and Edgard Marx
  • Presented a demo paper „Kbox – Distributing ready-to-query RDF Knowledge Graphs“ by Edgard Marx, Tommaso Soru, Ciro Baron Neto and Sandro Coelho

Claus Stadler

  • Presented a Workshop paper in QuWeDa „JPA Criteria Queries over RDF Data“ by Claus Stadler, Jens Lehmann

The final versions of the papers from Edgard and Claus will be made available soon.

As every year the ESWC also awarded the best papers and studies in several categories. The award for Best Challenge Paper went to: “End-to-end Representation Learning for Question Answering with Weak Supervision” by Daniil Sorokin and Iryna Gurevych. The paper is part of the HOBBIT project by AKSW. Congrats to the winners!

Have a look at all the winners at ESWC 2017: http://2017.eswc-conferences.org/awards.

Posted at 08:53

June 11

Libby Miller: Outline a bitmap in Inkscape

I keep doing this for lasercuts but getting a double outline instead of a single outline, and so a double cut. This is because (

Posted at 18:02

June 10

AKSW Group - University of Leipzig: Four papers accepted at WI 2017

Hello Community! We proudly announce that The International Conference on Web Intelligence (WI) accepted four papers by our group. The WI takes place in Leipzig between the 23th – 26th of August. The accepted papers are:

“An Evaluation of Models for Runtime Approximation in Link Discovery” by Kleanthi Georgala, Michael Hoffmann, and Axel-Cyrille Ngonga Ngomo.

Abstract: Time-efficient link discovery is of central importance to implement the vision of the Semantic Web. Some of the most rapid Link Discovery approaches rely internally on planning to execute link specifications. In newer works, linear models have been used to estimate the runtime the fastest planners. However, no other category of models has been studied for this purpose so far. In this paper, we study non-linear runtime estimation functions for runtime estimation. In particular, we study exponential and mixed models for the estimation of the runtimes of planners. To this end, we evaluate three different models for runtime on six datasets using 400 link specifications. We show that exponential and mixed models achieve better fits when trained but are only to be preferred in some cases. Our evaluation also shows that the use of better runtime approximation models has a positive impact on the overall execution of link specifications.

“CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link Repositories” by Andre Valdestilhas, Tommaso Soru and Axel-Cyrille Ngonga Ngomo.

Abstract: More than 500 million facts on the Linked Data Web are statements across knowledge bases. These links are of crucial importance for the Linked Data Web as they make a large number of tasks possible, including  cross-ontology, question answering and federated queries. However, a large number of these links are erroneous and can thus lead to these applications producing absurd results. We present a time-efficient and complete approach for the detection of erroneous links for properties that are transitive. To this end, we make use of the semantics of URIs on the Data Web and combine it with an efficient graph partitioning algorithm. We then apply our algorithm to the LinkLion repository and show that we can analyze 19,200,114 links in 4.6 minutes. Our results show that at least 13% of the owl:sameAs links we considered are erroneous. In addition, our analysis of the  provenance of links allows discovering agents and knowledge bases that commonly display poor linking. Our algorithm can be easily executed in parallel and on a GPU. We show that these implementations are up to two orders of magnitude faster than classical reasoners and a non-parallel implementation.

“LOG4MEX: A Library to Export Machine Learning Experiments” by Diego Esteves, Diego Moussallem, Tommaso Soru, Ciro Baron Neto, Jens Lehmann, Axel-Cyrille Ngonga Ngomo and Julio Cesar Duarte.

Abstract: A choice of the best computational solution for a particular task is increasingly reliant on experimentation. Even though experiments are often described through text, tables, and figures, their descriptions are often incomplete or confusing. Thus, researchers often have to perform lengthy web searches for reproducing and understanding the results. In order to minimize this gap, vocabularies and ontologies have been proposed for representing data mining and machine learning (ML) experiments. However, we still lack proper tools to export properly these metadata. To this end, we present an open-source library dubbed LOG4MEX which aims at supporting the scientific community to fulfill this gap.

“GENESIS – A Generic RDF Data Access Interface” by Tim Ermilov, Diego Moussallem, Ricardo Usbeck and Axel-Cyrille Ngonga Ngomo

Abstract: The availability of billions of facts represented in RDF on the Web provides novel opportunities for data discovery and access. In particular, keyword search and question answering approaches enable even lay people to access this data. However, the interpretation of the results of these systems, as well as the navigation through these results, remains challenging. In this paper, we present GENESIS, a generic RDF data access interface. GENESIS can be deployed on top of any knowledge base and search engine with minimal effort and allows for the representation of RDF data in a layperson-friendly way. This is facilitated by the modular architecture for reusable components underlying our framework. Currently, these include a generic search back-end, together with corresponding interactive user interface components based on a service for similar and related entities as well as verbalization services to bridge between RDF and natural language.

The final versions of the papers will be made available soon.

Come over to WI 2017 and enjoy the talks. More information on the program can be found here.

Posted at 13:01

June 07

Ebiquity research group UMBC: UMBC Seeks Professor of the Practice to Head new Data Science Program

The University of Maryland, Baltimore County is looking to hire a Professor of the Practice to head a new graduate program in Data Science. See the job announcement for more information and apply online at Interfolio.

In addition to developing and teaching graduate data science courses, the new faculty member will serve as the Graduate Program Director of UMBC’s program leading to a master’s degree in Data Science. This cross-disciplinary program is offered to professional students through a partnership between the College of Engineering and Information Technology; the College of Arts, Humanities and Social Sciences; the College of Natural and Mathematical Sciences; the Department of Computer Science and Electrical Engineering; and UMBC’s Division of Professional Studies.

Posted at 16:23

May 29

Bob DuCharme: Instead of writing SPARQL queries for Wikipedia--query for them!

Queries as data to help you get at more data.

Posted at 15:11

May 26

AKSW Group - University of Leipzig: AKSW Colloquium, 29.05.2017, Addressing open Machine Translation problems with Linked Data.

At the AKSW Colloquium, on Monday 29th of May 2017, 3 PM, Diego Moussallem will present two papers related to his topic. First paper titled “Using BabelNet to Improve OOV Coverage in SMT” of Du et al., which was presented at LREC 2016 and the second paper titled “How to Configure Statistical Machine Translation with Linked Open Data Resources” of Srivastava et al., which was presented at AsLing 2016.

Posted at 11:51

May 25

Leigh Dodds: Where can you contribute to open data? Yes, you!

This is just a quick post to gather together some pointers and links that were shared in answer to a question I asked on twitter yesterday:

Posted at 17:42

May 19

Dublin Core Metadata Initiative: How to Design and Build Semantic Applications with Linked Data

2017-06-16, Registration for DC-2017 is now open at http://dcevents.dublincore.org/IntConf/index/pages/view/reg17. The International Conference takes place on Thursday through Saturday, 26-28 October and the DCMI Annual Meeting occurs on Sunday, 29 October. The full conference rate includes two days of presentations, papers, project reports, posters and special sessions and the third day of workshops. A day rate is available for all three conference days. DC-2017 in Washington DC is collocated in the same venue with the ASIS&T Annual Meeting that takes place from 27 October through 1 November. Special rates for the ASIS&T meeting are available to DCMI members. For more information and to register, visit the DC-2017 conference website at http://dcevents.dublincore.org/index.php/IntConf/dc-2017/schedConf/.

Posted at 23:59

Dublin Core Metadata Initiative: DCMI opens registration for DC-2017 in Washington, DC

2017-06-16, @@@

Posted at 23:59

Leigh Dodds: Can you publish tweets as open data?

Can you publish data from twitter as open data? The short answer is: No. Read on for some notes, pointers and comments.

Twitter’s developer policy places a number of restrictions on your use of their API and the data you get from it. Some of the key ones are:

  • In the

Posted at 08:15

May 18

Leigh Dodds: Enabling data forensics

I’m interested in how people share information, particularly data, on social networks. I think it’s something to which it’s worth paying attention, so we can ensure that it’s easy for people to share insights and engage in online debates.

There’s lots of discussion at the moment around fact checking and similar ways that we can improve the ability to identify reliable and unreliable information online. But there may be other ways that we can make some small improvements in order to help people identify and find sources of data.

Data forensics is a term that usually 

Posted at 18:54

May 15

Ebiquity research group UMBC: Modeling and Extracting information about Cybersecurity Events from Text

Ph.D. Dissertation Proposal

Modeling and Extracting information about Cybersecurity Events from Text

Taneeya Satyapanich

Tuesday, 16 May 2017, ITE 325, UMBC

People rely on the Internet to carry out much of the their daily activities such as banking, ordering food and socializing with their family and friends. The technology facilitates our lives, but also comes with many problems, including cybercrimes, stolen data and identity theft. With the large and increasing number of transaction done every day, the frequency of cybercrime events is also increasing. Since the number of security-related events is too high for manual review and monitoring, we need to train machines to be able to detect and gather data about potential cybersecurity threats. To support machines that can identify and understand threats, we need standard models to store the cybersecurity information and information extraction systems that can collect information to populate the models with data from text.

This dissertation will make two major contributions. The first is to extend our current cyber security ontologies with better models for relevant events, from atomic events like a login attempt, to an extended but related series of events that make up a campaign, to generalized events, such as an increase in denial-of-service attacks originating from a particular region of the world targeted at U.S. financial institutions. The second is the design and implementation of a event extraction system that can extract information about cybersecurity events from text and populated a knowledge graph using our cybersecurity event ontology. We will extend our previous work on event extraction that detected human activity events from news and discussion forums. A new set of features and learning algorithms will be introduced to improve the performance and adapt the system to cybersecurity domain. We believe that this dissertation will be useful for cybersecurity management in the future. It will quickly extract cybersecurity events from text and fill in the event ontology.

Committee: Drs. Tim Finin (chair), Anupam Joshi, Tim Oates and Karuna Joshi

Posted at 17:09

Ebiquity research group UMBC: new paper: Modeling the Evolution of Climate Change Assessment Research Using Dynamic Topic Models and Cross-Domain Divergence Maps

Jennifer Sleeman, Milton Halem, Tim Finin, and Mark Cane, Modeling the Evolution of Climate Change Assessment Research Using Dynamic Topic Models and Cross-Domain Divergence Maps, AAAI Spring Symposium on AI for Social Good, AAAI Press, March, 2017.

Climate change is an important social issue and the subject of much research, both to understand the history of the Earth’s changing climate and to foresee what changes to expect in the future. Approximately every five years starting in 1990 the Intergovernmental Panel on Climate Change (IPCC) publishes a set of reports that cover the current state of climate change research, how this research will impact the world, risks, and approaches to mitigate the effects of climate change. Each report supports its findings with hundreds of thousands of citations to scientific journals and reviews by governmental policy makers. Analyzing trends in the cited documents over the past 30 years provides insights into both an evolving scientific field and the climate change phenomenon itself. Presented in this paper are results of dynamic topic modeling to model the evolution of these climate change reports and their supporting research citations over a 30 year time period. Using this technique shows how the research influences the assessment reports and how trends based on these influences can affect future assessment reports. This is done by calculating cross-domain divergences between the citation domain and the assessment report domain and by clustering documents between domains. This approach could be applied to other social problems with similar structure such as disaster recovery.

Posted at 13:30

May 14

Ebiquity research group UMBC: Fact checking the fact checkers fact check metadata

TL;DR: Some popular fact checking sites are saying that false is true and true is false in their embedded metadata 

I’m a fan of the schema.org claimReview tags for rendering fact checking results as metadata markup embedded in the html that can be easily understood by machines. Google gave a plug for this last Fall and more recently announced that it has broadened its use of the fact checking metadata tags.  It’s a great idea and could help limit the spread of false information on the Web.  But its adoption still has some problems.

Last week I checked to see if the Washington Post is using schema.org’s ClaimReview in their Fact Checker pieces. They are (that’s great!) but WaPo seems to have misunderstood the semantics of the markup by reversing the reviewRating scale, with the result that it assets the opposite of its findings.  For an example, look at this Fact Checker article reviewing claims made by HHS Secretary Tom Price on the AHCA which WaPo rates as being very false, but gives it a high reviewRating of 5 on their scale from 1 to 6.  According to the schema.org specification, this means it’s mostly true, rather than false. ??

WaPo’s Fact Check article ratings assign a checkmark for a claim they find true and from one to four ‘pinocchios‘ for claims they find to be partially (one) or totally (four) false. They also give no rating for claims they find unclear and a ‘flip-flop‘ rating for claims on which a person has been inconsistent. Their reviewRating metadata specifies a worstRating of 1 and a bestRating of 6. They apparently map a checkmark to 1 and ‘four pinocchios‘ to 5. That is, their mapping is {-1:’unclear’; 1:’check mark’, 2:’1 pinocchio’, …, 5:’4 pinocchios’, 6:’flip flop’}. It’s clear from the schema.org ClaimReview examples that that a higher rating number is better and it’s implicit that it is better for a claim to be true.  So I assume that the WaPo FactCheck should reverse its scale, with ‘flip-flop‘ getting a 1, ‘four pinocchios‘ mapped to a 2 and a checkmark assigned a 6.

WaPo is not the only fact checking site that has got this reversed. Aaron Bradley pointed out early in April that Politifact had it’s scale reversed also. I checked last week and confirmed that this was still the case, as this example shows. I sampled a number of Snope’s ClaimCheck ratings and found that all of them were -1 on a scale of -1..+1, as in this example.

It’s clear how this mistake can happen.  Many fact checking sites are motivated by identifying false facts, so have native scales that go from the mundane true statement to the brazen and outrageous completely false.  So a mistake of directly mapping this linear scale into the numeric one from low to high is not completely surprising.

While the fact checking sites that have made this mistake are run by dedicated and careful investigators, the same care has not yet been applied in implementing the semantic metadata embedded in their pages on for their sites.

Posted at 03:26

May 13

Ebiquity research group UMBC: New paper: A Question and Answering System for Management of Cloud Service Level Agreements

Sudip Mittal, Aditi Gupta, Karuna Pande Joshi, Claudia Pearce and Anupam Joshi, A Question and Answering System for Management of Cloud Service Level Agreements,  IEEE International Conference on Cloud Computing, June 2017.

One of the key challenges faced by consumers is to efficiently manage and monitor the quality of cloud services. To manage service performance, consumers have to validate rules embedded in cloud legal contracts, such as Service Level Agreements (SLA) and Privacy Policies, that are available as text documents. Currently this analysis requires significant time and manual labor and is thus inefficient. We propose a cognitive assistant that can be used to manage cloud legal documents by automatically extracting knowledge (terms, rules, constraints) from them and reasoning over it to validate service performance. In this paper, we present this Question and Answering (Q&A) system that can be used to analyze and obtain information from the SLA documents. We have created a knowledgebase of Cloud SLAs from various providers which forms the underlying repository of our Q&A system. We utilized techniques from natural language processing and semantic web (RDF, SPARQL and Fuseki server) to build our framework. We also present sample queries on how a consumer can compute metrics such as service credit.

Posted at 17:56

May 11

AKSW Group - University of Leipzig: SML-Bench 0.2 Released

Dear all,

we are happy to announce the 0.2 release of SML-Bench, our Structured Machine Learning benchmark framework. SML-Bench provides full benchmarking scenarios for inductive supervised machine learning covering different knowledge representation languages like OWL and Prolog. It already comes with adapters for prominent inductive learning systems like the DL-Learner, the General Inductive Logic Programming System (GILPS), and Aleph, as well as Inductive Logic Programming ‘classics’ like Golem and Progol. The framework is easily extensible, be it in terms of new benchmarking scenarios, or support for new learning systems. SML-Bench allows to define, run and report on benchmarks combining different scenarios and learning systems giving insight into the performance characteristics of the respective inductive learning algorithms on a wide range of learning problems.

Website: http://sml-bench.aksw.org/
GitHub page: https://github.com/AKSW/SML-Bench/
Change log: https://github.com/AKSW/SML-Bench/releases/tag/0.2

In the current release we extended the options to configure learning systems in the overall benchmarking configuration, and added support for running multiple instances of a learning system, as well as the nesting of instance-specific settings and settings that apply to all instances of a learning system. Besides internal refactoring to increase the overall software quality, we also extended the reporting capabilities of the benchmark results. We added a new benchmark scenario and experimental support for the Statistical Relational Learning system TreeLiker.

We want to thank everyone who helped to create this release and appreciate any feedback.

Best regards,

Patrick Westphal, Simon Bin, Lorenz Bühmann and Jens Lehmann

Posted at 11:01

May 08

Leigh Dodds: Adventures in geodata

I spend a lot of my professional life giving people advice. Mostly around how to publish and use open data. In order to make sure I give people the best advice I can, I try and spend a lot of time actually publishing and using open data. A mixture of research and practical work is the best way I’ve found of improving my own

Posted at 20:28

AKSW Group - University of Leipzig: AKSW Colloquium, 08.05.2017, Scalable RDF Graph Pattern Matching

At the AKSW Colloquium, on Monday 8th of May 2017, 3 PM, Lorenz Bühmann will discuss a paper titled “Type-based Semantic Optimization for Scalable RDF Graph Pattern Matching” of Kim et al. Presented at WWW 2017, this work proposes a scalable query processing approach on RDF data that relies on early and aggressive determination and pruning of query-irrelevant data. The paper describes ongoing work as part of the RAPID+ platform project.

Abstract

Scalable query processing relies on early and aggressive determination and pruning of query-irrelevant data. Besides the traditional space-pruning techniques such as indexing, type-based optimizations that exploit integrity constraints defined on the types can be used to rewrite queries into more efficient ones. However, such optimizations are only applicable in strongly-typed data and query models which make it a challenge for semi-structured models such as RDF. Consequently, developing techniques for enabling type-based query optimizations will contribute new insight to improving the scalability of RDF processing systems.

In this paper, we address the challenge of type-based query optimization for RDF graph pattern queries. The approach comprises of (i) a novel type system for RDF data induced from data and ontologies and (ii) a query optimization and evaluation framework for evaluating graph pattern queries using type-based optimizations. An implementation of this approach integrated into Apache Pig is presented and evaluated. Comprehensive experiments conducted on real-world and synthetic benchmark datasets show that our approach is up to 500X faster than existing approaches

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 07:42

April 23

Bob DuCharme: The Wikidata data model and your SPARQL queries

Reference works to get you taking advantage of the fancy parts quickly.

Posted at 14:43

April 21

Dublin Core Metadata Initiative: Webinar: Me4MAP - A method for the development of metadata application profiles

2017-04-21, A metadata application profile (MAP) is a construct that provides a semantic model for enhancing interoperability when publishing data to the Web of Data. When a community of practice agrees to follow a MAP's set of rules for publishing data as Linked Open Data, it makes it possible for such data to be processed automatically by software agents. Therefore, the existence of a method for MAP development is essential to providing developers with a common ground on which to work. The absence of such a method leads to a non-systematic set of MAP development activities that frequently results in MAPs of lesser quality. This Webinar with Mariana Curado Malta, Polythecnic of Oporto, Portugal, will present Me4MAP, a method for the development of metadata application profiles. The webinar will be presented twice, once in English and once in Portuguese. For more information about the webinar and to register, visit http://dublincore.org/resources/training/#2017Malta.

Posted at 23:59

Dublin Core Metadata Initiative: NKOS Workshop at DC-2017 in Washington, DC

2017-04-21, The 11th U.S. Networked Knowledge Organization Systems (NKOS) Workshop will take place on Saturday, October 28 as part of DC-2017 in Crystal City, VA (Washington, D.C.). The Call for Participation including presentations and demos is available at http://dcevents.dublincore.org/IntConf/index/pages/view/nkosCall.

Posted at 23:59

April 19

AKSW Group - University of Leipzig: ESWC 2017 accepted two Demo Papers by AKSW members

Hello Community! The 14th ESWC, which takes place from May 28th to June 1st 2017 in Portoroz, Slovenia, accepted two demos to be presented at the conference. Read more about them in the following:                                                                        

1. “KBox Distributing Ready-to-query RDF Knowledge Graphs by Edgard Marx, Ciro Baron, Tommaso Soru and Sandro Athaide Coleho

Abstract: The Semantic Web community has successfully contributed to a remarkable number of RDF datasets published on the Web.However, to use and build applications on top of Linked Data is still a cumbersome and time-demanding task.We present \textsc{KBox}, an open-source platform that facilitates the distribution and consumption of RDF data.We show the different APIs implemented by \textsc{KBox}, as well as the processing steps from a SPARQL query to its corresponding result.Additionally, we demonstrate how \textsc{KBox} can be used to share RDF knowledge graphs and to instantiate SPARQL endpoints.

Please see: https://www.researchgate.net/publication/315838619_KBox_Distributing_Ready-to-query_RDF_Knowledge_Graphs

and

https://www.researchgate.net/publication/305410480_KBox_–_Transparently_Shifting_Query_Execution_on_Knowledge_Graphs_to_the_Edge

2. “EAGLET – All That Glitters is not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking“ by Kunal Jha, Michael Röder and Axel-Cyrille Ngonga Ngomo

The desideratum to bridge the unstructured and structured data on theweb has lead to the advancement of a considerable number of annotation tools andthe evaluation of these Named Entity Recognition and Entity Linking systems isincontrovertibly one of the primary tasks. However, these evaluations are mostlybased on manually created gold standards. As much these gold standards have anupper hand of being created by a human, it also has room for major proportionof over-sightedness. We will demonstrate EAGLET, a tool that supports the semi-automatic checking of a gold standard based on a set of uniform annotation rules.

Please also see: https://svn.aksw.org/papers/2017/ESWC_EAGLET_2017/public.pdf

Posted at 08:19

April 08

Ebiquity research group UMBC: Google search now includes schema.org fact check data

Google claims on their search blog that “Fact Check now available in Google Search and News”.  We’ve sampled searches on Google and found that some results did indeed include Fact Check data from schema.org’s ClaimReview markup.  So we are including the following markup on this page.

    
    <script type="application/ld+json">
    {
      "@context": "http://schema.org",
      "@type": "ClaimReview",
      "datePublished": "2016-04-08",
      "url": "http://ebiquity.umbc.edu/blogger/2017/04/08/google-search-now-
              including-schema-org-fact-check-data",
      "itemReviewed":
      {
        "@type": "CreativeWork",
        "author":
        {
          "@type": "Organization",
          "name": "Google"
        },
        "datePublished": "2016-04-07"
      },
      "claimReviewed": "Fact Check now available in Google search and news",
      "author":
      {
        "@type": "Organization",
        "Name": "UMBC Ebiquity Research Group",
        "url": "http://ebiquity.umbc.edu/"
      },
      "reviewRating":
      {
        "@type": "Rating",
        "ratingValue": "5",
        "bestRating": "5",
        "worstRating": "1",
        "alternateName" : "True"
      }
    }</script>

Google notes that

“Only publishers that are algorithmically determined to be an authoritative source of information will qualify for inclusion. Finally, the content must adhere to the general policies that apply to all structured data markup, the Google News Publisher criteria for fact checks, and the standards for accountability and transparency, readability or proper site representation as articulated in our Google News General Guidelines. If a publisher or fact check claim does not meet these standards or honor these policies, we may, at our discretion, ignore that site’s markup.”

and we hope that the algorithms will find us to be an authoritative source of information.

You can see the actual markup by viewing this page’s source or looking at the markup that Google’s structured data testing tool finds on it here by clicking on ClaimReview in the column on the right.

Update: We’ve been algorithmically determined to be an authoritative source of information!

Posted at 14:39

April 07

AKSW Group - University of Leipzig: AKSW Colloquium, 10.04.2017, GeoSPARQL on geospatial databases

At the AKSW Colloquium, on Monday 10th of April 2017, 3 PM, Matthias Wauer will discuss a paper titled “Ontop of Geospatial Databases“. Presented at ISWC 2016, this work extends an ontology based data access (OBDA) system with support for GeoSPARQL for querying geospatial relational databases. In the evaluation section, they compare their approach to Strabon. The work is partially supported by the Optique and Melodies EU projects.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 08:43

April 02

W3C Read Write Web Community Group: Read Write Web — Q1 Summary — 2017

Summary

A quiet start to 2017 as people prepare for www 2017 and ESWC.  An active political quarter saw the inauguration of a US new president, and numerous concerns raised about new laws regarding the privacy at the ISP level.

The Linked Open Data cloud continues to grow and has a neat update here.  There has also been a release of the SHACL playground which allows data to be validated according to various “shapes“.

Linked Data Notifications has become a Proposed Recommendation, and will allow users of the web to have a data inbox, and enable a whole host of use cases.

Communications and Outreach

Collaboration has started to begun with two cloud providers, nextcloud and cozy cloud.  Hopefully this will bring read and write web standards to a wider audience, over time.

 

Community Group

Some ideas for extending the way PATCH works has been described by TimBL.  I found interesting the way data can be transmitted over other protocols than the web

– When clients of listening to the same resource are in fact located physically close, they could exchange patches through other medium like wifi or bluetooth.

– The system can evolve (under stress) to work entirely with distributed patches, making the original HTTP server unnecessary

– The patches could be combined with hashes of versions of folders to be the basis for a git-like version control system, or connect to git itself

solid

Applications

There is a new test website for the openid authentication branch of node solid server and solid client has been updated to work with this.  There have been various fixes to rdf and solid libraries, and two new repositories for solid notifications and solid permissions.

Good work has continued on rabel, a program for reading and writing linked data in various formats.  In addition the browser shimmed apps on solid-ui, solid-app-set continue to improve.  Finally, *shameless plug*, I am writing a gitbook on a skinned version of node solid server, bitmark storage, which hopes to integrate solid with crypto currencies, creating self funding storage.

Last but not Least…

On the topic of crypto currencies, I’m very excited about a draft paper released on semantic block chains.  There was some buzz generated around this topic and hopefully will feature in a workshop next quarter.

Posted at 11:04

Copyright of the postings is owned by the original blog authors. Contact us.