Planet RDF

It's triples all the way down

August 14

Tetherless World Constellation group RPI: WebSci ’17

The Web Science Conference was hosted by Rensselaer Polytechnic Institute this year. The Tetherless World Constellation was heavily involved in organizing the event and ensuring the conference ran smoothly.The venue for the conference was the Franklin Plaza in downtown Troy. It was a great venue, with a beautiful rooftop.

On 25th June, there were a set of workshops organized for the attendees. I was a student volunteer at the “Algorithm Mediated Online Information Access (AMOIA)” workshop. We started the day off with a set of talks. The common theme for these talks were to reduce the bias in services we use online. We then spent the next few hours in a discussion on the “Role of recommendation algorithms in online hoaxes and fake news.”

Prof. Peter Fox and Prof Deborah McGuinness, who were the Main Conference Chairs, kicked off the Conference on 26th June. Steffen Staab gave his keynote talk on “The Web We Want“.  After the keynote talk, we jumped right into a series of talks. A few topics caught my attention during each session. Venkata Rama Kiran Garimella’s talk on “The Effect of Collective Attention on Controversial Debates on Social Media” was very interesting, as was the talk on “Recommendations for groups in location-based social networks” by Fred Ayala. We ended the talks with a Panel disscussion on “The ethics of doing Web Science”. After the panel discussions, we headed to the roof for some dinner and the Web Science Poster Session. There were plenty of Posters at the session. Congrui Li and Spencer Norris from TWC presented their work at the poster session.

 

27th of June was the day of the conference I was most looking forward to, since they had a session on “Networks : Structure, Identifiers, Search”. I found all the talk presented here very fascinating and useful. Particularly the talk “Herirachichal Change Point Detection” and “Adaptive Edge Probing” by Yu Wang and Sucheta Soundarajan respectively. I plan to use the work they presented in one of my current research projects. At the end of the day on 27th June, the award for the papers and posters were presented. Helena Webb won the best paper award. She presented her work on “The ethical challenges of publishing Twitter data for research dissemination”. Venkata Garimella won the best student paper award. Tetherless’ own Spencer Norris won the best poster award.

On 28th June, we started the day of by giving a set of talks on the topic chosen for the Hackthon, “Network Analysis for Non-Social Data”. Here I presented my work on how Network Analysis techniques can be leveraged and applied in the field of Earth Science. After these talk, the hackathon presentations were made by the participants. At lunch , Ahmed Eliesh from TWC won first place in the Hackathon. After lunch, we had the last 2 sessions at WebSci ’17. In these talks, Shawn Jones’ talk present Yasmin Alnomany’s work on “Generating Stories from Archived Collections” and Helena Webb’s best paper winning talk on “The ethical challenges of publishing Twitter data for research dissemination” piqued my interest.

Overall, attending the web science conference was a very valuable experience for me. There was plenty to learn, lots of networking opportunities and a generally jovial atmosphere around the conference. Here’s Looking forward to the next year’s conference in Amsterdam.

 

 

Posted at 21:01

August 07

Leigh Dodds: Bath Playbills 1812-1851

This weekend I published scans of over 2000 historical playbills for the Theatre Royal in Bath. Here are some notes on whey they come from and how they might be useful.

The scans are

Posted at 06:39

August 01

Leigh Dodds: We can strengthen data infrastructure by analysing open data

Posted at 10:13

July 31

Leigh Dodds: Experiences with the Freestyle Libre

Posted at 20:01

Leigh Dodds: Thank you for the data

Here are three anecdotes that show ways in which I’ve shared data with different types of organisation, and how they’ve shared data with me.

Last year we donated some old children’s toys and books to Julian House. When we dropped them off, I signed a

Posted at 16:11

Libby Miller: Libbybot eleven – webrtc / pi3 / presence robot

The libbybot posable presence robot’s latest instructions are 

Posted at 09:32

July 30

Bob DuCharme: The W3C standard constraint language for RDF: SHACL

A brief history of the new standard and some toys to play with it.

Posted at 15:46

AKSW Group - University of Leipzig: AKSW at ISWC2017

We are very pleased to announce that AKSW will be presenting 2 papers at ISWC 2017, which will be held on 21-24 October in Vienna, Austria. The demo and workshops papers have to be announced.
The International Semantic Web Conference (ISWC) is the premier international forum where Semantic Web / Linked Data researchers, practitioners, and industry specialists come together to discuss, advance, and shape the future of semantic technologies on the web, within enterprises and in the context of the public institution.

Here is the list of the accepted paper with their abstract:

Distributed Semantic Analytics using the SANSA Stack” by Jens LehmannGezim SejdiuLorenz BühmannPatrick WestphalClaus Stadler, Ivan ErmilovSimon Bin, Muhammad Saleem, Axel-Cyrille Ngonga Ngomo and Hajira Jabeen.

Abstract:A major research challenge is to perform scalable analysis of large-scale knowledge graphs to facilitate applications like link prediction, knowledge base  completion and reasoning. Analytics methods which exploit expressive structures usually do not scale well to very large knowledge bases, and most analytics approaches which do scale horizontally (i.e., can be executed in a distributed environment) work on simple feature-vector-based input. This software framework paper describes the ongoing Semantic Analytics Stack (SANSA) project, which supports expressive and scalable semantic analytics by providing functionality for distributed computing on RDF data.

Iguana : A Generic Framework for Benchmarking the Read-Write Performance of Triple Stores” by Felix ConradsJens Lehmann, Axel-Cyrille Ngonga Ngomo, Muhammad Saleem, and Mohamed Morsey.

Abstract  :The performance of triples stores is crucial for applications which rely on RDF data. Several benchmarks have been proposed that assess the performance of triple stores. However, no integrated benchmark-independent execution framework for these benchmarks has been provided so far. We propose a novel SPARQL benchmark execution framework called IGUANA. Our framework complements benchmarks by providing an execution environment which can measure the performance of triple stores during data loading, data updates as well as under different loads. Moreover, it allows a uniform comparison of results on different benchmarks. We execute the FEASIBLE and DBPSB benchmarks using the IGUANA framework and measure the performance of popular triple stores under updates and parallel user requests. We compare our results with state-of-the-art benchmarking results and show that our benchmark execution framework can unveil new insights pertaining to the performance of triple stores.

Thank you and looking forward to see you at ISWC 2017.

Acknowledgments
These work were supported by the European Union’s H2020 research and innovation action HOBBIT (GA no. 688227), the European Union’s H2020 research and innovation program BigDataEurope (GA no.644564), German Ministry BMWI under the SAKE project (Grant No. 01MD15006E), WDAqua : Marie Skłodowska-Curie Innovative Training Network and Industrial Data Space.

Posted at 03:57

July 26

Leigh Dodds: “The Rock Thane”, an open data parable

In a time long past, in a land far away, there was a once a troubled kingdom. Despite the efforts of the King to offer justice freely to all, many of his subjects were troubled by unscrupulous merchants and greedy landowners. Time and again, the King heard claims of goods not being delivered, or disputes over land.

While the merchants and landowners were able to produce documents and affidavits to their defence, the King grew increasingly troubled. He felt that his subjects were being wronged, and he grew distrustful of the scribes that thronged the hallways of his courts and marketplaces.

One day, three wizards visited the kingdom. The wizards had travelled from the Far East, where as Masters of the Satoshi School, they had developed many curious spells. The three wizards were brothers. Na was the youngest, and was made to work hardest by his elder brothers, Ka and Mo. Mo, the eldest, was versed in many arts still unknown to his brothers.

Their offer to the King was simple: through the use of their magic they would remove all corruption from his lands. In return they would expect to be well paid for their efforts. Keen to be a just and respected ruler, the King agreed to the wizards’ plan. But while their offer was simple, the plan itself was complex.

The wizards explained that, through an obscure art, they could cause words and images to appear within a certain type of rock, or crystal which could be found commonly throughout the land. Once imbued with words, a crystal could no longer be changed even by a powerful wizard. In a masterful show of power, Ka and Mo embedded the King’s favourite poem and then a painting of his mother in a pair of crystals of the highest quality.

The wizards explained that rather than relying on parchment which could be faked or changed through the cunning application of pumice stones, they could use inscribed crystals to create indelible records of trading bills, property sales and other important documents.

The wizards also demonstrated to the King how, by channelling the power of their masters, groups of their acolytes could simultaneously record the same words in crystals all across the land. This meant that not only would there be an indisputable record of a given trade, but that there would immediately be dozens of copies available across the land, for anyone to check. Readily available and verifiable copies of any bill of trade would mean that no merchant could ever falsify a transaction.

In payment, the wizards would receive a gold piece for every crystal inscribed by their acolytes. Each crystal providing a clear proof of their works.

Impressed, the King decreed that henceforth, all across his lands, trading would now be carried out in trading posts staffed by teams of the wizards’ acolytes.

And, for a time, everything was fine.

But the King began to again receive troubling reports about trading disputes. Trust was failing once again. Speaking to his advisers and visiting some of the new trading posts, the King learned the source of the concerns.

When trading bills had been written on parchment, they could be read by anyone. This made them accessible to all. But only the wizards and their acolytes could read the words inscribed in the crystals. And the King’s subjects didn’t trust them.

Demanding an explanation, the King learnt that Na, the youngest wizard, had been tasked with providing the power necessary to inscribe the crystals. Not as versed in the art as his elder brothers, he was only able to inscribe the crystals with a limited number of words and only the haziest of images. Rather than inscribing easily readable bills of trade, Na and the acolytes were making inscriptions in a cryptic language known only to wizards.

Anyone wanting to read a bill had to request an acolyte to interpret it for them. Rumours had been spreading that the acolytes could be paid to interpret the runes in ways that were advantageous to those with sufficient coin.

The middle brother, Ka, attempted to placate the enraged King, proposing an alternative arrangement. He would oversee the inscribing of the crystals in the place of his brother. Skilled in additional spells, Ka’s proposal was that the crystals would no longer be inscribed with runes describing the bills of sale. Instead each crystal would simply hold the number of a page in a magical book. Each Book of Bills, would hold an infinite number of pages. And, when a sale was made one acolyte would write the bill into a fresh page of a Book, whilst another would inscribe the page number into a crystal. As before, across the land, other acolytes would simultaneously inscribe copies of the bills into other crystals and other copies of the Book.

In this way, anyone wanting to read a bill of sale could simply ask a Book of Bills to turn to the page they needed. Anyone could then read from the book. But the crystals themselves would remain the ultimate proof of the trade. While someone might have been able to fake a copy of a Book, no-one could fake one of the crystals.

Grudgingly accepting this even more complex arrangement, the King was briefly satisfied. Until the accident.

One day, the wizard Ka visited the Craggy Valley, to forage for the rare Ipoh herb, which was known to grow in that part of the Kingdom. However, in a sudden fog, the wizard slipped and fell to his doom. And at the moment of his death, all of the wizard’s spells were undone. In a blink of an eye, all of the magical Books of Bills disappeared. Along with every proof of trade.

Enraged once more, the King gave the eldest wizard one more opportunity to deliver. Mo reassured the King that his power was far greater and that he was uniquely able to deliver on his late brother’s promise. Mo explained that through various dark arts he was able to resist death. He demonstrated his skill to the King, recklessly drinking terrible poisons, and throwing himself from a high tower only to land unharmed. Stunned at this show of power, the King agreed that Mo could take up his brother’s task.

For a few months, the turmoil was resolved, until fresh reports of corruption begin to spread.

A dismayed King granted an audience to a retinue of merchants who had travelled from all across his kingdom. The merchants claimed to have evidence that discrepancies had begun to appear in the Books of Bills. In different towns and cities the Books showed slightly different numbers. There was also talk of a strange, shadowy figure who had been present at many of the trading posts in which discrepancies had been found.

Troubled, the King sent out soldiers to set watch on the trading posts, giving orders that they should attempt to capture and bring this stranger to the court.

Many weeks of waiting and watching passed. More evidence of corrupted Books of Bills continued to appear. Challenged to explain the allegations, Mo scoffed at the evidence. The wizard suggested that the problem was illiterate merchants, asserting that his acolytes were above suspicion.

But finally the king’s soldiers captured the shadowy stranger, and his identity was revealed.

While Mo was the oldest of the three wizards, he was not the eldest. There was a fourth brother, named To. Much older than his brothers, To had been stripped of his riches and banished for studying certain forbidden arts. It was from their brother that Na, Ka and Mo had learned many of their spells, including the arts of inscribing crystals and books, and the means of channelling their powers through acolytes.

Except To had not taught them everything. He had kept many secrets for himself and was able to corrupt the spells used to inscribe the crystals and Books. He was able to change page numbers to refer to other pages which he had inscribed with different words. He had been selling his skills to unscrupulous merchants in an attempt to grow rich once again.

Sickened of wizards and their complicated schemes, the King banished them from his kingdom, never to return.

The King then turned to the task of once more building trust in commerce across his land. He did this not by trusting in magics and complex schemes, but by addressing the problems with which he was originally faced. He decreed the founding of a guild, to create a cadre of trusted, reliable scribes. He appointed new ombudsman and magistrates across the land, to help oversee and administer all forms of trade. He founded libraries and reading rooms to increase literacy amongst his subjects, so that more of them could read and write their own bills of trade. And he offered free use of the courts to all, so that none were denied an opportunity to seek justice.

Many years passed before the King and his kingdom worked through their troubles. But in the history books, the King was forever known as “The Rock Thane”.


Read the previous open data parables:

Posted at 20:20

July 17

Ebiquity research group UMBC: PhD defense: Deep Representation of Lyrical Style and Semantics for Music Recommendation

Dissertation Defense

Deep Representation of Lyrical Style and Semantics for Music Recommendation

Abhay L. Kashyap

11:00-1:00 Thursday, 20 July 2017, ITE 346

In the age of music streaming, the need for effective recommendations is important for music discovery and a personalized user experience. Collaborative filtering based recommenders suffer from popularity bias and cold-start which is commonly mitigated by content features. For music, research in content based methods have mainly been focused in the acoustic domain while lyrical content has received little attention. Lyrics contain information about a song’s topic and sentiment that cannot be easily extracted from the audio. This is especially important for lyrics-centric genres like Rap, which was the most streamed genre in 2016. The goal of this dissertation is to explore and evaluate different lyrical content features that could be useful for content, context and emotion based models for music recommendation systems.

With Rap as the primary use case, this dissertation focuses on featurizing two main aspects of lyrics; its artistic style of composition and its semantic content. For lyrical style, a suite of high level rhyme density features are extracted in addition to literary features like the use of figurative language, profanity and vocabulary strength. In contrast to these engineered features, Convolutional Neural Networks (CNN) are used to automatically learn rhyme patterns and other relevant features. For semantics, lyrics are represented using both traditional IR techniques and the more recent neural embedding methods.

These lyrical features are evaluated for artist identification and compared with artist and song similarity measures from a real-world collaborative filtering based recommendation system from Last.fm. It is shown that both rhyme and literary features serve as strong indicators to characterize artists with feature learning methods like CNNs achieving comparable results. For artist and song similarity, a strong relationship was observed between these features and the way users consume music while neural embedding methods significantly outperformed LSA. Finally, this work is accompanied by a web-application, Rapalytics.com, that is dedicated to visualizing all these lyrical features and has been featured on a number of media outlets, most notably, Vox, attn: and Metro.

Committee: Drs. Tim Finin (chair), Anupam Joshi, Tim Oates, Cynthia Matuszek and Pranam Kolari (Walmart Labs)

Posted at 01:38

July 13

Ebiquity research group UMBC: PhD Proposal: Analysis of Irregular Event Sequences using Deep Learning, Reinforcement Learning, and Visualization

Analysis of Irregular Event Sequences using Deep Learning, Reinforcement Learning, and Visualization

Filip Dabek

11:00-1:00 Thursday 13 July 2017, ITE 346, UMBC

History is nothing but a catalogued series of events organized into data. Amazon, the largest online retailer in the world, processes over 2,000 orders per minute. Orders come from customers on a recurring basis through subscriptions or as one-off spontaneous purchases, resulting in each customer exhibiting their own behavioral pattern when it comes to the way in which they place orders throughout the year. For a company such as Amazon, that generates over $130 billion of revenue each year, understanding and uncovering the hidden patterns and trends within this data is paramount in improving the efficiency of their infrastructure ranging from the management of the inventory within their warehouses, distribution of their labor force, and preparation of their online systems for the load of users. With the ever increasingly availability of big data, problems such as these are no longer limited to large corporations but are experienced across a wide range of domains and faced by analysts and researchers each and every day.

While many event analysis and time series tools have been developed for the purpose of analyzing such datasets, most approaches tend to target clean and evenly spaced data. When faced with noisy or irregular data, it has been recommended to undergo a pre-processing step of converting and transforming the data into being regular. This transformation technique arguably interferes on a fundamental level as to how the data is represented, and may irrevocably bias the way in which results are obtained. Therefore, operating on raw data, in its noisy natural form, is necessary to ensure that the insights gathered through analysis are accurate and valid.

In this dissertation novel approaches are presented for analyzing irregular event sequences using a variety of techniques ranging from deep learning, reinforcement learning, and visualization. We show how common tasks in event analysis can be performed directly on an irregular event dataset without requiring a transformation that alters the natural representation of the process that the data was captured from. The three tasks that we showcase include: (i) summarization of large event datasets, (ii) modeling the processes that create events, and (iii) predicting future events that will occur.

Committee: Drs. Tim Oates (Chair), Jesus Caban, Penny Rheingans, Jian Chen, Tim Finin

 

Posted at 02:55

July 12

Leigh Dodds: Data is infrastructure, so it needs a design manual

Posted at 12:07

July 07

Gregory Williams: Feature for SPARQL 1.2

Jindřich Mynarz recently posted a good list of “What I would like to see in SPARQL 1.2” and I thought I’d add a few comments as well as some of my own wished-for features.

Explicit ordering in GROUP_CONCAT, and quads support for both the HTTP Graph Store Protocol and CONSTRUCT queries (items 2, 5, and 8 in Jindřich’s list) seem like obvious improvements to SPARQL with a clear path forward for semantics and implementation.

Here are the some of the other wished-for features:

  • Explicitly specify the REDUCED modifier (#1)

    As an implementor, I quite like the fact that REDUCED is “underspecified.” It allows optimization opportunities that are much cheaper than a full DISTINCT would be, while still reducing result cardinality. I think it’s unfortunate that REDUCED hasn’t seen much use over the years, but I’m not sure what a better-specified REDUCED operator would do different from DISTINCT.

  • Property path quantifiers (#3)

    The challenge of supporting path quantifiers like elt{n,m} is figuring out what the result cardinality should be. The syntax for this was standardized during the development of SPARQL 1.1, but we couldn’t find consensus on whether elt{n,m} should act like a translation to an equivalent BGP/UNION pattern or like the arbitrary length paths (which do not introduce duplicate results). For small values of n and m, the translation approach seems natural, but as they grow, it’s not obvious that use cases would only want the translation semantics and not the non-duplicating semantics.

    Perhaps a new syntax could be developed which would allow the query author to indicate the desired cardinality semantics.

  • Date time/duration arithmetic functions (#6)

    This seems like a good idea, and very useful to some users, though it would substantially increase the size and number of the built-in functions and operators.

  • Support for non-scalar-producing aggregates (#9)

    I’m interested to see how this plays out as a SPARQL extension in systems like Stardog. It likely has a lot of interesting uses, but I worry that it would greatly complicate the query and data models, leading to calls to extend the semantics of RDF, and add new query forms, operators, and functions.

  • Structured serialization format for SPARQL queries (#10)

    I’m indifferent to this. I suspect some people would benefit from such a format, but I don’t think I’ve ever had need for one (where I couldn’t just parse a query myself and use the resulting AST) and it would be another format to support for implementors.

Beyond that, here are some other things I’d like to see worked on (either standardization, or cross-implementation support):

  • Support for window functions

  • Explicit support for named graphs in SERVICE blocks

    This can be partially accomplished right now for hard-coded graphs by using an endpoint url with the default-graph-uri query parameter, but I’d like more general support that could work dynamically with the active graph when the SERVICE block is evaluated.

  • Structured errors for use in the SPARQL Protocol

    My preference for this would be using the RFC7807 “Problem Details” JSON format, with a curated list of IRIs and associated metadata representing common error types (syntax errors, query-to-complex or too-many-requests refusals, etc.). There’s a lot of potential for smarter clients if errors contain structured data (e.g. SPARQL editors can highlight/fix syntax issues; clients could choose alternate data sources such as triple pattern fragments when the endpoint is overwhelmed).

Posted at 02:15

July 06

AKSW Group - University of Leipzig: AKSW Colloquium, 07.07.2017, Two paper presentations concerning Link Discovery and Knowledge Base Reasoning

At the AKSW Colloquium on Friday 7th of July, at 10:40 AM there will be two paper presentations concerning genetic algorithms to learn linkage rules, and differentiable learning of logical rules for knowledge base reasoning.

Tommaso Soru will present the paper Differentiable Learning of Logical Rules for Knowledge Base Reasoning, currently a pre-print, by Fan Yang, Zhilin Yang, and William W. Cohen.

Abstract

“We study the problem of learning probabilistic first-order logical rules for knowledge base reasoning. This learning problem is difficult because it requires learning the parameters in a continuous space as well as the structure in a discrete space. We propose a framework, Neural Logic Programming, that combines the parameter and structure learning of first-order logical rules in an end-to-end differentiable model. This approach is inspired by a recently-developed differentiable logic called TensorLog, where inference tasks can be compiled into sequences of differentiable operations. We design a neural controller system that learns to compose these operations. Empirically, our method obtains state-of-the-art results on multiple knowledge base benchmark datasets, including Freebase and WikiMovies.”

Daniel Obraczka will present the paper Learning Expressive Linkage Rules using Genetic Programming of Isele and Bizer accepted at VLDB 2012. This work presents an algorithm to learn record linkage rules utilizing genetic programming.

Abstract

“A central problem in data integration and data cleansing is to find entities in different data sources that describe the same real-world object. Many existing methods for identifying such entities rely on explicit linkage rules which specify the conditions that entities must fulfill in order to be considered to describe the same real-world object. In this paper, we present the GenLink algorithm for learning expressive linkage rules from a set of existing reference links using genetic programming. The algorithm is capable of generating linkage rules which select discriminative properties for comparison, apply chains of data transformations to normalize property values, choose appropriate distance measures and thresholds and combine the results of multiple comparisons using non-linear aggregation functions. Our experiments show that the GenLink algorithm outperforms the state-of-the-art genetic programming approach to learning linkage rules recently presented by Carvalho et. al. and is capable of learning linkage rules which achieve a similar accuracy as human written rules for the same problem.”

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 19:24

July 01

W3C Read Write Web Community Group: Read Write Web — Q2 Summary — 2017

Summary

This quarter kicked off with a full program at www 2017 in Perth and continued with ESWC which included an award winning paper on Linked Data Notifications.

Solid gain some traction this quarter going from 500 to 1500 stars on github, after a few articles publicized the protocol.  Linked data continues to evolve in search with with google now including fact check data powered by schema.org.

A new proposal was also introduced called Linked Data Templates, which aims “to provide means to define read-write Linked Data APIs declaratively using SPARQL and specify a uniform interaction protocol for them”.

Communications and Outreach

Following the announcement of Linked Data Templates, there was an invitation to join the declarative apps community group, which some of us have.  Please feel free to join this group and get involved with the evolution of read-write standards.

 

Community Group

This quarter saw the announcement and call for participation of Linked Data Templates, which interestingly is a protocol, neutral to transport method (HTTP, IPFS etc), however, the specification shall provide bindings for the HTTP protocol.

There was also an excellent blog post from Kingsley show casing a number of the technologies we’ve been looking at over the years, and demonstrating the openlink smart data bot.

solid

Applications

Relatively quiet quarter for apps with much work being done on authentication.  There was also the release of twinql, “A graph query language for the semantic web”.  Work on updating libraries, rdflib, and servers, node solid sever have been bumped to the latest versions.  There was work done on the solid connections ui, an application to manage the solid social graph.

And also the release of the Openlink Smart Data Bot, “Smart Agent that provides a single RESTful interface for interacting with a variety of Actions (operations) provided by APIs”.

Last but not Least…

Congrats to timbl on winning the Turing award, often referred to as the Nobel Prize for Computing, “for inventing the World Wide Web, the first web browser, and the fundamental protocols and algorithms allowing the Web to scale”.  An invention that continues to change all our lives, and hopefully one we can all help to bring to its full potential!

Posted at 17:04

June 30

Dublin Core Metadata Initiative: DCMI opens registration for DC-2017 in Washington, DC

2017-06-30, Registration for DC-2017 is now open at http://dcevents.dublincore.org/IntConf/index/pages/view/reg17. The International Conference takes place on Thursday through Saturday, 26-28 October and the DCMI Annual Meeting occurs on Sunday, 29 October. The full conference rate includes two days of presentations, papers, project reports, posters and special sessions and the third day of workshops. A day rate is available for all three conference days. DC-2017 in Washington DC is collocated in the same venue with the ASIST Annual Meeting that takes place from 27 October through 1 November. Special rates for the ASIST meeting are available to DCMI members. For more information and to register, visit the DC-2017 conference website at http://dcevents.dublincore.org/index.php/IntConf/dc-2017/schedConf/.

Posted at 23:59

June 27

Ebiquity research group UMBC: Jennifer Sleeman dissertation defense: Dynamic Data Assimilation for Topic Modeling

Ph.D. Dissertation Defense

Dynamic Data Assimilation for Topic Modeling

Jennifer Sleeman
9:00am Thursday, 29 June 2017, ITE 325b, UMBC

Understanding how a particular discipline such as climate science evolves over time has received renewed interest. By understanding this evolution, predicting the future direction of that discipline becomes more achievable. Dynamic Topic Modeling (DTM) has been applied to a number of disciplines to model topic evolution as a means to learn how a particular scientific discipline and its underlying concepts are changing. Understanding how a discipline evolves, and its internal and external influences, can be complicated by how the information retrieved over time is integrated. There are different techniques used to integrate sources of information, however, less research has been dedicated to understanding how to integrate these sources over time. The method of data assimilation is commonly used in a number of scientific disciplines to both understand and make predictions of various phenomena, using numerical models and assimilated observational data over time.

In this dissertation, I introduce a novel algorithm for scientific data assimilation, called Dynamic Data Assimilation for Topic Modeling (DDATM), which uses a new cross-domain divergence method (CDDM) and DTM. By using DDATM, observational data in the form of full-text research papers can be assimilated over time starting from an initial model. DDATM can be used as a way to integrate data from multiple sources and, due to its robustness, can exploit the assimilating observational information to better tolerate missing model information. When compared with a DTM model, the assimilated model is shown to have better performance using standard topic modeling measures, including perplexity and topic coherence. The DDATM method is suitable for prediction and results in higher likelihood for subsequent documents. DDATM is able to overcome missing information during the assimilation process when compared with a DTM model. CDDM generalizes as a method that can also bring together multiple disciplines into one cohesive model enabling the identification of related concepts and documents across disciplines and time periods. Finally, grounding the topic modeling process with an ontology improves the quality of the topics and enables a more granular understanding of concept relatedness and cross-domain influence.

The results of this dissertation are demonstrated and evaluated by applying DDATM to 30 years of reports from the Intergovernmental Panel on Climate Change (IPCC) along with more than 150,000 documents that they cite to show the evolution of the physical basis of climate change.

Committee Members: Drs. Tim Finin (co-advisor), Milton Halem (co-advisor), Anupam Joshi, Tim Oates, Cynthia Matuszek, Mark Cane, Rafael Alonso

Posted at 19:54

June 25

Bob DuCharme: Creating Wide CSV files with SPARQL

Lots of columns and commas, but all in the right place.

Posted at 14:47

June 23

Leigh Dodds: Lunchtime Lecture: “How you (yes, you) can contribute to open data”

The following is a written version of the lunchtime lecture I gave today at the Open Data Institute. I’ll put in a link to the video when it comes online. It’s not a transcript, I’m just writing down what I had planned to say.

Posted at 18:40

June 19

Dublin Core Metadata Initiative: How to Design and Build Semantic Applications with Linked Data

2017-06-19, This webinar, presented by Dave Clarke, co-founder and CEO of the Synaptica® group of companies, will demonstrate how to design and build rich end-user search and discovery applications using Linked Data. The Linked Open Data cloud is a rapidly growing collection of publicly accessible resources, which can be adopted and reused to enrich both internal enterprise projects and public-facing information systems. The webinar will use the Linked Canvas application as its primary use-case. Linked Canvas is an application designed by Synaptica for the cultural heritage community. It enables high-resolution images of artworks and artifacts to be catalogued and subject indexed using Linked Data. The talk will demonstrate how property fields and relational predicates can be adopted from open data ontologies and metadata schemes, such as DCMI, SKOS, IIIF and the Web Annotation Model. Selections of properties and predicates can then be recombined to create Knowledge Organization Systems (KOS) customized for business applications. The demonstration will also illustrate how very-large-scale subject taxonomies and name authority files, such as the Library of Congress Name Authority File, DBpedia, and the Getty Linked Open Data Vocabularies collection, can be used for content enrichment and indexing.

To register and for more information about the webinar and presenter, visit http://dublincore.org/resources/training/#2017clarke.

Posted at 23:59

June 18

Sandro Hawke: Ridesharing 3.0: Forget About Uber

Uber, whatever its faults, provides value to its users, both the drivers and the riders. People appreciate or even enjoy the service, even if they don’t like the corporate behavior or economic disruption.   Solutions seem to mostly include boycotts (in favor of taxis or competitors like Lyft) and legal action.  But most of those those solutions are pushing water uphill, because people actually like the service.

I have another solution: let’s rebuild the service without any major company involved. Let’s help software eat the world on behalf of the users, not the stockholders. In this post, I’ll explain a way to do it.  It’s certainly not trivial, and has some risks, but I think it’s possible and would be a good thing.

The basic idea is similar to this: riders post to social media describing the rides they want, and drivers post about the rides they are available to give.  They each look around their extended social graph for posts that line up with what they want, and also check for reasons to trust or distrust each other. That’s about it. You could do this today on Twitter, but it would take some decent software to make it pleasant and reliable, to make the user experience as good as with Uber.  To be clear: I’m aiming for a user experience similar to the Uber app; I’m proposing using social media as an underlying layer, not a UI.

What’s deeply different in this model is the provider of the software does not control the market. If I build this today and get millions of users, someone else can come along tomorrow with a slightly better interface or nicer ads, and the customers can move easily, even in the middle of a ride.  In particular, the upstart doesn’t need to convince all the riders and driver to switch in order to bootstrap their system!  The market, with all its relationships and data are outside the ridesharing system. As a user, you wouldn’t even know what software the other riders and drivers are be using, unless they choose to tell you.

With this approach, open source solutions would also be viable.  Then the competition could arise quite literally tomorrow, as someone just forks a product and makes a few small changes.

This is no fun for big money investors looking for their unicorn exit, but it’s great for end users.  They get non-stop innovation, and serious competition for their business.

There are many details, below, including some open issues.  The details span areas of expertise, so I’m sure I’ve gotten parts incomplete or wrong. If this vision appeals to you, please help fill it in, in comments or posts of your own.

Critical Mass

Perhaps the hardest problem with establishing any kind of multi-sided market is getting a critical mass of buyers and sellers.  Why would a rider use the system, if there are not yet any drivers?  Why would drivers bother using the software when there are no riders? Each time someone tries the system, they find no one else there and they go away unhappy.

In this case, however, I think we have some options.   For example, existing drivers, including taxi operators, could start to use the software while they’re doing the driving they already do, with minimum additional effort.  Reasons to do it, in addition to optimistically wanting to help bootstrap this: it could help them keep track of their work, and it could establish a track record for when riders start to show up.

Similarly, riders could start to use it in areas without drivers if they understand they’re priming the pump, helping establish demand, and perhaps there were some fun or useful self-tracking features.

Various communities and businesses, not in the ridesharing business, might benefit from promoting this system: companies who have a lot of employees commuting, large events where parking is a bottleneck, towns with traffic issues, etc.  In these cases, in a niche, it’s much easier to get critical mass, which can then spread outward.

Finally, there are existing ridesharing systems that might choose to play in this open ecosystem, either because their motivation is noncommercial (eg eco carpooling) or because they see a way to share the market and still make their cut (eg taxi companies).

Privacy

In the model as I’ve described it so far, there’s no privacy. If I want a ride from San Francisco to Sebastopol, the whole world could see that. My friends might ask, uncomfortably, what I was doing in Sebastopol that day. This is a tricky problem, and there might not be a perfect solution.

In the worst case, the system ends up as only viable for the kind of trips you’re fine being public, perhaps your commute to work, or your trip to an event you’re going to post about anyway. But we can probably do better than that.  I currently see two imperfect classes of solution:

  1. Trust some third party organizations, perhaps to act as information brokers, seeing all the posts from both sides and informing each when there is a match, possibly masking some details. Or perhaps they certify drivers, which gives them access to your data, with an enforceable contract they’ll use it for these purposes only.
  2. Trust people to act appropriately when given the right social cues and pressure: basically, use advisory  access control, where anyone can see the data, but only after they clearly agree that they are acting as part of the ridesharing system and that they will only use the data for that purpose. There might be social or legal penalties for violating this agreement.

There might also be cryptographic solutions, perhaps as an application of homomorphic encryption, but I’m not yet aware of any results that would fully address this issue.

Personal Safety

When I was much younger, hitchhiking was common. If you wanted to go somewhere without having a car, you could stand on the side of the road and stick out your thumb. But there was some notion this might be dangerous for either party, and in some places it became illegal. (Plus there was that terrifying Rutger Hauer and C Thomas Howell movie.) There have been a few stories of assaults by Uber drivers, and the company claims to carefully vet drivers. So how could this work without a company like Uber, standing behind the drivers?

There are several approaches here, that can all work together:

  1. Remote trust assessment. Each party should be able to see data on the other before agreeing to the ride.  This might include social graph connections to the other person, reviews posted about the other person (and by whom), and official certifications about the other person  (including even: the ride is from a licensed taxicab).  When legally permissible this should, I think, even include information that might be viewed as discriminatory, like

Posted at 14:30

June 16

Sandro Hawke: Back to Blogging

As I remember it, about ten years ago, I started this blog for one main reason. I had just watched a talk from the CTO of Second Life (remember them, when they were hot?) about his vision for how to expand by opening up the system, making it decentralized.  I thought to myself: that’s going to be really hard, but I’ve thought about it a lot; I should blog my ideas about it.

As I dug into the problem, however, I realized how many sub-problems I couldn’t really solve, yet. So I never posted.   (Soon thereafter, Cory Ondrejka left the Second Life project, moving on to run engineering at Facebook.  Not sure if that’s ironic.)

Now, what feels like several lifetimes later, I’m ready to try again.

This time the “industry darling” I want to tackle first is Uber.  Okay, it’s already become widely hated, but the valuation is still, shall we say, … considerable.

So, coming soon: how to decentralized Uber.


Posted at 18:32

June 13

AKSW Group - University of Leipzig: SANSA 0.2 (Semantic Analytics Stack) Released

The AKSW and Smart Data Analytics groups are happy to announce SANSA 0.2 – the second release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing for semantic technologies in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

  • Reading and writing RDF files in N-Triples format
  • Reading OWL files in various standard formats
  • Querying and partitioning based on Sparqlify
  • RDFS/RDFS Simple/OWL-Horst forward chaining inference
  • RDF graph clustering with different algorithms
  • Rule mining from RDF graphs

Deployment and getting started:

  • There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
  • The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
  • There is example code for various tasks available.
  • We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular, the projects Big Data Europe,  HOBBIT , SAKE and Big Data Ocean.

SANSA Development Team

Posted at 16:18

June 12

AKSW Group - University of Leipzig: AKSW at ESWC 2017

Hello Community! The ESWC 2017 just ended and we give a short report of the course at the conference, especially regarding the AKSW-Group.

Our members Dr. Muhammad Saleem, Dr. Mohamed Ahmed Sherif, Claus Stadler, Michael Röder, Prof. Dr. Jens Lehmann and Edgard Marx participated at the conference. They held a number of presentations, workshops and tutorials:

Michael Röder

Mohamed Ahmed Sherif

Muhammad Saleem

Edgard Marx

  • Presented a Workshop paper „Exploring the Evolution and Provenance of Git Versioned RDF Data“ by Natanael Arndt, Patrick Naumann and Edgard Marx
  • Presented a demo paper „Kbox – Distributing ready-to-query RDF Knowledge Graphs“ by Edgard Marx, Tommaso Soru, Ciro Baron Neto and Sandro Coelho

Claus Stadler

  • Presented a Workshop paper in QuWeDa „JPA Criteria Queries over RDF Data“ by Claus Stadler, Jens Lehmann

The final versions of the papers from Edgard and Claus will be made available soon.

As every year the ESWC also awarded the best papers and studies in several categories. The award for Best Challenge Paper went to: “End-to-end Representation Learning for Question Answering with Weak Supervision” by Daniil Sorokin and Iryna Gurevych. The paper is part of the HOBBIT project by AKSW. Congrats to the winners!

Have a look at all the winners at ESWC 2017: http://2017.eswc-conferences.org/awards.

Posted at 08:53

June 11

Libby Miller: Outline a bitmap in Inkscape

I keep doing this for lasercuts but getting a double outline instead of a single outline, and so a double cut. This is because (

Posted at 18:02

June 10

AKSW Group - University of Leipzig: Four papers accepted at WI 2017

Hello Community! We proudly announce that The International Conference on Web Intelligence (WI) accepted four papers by our group. The WI takes place in Leipzig between the 23th – 26th of August. The accepted papers are:

“An Evaluation of Models for Runtime Approximation in Link Discovery” by Kleanthi Georgala, Michael Hoffmann, and Axel-Cyrille Ngonga Ngomo.

Abstract: Time-efficient link discovery is of central importance to implement the vision of the Semantic Web. Some of the most rapid Link Discovery approaches rely internally on planning to execute link specifications. In newer works, linear models have been used to estimate the runtime the fastest planners. However, no other category of models has been studied for this purpose so far. In this paper, we study non-linear runtime estimation functions for runtime estimation. In particular, we study exponential and mixed models for the estimation of the runtimes of planners. To this end, we evaluate three different models for runtime on six datasets using 400 link specifications. We show that exponential and mixed models achieve better fits when trained but are only to be preferred in some cases. Our evaluation also shows that the use of better runtime approximation models has a positive impact on the overall execution of link specifications.

“CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link Repositories” by Andre Valdestilhas, Tommaso Soru and Axel-Cyrille Ngonga Ngomo.

Abstract: More than 500 million facts on the Linked Data Web are statements across knowledge bases. These links are of crucial importance for the Linked Data Web as they make a large number of tasks possible, including  cross-ontology, question answering and federated queries. However, a large number of these links are erroneous and can thus lead to these applications producing absurd results. We present a time-efficient and complete approach for the detection of erroneous links for properties that are transitive. To this end, we make use of the semantics of URIs on the Data Web and combine it with an efficient graph partitioning algorithm. We then apply our algorithm to the LinkLion repository and show that we can analyze 19,200,114 links in 4.6 minutes. Our results show that at least 13% of the owl:sameAs links we considered are erroneous. In addition, our analysis of the  provenance of links allows discovering agents and knowledge bases that commonly display poor linking. Our algorithm can be easily executed in parallel and on a GPU. We show that these implementations are up to two orders of magnitude faster than classical reasoners and a non-parallel implementation.

“LOG4MEX: A Library to Export Machine Learning Experiments” by Diego Esteves, Diego Moussallem, Tommaso Soru, Ciro Baron Neto, Jens Lehmann, Axel-Cyrille Ngonga Ngomo and Julio Cesar Duarte.

Abstract: A choice of the best computational solution for a particular task is increasingly reliant on experimentation. Even though experiments are often described through text, tables, and figures, their descriptions are often incomplete or confusing. Thus, researchers often have to perform lengthy web searches for reproducing and understanding the results. In order to minimize this gap, vocabularies and ontologies have been proposed for representing data mining and machine learning (ML) experiments. However, we still lack proper tools to export properly these metadata. To this end, we present an open-source library dubbed LOG4MEX which aims at supporting the scientific community to fulfill this gap.

“GENESIS – A Generic RDF Data Access Interface” by Tim Ermilov, Diego Moussallem, Ricardo Usbeck and Axel-Cyrille Ngonga Ngomo

Abstract: The availability of billions of facts represented in RDF on the Web provides novel opportunities for data discovery and access. In particular, keyword search and question answering approaches enable even lay people to access this data. However, the interpretation of the results of these systems, as well as the navigation through these results, remains challenging. In this paper, we present GENESIS, a generic RDF data access interface. GENESIS can be deployed on top of any knowledge base and search engine with minimal effort and allows for the representation of RDF data in a layperson-friendly way. This is facilitated by the modular architecture for reusable components underlying our framework. Currently, these include a generic search back-end, together with corresponding interactive user interface components based on a service for similar and related entities as well as verbalization services to bridge between RDF and natural language.

The final versions of the papers will be made available soon.

Come over to WI 2017 and enjoy the talks. More information on the program can be found here.

Posted at 13:01

June 07

Ebiquity research group UMBC: UMBC Seeks Professor of the Practice to Head new Data Science Program

The University of Maryland, Baltimore County is looking to hire a Professor of the Practice to head a new graduate program in Data Science. See the job announcement for more information and apply online at Interfolio.

In addition to developing and teaching graduate data science courses, the new faculty member will serve as the Graduate Program Director of UMBC’s program leading to a master’s degree in Data Science. This cross-disciplinary program is offered to professional students through a partnership between the College of Engineering and Information Technology; the College of Arts, Humanities and Social Sciences; the College of Natural and Mathematical Sciences; the Department of Computer Science and Electrical Engineering; and UMBC’s Division of Professional Studies.

Posted at 16:23

May 29

Bob DuCharme: Instead of writing SPARQL queries for Wikipedia--query for them!

Queries as data to help you get at more data.

Posted at 15:11

May 26

AKSW Group - University of Leipzig: AKSW Colloquium, 29.05.2017, Addressing open Machine Translation problems with Linked Data.

At the AKSW Colloquium, on Monday 29th of May 2017, 3 PM, Diego Moussallem will present two papers related to his topic. First paper titled “Using BabelNet to Improve OOV Coverage in SMT” of Du et al., which was presented at LREC 2016 and the second paper titled “How to Configure Statistical Machine Translation with Linked Open Data Resources” of Srivastava et al., which was presented at AsLing 2016.

Posted at 11:51

May 25

Leigh Dodds: Where can you contribute to open data? Yes, you!

This is just a quick post to gather together some pointers and links that were shared in answer to a question I asked on twitter yesterday:

Posted at 17:42

Copyright of the postings is owned by the original blog authors. Contact us.