Planet RDF

It's triples all the way down

May 26

AKSW Group - University of Leipzig: AKSW Colloquium, 29.05.2017, Addressing open Machine Translation problems with Linked Data.

At the AKSW Colloquium, on Monday 29th of May 2017, 3 PM, Diego Moussallem will present two papers related to his topic. First paper titled “Using BabelNet to Improve OOV Coverage in SMT” of Du et al., which was presented at LREC 2016 and the second paper titled “How to Configure Statistical Machine Translation with Linked Open Data Resources” of Srivastava et al., which was presented at AsLing 2016.

Posted at 11:51

May 25

Leigh Dodds: Where can you contribute to open data? Yes, you!

This is just a quick post to gather together some pointers and links that were shared in answer to a question I asked on twitter yesterday:

Posted at 17:42

May 19

Dublin Core Metadata Initiative: How to Design and Build Semantic Applications with Linked Data

2017-05-19, This webinar, presented by Dave Clarke, co-founder and CEO of the Synaptica® group of companies, will demonstrate how to design and build rich end-user search and discovery applications using Linked Data. The Linked Open Data cloud is a rapidly growing collection of publicly accessible resources, which can be adopted and reused to enrich both internal enterprise projects and public-facing information systems. The webinar will use the Linked Canvas application as its primary use-case. Linked Canvas is an application designed by Synaptica for the cultural heritage community. It enables high-resolution images of artworks and artifacts to be catalogued and subject indexed using Linked Data. The talk will demonstrate how property fields and relational predicates can be adopted from open data ontologies and metadata schemes, such as DCMI, SKOS, IIIF and the Web Annotation Model. Selections of properties and predicates can then be recombined to create Knowledge Organization Systems (KOS) customized for business applications. The demonstration will also illustrate how very-large-scale subject taxonomies and name authority files, such as the Library of Congress Name Authority File, DBpedia, and the Getty Linked Open Data Vocabularies collection, can be used for content enrichment and indexing.

To register and for more information about the webinar and presenter, visit http://dublincore.org/resources/training/#2017clarke.

Posted at 23:59

Leigh Dodds: Can you publish tweets as open data?

Can you publish data from twitter as open data? The short answer is: No. Read on for some notes, pointers and comments.

Twitter’s developer policy places a number of restrictions on your use of their API and the data you get from it. Some of the key ones are:

  • In the

Posted at 08:15

May 18

Leigh Dodds: Enabling data forensics

I’m interested in how people share information, particularly data, on social networks. I think it’s something to which it’s worth paying attention, so we can ensure that it’s easy for people to share insights and engage in online debates.

There’s lots of discussion at the moment around fact checking and similar ways that we can improve the ability to identify reliable and unreliable information online. But there may be other ways that we can make some small improvements in order to help people identify and find sources of data.

Data forensics is a term that usually 

Posted at 18:54

May 15

Ebiquity research group UMBC: Modeling and Extracting information about Cybersecurity Events from Text

Ph.D. Dissertation Proposal

Modeling and Extracting information about Cybersecurity Events from Text

Taneeya Satyapanich

Tuesday, 16 May 2017, ITE 325, UMBC

People rely on the Internet to carry out much of the their daily activities such as banking, ordering food and socializing with their family and friends. The technology facilitates our lives, but also comes with many problems, including cybercrimes, stolen data and identity theft. With the large and increasing number of transaction done every day, the frequency of cybercrime events is also increasing. Since the number of security-related events is too high for manual review and monitoring, we need to train machines to be able to detect and gather data about potential cybersecurity threats. To support machines that can identify and understand threats, we need standard models to store the cybersecurity information and information extraction systems that can collect information to populate the models with data from text.

This dissertation will make two major contributions. The first is to extend our current cyber security ontologies with better models for relevant events, from atomic events like a login attempt, to an extended but related series of events that make up a campaign, to generalized events, such as an increase in denial-of-service attacks originating from a particular region of the world targeted at U.S. financial institutions. The second is the design and implementation of a event extraction system that can extract information about cybersecurity events from text and populated a knowledge graph using our cybersecurity event ontology. We will extend our previous work on event extraction that detected human activity events from news and discussion forums. A new set of features and learning algorithms will be introduced to improve the performance and adapt the system to cybersecurity domain. We believe that this dissertation will be useful for cybersecurity management in the future. It will quickly extract cybersecurity events from text and fill in the event ontology.

Committee: Drs. Tim Finin (chair), Anupam Joshi, Tim Oates and Karuna Joshi

Posted at 17:09

Ebiquity research group UMBC: new paper: Modeling the Evolution of Climate Change Assessment Research Using Dynamic Topic Models and Cross-Domain Divergence Maps

Jennifer Sleeman, Milton Halem, Tim Finin, and Mark Cane, Modeling the Evolution of Climate Change Assessment Research Using Dynamic Topic Models and Cross-Domain Divergence Maps, AAAI Spring Symposium on AI for Social Good, AAAI Press, March, 2017.

Climate change is an important social issue and the subject of much research, both to understand the history of the Earth’s changing climate and to foresee what changes to expect in the future. Approximately every five years starting in 1990 the Intergovernmental Panel on Climate Change (IPCC) publishes a set of reports that cover the current state of climate change research, how this research will impact the world, risks, and approaches to mitigate the effects of climate change. Each report supports its findings with hundreds of thousands of citations to scientific journals and reviews by governmental policy makers. Analyzing trends in the cited documents over the past 30 years provides insights into both an evolving scientific field and the climate change phenomenon itself. Presented in this paper are results of dynamic topic modeling to model the evolution of these climate change reports and their supporting research citations over a 30 year time period. Using this technique shows how the research influences the assessment reports and how trends based on these influences can affect future assessment reports. This is done by calculating cross-domain divergences between the citation domain and the assessment report domain and by clustering documents between domains. This approach could be applied to other social problems with similar structure such as disaster recovery.

Posted at 13:30

May 14

Ebiquity research group UMBC: Fact checking the fact checkers fact check metadata

TL;DR: Some popular fact checking sites are saying that false is true and true is false in their embedded metadata 

I’m a fan of the schema.org claimReview tags for rendering fact checking results as metadata markup embedded in the html that can be easily understood by machines. Google gave a plug for this last Fall and more recently announced that it has broadened its use of the fact checking metadata tags.  It’s a great idea and could help limit the spread of false information on the Web.  But its adoption still has some problems.

Last week I checked to see if the Washington Post is using schema.org’s ClaimReview in their Fact Checker pieces. They are (that’s great!) but WaPo seems to have misunderstood the semantics of the markup by reversing the reviewRating scale, with the result that it assets the opposite of its findings.  For an example, look at this Fact Checker article reviewing claims made by HHS Secretary Tom Price on the AHCA which WaPo rates as being very false, but gives it a high reviewRating of 5 on their scale from 1 to 6.  According to the schema.org specification, this means it’s mostly true, rather than false. ??

WaPo’s Fact Check article ratings assign a checkmark for a claim they find true and from one to four ‘pinocchios‘ for claims they find to be partially (one) or totally (four) false. They also give no rating for claims they find unclear and a ‘flip-flop‘ rating for claims on which a person has been inconsistent. Their reviewRating metadata specifies a worstRating of 1 and a bestRating of 6. They apparently map a checkmark to 1 and ‘four pinocchios‘ to 5. That is, their mapping is {-1:’unclear’; 1:’check mark’, 2:’1 pinocchio’, …, 5:’4 pinocchios’, 6:’flip flop’}. It’s clear from the schema.org ClaimReview examples that that a higher rating number is better and it’s implicit that it is better for a claim to be true.  So I assume that the WaPo FactCheck should reverse its scale, with ‘flip-flop‘ getting a 1, ‘four pinocchios‘ mapped to a 2 and a checkmark assigned a 6.

WaPo is not the only fact checking site that has got this reversed. Aaron Bradley pointed out early in April that Politifact had it’s scale reversed also. I checked last week and confirmed that this was still the case, as this example shows. I sampled a number of Snope’s ClaimCheck ratings and found that all of them were -1 on a scale of -1..+1, as in this example.

It’s clear how this mistake can happen.  Many fact checking sites are motivated by identifying false facts, so have native scales that go from the mundane true statement to the brazen and outrageous completely false.  So a mistake of directly mapping this linear scale into the numeric one from low to high is not completely surprising.

While the fact checking sites that have made this mistake are run by dedicated and careful investigators, the same care has not yet been applied in implementing the semantic metadata embedded in their pages on for their sites.

Posted at 03:26

May 13

Ebiquity research group UMBC: New paper: A Question and Answering System for Management of Cloud Service Level Agreements

Sudip Mittal, Aditi Gupta, Karuna Pande Joshi, Claudia Pearce and Anupam Joshi, A Question and Answering System for Management of Cloud Service Level Agreements,  IEEE International Conference on Cloud Computing, June 2017.

One of the key challenges faced by consumers is to efficiently manage and monitor the quality of cloud services. To manage service performance, consumers have to validate rules embedded in cloud legal contracts, such as Service Level Agreements (SLA) and Privacy Policies, that are available as text documents. Currently this analysis requires significant time and manual labor and is thus inefficient. We propose a cognitive assistant that can be used to manage cloud legal documents by automatically extracting knowledge (terms, rules, constraints) from them and reasoning over it to validate service performance. In this paper, we present this Question and Answering (Q&A) system that can be used to analyze and obtain information from the SLA documents. We have created a knowledgebase of Cloud SLAs from various providers which forms the underlying repository of our Q&A system. We utilized techniques from natural language processing and semantic web (RDF, SPARQL and Fuseki server) to build our framework. We also present sample queries on how a consumer can compute metrics such as service credit.

Posted at 17:56

May 11

AKSW Group - University of Leipzig: SML-Bench 0.2 Released

Dear all,

we are happy to announce the 0.2 release of SML-Bench, our Structured Machine Learning benchmark framework. SML-Bench provides full benchmarking scenarios for inductive supervised machine learning covering different knowledge representation languages like OWL and Prolog. It already comes with adapters for prominent inductive learning systems like the DL-Learner, the General Inductive Logic Programming System (GILPS), and Aleph, as well as Inductive Logic Programming ‘classics’ like Golem and Progol. The framework is easily extensible, be it in terms of new benchmarking scenarios, or support for new learning systems. SML-Bench allows to define, run and report on benchmarks combining different scenarios and learning systems giving insight into the performance characteristics of the respective inductive learning algorithms on a wide range of learning problems.

Website: http://sml-bench.aksw.org/
GitHub page: https://github.com/AKSW/SML-Bench/
Change log: https://github.com/AKSW/SML-Bench/releases/tag/0.2

In the current release we extended the options to configure learning systems in the overall benchmarking configuration, and added support for running multiple instances of a learning system, as well as the nesting of instance-specific settings and settings that apply to all instances of a learning system. Besides internal refactoring to increase the overall software quality, we also extended the reporting capabilities of the benchmark results. We added a new benchmark scenario and experimental support for the Statistical Relational Learning system TreeLiker.

We want to thank everyone who helped to create this release and appreciate any feedback.

Best regards,

Patrick Westphal, Simon Bin, Lorenz Bühmann and Jens Lehmann

Posted at 11:01

May 08

Leigh Dodds: Adventures in geodata

I spend a lot of my professional life giving people advice. Mostly around how to publish and use open data. In order to make sure I give people the best advice I can, I try and spend a lot of time actually publishing and using open data. A mixture of research and practical work is the best way I’ve found of improving my own

Posted at 20:28

AKSW Group - University of Leipzig: AKSW Colloquium, 08.05.2017, Scalable RDF Graph Pattern Matching

At the AKSW Colloquium, on Monday 8th of May 2017, 3 PM, Lorenz Bühmann will discuss a paper titled “Type-based Semantic Optimization for Scalable RDF Graph Pattern Matching” of Kim et al. Presented at WWW 2017, this work proposes a scalable query processing approach on RDF data that relies on early and aggressive determination and pruning of query-irrelevant data. The paper describes ongoing work as part of the RAPID+ platform project.

Abstract

Scalable query processing relies on early and aggressive determination and pruning of query-irrelevant data. Besides the traditional space-pruning techniques such as indexing, type-based optimizations that exploit integrity constraints defined on the types can be used to rewrite queries into more efficient ones. However, such optimizations are only applicable in strongly-typed data and query models which make it a challenge for semi-structured models such as RDF. Consequently, developing techniques for enabling type-based query optimizations will contribute new insight to improving the scalability of RDF processing systems.

In this paper, we address the challenge of type-based query optimization for RDF graph pattern queries. The approach comprises of (i) a novel type system for RDF data induced from data and ontologies and (ii) a query optimization and evaluation framework for evaluating graph pattern queries using type-based optimizations. An implementation of this approach integrated into Apache Pig is presented and evaluated. Comprehensive experiments conducted on real-world and synthetic benchmark datasets show that our approach is up to 500X faster than existing approaches

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 07:42

April 23

Bob DuCharme: The Wikidata data model and your SPARQL queries

Reference works to get you taking advantage of the fancy parts quickly.

Posted at 14:43

April 21

Dublin Core Metadata Initiative: Webinar: Me4MAP - A method for the development of metadata application profiles

2017-04-21, A metadata application profile (MAP) is a construct that provides a semantic model for enhancing interoperability when publishing data to the Web of Data. When a community of practice agrees to follow a MAP's set of rules for publishing data as Linked Open Data, it makes it possible for such data to be processed automatically by software agents. Therefore, the existence of a method for MAP development is essential to providing developers with a common ground on which to work. The absence of such a method leads to a non-systematic set of MAP development activities that frequently results in MAPs of lesser quality. This Webinar with Mariana Curado Malta, Polythecnic of Oporto, Portugal, will present Me4MAP, a method for the development of metadata application profiles. The webinar will be presented twice, once in English and once in Portuguese. For more information about the webinar and to register, visit http://dublincore.org/resources/training/#2017Malta.

Posted at 23:59

Dublin Core Metadata Initiative: NKOS Workshop at DC-2017 in Washington, DC

2017-04-21, The 11th U.S. Networked Knowledge Organization Systems (NKOS) Workshop will take place on Saturday, October 28 as part of DC-2017 in Crystal City, VA (Washington, D.C.). The Call for Participation including presentations and demos is available at http://dcevents.dublincore.org/IntConf/index/pages/view/nkosCall.

Posted at 23:59

April 19

AKSW Group - University of Leipzig: ESWC 2017 accepted two Demo Papers by AKSW members

Hello Community! The 14th ESWC, which takes place from May 28th to June 1st 2017 in Portoroz, Slovenia, accepted two demos to be presented at the conference. Read more about them in the following:                                                                        

1. “KBox Distributing Ready-to-query RDF Knowledge Graphs by Edgard Marx, Ciro Baron, Tommaso Soru and Sandro Athaide Coleho

Abstract: The Semantic Web community has successfully contributed to a remarkable number of RDF datasets published on the Web.However, to use and build applications on top of Linked Data is still a cumbersome and time-demanding task.We present \textsc{KBox}, an open-source platform that facilitates the distribution and consumption of RDF data.We show the different APIs implemented by \textsc{KBox}, as well as the processing steps from a SPARQL query to its corresponding result.Additionally, we demonstrate how \textsc{KBox} can be used to share RDF knowledge graphs and to instantiate SPARQL endpoints.

Please see: https://www.researchgate.net/publication/315838619_KBox_Distributing_Ready-to-query_RDF_Knowledge_Graphs

and

https://www.researchgate.net/publication/305410480_KBox_–_Transparently_Shifting_Query_Execution_on_Knowledge_Graphs_to_the_Edge

2. “EAGLET – All That Glitters is not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking“ by Kunal Jha, Michael Röder and Axel-Cyrille Ngonga Ngomo

The desideratum to bridge the unstructured and structured data on theweb has lead to the advancement of a considerable number of annotation tools andthe evaluation of these Named Entity Recognition and Entity Linking systems isincontrovertibly one of the primary tasks. However, these evaluations are mostlybased on manually created gold standards. As much these gold standards have anupper hand of being created by a human, it also has room for major proportionof over-sightedness. We will demonstrate EAGLET, a tool that supports the semi-automatic checking of a gold standard based on a set of uniform annotation rules.

Please also see: https://svn.aksw.org/papers/2017/ESWC_EAGLET_2017/public.pdf

Posted at 08:19

April 08

Ebiquity research group UMBC: Google search now includes schema.org fact check data

Google claims on their search blog that “Fact Check now available in Google Search and News”.  We’ve sampled searches on Google and found that some results did indeed include Fact Check data from schema.org’s ClaimReview markup.  So we are including the following markup on this page.

    
    <script type="application/ld+json">
    {
      "@context": "http://schema.org",
      "@type": "ClaimReview",
      "datePublished": "2016-04-08",
      "url": "http://ebiquity.umbc.edu/blogger/2017/04/08/google-search-now-
              including-schema-org-fact-check-data",
      "itemReviewed":
      {
        "@type": "CreativeWork",
        "author":
        {
          "@type": "Organization",
          "name": "Google"
        },
        "datePublished": "2016-04-07"
      },
      "claimReviewed": "Fact Check now available in Google search and news",
      "author":
      {
        "@type": "Organization",
        "Name": "UMBC Ebiquity Research Group",
        "url": "http://ebiquity.umbc.edu/"
      },
      "reviewRating":
      {
        "@type": "Rating",
        "ratingValue": "5",
        "bestRating": "5",
        "worstRating": "1",
        "alternateName" : "True"
      }
    }</script>

Google notes that

“Only publishers that are algorithmically determined to be an authoritative source of information will qualify for inclusion. Finally, the content must adhere to the general policies that apply to all structured data markup, the Google News Publisher criteria for fact checks, and the standards for accountability and transparency, readability or proper site representation as articulated in our Google News General Guidelines. If a publisher or fact check claim does not meet these standards or honor these policies, we may, at our discretion, ignore that site’s markup.”

and we hope that the algorithms will find us to be an authoritative source of information.

You can see the actual markup by viewing this page’s source or looking at the markup that Google’s structured data testing tool finds on it here by clicking on ClaimReview in the column on the right.

Update: We’ve been algorithmically determined to be an authoritative source of information!

Posted at 14:39

April 07

AKSW Group - University of Leipzig: AKSW Colloquium, 10.04.2017, GeoSPARQL on geospatial databases

At the AKSW Colloquium, on Monday 10th of April 2017, 3 PM, Matthias Wauer will discuss a paper titled “Ontop of Geospatial Databases“. Presented at ISWC 2016, this work extends an ontology based data access (OBDA) system with support for GeoSPARQL for querying geospatial relational databases. In the evaluation section, they compare their approach to Strabon. The work is partially supported by the Optique and Melodies EU projects.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/public/colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 08:43

April 02

W3C Read Write Web Community Group: Read Write Web — Q1 Summary — 2017

Summary

A quiet start to 2017 as people prepare for www 2017 and ESWC.  An active political quarter saw the inauguration of a US new president, and numerous concerns raised about new laws regarding the privacy at the ISP level.

The Linked Open Data cloud continues to grow and has a neat update here.  There has also been a release of the SHACL playground which allows data to be validated according to various “shapes“.

Linked Data Notifications has become a Proposed Recommendation, and will allow users of the web to have a data inbox, and enable a whole host of use cases.

Communications and Outreach

Collaboration has started to begun with two cloud providers, nextcloud and cozy cloud.  Hopefully this will bring read and write web standards to a wider audience, over time.

 

Community Group

Some ideas for extending the way PATCH works has been described by TimBL.  I found interesting the way data can be transmitted over other protocols than the web

– When clients of listening to the same resource are in fact located physically close, they could exchange patches through other medium like wifi or bluetooth.

– The system can evolve (under stress) to work entirely with distributed patches, making the original HTTP server unnecessary

– The patches could be combined with hashes of versions of folders to be the basis for a git-like version control system, or connect to git itself

solid

Applications

There is a new test website for the openid authentication branch of node solid server and solid client has been updated to work with this.  There have been various fixes to rdf and solid libraries, and two new repositories for solid notifications and solid permissions.

Good work has continued on rabel, a program for reading and writing linked data in various formats.  In addition the browser shimmed apps on solid-ui, solid-app-set continue to improve.  Finally, *shameless plug*, I am writing a gitbook on a skinned version of node solid server, bitmark storage, which hopes to integrate solid with crypto currencies, creating self funding storage.

Last but not Least…

On the topic of crypto currencies, I’m very excited about a draft paper released on semantic block chains.  There was some buzz generated around this topic and hopefully will feature in a workshop next quarter.

Posted at 11:04

March 31

AKSW Group - University of Leipzig: AKSW Colloquium, 03.04.2017, RDF Rule Mining

At the AKSW Colloquium, on Monday 3rd of April 2017, 3 PM, Tommaso Soru will present the state of his ongoing research titled “Efficient Rule Mining on RDF Data”, where he will introduce Horn Concerto, a novel scalable SPARQL-based approach for the discovery of Horn clauses in large RDF datasets. The presentation slides will be available at this URL.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 11:39

March 30

Leigh Dodds: The limitations of the open banking licence

The

Posted at 18:45

March 27

AKSW Group - University of Leipzig: AKSW Colloquium, 27.03.2017, PPO & PPM 2.0: Extending the privacy preference framework to provide finer-grained access control for the Web of Data

In the upcoming Colloquium, March the 27th at 3 PM Marvin Frommhold will discuss the paper “PPO & PPM 2.0: Extending the Privacy Preference Framework to provide finer-grained access control for the Web of Data” by Owen Sacco and John G. Breslin published in the I-SEMANTICS ’12 Proceedings of the 8th International Conference on Semantic Systems.

Abstract:  Web of Data applications provide users with the means to easily publish their personal information on the Web. However, this information is publicly accessible and users cannot control how to disclose their personal information. Protecting personal information is deemed important in use cases such as controlling access to sensitive personal information on the Social Semantic Web or even in Linked Open Government Data. The Privacy Preference Ontology (PPO) can be used to define fine-grained privacy preferences to control access to personal information and the Privacy Preference Manager (PPM) can be used to enforce such preferences to determine which specific parts of information can be granted access. However, PPO and PPM require further extensions to create more control when granting access to sensitive data; such as more flexible granularity for defining privacy preferences. In this paper, we (1) extend PPO with new classes and properties to define further fine-grained privacy preferences; (2) provide a new light-weight vocabulary, called the Privacy Preference Manager Ontology (PPMO), to define characteristics about privacy preference managers; and (3) present an extension to PPM to enable further control when publishing and sharing personal information based on the extended PPO and the new vocabulary PPMO. Moreover, the PPM is extended to provide filtering data over SPARQL endpoints.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 08:13

March 26

Bob DuCharme: Wikidata's excellent sample SPARQL queries

Learning about the data, its structure, and more.

Posted at 17:40

March 24

Dublin Core Metadata Initiative: Sayeed Choudhury to deliver Keynote at DC-2017

2017-03-24, The Governing Board and the Chairs of the DC-2017 Program Committee are please to announce that Sayeed Choudhury, Associate Dean for Research Data Management and Hodson Director of the Digital Research and Curation Center at the Sheridan Libraries of Johns Hopkins University will deliver the keynote address at DC-2017 in Washington, D.C. Choudhury has oversight for data curation research and development and data archive implementation at the Sheridan Libraries at Johns Hopkins University. Choudhury is a President Obama appointee to the National Museum and Library Services Board. He is a member of the Executive Committee for the Institute of Data Intensive Engineering and Science (IDIES) based at Johns Hopkins. He is also a member of the Board of the National Information Standards Organization (NISO) and a member of the Advisory Board for OpenAIRE2020. He has been a member of the National Academies Board on Research Data and Information, the ICPSR Council, the DuraSpace Board, Digital Library Federation advisory committee, Library of Congress' National Digital Stewardship Alliance Coordinating Committee, Federation of Earth Scientists Information Partnership (ESIP) Executive Committee and the Project MUSE Advisory Board. He is the recipient of the 2012 OCLC/LITA Kilgour Award. Choudhury has testified for the U.S. Research Subcommittee of the Congressional Committee on Science, Space and Technology. For additional information, see http://dcevents.dublincore.org/IntConf/index/pages/view/keynote17.

Posted at 23:59

Dublin Core Metadata Initiative: ZBW German National Library of Economics joins DCMI as Institutional Member

2017-03-24, The DCMI Governing Board is pleased to announce that ZBW German National Library of Economics has joined DCMI as a Institutional Member. ZBW Director, Klaus Tochtermann will serve as the Library's representative to the Board. ZBW German National Library of Economics - Leibniz Information Centre for Economics is the world's largest research infrastructure for economic literature, online as well as offline. Its disciplinary repository EconStor provides a large collection of more than 127,000 articles and working papers in Open Access. EconBiz, the portal for international economic information, allows students and researchers to search among nine million datasets. The ZBW edits two journals in economic policy, Wirtschaftsdienst and Intereconomics, and in cooperation with the Kiel Institute for the World Economy produces the peer-reviewed journal Economics based on the principle of Open Access. For information on becoming a DCMI Institutional Member, visit the DCMI membership page at http://dublincore.org/support/.

Posted at 23:59

Dublin Core Metadata Initiative: Webinar: Nailing Jello to a Wall: Metrics, Frameworks, & Existing Work for Metadata Assessment

2017-03-24, With the increasing number of repositories, standards and resources we manage for digital libraries, there is a growing need to assess, validate and analyze our metadata - beyond our traditional approaches such as writing XSD or generating CSVs for manual review. Being able to further analyze and determine measures of metadata quality helps us better manage our data and data-driven development, particularly with the shift to Linked Open Data leading many institutions to large-scale migrations. Yet, the semantically-rich metadata desired by many Cultural Heritage Institutions, and the granular expectations of some of our data models, makes performing assessment, much less going on to determine quality or performing validation, that much trickier. How do we handle analysis of the rich understandings we have built into our Cultural Heritage Institutions’ metadata and enable ourselves to perform this analysis with the systems and resources we have? This webinar with Christina Harlow, Cornell University Library, sets up this question and proposes some guidelines, best practices, tools and workflows around the evaluation of metadata used by and for digital libraries and Cultural Heritage Institution repositories. The goal is for webinar participants to walk away prepared to handle their own metadata assessment needs by using existing works and being better aware of the open questions in this domain. For additional information and to register, go to http://dublincore.org/resources/training/#2017harlow.

Posted at 23:59

Leigh Dodds: What is data asymmetry?

You’ve just parked your car. Google Maps offers to

Posted at 18:01

March 23

schema.org: Schema.org 3.2 release: courses, fact-checking, digital publishing accessibility, menus and more...

Schema.org 3.2 is released! This update brings many improvements including new vocabulary for describing courses, fact-check reviews, digital publishing accessibility, as well as a more thorough treatment of menus and a large number of pending proposals which are offered for early-access use, evaluation and improvement. We also introduce a new "hosted extension" area, iot.schema.org which provides an entry point for schema collaborations relating to the Internet of Things field. As always, our releases page has full details.

These efforts depend entirely on a growing network of collaborations, within our own W3C Community Group and beyond. Many thanks are due to the Schema Course Extension Community Group, the IDPF's Epub Accessibility Working Group, members of the international fact-checking network including the Duke Reporters Lab and Full Fact, the W3C Web of Things and Spatial Web initiatives, the Bioschemas project, and to Wikipedia's Wikidata project.

This release also provides the opportunity to thank two of our longest-serving steering group members, whose careers have moved on from the world of structured data markup. Peter Mika and Martin Hepp have both played leading roles in Schema.org since its earliest days, and the project has benefited greatly from their insight, commitment and attention to detail.

As we look towards future developments, it is worth taking a brief recap on how we have organized things recently. Schema.org's primary discussion forum is a W3C group, although its most detailed collaborations are typically in Github, organized around specific issues and proposed changes. These discussions are open to all interested parties. Schema designs frequently draw upon related groups that have a more specific topical focus. For example, the Courses group became a hub for education/learning metadata experts from LRMI and others. This need to engage with relevant experts also motivated the creation of the "pending" area introduced in our previous release. Github is a site oriented towards computer programmers. By surfacing proposed, experimental and other early access designs at pending.schema.org we hope we can reach a wider audience who may have insight to share. With today's release, we add 14 new "pending" designs, with courses, accessibility and fact-checking markup graduating from pending into the core section of schema.org. Future releases will follow this pipeline approach, encouraging greater consistency, quality and clarity as our vocabulary continues to evolve.



Posted at 15:29

March 21

Leigh Dodds: Fearful about personal data, a personal example

I was recently at a workshop on making better use of (personal) data for the benefit of specific communities. The discussion, perhaps inevitably, ended up focusing on many of the attendees concerns around how data about them was being used.

The group was asked to share what made them afraid or fearful about how personal data might be misused. The examples were mainly about use of the data by Facebook, by advertisers, as surveillance, etc. There was a view that being in control of that data would remove the fear and put the individual back in control. This same argument pervades a lot of the discussion around personal data. The narrative is that if I own my data then I can decide how and where it is used.

But this overlooks the fact that data ownership is not a clear cut thing. Multiple people might reasonably claim to have ownership over some data. For example bank transactions between individuals.

Posted at 19:10

Gregory Williams: SPARQL Limit by Resource

As part of work on the Attean Semantic Web toolkit, I found some time to work through limit-by-resource, an oft-requested SPARQL feature and one that my friend Kjetil lobbied for during the SPARQL 1.1 design phase. As I recall, the biggest obstacle to pursuing limit-by-resource in SPARQL 1.1 was that nobody had a clear idea of how to fit it nicely into the existing SPARQL syntax and semantics. With hindsight, and some time spent working on a prototype, I now suspect that this was because we first needed to nail down the design of aggregates and let aggregation become a first-class feature of the language.

Now, with a standardized syntax and semantics for aggregation in SPARQL, limit-by-resource seems like just a small enhancement to the existing language and implementations by the addition of window functions. I implemented a RANK operator in Attean, used in conjunction with the already-existing GROUP BY. RANK works on groups just like aggregates, but instead of producing a single row for each group, the rows of the group are sorted, and given an integer rank which is bound to a new variable. The groups are then “un-grouped,” yielding a single result set. Limit-by-resource, then, is a specific use-case for ranking, where groups are established by the resource in question, ranking is either arbitrary or user-defined, and a filter is added to only keep rows with a rank less than a given threshold.

I think the algebraic form of these operations should be relatively intuitive and straightforward. New Window and Ungroup algebra expressions are introduced akin to Aggregation and AggregateJoin, respectively. Window(G, var, WindowFunc, args, order comparators) operates over a set of grouped results (either the output of Group or another Window), and Ungroup(G) flattens out a set of grouped results into a multiset.

If we wanted to use limit-by-resource to select the two eldest students per school, we might end up with something like this:

Project(
    Filter(
        ?rank <= 2,
        Ungroup(
            Window(
                Group((?school), BGP(?p :name ?name . ?p :school ?school . ?p :age ?age .)),
                ?rank,
                Rank,
                (),
                (DESC(?age)),
            )
        )
    ),
    {?age, ?name, ?school}
)

Students with their ages and schools are matched with a BGP. Grouping is applied based on the school. Rank with ordering by age is applied so that, for example, the result for the eldest student in each school is given ?rank=1, the second eldest ?rank=2, and so on. Finally, we apply a filter so that we keep only results where ?rank is 1 or 2.

The syntax I prototyped in Attean allows a single window function application applied after a GROUP BY clause:

PREFIX : <http://example.org/>
SELECT ?age ?name ?school WHERE {
    ?p :name ?name ;
       :school ?school ;
       :age ?age .
}
GROUP BY ?school
RANK(DESC(?age)) AS ?rank
HAVING (?rank <= 2)

However, a more complete solution might align more closely with existing SQL window function syntaxes, allowing multiple functions to be used at the same time (appearing syntactically in the same place as aggregation functions).

PREFIX : <http://example.org/>
SELECT ?age ?name ?school WHERE {
    ?p :name ?name ;
       :school ?school ;
       :age ?age .
}
GROUP BY ?school
HAVING (RANK(ORDER BY DESC(?age)) <= 2)

or:

PREFIX : <http://example.org/>
SELECT ?age ?name ?school WHERE {
    {
        SELECT ?age ?name ?school (RANK(GROUP BY ?school ORDER BY DESC(?age)) AS ?rank) WHERE {
            ?p :name ?name ;
               :school ?school ;
               :age ?age .
        }
    }
    FILTER(?rank <= 2)
}

Posted at 03:17

Copyright of the postings is owned by the original blog authors. Contact us.