Planet RDF

It's triples all the way down

February 17

Leigh Dodds: Creating better checklists, a short review of the Checklist Manifesto

I’ve just finished reading The Checklist Manifesto by Atul Gawande (Cancer Research UK affiliate link). It’s been on my reading list for a while. In my work I’ve written quite a few checklists to help capture best practice or to provide advice. So I was curious about whether I could learn something about creating better checklists.

I wanted to write down a few reflections whilst they’re still fresh in my mind.

The book explores the use of checklists in medicine, aviation and to a lesser extent in business. Checklists aren’t to-do lists. They are a tool to help reduce risk, uncertainty and failure. Gawande uses ample anecdotes, supported by evidence from real-world studies, to illustrate how effective a simple checklist can be. They routinely save people’s lives during surgeries and are a key contributor to the safety of modern aviation.

Gawande explains how checklists allow teams to perform better in complex situations. They protect against individual fallibility, and can help to transfer best practice and research into operational use. He explains that checklists aren’t a teaching tool. They are a means of imposing discipline on a team. Their goal is to improve outcomes.

He also explains why he thinks they’re not being used more widely. In particular, he highlights the tendency of professionals to feel like they’re being undermined or challenged when asked to use simple checklists. A “hero culture” contributes to this problem, something that is very evident in surgery. It’s also something that the tech industry struggles with as well.

Gawande explains that checklists help to address this by re-balancing power within teams. For example, by giving nurses the permission to halt a surgical procedure if a checklist hasn’t been completed to their satisfaction. A hero culture might otherwise silence the raising of concerns, or deter team members from pointing out problems as they see them.

The book highlights that a common pattern that occurs in many successful checklists are specific steps to encourage and make time for team communication. These range from simple introductions and a review of responsibilities, through to a walk-through of expected and possible outcomes. These all contribute towards making the team a more effective unit.

Throughout the book I was wondering how to transfer the insight into other areas. Gawande suggests that checklists are useful anywhere that we have multi-disciplinary teams working together on complex tasks. And specifically where those tasks might have complex outcomes that might have serious impacts.

I think there’s probably a lot of examples in the data and digital world where they might be useful.

What if teams working on data science and machine learning had “preflight checklists” that were used not just at the start of a project, but also at the time of launch and beyond? Would they help highlight problems, increase discipline and allow times for missteps or other concerns to be highlighted?

The ODI data ethics canvas, developed by Amanda Smith, Ellen Broad and Peter Wells is not quite a checklist. But it’s a similar type of tool, aiming to address some similar problems. Privacy impact assessments are another example. But perhaps there are other useful aids?

The book also raises wider questions about the approach we take in our societies towards ensuring safe outcomes of our work, research, etc. There is often too much focus on the use and application of exciting new research and technologies, and not enough on the discipline required to use them safely and effectively.

In short, are we taking care of one another in the best ways we know how?

Creating better checklists

There’s some great insight into creating checklists scattered throughout the Manifesto. But ironically, they’re not gathered together into a single list of suggestions.

So, for my own benefit I’ve jotted down some points to reflect on:

  • Checklists need to be focused. An exhaustive list of steps is not useful. Trust people know how to do their job, just ask them to confirm the most critical or important steps
  • Think about how the checklist will be used. There are READ-DO checklists (where you perform each item and check it off) and DO-CONFIRM checklists (where you carry out an activity, and then review what you have done)
  • Checklists can be used to help with both routine situations (pre-flight) and emergencies (engine failure)
  • Make the checklist easy to use. A good user experience can help embed them into routine practice
  • Consider who is leading use of the checklist. A checklist can help to balance power across a team
  • Include team communications. Teams perform better when they know each other and understand their roles. Ask them to explain what will happen and what the expected outcomes might be. This helps teams deal with items not on the checklist
  • Test and iterate the list
  • Let people customise it, so they can adapt to local use
  • Measuring success and impact (e.g. by measuring outcomes, or even just identifying where they have helped) can help encourage others to adopt it

 

Posted at 12:51

Egon Willighagen: FAIR-er Compound Interest Christmas Advent 2017: learnability and citability

Compound Interest infographics
of yesterday.
I love Compound Interest! I love what it does for popularization of the chemistry in our daily life. I love that the infographics have a pretty liberal license.

But I also wish they would be more usable. That is, the usability is greatly diminished by the lack of learnability. Of course, there is not a lot of room to give pointers.  Second, they do not have DOIs and are hard to cite as source. That said, the lack of sourcing information may not make it the best source, but let's consider these aspects separate. I would also love to see the ND clause got dropped, as it makes it harder to translate these infographics (you do not have that legal permission to do so) and fixing small glitches has to involve Andy Brunning personally.

The latter I cannot change, but the license allows me to reshare the graphics. I contacted Andy and proposed something I wanted to try. This post details some of the outcomes of that.

Improving the citability
This turns out to be the easy part, thanks to the great integration of GitHub and Zenodo. So, I just started a GitHub repository, added the proper license, and copied in the graphics. I wrapped it with some Markdown, taking advantage of another cool GitHub feature, and got this simple webpage:


By making the result a release, it got automatically archived on Zenodo. Now Andy's Compound Interest Christmas Advent 2017 has a DOI: 10.5281/zenodo.1164788:


So, this archive can be cited as:
Andy Brunning, & Egon Willighagen. (2018, February 2). egonw/ci-advent-2017: Compound Interest Christmas Advent 2017 - Version 1 (Version v1). Zenodo. http://doi.org/10.5281/zenodo.1164789
Clearly, my contribution is just the archiving and, well, what I did as explained in the next section. The real work is done by Andy Brunning, of course!

Improving the learnability
One of the reasons I love the graphics, it that is shows the chemicals around is. Just look aside your window and you'll see the chemicals that make flowers turn colorful, berries taste well, and several plants poisonous. Really, just look outside! You see them now? (BTW, forget about that nanopores and minions, I want my portable metabolomics platform :)

But if I want to learn more about those chemicals (what are their properties, how do I extract them from the plants, how will I recognize them, what toxins are I (deliberately, but in very low doses) eating during lunch, who discovered them, etc, etc?), those infographics don't help me.

Scholia to the rescue (see doi:10.1007/978-3-319-70407-4_36): using Wikidata (and SPARQL queries) this can tell me a lot about chemicals, and there is a good community that cares about the above questions too, and adds information to Wikidata. Make sure to check out WikiProject Chemistry. All it needed is a Scholia extension for chemicals, something we've been working on. For example, check out bornyl acetate (from Day 2: Christmas tree aroma):


This list of identifiers is perhaps not the most interesting, and we're still working out how we can make it properly link out with the current code. Also, this compound is not so interesting for properties, but if there is enough information, it can look list this (for acetic acid):


I can recommend exploring the information it provides, and note the links to literature (which may include primary literature, though not in this screenshot).

But I underestimated the situation, as Compound Interest actually includes quite a few compound classes, and I had yet to develop a Scholia aspect for that. Fortunately, I got that finished too (and I love it), and it as distinct features and properly integrated, but to give you some idea, here is what phoratoxin (see Day 9: Poisonous Mistletoe) looks like:


Well, I'm sure it will look quite different in a year from now, but I hope you can see where this is going. It is essential we improve the FAIR-ness (see doi:10.1038/sdata.2016.18) of resources, small and large. If project like Compound Interest would set an example, this will show the next generation scientists how to do science better.

Posted at 08:52

February 15

Dydra: Inferring Quicklisp System Descriptions

We presented, at the recent ELS 2016[0], the speculative results from using SPARQL to analyse implicit dependancy relations in the LISP ecosystem, as embodied in a recent Quicklisp[1] release.

Posted at 15:08

Dublin Core Metadata Initiative: DCMI 2018 Conference - Call for Papers

The Call for Papers for the DCMI Conference, 2018, has been issued! Full details, including instructions on how to submit proposals, deadlines etc. can be found on the DCMI Conference Submission website. More general information about the conference, venue, location etc. can be found on the DCMI website.

Posted at 10:56

February 06

Dublin Core Metadata Initiative: DC-2018 Announced

We are pleased to announce that DCMI's annual international conference, DC-2018, will be hosted by the University of Porto in Portugal, from September 10th - 13th. We are also pleased to announce that the conference will be co-located with the Theory and Practice of Digital Libraries (TPDL) annual international conference The conferences will deliver two, parallel programmes of peer-reviewed papers, tutorials and workshops. Delegates will register once, and will then be free to choose between the two programmes, with some plenary sessions (keynotes) and social events bringing all delegates together.

Posted at 10:56

January 31

: W3C/ERCIM at Boost 4.0 kick off meeting

W3C/ERCIM is one of fifty organizations participating in the Boost 4.0 European project on big data in Industry 4.0 which kicked off with an initial face to face meeting at the Automotive Intelligence Center in Bilbao on 30-31 January 2018. … Continue reading

Posted at 18:31

January 29

Leigh Dodds: When are open (geospatial) identifiers useful?

In a meeting today, I was discussing how and when open geospatial identifiers are useful. I thought this might make a good topic for a blog post in my continuing series of questions about data. So here goes.

An identifier provides an unambiguous reference for something about which we want to collect and publish data. That thing might be a road, a school, a parcel of land or a bus stop.

If we publish a dataset that contains some data about “Westminster” then, without some additional documentation, a user of that dataset won’t know whether the data is about a tube station, the Parliamentary Constituency, a company based in Hayes or a school.

If we have identifiers for all of those different things, then we can use the identifiers in our data. This lets us be confident that we are talking about the same things. Publishing data about “940GZZLUWSM” makes it pretty clear that we’re referring to a specific tube station.

If data publishers use the same sets of identifiers, then we can start to easily combine your dataset on the wheelchair accessibility of tube stations, with my dataset of tube station locations and Transport for London’s transit data. So we can build an application that will help people in wheelchairs make better decisions about how to move around London.

Helpful services

To help us publish datasets that use the same identifiers, there are a few things that we repeatedly need to do.

For example it’s common to have to lookup an identifier based on the name of the thing we’re describing. E.g. what’s the code for Westminster tube station? We often need to find information about an identifier we’ve found in a dataset. E.g. what’s the name of the tube station identified by 940GZZLUWSM? And where is it?

When we’re working with geospatial data we often need to find identifiers based on a physical location. For example, based on a latitude and longitude:

  • Where is the nearest tube station?
  • Or, what polling district am I in, so I can find out where I should go to vote?
  • Or, what is the identifier for the parcel of land that contains these co-ordinates?
  • …etc

It can be helpful if these repeated tasks are turned into specialised services (APIs) that make it easier to perform them on-demand. The alternative is that we all have to download and index the necessary datasets ourselves.

Network effects

Choosing which identifiers to use in a dataset is an important part of creating agreements around how we publish data. We call those agreements data standards.

The more datasets that use the same set of identifiers, the easier it becomes to combine those datasets together, in various combinations that will help to solve a range of problems. To put it another way, using common identifiers helps to generate network effects that make it easier for everyone to publish and use data.

I think it’s true to say that almost every problem that we might try and solve with better use of data requires the combination of several different datasets. Some of those datasets might come from the private sector. Some of them might come from the public sector. No single organisation always holds all of the data.

This makes it important to be able to share and reuse identifiers across different organisations. And that is why it is important that those identifiers are published under an open licence.

Open licensing

Open licences allow anyone to access, use and share data. Openly licensed identifiers can be used in both open datasets and those that are shared under more restrictive licences. They give data publishers the freedom to choose the correct licence for their dataset, so that it sits at the right point on the data spectrum.

Identifiers that are not published under an open licence remove that choice. Restricted licensing limits the ability of publishers to share their data in the way that makes sense for their business model or application. Restrictive licences cause friction that gets in the way of making data as open as possible.

Open identifiers create open ecosystems. They create opportunities for a variety of business models, products and services. For example intermediaries can create platforms that aggregate and distribute data that has been published by a variety of different organisations.

So, the best identifiers are those that are

  • published under an open licence that allows anyone to access, use and share them
  • published alongside some basic metadata (a label, a location or other geospatial data, a type)
  • and, are accessible via services that allow them to be easily used

Who provides that infrastructure?

Whenever there is friction around the use of data, application developers are left with a difficult choice. They either have to invest time and effort in working around that friction, or compromise their plans in some way. The need to quickly bring products to market may lead to choices which are not ideal.

For example, developers may choose to build applications against Google’s mapping services. These services are easily and immediately available for anyone developer wanting to display a map or recommend a route to a user. But these platforms usually have restricted licensing that means it is usually the platform provider that reaps the most benefits. In the absence of open licences, network effects can lead to data monopolies.

So who should provide these open identifiers, and the metadata and services that support them?

This is the role of national mapping agencies. These agencies will already have identifiers for important geospatial features. The Ordnance Survey has an identifier called a TOID which is assigned to every feature in Great Britain. But there are other identifiers in use too. Some are designed to support publication of specific types of data, e.g. UPRNs.

These identifiers are national assets. They should be managed as data infrastructure and not be tied up in commercial data products.

Publishing these identifiers under an open licence, in the ways that have been outlined here, will provide a framework to support the collection and curation of geospatial data by many  different organisations, across the public and private sector. That infrastructure will allow value to be created from that geospatial data in a variety of new ways.

Provision of this type of infrastructure is also in-line with what we can see happening across other parts of government. For example the work of the GDS team to develop registers of important data. Identifiers, registers and standards are important building blocks of our local, national and global data infrastructure.

If you’re interested in reading more about the benefits of open identifiers, then you might be interested in this white paper that I wrote with colleagues from the Open Data Institute and Thomson Reuters: “Creating value from identifiers in an open data world

Posted at 20:47

January 28

Bob DuCharme: JavaScript SPARQL

With rdfstore-js.

Posted at 14:35

January 22

: Cell Phones for Seniors: Stay Independent, Stay Safe

Cell Phones for Seniors

One of the biggest arguments against allowing elderly family members and senior citizens to live alone is the concern that no one would be there to help them if they needed medical attention, got lost or needed any other type of assistance. This fear led to all sorts of devices for helping the elderly get in touch with emergency services (the iconic “I’ve fallen and I can’t get up” commercials come to mind). But now, cell phones for seniors deftly fill this niche. Plus, mobile phones for the elderly have more features than the usual lifeline devices. Here are a few handy features you should help your family member or friend set up on their cell phone:

In Case of Emergency (ICE) Contact

The first thing you should do when you get a cell phone for senior citizens is to program in a few ICE contacts. You can do this in the normal phonebook or in the settings of the phone. Plug in a couple different phone numbers for several individuals, including your work number, your cell number, your home number and the doctor’s office number. EMTs and other emergency assistance workers will know to check the cell phone for ICE contacts and will be able to get in touch with you if your loved one is incapacitated or hospitalized.

#TAXI

This feature is more of a convenience and it’s something that all of us should consider using from time to time (for instance, when you’ve had a little too much to drink). For a small fee, #TAXI connects you to a service that helps you find an affordable taxi near your location. This is the best way to get picked up quickly, without having to stand in the street hailing a cab or wait in the cold for hours.

Roadside Assistance

AAA remains one of the best motor clubs available, but the roadside assistance offered by your cell phone company has its advantages. For example, Verizon’s roadside assistance can cost you as little as $3 and will bail you out if you need to jump your battery, change a tire, run out of gas (3 gallons free), unlock your car when you’re locked out, perform small mechanical fixes to get it running long enough to get to a service station, up to 10 miles free towing as well as help getting your car out of ditches, snowdrifts and mud. The number for Verizon roadside assistance is easy to remember – just dial #ROAD. Other carriers have similar numbers, such as #AUTO.

GPS

GPS is a must in any cell phone for seniors. It’ll help them get their bearings when they’re lost via GPS navigation and maps and it’ll help you (or emergency services) find them, even if they don’t answer the phone. For instance, AT&T offers the FamilyMap service, which lets you know where up to 5 family members are (whether it’s grandpa or junior) and Verizon has the similar Family Locator.

Advertisements

Posted at 08:31

January 07

W3C Read Write Web Community Group: Read Write Web — Q4 Summary — 2017

Summary

TPAC 2017 kicked off in California, achieving its highest attendance to date, with some saying it may have been the best TPAC ever.  Extensive Strategic Highlights were also published.

HTML 5.2 is now a recommendation, with HTML 5.3 coming.  Work on payments made progress with some demos presented at money 20/20.  There was also a nice look ahead to 2018 and semantic web trends from Dataversity.

A slight pickup in activity this quarter in the community group.  With around 75 messages, almost double the previous quarter.  More details below.

Communications and Outreach

Aside from Fedora, there was some outreach this quarter with CERN, where it all began, with a view to possibly reuse Web Access Control.

 

Community Group

In the CG there were calls for team members with Sebastian Samagura starting a project.  An announcement of the Fedora API Spec CR and a fantastic post by Ruben entitled, “Paradigm Shifts for the Decentralized Web“.

solid

Applications

By convention it’s been decided to try to add the ‘solid-app’ tag to new and existing apps for the solid platform, which will hopefully allow apps to become searchable.  One new app, twee-fi, was written from scratch quite quickly and seems to work with all our servers.  There is a general move to patch existing apps to follow this pattern so that both OIDC and TLS auth can be leveraged.

There have been updates to rdflib.js, solid-ui and solid-app-set.  I also made a small console based playground, solid-libraries, that shows how these libraries fit together, and allows a few commands as examples.  Additionally I have started trying to patch apps to use the new auth system starting with the pastebin tutorial.  Hopefully more apps will be patched this quarter.

Last but not Least…

The OWL Time ontology is now a W3C REC. The ontology provides a vocabulary for expressing facts about topological (ordering) relations among instants and intervals, together with information about durations, and about temporal position including date-time information.

Posted at 08:15

January 03

W3C Blog Semantic Web News: W3C study on Web data standardization

The Web has had a huge impact on how we exchange and access information. The Web of data is growing rapidly, and interoperability depends upon the availability of open standards, whether intended for interchange within small communities, or for use on a global scale. W3C is pleased to release a W3C study on practices and tooling for Web data standardization, and gratefully acknowledges support from the Open Data Institute and Innovate UK.

A lengthy questionnaire was used to solicit input from a wide range of stakeholders. The feedback will be used as a starting point for making W3C a more effective, more welcoming and sustainable venue for communities seeking to develop Web data standards and exploit them to create value added services.

Posted at 17:44

: W3C study on Web data standardization

The Web has had a huge impact on how we exchange and access information. The Web of data is growing rapidly, and interoperability depends upon the availability of open standards, whether intended for interchange within small communities, or for use … Continue reading

Posted at 16:13

December 31

Bob DuCharme: SPARQL and Amazon Web Service's Neptune database

Promising news for large-scale RDF development.

Posted at 14:53

December 21

John Goodwin: Using Machine Learning to write the Queen’s Christmas Message

In their excellent book “The Indisputable Existence of Santa Claus” Hannah Fry (aka @FryRsquared) and Thomas Oléron Evans (aka @Mathistopheles) talked about using Markov Chains to generate the Queen’s Christmas message. You can read a bit about that here. After reading that chapter I asked Hannah and Thomas if they had considered repeating this using recurrant neural networks. A couple of years ago Andrej Karpathy wrote a blog that he summarised as follows:

We’ll train RNNs to generate text character by character and ponder the question “how is that even possible?”

In his blog he posed the question:

It looks like we can learn to spell English words. But how about if there is more structure and style in the data?

and went on to train an rNN to write text in the style of  Shakespeare. I recommend you read his blog to see the examples and get an idea of what is going on.

You might also want to watch this video from the excellent Computerphile on a similar subject.

In a previous blog I decide to have a go at using rNNs to generate lyrics for my favourite band New Model Army. Thomas was kind enough to send me the text for the Queen’s previous speeches that he and Hannah had used in their work, and I thought I’d see how good rNNs were at generating a Royal Christmas message.  I didn’t do anything particular fancy here…just followed the code and examples in Karpathy’s original blog on a fairly standard home desktop machine.

Here are some of the results… (I take no responsibility for any text generated by the neural network :)).

One parameter that can be adjusted to generate the output is ‘temperature’. This takes a value between 0 and 1. Lower temperature will cause the model to make more likely, but also more boring and conservative predictions. Higher temperatures cause the model to take more chances and increase diversity of results, but at a cost of more mistakes. Running with a temperaure of 0.5 we get:

i want to send a special message of the human spirit are much to be for everyone this is a happy and part of the christmas story so in the same streed. we are all truly new zealand in an example of the commonwealth has an equal place in the lives of men and women and what they have to show that they are all life in their own peoples of the commonwealth which we have come to the sacrifice of all the fortunes of hope and there are many heart of our grandchildren the presents the prince of wales and will have a preacher group of the commonwealth. on a strong and spreading the responsibility to the commonwealth who are much on the courage of the modern world and the commonwealth games and there are many others at home and the rest of the world have been inspired by the courage and enjoyment and the present peace of a commonwealth which is how to promote the games and from all walks of life and to become a growing peace on earth which we should me to see the strength to try to do so in the world today. it was a great conflict remain inspired by the courage and prince philip and i were able to see buckingham palace which the most special year for the commonwealth is to be movement and in a different furmestand and they are all source of spirit all over the world have shared a moment in the year and the christmas message is to be consed rather a quicter and the contribution of the family of nations and i have meen and wonderful in both these ordarisasions or a common sense and encouragement. this year i am speaking to you from the same family make the world in the world. the christmas message is so many of them and i have difference to the best of ourselves. and this year there are something to look at the world they are may be a life of all the world is the commonwealth is a sense of belonging to the world i was well as they are all sort of moment the more depression

I’m not sure this makes a lot of sense, and there are a few oddities/errors in the text.

I tried lowering the temperature to 0.3:

i hope that the christmas message is to be there and it is not only a very happy christmas to you all. i am speaking to you from the commonwealth and i have been able to see that the commonwealth and i have been deeply touched this year i am speaking to you from the same family friends and communities as the future of the commonwealth and as i have always given to many of you will have heard of the more conscious of the commonwealth and around us we should be able to see that the commonwealth and i have been deeply touched the strength to try to do the same family gathering in the commonwealth and i have all been about the future as well as a great comfort to us all the problems of the commonwealth and i have seen in their own way the problems of the commonwealth and i have been deeply touched the world the progress of the commonwealth and around us we are common throughout the commonwealth who are struck by the state of the commonwealth and i have depended to see in the world today that we should remember those who have seen in their own way a celebration of the child who was born at christmas time for families and friends will never be very different problems but it is not only a time for reflection and confidence to the commonwealth and i have all been about the world that we should not be our lives and to remind us of the future. i am speaking to you from the commonwealth and i have been deeply touched the world that we can all try to make a splendid birthday and the commonwealth and i have been floods and sadness and the best of the world have been able to discuss the best of ourselves. i believe that this christmas day i want to send a special message of hope in the face of hardship is nothing new that of the commonwealth who have seen in their lives in the commonwealth and in the commonwealth and i have been able to discuss the best of ourselves. we are all live together as a great daily and its war the commonwealth

As suggested the the result here is a bit more predictable/boring.

It is also possible to prime the model with some starting text using. This starts out the RNN with some hardcoded characters to warm it up with some context before it starts generating text.

Using a temperature of 0.5 and prime text of ‘Each Christmas’ we get:

each christmas is a time for reflection and grandchildren the more depends of a dedication of the commonwealth and around us of the future of the commonwealth games in the rich and proud of life in a great religions and its members. but i am also speaking by instince in the right through the members of my family and i have been great and personal responsibility to our lives and they say to me that there are many happy or so many countries of the future with the prince of peace and happiness and self-respect for the commonwealth and is a happy and prosperous future of the world we will be achieved with the opportunity to help them are a contribution to the powerful ways of spirit and learning to see the problems of the commonwealth and i have seen in their inestivation the peoples of the commonwealth and i have been for the better but that is possible to the people and the christians it’s all those who are great practical form we have all been for them to be a chance to get the gift of a carren of different lives the service of their people throughout the world. we are all live in the rolin to our children and grandchildren the present generation to see this country and arrived the same for one another which is most popularity and the rest of the world and all the great commonwealth. pronomed news the responsibilities for the duty of events which we have all been the holidic of science. it is they all who are so there and shared heritage and sometimes in saint paul’s and much considerate and to communication the human spirit. and they can be a contralle commonwealth and a firm landmark in the course of the many servicemen and women who are broken for many that can be a precious given us to be a witness this continuing that we can all be the contribution of the commonwealth and as we all have all the features of the love of hope and goodwill. this year has constantly will be overcome. i believe that this year i am speaking to you from the hope of a determination and continues to all

Using a temperature of 0.5 and prime text of ‘This  past year’ we get:

this past year has been one to see that what i have been for the commonwealth and around us we are thinking of the problems show in the face of the fact that he wanted the strongly that we should show many of them service is the birth of the commonwealth. it is a contribution of a lives of family and friends for parents and religious difficulties at the commonwealth and as we all share the future as the future of the future. it was a moment and that the christmas message of the things that have been god shown in the earliest care and for the commonwealth can give the united kingdom and across the commonwealth. it is a helping one of the march of people who are so easy to go but the rest of the world have shaped for their determination and courage of what is right the life of life is word if we can do the state of the same time last month i was welcomed as you and that the opportunity to honour the most important of the thread which have provided a strong and family. even the commonwealth is a common bond that the old games and in the courage which the generations of the commonwealth and i have those to succeed without problems which have their course all that the world has been complete strangers which could have our response. i believe that the triditings of reconciliation but there is nothing in and happy or acts of the commonwealth and around us by science the right of all the members of the world have been difficult and the benefits of dreads and happiness and service to the commonwealth of the future. i wish you all a very happy christmas to you all.

So there you have it. Not sure we’ll be replacing the Queen with an AI anytime soon. Merry Christmas!

Posted at 11:00

December 18

AKSW Group - University of Leipzig: SANSA 0.3 (Semantic Analytics Stack) Released

Dear all,

We are happy to announce SANSA 0.3 – the third release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

  • Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format
  • Reading OWL files in various standard formats
  • Support for multiple data partitioning techniques
  • SPARQL querying via Sparqlify (with some known limitations until the next Spark 2.3.* release)
  • SPARQL querying via conversion to Gremlin path traversals (experimental)
  • RDFS, RDFS Simple, OWL-Horst (all in beta status), EL (experimental) forward chaining inference
  • Automatic inference plan creation (experimental)
  • RDF graph clustering with different algorithms
  • Rule mining from RDF graphs based AMIE+
  • Terminological decision trees (experimental)
  • Anomaly detection (beta)
  • Distributed knowledge graph embedding approaches: TransE (beta), DistMult (beta), several further algorithms planned

Deployment and getting started:

  • There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
  • The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
  • There is example code for various tasks available.
  • We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects Big Data Europe, HOBBIT, SAKE, Big Data Ocean, SLIPO, QROWD and BETTER.

Greetings from the SANSA Development Team

Posted at 10:15

December 17

Egon Willighagen: Winter solstice challenge #2: first submission

Citation data for articles colored by
availability of a full text, as indicated
in Wikidata. Mind the artifacts.
Reproduce with this query and the
new GraphBuilder functionality.
Bianca Kramer (of 101 Innovations fame) is the first to submit results for the Winter solstice challenge and it's impressive! She has an overall score based on her own publications and the first level citations of 54%!

So, the current rankings in the challenge are as follows.

Highest Score

  1. Bianca Kramer

Best Tool

  1. Bianca Kramer

I'm sure she is more than happy if you use her tool to calculate your score. If you're patient, you may even wish to take it one level deeper.

What are you talking about??
Well, the original post sheds some idea on this, but basically scientific writing has become so dense, that a single paper does not provide enough information. But if you cannot read the cited papers, you may not be able to precisely reproduce what they did. Now that many countries are steadily heading to 100% #OpenAccess it is time to start thinking about the next step. So, is the knowledge you built on also readily available or is that still locked away.

For example, take the figure on the right-hand side: it shows when articles are published that I cited in my work (a subset, because based on data in Wikidata, using the increasing amount of I4OC data). We immediately see some indication of the availability of the cited papers. The more yellow, the more available. However, keep in mind that this is based on "full text availability" information in Wikidata, which is very sparse. That makes Bianca's approach so powerful: it uses (the equally wonderful) oadoi.org.

You also note the immediate quality issues. Apparently, this data tells me I am citing articles from the future :) You also see that I am citing some really old articles.

Posted at 06:56

December 16

Ebiquity research group UMBC: Videos of ISWC 2017 talks

Videos of almost all of the talks from the 16th International Semantic Web Conference (ISWC) held in Vienna in 2017 are online at videolectures.net. They include 89 research presentations, two keynote talks, the one-minute madness event and the opening and closing ceremonies.

Posted at 23:15

Ebiquity research group UMBC: Videos of ISWC 2017 talks

Videos of almost all of the talks from the 16th International Semantic Web Conference (ISWC) held in Vienna in 2017 are online at videolectures.net. They include 89 research presentations, two keynote talks, the one-minute madness event and the opening and closing ceremonies.

Posted at 23:15

December 15

Dublin Core Metadata Initiative: CEDIA Joins as new Regional Member

I am delighted to report that the Corporación Ecuatoriana para el Desarrollo de Investigación y la Academia (CEDIA) has agreed to join DCMI as a regional member. CEDIA is a national membership organisation which represents higher-education and research institutions in Ecuador. The organisations has a keen interest in metadata standardisation - especially in the domain of data repositories, and has been active in this area since 2009. CEDIA is a strong proponent of Open Access, and encourages and supports the adoption of interoperable metadata and specialised metadata vocabularies among its member organisations.

Posted at 10:56

December 14

: W3C Workshop on Linked Data and Privacy

W3C is inviting position papers for a workshop on data controls and linked data vocabularies to be held in Vienna, Austria on 7-8 March 2018. This is motivated by the challenges for addressing privacy across an ecosystem of services involving … Continue reading

Posted at 11:26

: Dataset Exchange WG publishes use cases and requirements

The Dataset Exchange Working Group (DXWG) is pleased to announce the publication of the First Public Working Draft of the Dataset ExchangeUse Cases and Requirements. The working group will produce a second version of the Data Catalog(DCAT) Vocabulary, guidance for … Continue reading

Posted at 11:14

December 01

Libby Miller: #Makevember

@chickengrylls#makevember manifesto / hashtag has been an excellent experience. I’ve made maybe five nice things and a lot of nonsense, and a lot of useless junk, but that’s fine – I’ve learned a lot, mostly about servos and other motors. There’s been tons of inspiration too (check out these beautiful automata, some characterful paper sculptures, Richard’s unsuitable materials, my initial inspiration’s set of themes on a tape, and loads more). A lovely aspect was all the nice people and beautiful and silly things emerging out of the swamp of Twitter.

Screen Shot 2017-12-01 at 16.31.14

Of my own makes, my favourites were this walking creature, with feet made of crocodile clips (I was amazed it worked); a saw-toothed vertical traveller, such a simple little thing; this fast robot (I was delighted when it actually worked); some silly stilts; and (from October) this blimp / submarine pair.

I did lots of fails too – e.g. a stencil, a raspberry blower. Also lots of partial fails that got scaled back – AutoBez 1, 2, and 3; Earth-moon; a poor-quality under-water camera. And some days I just ran out of inspiration and made something crap.

Why’s it so fun? Well there’s the part about being more observant, looking at materials around you constantly to think about what to make, though that’s faded a little. As I’ve got better I’ve had more successes and when you actually make something that works, that’s amazing. I’ve loved seeing what everyone else is making, however good or less-good, whether they spent ages or five minutes on it. It feels very purposeful too, having something you have to do every day.

Downsides: I’ve spent far too long on some of these. I was very pleased with both Croc Nest, and Morse, but both of them took ages. The house is covered in bits of electronics and things I “might need” despite spending some effort tidying, but clearly not enough (and I need to have things to hand and to eye for inspiration). Oh, and I’m addicted to Twitter again. That’s it really. Small price to pay.

Posted at 16:38

November 29

Ebiquity research group UMBC: paper: Automated Knowledge Extraction from the Federal Acquisition Regulations System

Automated Knowledge Extraction from the Federal Acquisition Regulations System (FARS)

Srishty Saha and Karuna Pande Joshi, Automated Knowledge Extraction from the Federal Acquisition Regulations System (FARS), 2nd International Workshop on Enterprise Big Data Semantic and Analytics Modeling, IEEE Big Data Conference, December 2017.

With increasing regulation of Big Data, it is becoming essential for organizations to ensure compliance with various data protection standards. The Federal Acquisition Regulations System (FARS) within the Code of Federal Regulations (CFR) includes facts and rules for individuals and organizations seeking to do business with the US Federal government. Parsing and gathering knowledge from such lengthy regulation documents is currently done manually and is time and human intensive.Hence, developing a cognitive assistant for automated analysis of such legal documents has become a necessity. We have developed semantically rich approach to automate the analysis of legal documents and have implemented a system to capture various facts and rules contributing towards building an ef?cient legal knowledge base that contains details of the relationships between various legal elements, semantically similar terminologies, deontic expressions and cross-referenced legal facts and rules. In this paper, we describe our framework along with the results of automating knowledge extraction from the FARS document (Title48, CFR). Our approach can be used by Big Data Users to automate knowledge extraction from Large Legal documents.

Posted at 01:56

Ebiquity research group UMBC: paper: Automated Knowledge Extraction from the Federal Acquisition Regulations System

Automated Knowledge Extraction from the Federal Acquisition Regulations System (FARS)

Srishty Saha and Karuna Pande Joshi, Automated Knowledge Extraction from the Federal Acquisition Regulations System (FARS), 2nd International Workshop on Enterprise Big Data Semantic and Analytics Modeling, IEEE Big Data Conference, December 2017.

With increasing regulation of Big Data, it is becoming essential for organizations to ensure compliance with various data protection standards. The Federal Acquisition Regulations System (FARS) within the Code of Federal Regulations (CFR) includes facts and rules for individuals and organizations seeking to do business with the US Federal government. Parsing and gathering knowledge from such lengthy regulation documents is currently done manually and is time and human intensive.Hence, developing a cognitive assistant for automated analysis of such legal documents has become a necessity. We have developed semantically rich approach to automate the analysis of legal documents and have implemented a system to capture various facts and rules contributing towards building an ef?cient legal knowledge base that contains details of the relationships between various legal elements, semantically similar terminologies, deontic expressions and cross-referenced legal facts and rules. In this paper, we describe our framework along with the results of automating knowledge extraction from the FARS document (Title48, CFR). Our approach can be used by Big Data Users to automate knowledge extraction from Large Legal documents.

Posted at 01:56

November 25

Leigh Dodds: Data assets and data products

A lot of the work that we’ve done at the ODI over the last few years has involved helping organisations to recognise their data assets.

Many organisations will have their IT equipment and maybe even their desks and chairs asset tagged. They know who is using them, where they are, and have some kind of plan to make sure that they only invest in maintaining the assets they really need. But few will be treating data in the same way.

That’s a change that is only just beginning. Part of the shift is in understanding how those assets can be used to solve problems. Or help them, their partners and customers to make more informed decisions.

Often that means sharing or opening that data so that others can use it. Making sure that data is at the right point of the data spectrum helps to unlock its value.

A sticking point for many organisations is that they begin to question why they should share or open those data assets, and whether others should contribute to their maintenance. There are many commons questions around the value of sharing, respecting privacy, logistics, etc.

I think a useful framing for this type of discussion might be to distinguish between data assets and data products.

A data asset is what an organisation is managing internally. It may be shared with a limited audience.

A data product is what you share with or open to a wider audience. Its created from one or more data assets. A data product may not contain all of the same data as the data assets it’s based on. Personal data might need to be removed or anonymised for example. This means a data product might sit at a different point in the data spectrum. It can be more open. I’m using data product here to refer to specific types of datasets, not “applications that have been made using data”

An asset is something you manage and invest in. A product is intended to address some specific needs. It may need some support or documentation to make sure it’s useful. It may also need to evolve based on changing needs.

In some cases a data asset could also be a data product. The complete dataset might be published in its entirety. In my experience this is often rarely the case though. There’s usually additional information, e.g governance and version history, that might not be useful to reusers.

In others cases data assets are collaboratively maintained, often in the open. Wikidata and OpenStreetMap are global data assets that are maintained in this way. There are many organisations that are using those assets to create more tailored data products that help to meet specific needs. Over time I expect more data assets will be managed in collaborative ways.

Obviously not every open data release needs to be a fully supported “product”. To meet transparency goals we often just need to get data published as soon as possible, with a minimum of friction for both publishers and users.

But when we are using data as tool to create other types of impact, more work is sometimes needed. There are often a number of social, legal and technical issues to consider in making data accessible in a sustainable way.

By injecting some product thinking into how we share and open data it might be helpful in addressing the types of problems that can contribute to data releases not having the desired impact: Why are we opening this data? Who will use it? How can we help them be more effective? Does releasing the data provide ways in which the data asset might be more collaboratively maintained?

When governments are publishing data that should be part of a national data infrastructure, more value will be unlocked if more of the underlying data assets are available for anyone to access, use and share. Releasing a “data product” that is too closely targeted might limit its utility.  So I also think this “data asset” vs “data product” distinction can help us to challenge the types data that are being released. Are we getting access to the most valuable data assets or useful subsets of them. Or are we just being given a data product that has much more limited applications, regardless of how well it is being published?

Posted at 11:56

November 24

Leigh Dodds: We CAN get there from here

On Wednesday, as part of the Autumn Budget, the Chancellor announced that the government will be creating a Geospatial Commission “to establish how to open up freely the OS MasterMap data to UK-based small businesses”. It will be supported by new funding of £80 million over two years. The Commission will be looking at a range of things including:

  • improving the access to, links between, and quality of their data
  • looking at making more geospatial data available for free and without restriction
  • setting regulation and policy in relation to geospatial data created by the public sector
  • holding individual bodies to account for delivery against the geospatial strategy
  • providing strategic oversight and direction across Whitehall and public bodies who operate in this area

That’s a big pot of money to get something done and a remit that ticks all of the right boxes. As the ODI blog post notes, it creates “the opportunity for national mapping agencies to adapt to a future where they become stewards for national mapping data infrastructure, making sure that data is available to meet the needs of everyone in the country”.

So, I’m really surprised that the many of the reactions from the open data community have been fairly negative. I understand the concerns that the end result might not be a completely open Mastermap. There are many, many ways in which this could end up with little or no change to the status quo. That’s certainly true if we ignore the opportunity to embed some change.

From my perspective, this is the biggest step towards a more open future for UK geospatial data since the first OS Open Data release in 2010. (I remember excitedly hitting the publish button to make their first Linked Data release publicly accessible)

Anyone who has been involved with open data in the UK will have encountered the Ordnance Survey licensing issues that are massively inhibiting both the release and use of open data in the UK. It’s a frustration of mine that these issues aren’t manifest in the various open data indexes.

In my opinion, anything that moves us forward from the current licensing position is to be welcomed. Yes, we all want a completely open MasterMap. That’s our shared goal. But how do we get there?

We’ve just seen the government task and resource itself to do something that can help us achieve that goal. It’s taken concerted effort by a number of people to get to this point. We should be focusing on what we all can do, right now, to help this process stay on track. Dismissing it as an already failed attempt isn’t helpful.

I think there’s a great deal that the community could do to engage with and support this process.

Here’s a few ideas of things of ways that we could inject some useful thinking into the process:

  • Can we pull together examples of where existing licensing restrictions are causing friction for UK businesses? Those of who us have been involved with open data have internalised many of these issues already, but we need to make sure they’re clearly understood by a wider audience
  • Can we do the same for local government data and services? There are loads of these too. Particularly compelling examples will be those that highlight where more open licensing can help improve local service delivery
  • Where could greater clarity around existing licensing arrangements help UK businesses, public sector and civil society organisations achieve greater impact? It often seems like some projects and local areas are able to achieve releases where others can’t.
  • Even if all of MasterMap were open tomorrow, it might still be difficult to access. No-one likes the current shopping cart model for accessing OS open data. What services would we expect from the OS and others that would make this data useful? I suspect this would go beyond “let me download some shapefiles”. We built some of these ideas into the OS Linked Data site. It still baffles me that you can’t find much OS data on the OS website.
  • If all of MasterMap isn’t made open, then which elements of it would unlock the most value? Are there specific layers or data types that could reduce friction in important application areas?
  • Similarly, how could the existing OS open data be improved to make it more useful? Hint: currently all of the data is generalised and doesn’t have any stable identifiers at all.
  • What could the OS and others do to support the rest of us in annotating and improving their data assets? The OS switched off its TOID lookup service because no-one was using it. It wasn’t very good. So what would we expect that type of identifier service to do?
  • If there is more openly licensed data available, then how could it be usefully added to OpenStreetMap and used by the ecosystem of open geospatial tools that it is supporting?
  • We all want access to MasterMap because its a rich resource. What are the options available to ensure that the Ordnance Survey stays resourced to a level where we can retain it as a national asset? Are there reasonable compromises to be made between opening all the data and them offering some commercial services around it?
  • …etc, etc, etc.

Personally, I’m choosing to be optimistic. Let’s get to work to create the result we want to see.

Posted at 20:39

November 23

Dublin Core Metadata Initiative: Webinar: Save the Children Resource Libraries

update: Branka Kosovac will join Joseph Busch in presenting this webinar DCMI is pleased to announce a new webinar: Save the Children Resource Libraries: Aligning Internal Technical Resource Libraries with a Public Distribution Website. Presented by Joseph Busch, Founder of Taxonomy Strategies and Branka Kosovac, Taxonomy Strategies associate and Principal of dotWit Consulting, the webinar will discuss a recent project which has established an internal library of technical resources at the international Save the Children charity.

Posted at 10:56

November 19

Bob DuCharme: SPARQL queries of Beatles recording sessions

Who played what when?

Posted at 15:40

October 29

Bob DuCharme: An HTML form trick to add some convenience to life

With a little JavaScript as needed.

Posted at 15:07

October 28

Leigh Dodds: The state of open licensing, 2017 edition

Let’s talk about open data licensing. Again.

Last year I wrote a post, the State of Open Licensing in which I gave a summary of the landscape as I saw it. A few recent developments mean that I think it’s worth posting an update.

But Leigh, I hear you cry, do people really care about licensing? Are you just fretting over needless details? We’re living in a post-open source world after all!

To which I would respond, if licensing doesn’t have real impacts, then why did the open source community recently go into meltdown about Facebook’s open source licences? And why have they recanted? There’s a difference between throwaway, unmaintained code and data, and resources that could and should be infrastructure.

The key points I make in my original post still stand: I think there is still a need to encourage convergence around licensing in order to reduce friction. But I’m concerned that we’re not moving in the right direction. Open Knowledge are doing some research around licensing and have also highlighted their concerns around current trends.

So what follows is a few observations from me looking at trends in a few different areas of open data practice.

Licensing of open government data

I don’t think much has changed with regards to open licenses for government data. The UK Open Government Licence (UK-OGL) still seems to be the starting point for creating bespoke national licences.

Looking through the open definition forum archives, the last government licence that was formally approved as open definition compliant was the Taiwan licence. Like the UK-OGL Version 3, the licence clearly indicates that it is compatible with the Creative Commons Attribution (CC-BY) 4.0 licence. The open data licence for Mexico makes a similar statement.

In short, you can take any data from the UK, Taiwan and Mexico and re-distribute it under a CC-BY 4.0 licence. Minimal friction.

I’d hoped that we could discourage governments from creating new licences. After all, if they’re compatible with CC-BY, then why go to the trouble?

But, chatting briefly about this with Ania Calderon this week, I’ve come to realise that the process of developing these licences is valuable, even if the end products end up being very similar. It encourages useful reflection on the relevant national laws and regulations, whilst also ensuring there is sufficient support and momentum behind adoption of the open data charter. They are as much as a statement of shared intent as a legal document.

The important thing is that national licences should always state compatibility with an existing licence. Ideally CC-BY 4.0. This removes all doubt when combining data collected from different national sources. This will be increasingly important as we strengthen our global data infrastructure.

Licensing of data from commercial publishers

Looking at how data is being published by commercial organisations, things are very mixed.

Within the OpenActive project we now have more than 20 commercial organisations publishing open data under a CC-BY 4.0 licence. Thomson Reuters are using CC-BY 4.0 as the core licence for its PermID product. And Syngenta are publishing their open data under a CC-BY-SA 4.0 licence. This is excellent. 10/10 would reuse again.

But in contrast, the UK Open Banking initiative has adopted a custom licence which has a number of limitations, which I’ve written about extensively. Despite feedback they’ve chosen to ignore concerns raised by the community.

Elsewhere the default is for publishers and platforms to use custom terms and conditions that create complexity for reusers. Or for lists of “open data” to have no clear licensing.

Licensing in the open data commons

It’s a similar situation in the broader open data commons.

In the research community CC0 licences have been recommended for some time and is the default on a number of research data archives. Promisingly the FigShare State of Open Data 2017 report (PDF) shows a growing awareness of open data amongst researchers, and a reduction in uncertainty around licensing. But there’s still lots of work to do. Julie McMurry of the (Re)usable Data Project notes that less than half of the databases they’ve indexed have a clear, findable licence.

While the CC-BY and CC-BY-SA 4.0 licences are seen to be the best practice default, a number of databases still rely on the Open Database Licence (ODbL). OpenStreetMap being the obvious example.

The OSM Licence Working Group has recently concluded that, pending a more detailed analysis, the Creative Commons licences are incompatible with the ODbL. They now recommend asking for specific permission and the completion of a waiver form before importing CC licenced open data into OSM. This is, of course, exactly the situation that open licensing is intended to avoid.

Obtaining 1:1 agreements is the opposite of friction-less data sharing.

And it’s not clear whose job it is to sort it out. I’m concerned that there’s no clear custodian for the ODbL or investment in its maintenance. Resolving issues of compatibility with the CC licences is clearly becoming more urgent. I think it needs an organisation or a consortia of interested parties to take this forward. It will need some legal advice and investment to resolve any issues. Taking no action doesn’t seem like a viable option to me.

Based on what I’ve seen summarised around previous discussions there seem to be some basic disagreements around the approaches taken to data licensing that have held up previous discussions. Creative Commons could take a lead on this, but so far they’ve not certified any third-party licences as compatible with their suite. All statements have been made the other way.

Despite the use by big projects like OSM, its really unclear to me what role the ODbL has longer term. Getting to a clear definition of compatibility would provide a potential way for existing users of the licence to transition at a future date.

Just to add to the fun, the Linux Foundation have thrown two new licences into the mix. There has been some discussion about this in the community and some feedback in these two articles in the Register. The second has some legal analysis: “I wouldn’t want to sign it“.

Adding more licences isn’t helpful. What would have been helpful would have been exploring compatibility issues amongst existing licences and investing in resolving them. But as their FAQ highlights, the Foundation explicitly chose to just create new licences rather than evaluate the current landscape.

I hope that the Linux Foundation can work with Creative Commons to develop a statement of compatibility, otherwise we’re in an even worse situation.

Some steps to encourage convergence

So how do we move forward?

My suggestions are:

  • No new licences! If you’re a government, you get a pass to create a national licence so long as you include a statement of compatibility with a Creative Commons licence
  • If your organisation has issues with the Creative Commons licences, then document and share them with the community. Then engage with the Creative Commons to explore creating revisions. Spend what you would have given your lawyers on helping the Creative Commons improve their licences. It’s a good test of how much you really do want to work in the open
  • If you’re developing a platform, require people to choose a licence or set a default. Choosing a licence can include “All Rights Reserved”. Let’s get some clarity
  • We need to invest further in developing guidance around data licensing.
  • Let’s sort out compatibility between the CC and ODbL licence suites
  • Let’s encourage Linux Foundation to do the same, and also ask them to submit their license to the licence approval process. This should be an obvious step for them as they’ve repeatedly highlighted the lessons to be learned from open source licensing, which go through a similar process.

I think these are all useful steps forward. What would you add to the list? What organisations can help drive this forward?

Note that I’m glossing over a set of more nuanced issues which are worthy of further, future discussion. For example whether licensing is always the right protection, or when “situated openness” may be the best approach towards building trust with communities. Or whether the two completely different licensing schemes for Wikidata and OSM will be a source of friction longer term or are simply necessary to ensure their sustainability.

For now though, I think I’ll stick with the following as my licensing recommendations:

 

Posted at 12:42

Copyright of the postings is owned by the original blog authors. Contact us.