Planet RDF

It's triples all the way down

May 19

Libby Miller: Shonkbot

It was the journey home from MakerFaire UK, and

Posted at 21:18

May 17

Ebiquity research group UMBC: talk: Amit Sheth on Transforming Big data into Smart Data, 11a Tue 5/26

Transforming big data into smart data:
deriving value via harnessing volume, variety
and velocity using semantics and semantic web

Professor Amit Sheth
Wright State University

11:00am Tuesday, 26 May 2015, ITE 325, UMBC

Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. In this talk, I will describe Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If my child is an asthma patient, for all the data relevant to my child with the four V-challenges, what I care about is simply, "How is her current health, and what are the risk of having an asthma attack in her current situation (now and today), especially if that risk has changed?" As I will show, Smart Data that gives such personalized and actionable information will need to utilize multimodal data and their metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on Machine Learning and NLP. I will motivate the need for a synergistic combination of techniques similar to the close interworking of the top brain and the bottom brain in the cognitive models. I will present a couple of Smart Data applications in development at Kno.e.sis from the domains of personalized health, health informatics, social data for social good, energy, disaster response, and smart city.

Amit Sheth is an Educator, Researcher and Entrepreneur. He is the LexisNexis Ohio Eminent Scholar, an IEEE Fellow, and the executive director of Kno.e.sis – the Ohio Center of Excellence in Knowledge-enabled Computing a Wright State University. In World Wide Web (WWW), it is placed among the top ten universities in the world based on 10-year impact. Prof. Sheth is a well cited computer scientists (h-index = 87, >30,000 citations), and appears among top 1-3 authors in World Wide Web (Microsoft Academic Search). He has founded two companies, and several commercial products and deployed systems have resulted from his research. His students are exceptionally successful; ten out of 18 past PhD students have 1,000+ citations each.

Host: Yelena Yesha, yeyesha2umbc.edu

Posted at 19:43

AKSW Group - University of Leipzig: AKSW Colloquium, 18-05-2015, Multilingual Morpheme Ontology, Personalised Access and Enrichment of Linked Data Resources

MMoOn – A Multilingual Morpheme Ontology by Bettina Klimek

BettinaIn the last years a rapid emergence of lexical resources evolved in the Semantic Web. Whereas most of the linguistic information is already machine-readable, we found that morphological information is either absent or only contained in semi-structured strings. While a plethora of linguistic resources for the lexical domain already exist and are highly reused, there is still a great gap for equivalent morphological datasets and ontologies. In order to enable the capturing of the semantics of expressions beneath the word-level, I will present a Multilingual Morpheme Ontology called MMoOn. It is designed for the creation of machine-processable and interoperable morpheme inventories of a given natural language. As such, any MMoOn dataset will contain not only semantic information of whole words and word-forms but also information on the meaningful parts of which they consist, including inflectional and derivational affixes, stems and bases as well as a wide range of their underlying meanings.

Personalised Access and Enrichment of Linked Data Resources by Milan Dojchinovski

MilanRecent efforts in the Semantic Web community have been primarily focused at developing technical infrastructure and methods for efficient Linked Data acquisition, interlinking and publishing. Nevertheless, the actual access to a piece of information in the LOD cloud still demands significant amount of effort. In the recent years, we have conducted two lines of research to address this problem. The first line of research aims at developing graph based methods for “personalised access to Linked Data”. A key contribution of this research is the ”Linked Web APIs” dataset, the largest Web services dataset with over 11K service descriptions, which has been used as a validation dataset. The second line of research has aimed at enrichment of Linked Data text resources and development of “entity recognition and linking” methods. In the talk, I will present the developed methods and the results from the evaluation on a different datasets and evaluation challenges, and the lessons learned in this activities. I will discuss the adaptability, performance of the developed methods and present the future directions.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 10:24

May 15

Dublin Core Metadata Initiative: DC-2015 registration is now open

2015-05-15, Online registration for DC-2015 is now open at http://dcevents.dublincore.org/IntConf/index/pages/view/reg15. The conference and DCMI Annual Meeting is scheduled for 1-4 September in São Paulo, Brazil. This year's theme is "Metadata and Ubiquitous Access to Culture, Science and Digital Humanities". The need for structured metadata to support ubiquitous access across the Web to the treasure troves of resources spanning cultures, in science, and in the digital humanities is now common knowledge among information systems designers and implementers. Structured metadata expressed through languages of description make it possible for us to 'speak' about the contents of our treasure troves. But, like all human languages, our languages of description both enable and isolate. The push to break out of the isolation of the metadata silos in which professionals inevitably design, implement and manage metadata in order to discover the intersections of our treasure troves drives much of today's discourse and emerging practice in metadata. The meeting in São Paulo is intended to advance the metadata discourse and practice behind this push to intersect. More information about the conference can be found at http://purl.org/dcevents/dc-2015.

Posted at 23:59

May 14

Orri Erling: SNB Interactive, Part 2 - Modeling Choices

SNB Interactive is the wild frontier, with very few rules. This is necessary, among other reasons, because there is no standard property graph data model, and because the contestants support a broad mix of programming models, ranging from in-process APIs to declarative query.

In the case of Virtuoso, we have played with SQL and SPARQL implementations. For a fixed schema and well known workload, SQL will always win. The reason is that SQL allows materialization of multi-part indices and data orderings that make sense for the application. In other words, there is transparency into physical design. An RDF/SPARQL-based application may also have physical design by means of structure-aware storage, but this is more complex and here we are just concerned with speed and having things work precisely as we intend.

Schema Design

SNB has a regular schema described by a UML diagram. This has a number of relationships, of which some have attributes. There are no heterogenous sets, i.e., no need for run-time typed attributes or graph edges with the same label but heterogenous end-points. Translation into SQL or SPARQL is straightforward. Edges with attributes (e.g., the foaf:knows relation between people) would end up represented as a subject with the end points and the effective date as properties. The relational implementation has a two-part primary key and the effective date as a dependent column. A native property graph database would use an edge with an extra property for this, as such are typically supported.

The only table-level choice has to do with whether posts and comments are kept in the same or different data structures. The Virtuoso schema uses a single table for both, with nullable columns for the properties that occur only in one. This makes the queries more concise. There are cases where only non-reply posts of a given author are accessed. This is supported by having two author foreign key columns each with its own index. There is a single nullable foreign key from the reply to the post/comment being replied to.

The workload has some frequent access paths that need to be supported by index. Some queries reward placing extra columns in indices. For example, a common pattern is accessing the most recent posts of an author or a group of authors. There, having a composite key of ps_creatorid, ps_creationdate, ps_postid pays off since the top-k on creationdate can be pushed down into the index without needing a reference to the table.

The implementation is free to choose data types for attributes, particularly datetimes. The Virtuoso implementation adopts the practice of the Sparksee and Neo4j implementations and represents this is a count of milliseconds since epoch. This is less confusing, faster to compare, and more compact than a native datetime datatype that may or may not have timezones, etc. Using a built-in datetime seems to be nearly always a bad idea. A dimension table or a number for a time dimension avoids the ambiguities of a calendar or at least makes these explicit.

The benchmark allows procedurally maintained materializations of intermediate results for use by queries as long as these are maintained transaction-by-transaction. For example, each person could have the 20 newest posts by their immediate contacts precomputed. This would reduce Q2 "top of the wall" to a single lookup. This does not however appear to be worthwhile. The Virtuoso implementation does do one such materialization for Q14: A connection weight is calculated for every pair of persons that know each other. This is related to the count of replies by either to content generated by the other. If there does not exist a single reply in either direction, the weight is taken to be 0. This weight is precomputed after bulk load and subsequently maintained each time a reply is added. The table for this is the only row-wise structure in the schema and represents a half-matrix of connected people, i.e., person1, person2 -> weight. Person1 is by convention the one with the smaller p_personid. Note that comparing IDs in this way is useful but not normally supported by SPARQL/RDF systems. SPARQL would end up comparing strings of URIs with disastrous performance implications unless an implementation-specific trick were used.

In the next installment, we will analyze an actual run.

Posted at 15:37

Orri Erling: SNB Interactive, Part 1 - What is SNB Interactive Really About?

This is the first in a series of blog posts analyzing the Interactive workload of the LDBC Social Network Benchmark. This is written from the dual perspective of participating in the benchmark design, and of building the OpenLink Virtuoso implementation of same.

With two implementations of SNB Interactive at four different scales, we can take a first look at what the benchmark is really about. The hallmark of a benchmark implementation is that its performance characteristics are understood; even if these do not represent the maximum of the attainable, there are no glaring mistakes; and the implementation represents a reasonable best effort by those who ought to know such, namely the system vendors.

The essence of a benchmark is a set of trick questions or "choke points," as LDBC calls them. A number of these were planned from the start. It is then the role of experience to tell whether addressing these is really the key to winning the race. Unforeseen ones will also surface.

So far, we see that SNB confronts the implementor with choices in the following areas:

  • Data model — Tabular relational (commonly known as SQL), graph relational (including RDF), property graph, etc.

  • Physical storage model — Row-wise vs. column-wise, for instance.

  • Ordering of materialized data — Sorted projections, composite keys, replicating columns in auxiliary data structures, etc.

  • Persistence of intermediate results —  Materialized views, triggers, precomputed temporary tables, etc.

  • Query optimization — join order/type, interesting physical data orderings, late projection, top k, etc.

  • Parameters vs. literals — Sometimes different parameter values result in different optimal query plans.

  • Predictable, uniform latency — Measurement rules stipulate the the SUT (system under test) must not fall behind the simulated workload.

  • Durability — How to make data durable while maintaining steady throughput, e.g., logging, checkpointing, etc.

In the process of making a benchmark implementation, one naturally encounters questions about the validity, reasonability, and rationale of the benchmark definition itself. Additionally, even though the benchmark might not directly measure certain aspects of a system, making an implementation will take a system past its usual envelope and highlight some operational aspects.

  • Data generation — Generating a mid-size dataset takes time, e.g., 8 hours for 300G. In a cloud situation, keeping the dataset in S3 or similar is necessary; re-generating every time is not an option.

  • Query mix — Are the relative frequencies of the operations reasonable? What bias does this introduce?

  • Uniformity of parameters — Due to non-uniform data distributions in the dataset, there is easily a 100x difference between "fast" and "slow" cases of a single query template. How long does one need to run to balance these fluctuations?

  • Working set — Experience shows that there is a large difference between almost-warm and steady-state of working set. This can be a factor of 1.5 in throughput.

  • Reasonability of latency constraints — In the present case, a qualifying run must have no more than 5% of all query executions starting over 1 second late. Each execution is scheduled beforehand and done at the intended time. If the SUT does not keep up, it will have all available threads busy and must finish some work before accepting new work, so some queries will start late. Is this a good criterion for measuring consistency of response time? There are some obvious possibilities for abuse.

  • Ease of benchmark implementation/execution — Perfection is open-ended and optimization possibilities infinite, albeit with diminishing returns. Still, getting started should not be too hard. Since systems will be highly diverse, testing that these in fact do the same thing is important. The SNB validation suite is good for this and, given publicly available reference implementations, the effort of getting started is not unreasonable.

  • Ease of adjustment — Since a qualifying run must meet latency constraints while going as fast as possible, setting the performance target involves trial and error. Does the tooling make this easy?

  • Reasonability of durability rule — Right now, one is not required to do checkpoints but must report the time to roll forward from the last checkpoint or initial state. Inspiring vendors to build faster recovery is certainly good, but we are not through with all the implications. What about redundant clusters?

The following posts will look at the above in light of actual experience.

Posted at 15:37

May 13

schema.org:

Schema.org 2.0


We are pleased to announce the public release of Schema.org 2.0 which brings several significant changes and additions, not just to the vocabulary, but also to how we grow and manage it, from both technical and governance perspectives.


As schema.org adoption has grown, a number groups with more specialized vocabularies have expressed interest in extending schema.org with their terms. Examples of this include real estate, product, finance, medical and bibliographic information. Even in something as common as human names, there are groups interested in creating the vocabulary for representing all the intricacies of names. Groups that have a special interest in one of these topics often need a level of specificity in the vocabulary and operational independence. We are introducing a new extension mechanism which we hope will enable these and many other groups to extend schema.org.

Over the years, Schema.org has taken steps towards become more open. Today, there is more community participation than ever before. The newly formed W3C Schema.org Community Group is now the main forum for schema collaboration, and provides the public-schemaorg@w3.org mailing list for discussions. Schema.org issues are tracked on GitHub. The day to day operations of Schema.org, including decisions regarding the schema, are handled by a newly formed steering group, which includes representatives of the sponsor companies, the W3C and some individuals who have contributed substantially to Schema.org. Discussions of the steering group are public.

Schema.org is a ‘living’ spec that is constantly evolving. Sometimes this evolution can be an issue, such as when other standards groups want to refer to it. So, from this release on, we will be providing snapshots of the entire vocabulary.


And of course, we cannot have a major release without new vocabulary. In this version, we introduce vocabulary for Autos. This represents considerable work by Martin Hepp, Mirek Sopek, Karol Szczepanski and others in the automotive-ontology.org community. In addition, this version also includes a lot of cleanup. A special thanks to Vicki Holland and Dan Brickley for driving this effort.


Over the last four years Schema.org has gotten adoption beyond our wildest expectations. We are deeply grateful to the webmaster and developer communities for this. We will continue working hard to earn your trust.

Guha

Posted at 21:09

May 12

Cambridge Semantics: The Perfect Storm for Data

Mike Atkin of the EDM Council speaks eloquently about the "perfect storm" for data in Financial Services. Two converging forces, regulatory reporting requirements and the need for customer insight, are placing unprecedented demands on the data infrastructure in most financial institutions.

Posted at 18:56

Ebiquity research group UMBC: Clare Grasso: Information Extraction from Dirty Notes for Clinical Decision Support

Information Extraction from Dirty Notes
for Clinical Decision Support

Clare Grasso

10:00am Tuesday, 12 May 2015, ITE346

The term clinical decision support refers broadly to providing clinicians or patients with computer-generated clinical knowledge and patient-related information, intelligently filtered or presented at appropriate times, to enhance patient care. It is estimated that at least 50% of the clinical information describing a patient’s current condition and stage of therapy resides in the free-form text portions of the Electronic Health Record (EHR). Both linguistic and statistical natural language processing (NLP) models assume the presence of a formal underlying grammar in the text. Yet, clinical notes are often times filled with overloaded and nonstandard abbreviations, sentence fragments, and creative punctuation that make it difficult for grammar-based NLP systems to work effectively. This research focuses on investigating scalable machine learning and semantic techniques that do not rely on an underlying grammar to extract medical concepts in the text in order to apply them in CDS on commodity hardware and software systems. Additionally, by packaging the extracted data within a semantic knowledge representation, the facts can be combined with other semantically encoded facts and reasoned over to help to inform clinicians in their decision making.

Posted at 01:08

May 10

AKSW Group - University of Leipzig: AKSW Colloquium, 11-05-2015, DBpedia distributed extraction framework

Scaling up the DBpedia extraction framework by Nilesh Chakraborty

NileshThe DBpedia extraction framework extracts different kinds of structured information from Wikipedia to generate various datasets. Performing a full extraction of Wikipedia dumps of all languages (or even just the mapping-based languages) takes a significant amount of time. The distributed extraction framework runs the extraction on top of Apache Spark so that users can leverage multi-core machines or a distributed cluster of commodity machines to perform faster extraction. For example, performing extraction of the 30-40 mapping based languages on a machine with a quad-core CPU and 16G RAM takes about 36 hours. Running the distributed framework in the same setting using three such worker nodes takes around 10 hours. It’s easy to achieve faster running times by adding more cores or more machines. Apart from the Spark-based extraction framework, we have also implemented a distributed wiki-dump downloader to download Wikipedia dumps for multiple languages, from multiple mirrors, on a cluster in parallel. This is still a work in progress, and in this talk I will discuss the methods and challenges involved in this project, and our immediate goals and timeline.

Posted at 16:36

May 08

AKSW Group - University of Leipzig: Invited talk @AIMS webinar series

On 5th of May Ivan Ermilov on behalf of AKSW presented CKAN data catalog as a part of AIMS (Agricultural Information Management Standards) webinar series. The recording and the slides of the webinar “CKAN as an open-source data management solution for open data” are available on the AIMS web portal: http://aims.fao.org/capacity-development/webinars/ckan-open-source-data-management-solution-open-data

AIMS organizes free and open to everyone webinars on various topics. You can find more recordings and material on AIMS webpage, YouTube channel and Slideshare:

Main page of Webinars@AIMS: http://aims.fao.org/capacity-development/webinars

YouTube: http://www.youtube.com/user/FAOAIMSVideos

Slideshare: http://www.slideshare.net/faoaims/ckan-as-an-opensource-data-management-solution-for-open-data

Posted at 10:47

May 07

Cambridge Semantics: Understanding Smart Data Integration in just 2 minutes

Data integration projects can be time consuming, expensive and difficult to manage.Traditional data integration methods require point to point mapping of source and target systems. This effort typically requires a team of both business SMEs and technology professionals. These mappings are time consuming to create and code and errors in the ETL (Extract, Transform, and Load) process require iterative cycles through the process.

Posted at 20:11

May 06

Dydra: Collation Sequences in SPARQL

The SPARQL query language is relatively silent about how to order strings. When the question was posed to us a while back, what to expect as the order of a solution sequence which contained string literals with language tags, we had just the conservative answer that the relation among simple or string literals and plain literals was undefined. This was not a nice situation.

Even though RDF 1.1 ratifies the type rdf:langString, it defines no relation beyond equality, which leaves plfn:compare to apply but requires some context where it is possible to determine the collation sequence. This situation is not quite as unpleasant, but still not satisfactory. Fortunately ‘undefined’ leaves latitude for improvement, by definition.

Posted at 12:06

May 05

Semantic Web Company (Austria): Thoughts on KOS (Part 3): Trends in knowledge organization

The accelerating pace of change in the economic, legal and social environment combined with tendencies towards increased decentralization of organizational structures have had a profound impact on the way we organize and utilize and organize knowledge. The internet as we know it today and especially the World Wide Web as the multimodal interface for the presentation and consumption of multimedia information are the most prominent examples of these developments. To illustrate the impact of new communication technologies on information practices Saumure & Shiri (2008) conducted a survey on knowledge organization trends in the Library and Information Sciences before and after the emergence of the World Wide Web. Table 1 shows their results.

kos trends

 

 

 

 

 

 

 

The survey illustrates three major trends: 1) the spectrum of research areas has broadened significantly from originally complex and expert-driven methodologies and systems to more light-weight, application-oriented approaches; 2) while certain research areas have kept their status over the years (i.e. Cataloguing & Classification or Machine Assisted Knowledge Organization), new areas of research have gained importance (i.e. Metadata Applications & Uses, Classifying Web Information, Interoperability Issues) while formerly prevalent topics like Cognitive Models or Indexing have declined in importance or dissolved into other areas; and 3) the quantity of papers that are explicitly and implicitly dealing with metadata issues have significantly increased.

These insights coincide with a survey conducted by The Economist (2010) that comes to the conclusion that metadata has become a key enabler in the creation of controllable and exploitable information ecosystems under highly networked circumstances. Metadata provide information about data, objects and concepts. This information can be descriptive, structural or administrative. Metadata adds value to data sets by providing structure (i.e. schemas) and increasing the expressivity (i.e. controlled vocabularies) of a dataset.

According to Weibel & Lagoze (1997, p. 177):

“[the] association of standardized descriptive metadata with networked objects has the potential for substantially improving resource discovery capabilities by enabling field-based (e.g., author, title) searches, permitting indexing of non-textual objects, and allowing access to the surrogate content that is distinct from access to the content of the resource itself.”

These trends influence the functional requirements of the next generation’s Knowledge Organization Systems (KOSs) as a support infrastructure for knowledge sharing and knowledge creation under conditions of distributed intelligence and competence.

Go to previous posts in this series:
Thoughts on KOS (Part1): Getting to grips with “semantic” interoperability or
Thoughts on KOS (Part 2): Classifying Knowledge Organisation Systems

 

References

Saumure, Kristie; Shiri, Ali (2008). Knowledge organization trends in library and information studies: a preliminary comparison of pre- and post-web eras. In: Journal of Information Science, 34/5, 2008, pp. 651–666

The Economist (2010). Data, data everywhere. A special report on managing information. http://www.emc.com/collateral/analyst-reports/ar-the-economist-data-data-everywhere.pdf, accessed 2013-03-10

Weibel, S. L., & Lagoze, C. (1997). An element set to support resource discovery. In: International Journal on Digital Libraries, 1/2, pp. 176-187

Posted at 15:13

May 04

Dublin Core Metadata Initiative: DCMI Webinar: Digital Preservation Metadata and Improvements to PREMIS in Version 3.0

2015-05-04, This webinar with Angela Dappert on 27 May gives a brief overview of why digital preservation metadata is needed, shows examples of digital preservation metadata, shows how PREMIS can be used to capture this metadata, and illustrates some of the changes that will be available in version 3.0. The PREMIS Data Dictionary for Preservation Metadata is the international standard for metadata to support the preservation of digital objects and ensure their long-term usability. Developed by an international team of experts, PREMIS is implemented in digital preservation projects around the world, and support for PREMIS is incorporated into a number of commercial and open-source digital preservation tools and systems. The PREMIS Editorial Committee coordinates revisions and implementation of the standard, which consists of the Data Dictionary, an XML schema, and supporting documentation. The PREMIS Data Dictionary is currently in version 2.2. A new major release 3.0 is due out this summer. More information and registration is available at http://dublincore.org/resources/training/#2015dappert.

Posted at 23:59

May 03

Bob DuCharme: SPARQL: the video

Well, a video, but a lot of important SPARQL basics in a short period of time.

Posted at 21:15

AKSW Group - University of Leipzig: AKSW Colloquium, 04-05-2015, Automating RDF Dataset Transformation and Enrichment, Structured Machine Learning in Life Science

Automating RDF Dataset Transformation and Enrichment by Mohamed Sherif

Mohamed SherifWith the adoption of RDF across several domains, come growing requirements pertaining to the completeness and quality of RDF datasets. Currently, this problem is most commonly addressed by manually devising means of enriching an input dataset. The few tools that aim at supporting this endeavour usually focus on supporting the manual definition of enrichment pipelines. In this talk, we present a supervised learning approach based on a refinement operator for enriching RDF datasets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against eight manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples.

Structured Machine Learning in Life Science (PhD progress report) by Patrick Westphal

Patrick WestphalThe utilization of machine learning techniques to solve life science tasks has become widespread within the last years. Mainly working on unstructured data one question is whether such techniques could benefit from the provision of structured background knowledge. One prevalent way to express background knowledge in the life sciences is the Web Ontology Language (OWL). Accordingly there is a great variety of different domain ontologies covering anatomy, genetics, biological processes or chemistry that can be used to form structured machine learning approaches in the life science domain. The talk will give a brief overview of tasks and problems of structured machine learning in life science. Besides the special characteristics observed when applying the state-of-the-art concept learning approaches to life science tasks, a short description of the actual differences to concept learning setups in other domains is given. Further, some directions for machine learning based techniques are shown that could support concept learning in life science tasks.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

Posted at 19:50

April 30

Tetherless World Constellation group RPI: DCO-DS participation at Research Data Alliance Plenary 5 meeting

In early March I attended the Research Data Alliance Fifth Plenary and “Adoption Day” event to present our plans for adopting DataTypes and Persistent Identifier Types in the DCO Data Portal. This was the first plenary following the publishing of the data type and persistent identifer type outputs and the RDA community was interested in seeing how early adopters were faring.

At the Adoption Day event I gave a short presentation on our plan for representing DataTypes in the DCO Data Portal knowledge base. Most of the other adopter presentations were limited to organizational requirements or high-level architecture around data types or persistent identifiers – our presentation stood out because we presented details on ‘how’ we intended to implement RDA outputs rather than just ‘why’. I think our attention on technical details was appreciated; from listening to the presentations it did not sound like many other groups were very far into their adoption process.

My main takeaways from the conference were the following:
– we are ahead of the curve on adopting the RDA data type and persistent identifier outputs
– we are viewed as leaders on how to implement data types; people are paying attention to what we are doing
– the chair of the DataType WG was very happy that we were thinking of how data types made sense within the context of our existing infrastructure rather than looking to the WGs reference implementation as the sole way to implement the output
– the DataType WG reference repository is more proof-of-concept then production system
– The data type community is interested in the topic of federating repositories but is not ready to do much on that yet

Overall I think we are well positioned to be a leader on data types. Our work to-date was very well received and many members involved in the DataType WG will be very interested in what more we have to show next September at the Sixth Plenary.

Good work team and let’s keep up the good work!

Posted at 15:20

April 28

Dydra: Extended Temporal Datatype Support

Dydra recently extended its native support to cover the full XSD temporal data type complement. This means not only that our platform continues to provide compact native representations for the core data types:

  • xsd:date
  • xsd:dateTime
  • xsd:time

where each term requires one cell – that is, eight bytes, but, in addition, SPOCQ, Dydra’s SPARQL processor, implements internal data classes for the remaining types:

  • xsd:dayTimeDuration
  • xsd:yearMonthDuration
  • xsd:gDay
  • xsd:gMonth
  • xsd:gMonthDay
  • xsd:gYear
  • xsd:gYearMonth

Posted at 16:05

Ebiquity research group UMBC: Ankur Padia on Ontology Learning, 10am ITE346

In this weeks ebiquity lab meeting, Ankur Padia will talk about ontology learning and the work he did for his MS thesis at 10:00am in ITE 346 at UMBC.

Ontology Learning

Ankur Padia

10:00am Tuesday, Apr. 28, 2015, ITE 346

Ontology Learning has been the subject of intensive study for the past decade. Researchers in this field have been motivated by the possibility of automatically building a knowledge base on top of text documents so as to support reasoning based knowledge extraction. While most works in this field have been primarily statistical (known as light-weight Ontology Learning) not much attempt has been made in axiomatic Ontology Learning (called Formal Ontology Learning) from Natural Language text documents. Presentation will focus on the relationship between Description Logic and Natural Language (limited to IS-A) for Formal Ontology Learning.

Posted at 04:21

April 27

Semantic Web Company (Austria): SWC’s Semantic Event Recommendations

Semantic business conference

Just a couple of years ago critics argued that the semantic approach in IT wouldn’t make the transformation from an inspiring academic discipline to a relevant business application. They were wrong! With the digitalization of business, the power of semantic solutions to handle Big Data became obvious.

Thanks to a dedicated global community of semantic technology experts, we can observe a rapid development of software solutions in this field. The progress is coupled to a fast growing number of corporations that are implementing semantic solutions to win insights from existing but unused data.

Knowledge transfer is extremely important in semantics. Let`s have a look on the community calendar for the upcoming months. We are looking forward to share our experiences and learn. Join us!

                               

                                >> Semantics technology event calendar

 

Posted at 13:51

Semantic Web Company (Austria): SWC’s Semantic Event Recommendations

Semantic business conference

Just a couple of years ago critics argued that the semantic approach in IT wouldn’t make the transformation from an inspiring academic discipline to a relevant business application. They were wrong! With the digitalization of business, the power of semantic solutions to handle Big Data became obvious.

Thanks to a dedicated global community of semantic technology experts, we can observe a rapid development of software solutions in this field. The progress is coupled to a fast growing number of corporations that are implementing semantic solutions to win insights from existing but unused data.

Knowledge transfer is extremely important in semantics. Let`s have a look on the community calendar for the upcoming months. We are looking forward to share our experiences and learn. Join us!

 

>> Semantics technology event calendar

 

Posted at 13:51

Semantic Web Company (Austria): SWC’s Semantic Event Recommendations

Semantic business conference

Just a couple of years ago critics argued that the semantic approach in IT wouldn’t make the transformation from an inspiring academic discipline to a relevant business application. They were wrong! With the digitalization of business, the power of semantic solutions to handle Big Data became obvious.

Thanks to a dedicated global community of semantic technology experts, we can observe a rapid development of software solutions in this field. The progress is coupled to a fast growing number of corporations that are implementing semantic solutions to win insights from existing but unused data.

Knowledge transfer is extremely important in semantics. Let`s have a look on the community calendar for the upcoming months. We are looking forward to share our experiences and learn. Join us!

 

>> Semantics technology event calendar

 

Posted at 13:51

Libby Miller: Guest blog: David Miller – A solar-powered glitter ball rotator

My Dad’s always had an interest in tinkering with electronics and the like. Recently he made an interesting thing so I asked him to write it up, and here it is:

A few weeks ago my wife and I were invited to join four friends for Sunday lunch. Our host has a south-facing dining room with a glitter ball sitting in the window. We were all entertained by the light beams gradually moving around the walls and ceiling. I suggested that a small solar cell and motor attached to the glitterball might improve the entertainment.

I bought these parts from Maplin soon after at a cost of £3.60. I glued the motor and cell to a spare ruler to which I also glued a strong copper wire to attach all to the ceiling. I have this running in our garden room.

Posted at 08:48

April 25

Ebiquity research group UMBC: PhD defense: Semantic Resolution Framework for Integrating Manufacturing Service Capability Data

Ph.D. Dissertation Defense

A Semantic Resolution Framework for Integrating
Manufacturing Service Capability Data

Yan Kang

10:00am Monday 27 April 2015, ITE 217b

Building flexible manufacturing supply chains requires availability of interoperable and accurate manufacturing service capability (MSC) information of all supply chain participants. Today, MSC information, which is typically published either on the supplier’s web site or registered at an e-marketplace portal, has been shown to fall short of interoperability and accuracy requirements. The issue of interoperability can be addressed by annotating the MSC information using shared ontologies. However, this ontology-based approach faces three main challenges: (1) lack of an effective way to automatically extract a large volume of MSC instance data hidden in the web sites of manufacturers that need to be annotated; (2) difficulties in accurately identifying semantics of these extracted data and resolving semantic heterogeneities among individual sources of these data while integrating them under shared formal ontologies; (3) difficulties in the adoption of ontology-based approaches by the supply chain managers and users because of their unfamiliarity with the syntax and semantics of formal ontology languages such as the web ontology language (OWL).

The objective of our research is to address the main challenges of ontology-based approaches by developing an innovative approach that is able to extract MSC instances from a broad range of manufacturing web sites that may present MSC instances in various ways, accurately annotate MSC instances with formal defined semantics on a large scale, and integrate these annotated MSC instances into formal manufacturing domain ontologies to facilitate the formation of supply chains of manufacturers. To achieve this objective, we propose a semantic resolution framework (SRF) that consists of three main components: a MSC instance extractor, a MSC Instance annotator and a semantic resolution knowledge base. The instance extractor builds a local semantic model that we call instance description model (IDM) for each target manufacturer web site. The innovative aspect of the IDM is that it captures the intended structure of the target web site and associates each extracted MSC instance with a context that describes possible semantics of that instance. The instance annotator starts the semantic resolution by identifying the most appropriate class from a (or a set of) manufacturing domain ontology (or ontologies) (MDO) to annotate each instance based on the mappings established between the context of that instance and the vocabularies (i.e., classes and properties) defined in the MDO. The primary goal of the semantic resolution knowledge base (SR-KB) is to resolve semantic heterogeneity that may occur in the instance annotation process and thus improve the accuracy of the annotated MSC instances. The experimental results demonstrate that the instance extractor and the instance annotator can effectively discover and annotate MSC instances while the SR-KB is able to improve both precision and recall of annotated instances and reducing human involvement along with the evolution of the knowledge base.

Committee: Drs. Yun Peng (Chair), Tim Finin, Yaacov Yesha, Matthew Schmill and Boonserm Kulvatunyou

Posted at 17:57

April 23

AKSW Group - University of Leipzig: AKSW Colloquium, 27-04-2015, Ontotext’s RDF database-as-a-service (DBaaS) via Self-Service Semantic Suite (S4) platform via & Knowledge-Based Trust

This colloquium features two talks. First the Self-Service Semantic Suite (S4) platform is presented by Marin Dimitrov (Ontotext), followed up by Jörg Unbehauens report on Googles effort on using factual correctness as a ranking factor.

RDF database-as-a-service (DBaaS) via Self-Service Semantic Suite (S4) platform

In this talk Marin Dimitrov (Ontotext) will introduce the RDFdatabase-as-a-service (DBaaS) options for managing RDF data in the Cloud via the Self-Service Semantic Suite (S4) platform. With S4 developers and researchers can instantly get access to fully managed RDF DBaaS, without the need for hardware provisioning, maintenance and operations. Additionally, the S4 platform provides on-demand access to text analytics services for news, social media and life sciences, as well as access to knowledge graphs (DBpedia, Freebase and GeoNames).

The goal of the S4 platform is to make it easy for developers and researchers to develop smart/semantic applications, without the need to spend time and effort on infrastructure provisioning and maintenance. Marin will also provide examples of EC funded research projects – DaPaaS, ProDataMarket and KConnect — that plan to utilise the S4 platform for semantic data management

More information on S4 will be available in [1][2] and [3]

[1] Marin Dimitrov, Alex Simov and Yavor Petkov.  On-demand Text Analytics and Metadata Management with S4. In: proceedings of Workshop on Emerging Software as a Service and Analytics (ESaaSA 2015) at the 5th International Conference on Cloud Computing and Services Science (CLOSER 2015), Lisbon, Portugal.

[2] Marin Dimitrov, Alex Simov and Yavor Petkov. Text Analytics and Linked Data Management As-a-Service with S4. In: proceedings of 3rd International Workshop on Semantic Web Enterprise Adoption and Best Practice (WaSABi 2015) part of the Extended Semantic Web Conference (ESWC 2015), May 31st 2015, Portoroz, Slovenia

[3] Marin Dimitrov, Alex Simov and Yavor Petkov. Low-cost Open Data As-a-Service in the Cloud. In: proceedings of 2nd Semantic Web Enterprise Developers Workshop (SemDev 2015) part of the Extended Semantic Web Conference (ESWC 2015), May 31st 2015, Portoroz, Slovenia

Report on: “Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources”

by Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, Wei Zhang

Link to the paper

Presentation by Jörg Unbehauen

Abstract:

“The quality of web sources has been traditionally evaluated using exogenous signals such as the hyperlink structure of the graph. We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy. The facts are automatically extracted from each source by information extraction methods commonly used to construct knowledge bases. We propose a way to distinguish errors made in the extraction process from factual errors in the web source per se, by using joint inference in a novel multi-layer probabilistic model. We call the trustworthiness score we computed Knowledge-Based Trust (KBT). On synthetic data, we show that our method can reliably compute the true trustworthiness levels of the sources. We then apply it to a database of 2.8B facts extracted from the web, and thereby estimate the trustworthiness of 119M webpages. Manual evaluation of a subset of the results confirms the effectiveness of the method. “

Posted at 15:21

Copyright of the postings is owned by the original blog authors. Contact us.