Planet RDF

It's triples all the way down

July 02

Frederick Giasson: Release of structWSF, conStruct and the Community Web Site

The last few months have been challenging in term of amount of work to get done, in focusing on deliverables and in getting ready for the release of conStruct and structWSF sources codes, documentations, tutorials, web sites and demos.

I am now really happy to be able to finally announce the release of both software code sources along with a new development community website where users and developers can exchange ideas about these two news projects.

The biggest milestone of the last months is now behind us. However, this is just the beginning of everything!

I think that many things have been written about these two projects already. I don�t want to write any tutorial at this point. So the only thing I will do right now is to point you the more relevant documentation, web sites, blog posts and demos about each project. The next step will be to write about specific use cases, features, etc.

 

Community Web Site

The community Web site is a place where developers and users of structWSF and conStruct can meet to talk about both projects, to report bugs and issues, to submit new enhancements, to find tips and tricks, etc.

I would suggest you to create a new user profile on the community Web site if you are interested in communicating with other members.

structWSF

structWSF is a platform-independent Web services framework for accessing and exposing structured RDF data. Its central organizing perspective is that of the dataset. These datasets contain instance records, with the structural relationships amongst the data and their attributes and concepts defined via ontologies (schema with accompanying vocabularies).

The structWSF middleware framework is fully RESTful in design and is based on HTTP and Web protocols and open standards. The initial structWSF framework comes packaged with a baseline set of about a dozen Web services in CRUD, browse, search and export and import. All Web services are exposed via APIs and SPARQL endpoints. Each request to an individual Web service returns an HTTP status and optionally a document of resultsets. Each results document can be serialized in many ways, and may be expressed as either RDF or pure XML.

 

conStruct

conStruct is a distro of the Drupal framework that aims to set a new standard in data integration and as a structured content system (SCS). With conStruct, you can let your data and its structure drive your applications. You can easily interoperate your diverse internal information with public content on the Web. And you can leverage a platform designed from the ground up for knowledge management and collaboration.

Posted at 19:59

Ebiquity research group UMBC: NOSQL: distributed key-value data stores

ComputerWorld has an article on the “nosql” movement and a recent nosql meetup held in San Francisco, No to SQL? Anti-database movement gains steam. Nosql systems are distributed, non-relational data stores that typically use a simple key-value approach to indexing and retrieving data and use a simple procedural query API rather than a sophisticated declarative query language.

“The inaugural get-together of the burgeoning NoSQL community crammed 150 attendees into a meeting room at CBS Interactive. Like the Patriots, who rebelled against Britain’s heavy taxes, NoSQLers came to share how they had overthrown the tyranny of slow, expensive relational databases in favor of more efficient and cheaper ways of managing data.

“Relational databases give you too much. They force you to twist your object data to fit a RDBMS [relational database management system],” said Jon Travis, principal engineer at Java toolmaker SpringSource, one of the 10 presenters at the NoSQL confab (PDF). NoSQL-based alternatives “just give you what you need,” Travis said.”

There were presentation on nine different ‘nosql’ databases: Voldemort, Cassandra, Dynomite, HBase, Hypertable, CouchDB, VPork, MongoDb as well as general presentations by Google’s Jonas Karlsson, and Cloudera’s Todd Lipcon.

Johan Oskarsson of Last.fm wrote a debriefing post on his blog.

“The relatively young but rapidly growing “nosql” community met last Thursday in San Francisco. The idea was to give attendees a solid introduction to how distributed, non relational databases work as well as an overview of the various projects out there.”

and provides links to the presentation slides and videos. You can also search for NOSQL on Vimeo to get the videos.

I learned of this meeting on Hacker News, where you can find some interesting comments.

Of course their are many popular key-value stores that are not designed to support the highly-scalable distributed needs of many Web applications. I found, for example, that as a persistent RDF store for rdflib, Sleepycat out performed MySQL.

Posted at 14:17

July 01

Alexandre Passant: "Technologies du Web Sémantique pour l'Entreprise 2.0": Thèse et slides en ligne

De retour de vacances, mon mémoire de thèse (pdf, licence CC BY-NC-ND) et les transparents de la soutenance (qui s'est très bien passée ;-) sont enfin en ligne.

Merci encore à tous ceux qui sont venus / m'ont encouragé / questionnés et avec qui j'ai pu travailler ces dernières années dans le cadre de cette thèse ... l'aventure ne fait que commencer !

Posted at 23:50

W3C QA Blog Semantic Web News: Data in the City

On Monday of this week I attended a hearing in New York City organized by the Technology and Government Committee of the New York City Council. On the agenda was a proposal (Int. No. 991) regarding the use of open standards for publishing New York city government data. I picked up a printed copy of the proposal and a summary when I walked into the hearing. To my surprise the handout referred to W3C by name (the online proposal does not) and included a reference to the recent publication of the eGovernment Interest Group Improving Access to Government through Better Use of the Web.

So I filled out a form requesting to speak. To my surprise, the Chair invited me to testify early in the hearing.

Before I spoke, however, a representative from the Mayor's Office voiced opposition to some specifics of the proposal. Earlier that day, at the Personal Democracy Forum elsewhere in the city, the Mayor himself announced several initiatives regarding publishing government data. This had generated some excitement, and a number of people who had been attending the conference (I had not) were present at the hearing.

The Mayor's Office cited 5 or 6 reasons why it opposed the particular proposal (which I trust will appear in the public record that I've not yet located) but the main ones I recall were cost and burden. I would paraphrase some of the exchange between the city council committee and the Mayor's office as follows:

  • City Council: Please put raw data on the Web.
  • Mayor's Office: We prefer publishing information that is less raw and more citizen-friendly.
  • City Council: Citizens won't know what they are missing unless you put it up there.
  • Mayor's Office: That will cost too much (e.g., scanning old documents). We have lots and lots of documents.
  • City Council: By choosing what to provide and massaging the data, you are not letting people make better use of it.
  • Mayor's Office: See the initiatives we just announced. We think that we are meeting customer needs (which we hear through surveys, complaints, etc.)
  • City Council: You shouldn't decide what people want. Let them decide.

W3C's eGovernment Interest Group has been working with a growing number of agencies to gather information that will help address these sorts of concerns. Now they will develop best practices and guidelines for publishing government data. This is not an area I know well, so I look forward to being able to refer to the eGov IG's findings. However, I'm sure New York City is not the first government to wrestle with the technology, the cultural issues ("why should I publish my data?"), and how to use taxpayer money to do this.

When my turn came to speak, I said something like this:

  • Thanks for using open standards.
  • Use W3C Semantic Web Standards to publish data. As a starting point, I referred to Tim Berners-Lee's recent draft of Putting Government Data online
  • Don't try to do everything at once. Start with what is already available electronically, for example.
  • Don't require agencies to coordinate through a single portal. Let them publish data at their own speed. Then aggregate (through a single portal if you wish and if people find that easy to use).
  • Participate in the eGovernment Interest Group.

I hope my summary here is backed up by the public record.

Posted at 22:51

Talis: Interesting semantic web stuff

By Tom Scott
| This guest post originally appeared on Tom Scott’s blog; republished under CreativeCommons License, and with kind permission of the author.

It’s starting to feel like the world has suddenly woken up to the whole Linked Data thing — and that’s clearly a very, very good thing. Not only are Google (and Yahoo!) now using RDFa but a whole bunch of other things are going on, all rather exciting, below is a round up of some of the best. But if you don’t know what I’m talking about you might like to start off with TimBL’s talk at TED.

TimBL is working with the UK Cabinet Office (as an advisor) to make our information more open and accessible on the web [cabinetoffice.gov.uk]
The blog states that he’s working on:

  • overseeing the creation of a single online point of access and work with departments to make this part of their routine operations.
  • helping to select and implement common standards for the release of public data
  • developing Crown Copyright and ‘Crown Commons’ licenses and extending these to the wider public sector
  • driving the use of the internet to improve consultation processes.
  • working with the Government to engage with the leading experts internationally working on public data and standards

The Guardian has an article on the appointment.

Closer to home there have been a few interesting developments

Media Meets Semantic Web – How the BBC Uses DBpedia and Linked Data to Make Connections [pdf]
Our paper at this years European Semantic Web Conference (ESWC2009) looking at how the BBC has adopted semantic web technologies, including DBpedia, to help provide a better, more coherent user experience. For which we won best paper of the in-use track – congratulations to Silver and Georgie.

The BBC has announced a couple SPARQL endpoints, hosted by talis and openlink [welcomebackstage.com]
Both platforms allow you to search and query the BBC data in a number of different ways, including SPARQL — the standard query language for semantic web data. If you’re not familiar with SPARQL, the Talis folk have published a tutorial that uses some NASA data.

A social semantic BBC?
Nice presentation from Simon and Ben on how social discovery of content could work… “show me the radio programmes my friends have listen to, show me the stuff my friends like that I’ve not seen” all built on people’s existing social graph. People meet content via activity.

PriceWaterhouseCooper’s spring technology forecast focuses on Linked Data [pwc.com]
“Linked Data is all about supply and demand. On the demand side, you gain access to the comprehensive data you need to make decisions. On the supply side, you share more of your internal data with partners, suppliers, and—yes—even the public in ways they can take the best advantage of. The Linked Data approach is about confronting your data silos and turning your information management efforts in a different direction for the sake of scalability. It is a component of the information mediation layer enterprises must create to bridge the gap between strategy and operations… The term “Semantic Web” says more about how the technology works than what it is. The goal is a data Web, a Web where not only documents but also individual data elements are linked.”
Including an interview with me!

You should also check out…

sameas.org a service to help link up equivalent URIs
It helps you to find co-references between different data sets. Interestingly it’s also licenced under CC0 which means all copyright and related or neighboring rights are waived.

Enhanced by Zemanta

Image: “Semantic Web Rubik’s Cube” by dullhunk, CC License, via flickr

Posted at 13:45

Web Semántica Hoy: BUSCADORES SEM�NTICOS: COMPRENDER PARA ENCONTRAR (Parte 1)

En este art�culo se definen conceptos como b�squeda sem�ntica y buscador sem�ntico y se exponen ejemplos de las ventajas de los buscadores sem�nticos con respecto a los buscadores convencionales, basados en el uso de palabras clave para encontrar informaci�n y mostrarla al usuario. El inter�s industrial y comercial de las b�squedas sem�nticas se manifiesta tanto en la aparici�n de numerosos buscadores sem�nticos como en la utilizaci�n de t�cnicas sem�nticas para complementar las b�squedas convencionales (en Google, por ejemplo).

En mi art�culo anterior habl� del nuevo buscador sem�ntico de Microsoft. En este art�culo definir� precisamente conceptos como "b�squeda sem�ntica" y "buscador sem�ntico" y expondr� ejemplos de sus ventajas con respecto a los buscadores convencionales, as� como de sus limitaciones actuales.

Muchos de los buscadores actuales se basan en palabras clave. Es decir, el usuario introduce las palabras relevantes de su b�squeda ("Albert Einstein" y "Nobel", p. ej.), y la aplicaci�n devuelve todos los documentos que contienen esas palabras. En el apartado 3.2 de El futuro de la Web (http://www.javahispano.org/tutorials.item.action?id=55) puede encontrarse una exposici�n de las desventajas de esos buscadores. Dos son las m�s importantes:

  1. Escasa precisi�n o relevancia en los resultados (se devuelven muchos documentos poco relevantes para la b�squeda: la presencia de una palabra clave en un documento no implica necesariamente que �ste sea relevante).
  2. Excesiva sensibilidad al vocabulario empleado en las b�squedas (y, por tanto, imposibilidad de obtener a la primera todos los resultados relevantes disponibles: muchos documentos de inter�s pueden no incluir las palabras clave, pero s� sin�nimos, hip�nimos o hiper�nimos de ellas).

Un estudio de David Hawking y de varios investigadores evalu� 20 buscadores convencionales (basados en palabras clave) usando 54 b�squedas. El porcentaje de resultados relevantes despu�s de inspeccionar las 20 primeras p�ginas web devueltas fue del 0,5% para el mejor buscador (Northern Light), y Google fue el segundo buscador m�s preciso. As� pues, la popularidad de los buscadores basados en palabras clave no tiene mucho que ver con su precisi�n, sino con la paciencia de buey de los usuarios.

Una b�squeda sem�ntica es una consulta en la que se tiene en cuenta el contexto, y por tanto el significado, de aquello por lo que se pregunta (y no solamente las palabras de la consulta), con el objetivo de evitar la ambig�edades tanto de las consultas como del texto de los documentos donde se busca. Por ejemplo, una b�squeda sem�ntica con las palabras "descubridor" y "penicilina" devolver�a documentos sobre Alexander Fleming, aunque en ellos no aparecieran esos dos t�rminos, porque identificar�a los conceptos que estructuran la b�squeda (la penicilina es un producto del cual se desea averiguar su descubridor o, dicho m�s formalmente, Medicina(Penicilina) tieneInventor Persona(Alexander Fleming)). El fin �ltimo de las b�squedas sem�nticas radica en que los usuarios puedan formular b�squedas m�s precisas y expresivas, que originen resultados relevantes para el usuario con la m�nima intervenci�n de �ste.

Normalmente, se admite que las b�squedas sem�nticas se basan en t�cnicas para extraer informaci�n mediante la utilizaci�n de ontolog�as (v�ase http://www.wshoy.sidar.org/index.php?2005/12/09/30-ontologias-que-son-y-para-que-sirven) o metadatos. El uso de ontolog�as permite definir formalmente los dominios de inter�s (teor�as cient�ficas, por ejemplo) con la suficiente riqueza expresiva para que los usuarios pueden especificar sus b�squedas con bastante detalle, ya sea antes de ejecutar la consulta o durante su ejecuci�n.

Desde un punto de vista t�cnico, un buscador sem�ntico es una aplicaci�n que comprende las b�squedas de los usuarios y los textos de los documentos de la web mediante el uso de algoritmos que simulan comprensi�n o entendimiento, y que a partir de �stos proporciona resultados correctos sin que el usuario tenga que abrir el documento e inspeccionarlo por s� mismo. Un buscador de este tipo reconoce el contexto correcto para las palabras o sentencias de b�squeda. Google o Yahoo no son buscadores sem�nticos, pues se basan fundamentalmente en algoritmos que generan estad�sticas a partir de palabras y enlaces, y no en algoritmos cognitivos que capturen el conocimiento impl�cito en las palabras y su contexto. Por ejemplo, una b�squeda como "�Qui�n fue Urano?" en cualquiera de esos buscadores devolver� resultados afines al s�ptimo planeta del Sistema Solar, cuando est� claro que el prop�sito de la b�squeda es encontrar informaci�n sobre el dios primordial del cielo en la mitolog�a griega.

Los buscadores sem�nticos no siempre pueden acertar a la primera el significado de una palabra polis�mica. Por tanto, deben disponer de medios de desambiguaci�n para conocer el sentido exacto que tiene la palabra en la b�squeda. Por ejemplo, un buscador sem�ntico que utilize internamente ontolog�as con conceptos inform�ticos y medios de transporte deber� disponer de herramientas para determinar a qu� se refiere el usuario cuando hace una consulta con la palabra bus, que puede significar autob�s o "sistema digital que transfiere datos entre los componentes de un computador o computadores". Para ello, puede escoger el significado m�s probable, preguntar al usuario para que elija entre varias opciones (como hace el buscador Hakia, que presenta las opciones extray�ndolas de su ontolog�a) o usar las dem�s palabras de la b�squeda para inferir el significado exacto de bus en ese contexto (p. ej., en una consulta como "�A qu� hora sale este viernes el bus para Soria desde Madrid?").

Como un buscador sem�ntico se basa en algoritmos que simulan la comprensi�n de las palabras y, por ende, establecen relaciones entre ellas, pueden realizar b�squedas de inter�s para el usuario aunque en los documentos devueltos no figuren las palabras o expresiones de b�squeda. Por ejemplo, un buscador sem�ntico en que se introdujera la palabra "marsupial" mostrar�a documentos donde aparecer�an t�rminos como �stos: canguro, koala, satanelo de Nueva Guinea, monito del monte, rata canguro, zarig�eya, tlacuache, demonio de Tasmania. Como demuestra este ejemplo, las b�squedas sem�nticas son muy superiores a las basadas en palabras clave: uno puede encontrar documentos de inter�s que jam�s encontrar�a buscando con palabras clave. Adem�s, si uno buscara informaci�n sobre distintas especies de marsupiales, no necesitar�a formular la consulta de distintas maneras, con el nombre de cada especie, para obtener la informaci�n deseada.

La falta de estructura y de anotaciones sem�nticas en los recursos de la web (documentos Word, PDF, p�ginas HTML, etc.) obliga a que los buscadores sem�nticos analicen mediante algoritmos cognitivos los recursos, palabra a palabra y oraci�n a oraci�n, para asignar las palabras y oraciones a conceptos ontol�gicos. Estos algoritmos son lentos y requieren supervisi�n humana. De ah� que los buscadores sem�nticos no cubran por ahora tantos recursos de la web como los convencionales, que emplean algoritmos estad�sticos, mucho m�s r�pidos y completamente automatizados. Esta limitaci�n desaparecer� cuando se vayan mejorando los algoritmos cognitivos o en cuanto los "islotes sem�nticos" se unan para formar la web sem�ntica o, al menos, "continentes sem�nticos".

"Nunca existir� la web sem�ntica �oigo a lo lejos�. Es tan imposible que funcione como las m�quinas voladoras de Leonardo da Vinci." Tengo dos objeciones contra esa opini�n. Una: el pesimismo no tiene futuro. Dos: hubo un tiempo, no lejano, en que se pensaba que la interoperabilidad sint�ctica era imposible salvo con enormes inversiones, y casi todos apostaron a que no habr�a un �nico caballo ganador en la carrera de los lenguajes de intercambio de datos. Se equivocaron. Y algunos perdieron hasta la camisa.

A falta de la web sem�ntica, algunos ya se han puesto al tajo. Existen buscadores sem�nticos que trabajan ya estructurando la informaci�n a la que luego se accede mediante b�squedas. Por ejemplo, Freebase (http://www.freebase.com/), un buscador social, utiliza una base de datos de grafos para definir su estructura de datos como una serie de nodos y un conjunto de enlaces que establecen relaciones entre los nodos. Seg�n la documentaci�n oficial de Freebase, lo que diferencia a Freebase de otras bases de datos es que cualquier asunto puede ir acompa�ado de muchas clases distintas de informaci�n. El ejemplo que dan es muy claro: "Por ejemplo, Arnold Schwarzenegger podr�a aparecer como actor en una base de datos de pel�culas, como gobernador en una base de datos de pol�tica y como Mr. Universo en una base de datos de culturistas. En Freebase, solamente hay un tema sobre Arnold Schwarzenegger, que contiene informaci�n sobre las tres facetas de su vida p�blica. El tema unificado act�a como un centro de informaci�n, por lo que es f�cil encontrar informaci�n sobre �l y aportarla, independientemente de qu� clase de informaci�n es".

Freebase: una base de datos social y un buscador.


En principio, los buscadores sem�nticos podr�an evitar las p�ginas basura, que proliferan en la web como malas hierbas en un campo abandonado. Como tienen en cuenta el contexto de las palabras o frases de los documentos, podr�an descartar esas p�ginas enseguida. Por ejemplo, una p�gina web que incluya la frase "web sem�ntica" rodeada de frases sobre c�mo aumentar la potencia sexual, juguetes er�ticos y sexo f�cil en alg�n pa�s lejano de costumbres relajadas ser�a eliminada de cualquier b�squeda sobre la web sem�ntica o tendr�a una relevancia muy baja; pues el contexto de estas �ltimas frases (sexo) no tiene ninguna relaci�n con la web sem�ntica.

Que un buscador permita introducir preguntas en lenguaje natural ("�Qu� tiempo hace ahora en Viena?") y las responda correctamente no significa necesariamente que sea un buscador sem�ntico: puede que solamente traduzca las preguntas en lenguaje natural a consultas sobre una base de datos.

Por el momento, casi todos los buscadores sem�nticos permiten solamente b�squedas en ingl�s, aunque se est�n ampliando para que admitan otros lenguajes. Aparte del predominio del ingl�s, la causa de eso se debe tambi�n a las dificultades inherentes a reflejar el conocimiento de los lenguajes naturales en estructuras de datos que permitan b�squedas r�pidas y escalables (matrices, listas, pilas, colas, �rboles, grafos, etc.). Por ejemplo, el buscador Hakia utiliza un vocabulario en forma de ontolog�a que incluye unos 100.000 sentidos de palabras inglesas, y ese n�mero continuar� aumentando seg�n se perfeccione la aplicaci�n. Confeccionar cualquier vocabulario de ese tama�o es una empresa lenta, tediosa y muy cara, y que debe ser realizada por un equipo bien coordinado de especialistas en ling��stica.

Se equivocar� quien piense que, teniendo una ontolog�a de sentidos de palabras en ingl�s, resulta sencilla su conversi�n a una ontolog�a en otro idioma: la conversi�n de ontolog�as ling��sticas de un idioma a otro es un proceso muy complejo y que requiere la supervisi�n constante de un equipo de traductores. Por poner un ejemplo, si queremos pasar de una ontolog�a ling��stica en espa�ol a una en alem�n, debemos considerar todas las posibles traducciones al alem�n de cada palabra espa�ola; en caso contrario, los resultados de las b�squedas en alem�n estar�n m�s limitados que los de las b�squedas en espa�ol. Una palabra espa�ola sencilla y sin ambig�edades como "autom�vil" puede traducirse en alem�n como "Auto", "Wagen", "Kraftwagen", "Kraftfahrzeug", "Automobil", "Motorfahrzeug" o "KFZ" (seguro que hay m�s traducciones, pero hasta ah� llega mi alem�n b�sico).

En una b�squeda interlingu�stica espa�ol-alem�n de tipo sem�ntico, todas estas palabras deber�an tenerse en cuenta para encontrar todos los documentos relevantes cuando alguien escriba "autom�vil" en el buscador. (Las b�squedas interlingu�sticas son aquellas en que se traduce una b�squeda en un lenguaje a otro lenguaje, y los resultados se traducen de nuevo al primer lenguaje. Google est� trabajando para a�adir a su buscador esta clase de b�squedas, que permitir�n, entre otras muchas cosas, que un hispanohablante puede reservar entradas en museos y cines en Tokio, aunque la informaci�n de horarios y venta de entradas no est� disponible en espa�ol.)

Es probable que los buscadores sem�nticos cambien la manera en que se busca y se muestra la informaci�n y que supongan un gran cambio para los usuarios ocasionales. Consid�rense, por ejemplo, las interfaces que aparecen en las siguientes capturas de pantalla, procedentes de Mnemo (http://www.mnemo.org/) Kart00 (http://www.kartoo.com/) y KoolTorch ( http://www.kooltorch.com/).

Interfaz de Mnemo. Quiz�s los navegadores sem�nticos del futuro tengan interfaces similares a �sta.


Interfaz de Kart00. Quiz�s los navegadores sem�nticos del futuro tengan interfaces similares a �sta.


Interfaz de KoolTorch. Quiz�s los navegadores sem�nticos del futuro tengan interfaces similares a �sta.


Posted at 07:33

June 30

Clark and Parsia: PelletDb Whitepaper

At Semantic Technology 2009 we formally announced PelletDb, our new product that integrates Pellet with Oracle’s Semantic Database system, including the Oracle RDF query engine and OWL reasoner. We’re excited about PelletDb since it makes Pellet available to Oracle users, including its sound and correct OWL 2 reasoning, unique reasoning services like SPARQL-DL and explanations, etc. But we’re also excited because it makes Oracle’s enterprise-class information management facilities available to Pellet users and apps.

Today we’re releasing an extensive PelletDb whitepaper (PDF) that explains in detail what PelletDb is, how it works, who should use it, etc. It includes customer benefits, sample code, and a basic roadmap for future development. If you’re curious about how we’re fusing Pellet and Oracle, check out the whitepaper.

The PelletDb limited beta is on-track to begin 15 July, so please get in touch if you want to participate.

Posted at 20:28

W3C QA Blog Semantic Web News: Reflections on SemTech 2009

SemTech 2009, along with W3C's significant participation in it, is now behind us. Besides catching upon on emails, I have spent the past week reflecting on the enthusiasm, presentations, and flurry of activities that constituted this year's event in San Jose, 14 to 18 June.

One strong feeling I had while in San Jose, was a sense of /deja vu/ in the Web world. Stepping back, I realize that 2009 feels a lot like 1999 when I was consulting with Allaire (remember CFML and ColdFusion?) and attended their user group meetings teaming with enthusiastic Web developers with war stories about their successes and failures bringing Web development servers into organizations of all types and sizes.

Ten years ago, many enterprises were just getting onto the "e-commerce bus," having been either eclipsed or inspired by the likes of innovative Web-centric companies such as Amazon.com and eBay who launched in 1995, or early-adopter retailers like JCPenney whose understanding of the catalogue business put them online faster than many other retailers, or businesses for that matter. Many mainline companies were in various phases of their Web evolution in 1999 -- from brochureware to intranets to pilot customer-facing interactive sites. And keep in mind that ten years ago, Google was barely two.

In 1999 there was also a wide cross-section of skill sets and diversity of understanding about what the Web was, how it worked, and what people and tools to trust to bring one's vision onto the Web. I remember sitting in focus groups with a number of HTML Web designers who were impatient with their more senior corporate IT colleagues who insisted on clear roadmaps, risk assessments and cost-benefit analyses for the Web-based tools and technology solutions their companies were considering.

The Java developers, engineers and system architects in other discussion groups also weren't too keen on the irreverent attitudes and huge amounts of money being thrown at these young people, who just a few years earlier were teenagers playing video games at the arcades. But understanding and trust continued to build, innovation accelerated, communities with technical skills increased, and revenues skyrocketed as a direct result of vendors developing and companies embracing new Web technologies.

We fast forward to 2009 and see similar dynamics with Semantic Web technologies. There are the early adopters and evangelists who have already climbed aboard the "RDF-bus," understand what's possible with W3C's Semantic Web technology standards, and can point to impressive results in new tools, pilot projects and even robust deployments within organizations, governments, and enterprises.

Yet skeptics remain both in terms of understanding the paradigm shift that the Semantic Web brings, just as the early Web challenged the status quo, and in the legitimate need for better tools and long-term architectural considerations for how to successfully deploy Semantic Web technologies in large enterprises.

Like the early Web and the W3C standards and subsequent commercial tools, products and services that enabled its rapid growth, the W3C Semantic Web stack is highly stable today. The accelerating uptake of W3C Semantic Web standards, new tools and applications were part of the buzz at this year's Semantic Technologies Conference.

In addition to hearing and seeing many new use cases and case studies, the call for commercialization was clear, as was the amount of enthusiasm among the technologists doing good and exciting work. The community's call to publish and link data in RDF or RDFa is clearly being heard, with The New York Times joining the ranks of large data holders eager and willing to publish to the Linked Open Data Cloud.

Finally, the number of Semantic Web communities flourishing in cities coast to coast across North America and in Europe, is another healthy sign that the growth and adoption of Semantic Web technologies has not only "crossed the chasm" (in keeping with Geoffrey Moore's model), but has spawned strong beachheads of support among highly skilled technology professionals across business, industry, and government sectors.

It is my hope that at next year's Semantic Technologies Conference -- which is changing venues to San Francisco -- we will point to an even higher coordinate on the adoption curve and see amazing new results and impact from the use of W3C Semantic Web technologies. If I were Jean Luc Picard, I would, "Make it so." But for now, I'll continue in my role of education and outreach for W3C.... Look forward to seeing many of you throughout the year and at next year's conference!

Posted at 13:38

June 29

Clark and Parsia: Pellet 2 Tutorial Available

Two weeks ago at Semantic Technology 2009 conference Evren and Mike presented a 4 hour tutorial about building OWL-based applications with Pellet 2. About 50 people attended, which was a surprising turnout given that it was at the rump end of the conference, a notoriously difficult time slot.

After some polishing based on feedback, we’re making the tutorial materials, including sample code, slides, and a bundled download of Pellet, available for use in learning (or teaching others) how to use Pellet, both interactively and programmatically.

Enjoy!

Posted at 19:39

Orri Erling: Virtuoso loads 110,500 triples-per-second on LUBM 8000

LUBM load speed still seems to be a metric that is quoted in comparisons of RDF stores. Consequently, we too measured the load time of LUBM 8000, 1,068-million triples, on the newest Virtuoso.

The real time for the load was 161m 3s. The rate was 110,532 triples-per-second. The hardware was one machine with 2 x Xeon 5410 (quad core, 2.33 GHz) and 16G 6667 MHz RAM. The software was Virtuoso 6 Cluster, configured into 8 partitions (processes) — one partition per CPU core. Each partition had its database striped over 6 disks total; the 6 disks on the system were shared between the 8 database processes.

The load was done on 8 streams, one per server process. At the beginning of the load, the CPU usage was 740% with no disk; at the end, it was around 700% with 25% disk wait. 100% counts here for one CPU core or one disk being constantly busy.

The RDF store was configured with the default two indices over quads, these being GSPO and OGPS. Text indexing of literals was not enabled. No materialization of entailed triples was made.

In comparison, Bigdata reported 200K triples-per-second for the first 8000 LUBM universities on a 15 blade box. We expect to do about that much on one new dual Xeon board; we’ll publish this when this is done.

We think that LUBM loading is not a realistic benchmark for the world but since other people publish such numbers, so do we.

Posted at 16:12

Alexandre Passant: PhD fellowship position in Social Software and Semantic Web at DERI, NUI Galway

The Unit Social Software (USS) in DERI is currently looking for Ph.D. candidates. Applications must be sent by the end of the week at hr.ie@deri.org and positions will start in September.
More details in the add below:


The Unit Social Software (USS) at the Digital Enterprise Research Institute - DERI: http://www.deri.ie/ - of the National University of Ireland, Galway invites applications for a 4 years fully-funded PhD fellowship position.

DERI is a leading research institute in semantic technologies that offers a stimulating, dynamic and multi-cultural research environment, excellent ties to research-groups worldwide and standardization bodies, close collaboration with industrial partners and up-to-date infrastructure and resources.

The DERI Unit Social Software focuses on the convergence of Social Software and the Semantic Web by developing models and tools that support and take advantage of these two trends. Achievements of DERI USS include SIOC - Semantically-Interlinked Online Communities - and a large number of publications and tutorials on the topic in international venues and journals. USS Research is performed in collaboration with other DERI units and industrial partners. The PhD position is funded by Science Foundation Ireland (http://sfi.ie) within the Lion2 project and offers for the successful candidate an annual stipend, course fees and conference travel when presenting.

Applicants should have a strong interest in Social Software, Semantic Web and Web Science in general and hold an excellent primary degree or Masters qualification in a relevant discipline (e.g. computer science, information
science, knowledge representation), with an emphasis on practical aspects of research (e.g. industrial project experience, ontology development and open-source software developement being distinct advantages). Selected
candidates are expected to have the willingness to combine formal scientific work with application-oriented research and development in projects funded by national and international (EU) funding agencies, as well as participating in
open-source projects and standardization activities.

Please submit your application (including cover letter, relevant publications or software implementation, full CV and contact details for two referees) to hr.ie@deri.org by 5pm on Friday, July 3rd with the subject line 'PhD Position - DERI USS'. Candidates will be contacted in the first week of July and interviews will be then conducted for successful applications. For further information please contact Alexandre Passant (alexandre.passant@deri.org) and John Breslin (john.breslin@deri.org).

Posted at 15:24

Semantic Web Company (Austria): Semantic Web Meetup Vienna is alive!

Over the past few months we have seen an impressive increase in Semantic Web Meetups all over the world. More and more afficionados enjoy this informal and decentralized way of networking with the local community, gaining new inputs and impressions for projects and business ideas . On July 16, 2009 the first Semantic Web Meetup in Vienna takes place at headquarter of the Austrian Press Agency.

Join the community! It’s fun and free of charge!


Click here to check out
The Vienna Semantic Web Meetup!

Posted at 14:16

June 28

Libby Miller: Displaying Guardian book reviews for quick buying on Amazon


I read the

Posted at 22:22

June 27

Ebiquity research group UMBC: CFP: JWS special issue on Semantic Web and Social Media

important dates
abstracts 21 Sept 09
submissions 01 Oct 09
notification 15 Dec 09
final copy 15 Jan 10
publication April 10

The Journal of Web Semantics will publish a special issue on Data Mining and Social Network Analysis for integrating Semantic Web and Web 2.0 in the spring of 2010. The special issue will be edited by Bettina Berendt, Andreas Hotho and Gerd Stumme and initial abstracts for papers must be submitted via the Elsevier EES system by September 21, 2009.

The special issue, invites contributions that show how synergies between Semantic Web and Web 2.0 techniques can be successfully used. Since both communities work on network-like data structures, analysis methods from different fields of research could form a link between those communities. Techniques can be - but are not limited to - social network analysis, graph analysis, machine learning and data mining methods.

Relevant topics include

  • ontology learning from Web 2.0 data
  • instance extraction from Web 2.0 systems
  • analysis of Blogs
  • discovering social structures and communities
  • predicting trends and user behaviour
  • analysis of dynamic networks
  • using content of the Web for modelling
  • discovering misuse and fraud
  • network analysis of social resource sharing systems
  • analysis of folksonomies and other Web 2.0 data structures
  • analysis of Web 2.0 applications and their data
  • deriving profiles from usage
  • personalized delivery of news and journals
  • Semantic Web personalization
  • Semantic Web technologies for recommender systems
  • ubiquitous data mining in Web (2.0) environment
  • applications

Posted at 14:16

June 26

Kingsley Idehen: Linked Data Rules Simplified

As a compliment to the most recent Linked Data Design Issues note by TimBL, I would like to add this subtle tweak to the enumerated rules:

  1. Identify or Name things using HTTP URIs
  2. Describe things using the RDF metadata model
  3. Increase link data mesh density on the Web by linking (referring) to things in other data spaces using their HTTP URIs.

If you perform the steps above, on any HTTP network (e.g. World Wide Web), you implicitly bind the Names/Identifiers of things to negotiable representations of their metadata (description) bearing documents.

Also note, you can create and deploy the resulting RDF metadata using any of the following approaches:

  1. RDFa within (X)HTML documents
  2. N3, Turtle, TriX, RDF/XML etc. based documents
  3. Programmatically generated variants of 1&2.

Related

Posted at 14:49

W3C QA Blog Semantic Web News: W3C team at SemTech

Some of us on the team had a pretty busy last week: indeed, Karen Myers, Sandro Hawke, Dave Raggett, Eric Prud'hommeaux, Ralph Swick, and I were at the Semantic Technologies 2009 conference in San Jose. Dave (together with Dianne Mueller from JustSystems) gave a presentation on XBRL and the Semantic Web, Eric gave a tutorial (together with Lee Feigenbaum, from Cambridge Semantics) on SPARQL, and I also gave an introductory SW tutorial and a presentation. And, of course, we all had hallway discussions, meetings, interviews… more than I even remember right now. A number of W3C members were also represented either as presenters or at their booth at the exhibition (or both). More than 1200 people in San Jose in spite of the economic malaise... This is pretty good!

I published a blog entry on right before my journey back to Europe (and an addendum because I forgot something in the original blog entry…) with much more details. If you are interested in more detailed impressions on the conference, you can read it there. Suffices it to say: it was a great week!

Posted at 09:16

June 25

Tetherless World Constellation group RPI: What’s in data.gov

A recent article by Tim Berners-Lee, “Putting Government Data online“, has  attracted significant interest to the  datasets published at the US data.gov website.  As Berners-Lee discusses the Semantic Web techniques that can be used to get those data into RDF space (something we are now working on), we would like to share our initial investigation of the contents of these government datasets.

I. Translate dataset into RDF

The catalog of the datasets in data.gov,http://www.data.gov/details/92,  is published in CSV format as part of data.gov. We  converted it into RDF using simple CSV parsing. We kept the translation minimal: (i) the properties are directly created from thecolumn names; (ii) each table row is mapped to an instance of pmlp:Dataset; (iii) all non-header cells are mapped to a literal - we don’t create new URIs at this point. The output of our work is published on tw website at:

http://data-gov.tw.rpi.edu/raw/92/catalog.rdf

(We are now starting to do more  integration work, extracting multiple objects from single tables, linking into the linked open data  cloud, etc.  and will publish new version when that is done - the purpose of this first work was simply to make the catalog more available to the RDF community)

II. Browse and query the RDF graph

As an example, we can browse the dataset in tabulator, and then use a SPARQL webservice to query the dataset. For example, we use a sparql query to list datasets published in CSV format:

http://onto.rpi.edu/sw4j/sparql?queryURL=http://data-gov.tw.rpi.edu/sparql/select-csv-dataset.sparql

III. Observations on the RDF graph

Using this service we can answer some basic questions about the data.gov datatsets:

1. How many datasets are published, and how many among them can be easily converted into RDF?

There are 332 datasets which can be partitioned by  type:  raw data catalog(301);  tool catalog (31).

Not all of the datasets have a link to downloadable data because some offer only browseable data via their own websites,  Others  publish datasets in multiple formats. As of today, the online static files associated with the datasets are distributed as  follows:  204 datasets offer a CSV format dump, 10 datasets offer an XML format dump, and 21 datasets offer an XLS format dump.

2. How are the datasets categorized?

Category number of datasets
Geography and Environment 227
Labor Force, Employment, and Earnings 30
Social Insurance and Human Services 30
Health and Nutrition 11
Law Enforcement, Courts, and Prisons 7
Population 4
Other 3
Prices 3
Business Enterprise 2
Education 2
Energy and Utilities 2
Federal Government Finances and Employment 2
Income, Expenditures, Poverty, and Wealth 2
Science and Technology 2
Transportation 2
Construction and Housing 1
International Statistics 1
National Security and Veterans Affairs 1

3. What are some of the key items in the dataset?

4. What are the  sources of the datasets?

The majority of the datasets are published by the EPA, and they contain environmental data partitioned by the states of the US in three individual years.  Others come from other govt agencies - the distribution is as follows:

IV. Getting Datasets linked

Although the datasets are not explicily linked, we see a number of opportunities for connecting these datasets to others (and into the Linked Open Data datasets):

  • A large percentage of files have some sort of geo-tagging, thus they can be linked to DBpedia or Geo-names (and then presented via Map services).
  • Some datasets are subsets of other datasets, e.g. EPA data “2005 Toxics Release Inventory data for the state of Georgia” is a subset of  “2005 Toxics Release Inventory National data file of all US States and Territories” making for easier “internal” linking of the datasets.
  • A number of the datasets contain temporal information, e.g. IRS’s “Tax Year 1992 Private Foundations Study”,…”Tax Year 2005 Private Foundations Study” which provides an opportunity for mashups using timelines and such.

V. Conclusions

We are committed to getting more of the data.gov data online soon (in RDF), and then investigating data integration and knowledge discovery. In order to get our datasets linked to the linked data cloud, we will use SPARQL for extracting entities and our Semantic Mediawiki as a platform to capture the owl:sameAs mappings.  Scalable dataset publishing is also challenging as some of these are very large datasets, e.g. “2005-2007 American Community Survey Three-Year PUMS Population File” has a 1.1 g zipped csv file.  Moreover, some datasets are not directly available in one file but via a web service.  Our current plan is to produce RDF documents available for download soon, and to work on bringing more of these datasets into live, SPARQLable forms as we can.

Li Ding, Dominic DiFranzo and Jim Hendler

Posted at 14:05

Semantic Web Company (Austria): Some Semantic Apps for the iPhone

evriverseSome new releases around Apple´s iPhone family, like the new OS3.0 or the new 3G S have stimulated another big hype around this “little darling”. I took a look at another facet, namely: Has the Semantic Web entered the iPhone realm yet (or vice versa)? Experts have been talking about the need for semantically enhanced mobile applications for years, so let´s see, if they are in place already.

Searching for “semantic web” in the AppStore delivers six results, one of them called “SemanticWb” is obviously an interesting match. The application “extracts current life sciences and health care knowledge and place them conveniently at your fingertips on your iPhone”. The application offers search suggestions and moderated search and retrieves articles from PubMed or genetic disorders which are related to the search term. Good start, this is a neat iPhone application which should be interesting for medical doctors and related professions.

Another application on the iPhone which is related to the semantic web is the “English wordnet dictionary” based on WordNet from Princeton University.

So, not much semantic web on the iPhone so far - I thought until Evriverse was released some weeks ago. The iPhone version of evri.com offers a new way to find connections between all kind of things. Similar to OpenCalais Evri can extract people, places, organisations, products etc. from unstructured information like news or blogs. The innovation around Evriverse is the way how complex search queries around “anything” can be formulated by just touching the screen. For example, if you are looking for information about “Tim Berners-Lee” the application not only offers auto-complete but also suggests related people, organisations etc. to refine any search query. Such relations are updated constantly and are based on the semantic analysis of news and blogs.

Evriverse offers the most comfortable way to do news research on the iPhone today. It shows how semantic technologies can enhance user experience on a mobile device and it will path the way to more semantic (web) apps on the iPhone.

Posted at 09:31

June 24

John Breslin: Open government and Linked Data; now it’s time to draft…

For the past few months, there have been a variety of calls for feedback and suggestions on how the US Government can move towards becoming more open and transparent, especially in terms of their dealings with citizens and also for disseminating information about their recent financial stimulus package.

As part of this, the National Dialogue forum was set up to solicit solutions for ways of monitoring the “expenditure and use of recovery funds”. Tim Berners-Lee wrote a proposal on how linked open data could provide semantically-rich, linkable and reusable data from Recovery.gov. I also blogged about this recently, detailing some ideas for how discussions by citizens on the various uses of expenditure (represented using SIOC and FOAF) could be linked together with financial grant information (in custom vocabularies).

More recently, the Open Government Initiative solicited ideas for a government that is “more transparent, participatory, and collaborative”, and the brainstorming and discussion phases have just ended. This process is now in its third phase, where the ideas proposed to solve various challenges are to be more formally drafted in a collaborative manner.

What is surprising about this is how few submissions and contributions have been put into this third and final phase (see graph below), especially considering that there is only one week for this to be completed. Some topics have zero submissions, e.g. “Data Transparency via Data.gov: Putting More Data Online”.

20090624b

This doesn’t mean that people aren’t still thinking about this. On Monday, Tim Berners-Lee published a personal draft document entitled “Putting Government Data Online“. But we need more contributions from the Linked Data community to the drafts during phase three of the Open Government Directive if we truly believe that this solution can make a difference.

For those who want to learn more about Linked Data, click on the image below to go to Tim Berners-Lee’s TED talk on Linked Data.

(I watched it again today, and added a little speech bubble to the image below to express my delight at seeing SIOC profiles on the Linked Open Data cloud slide.)

We also have a recently-established Linked Data Research Centre at DERI in NUI Galway.

20090624a

Reblog this post [with Zemanta]

Posted at 15:25

W3C Semantic Web News: SPARQL Language Specification Translated to Russian

Сергей Щербак (Sergey Shcherbak) has published a Russian translation of the SPARQL Query Language, under the title “Язык запросов SPARQL для RDF”.

Posted at 08:23

June 23

Norm Walsh: Not exactly XProc

One advantage of being an implementor is that I can play with languages that the Working Group didn't approve.

Posted at 22:27

W3C Semantic Web News: XSPARQL published as a W3C Submission

The “XSPARQL” specification has been published as a W3C member submission, co-authored by experts of Asemantics S.R.L., DERI Galway, Fundación CTIC, INRIA, Ontotext, OpenLink Software Inc., Profium, Talis Information Ltd., and the University of Innsbruck. This specification defines a merge of SPARQL and XQuery, and has the potential to bring XML and RDF closer together. XSPARQL provides concise and intuitive solutions for mapping between XML and RDF in either direction, addressing both the use cases of GRDDL and SAWSDL.

Posted at 16:38

June 22

Benjamin Nowack: Code.semsol.org - A central home for semsol code

The code bundles on the ARC website are generated in an inefficient manual process, and each patch has to wait for the next to-be-generated zip file. The developer community is growing (there are now 600 ARC downloads each month), I'm increasingly receiving patches and requests for a proper repository, and the Trice framework is about to get online as well. So I spent last week on building a dedicated source code site for all semsol projects at code.semsol.org.

So far, it's not much more than a directory browser with source preview and a little method navigator. But it will simplify code sharing and frequent updates for me, and hopefully also for ARC and Trice developers. You can checkout various Bazaar code branches and generate a bundle from any directory. The app can't display repository messages yet (the server doesn't have bzr installed, I'm just deploying branches using the handy FTP option), but I'll try to come up with a work-around or an alternative when time permits.

Code Browser

Posted at 14:00

Lee Feigenbaum: SPARQLing at SemTech

SemTech 2009 has come and gone, and it was great. I was concerned—as were others—that the state of the economy would depress the turnout and enthusiasm for the show, but it seems that any such effects were at least counterbalanced by a growing interest in semantic technologies. Early reports are that attendance was up about 20% from last year, and at sessions, coffee breaks, and the exhibit hall there seemed to always be more people than I expected. Good stuff.

Eric P. and I gave our SPARQL By Example tutorial to a crowd of about 50 people on Monday. From the feedback I’ve received, it seems that people found the session beneficial, and at least a couple of people remarked on the fact that Eric and I seemed to be having fun. If this whole semantic thing doesn’t work out, at least we can fall back on our ad-hoc comedy routines.

Anyways, I wanted to share a couple of links with everyone. I think they work nicely to supplement other SPARQL tutorials in helping teach SPARQL to newcomers and infrequent practitioners.

  1. SPARQL By Example slides. I’ve probably posted this link before, but the slides have now been updated with some new examples and with a series of exercises that help reinforce each piece of SPARQL that the reader encounters. Thanks to Eric P. for putting together all of the exercises and to Leigh Dodds for the excellent space exploration data set.
  2. SPARQL Cheat Sheet slides. This is a short set of about 10 slides intended to be a concise reference for people learning to write SPARQL queries. It includes things like common prefixes, the structure of queries, how to encode SPARQL into an HTTP URL, and more.

Enjoy, and, as always, I’d welcome any feedback, suggestions for improvements, or pointers to how/where you’re able to make use of these materials.

Posted at 04:39

June 21

Shelley Powers: Bb's Semantic Feed: RDF: A Major Site Redesign

I've finished the re-organization of my web site, though I have odds and ends to finish up. I still have two major changes featuring SVG and RDFa that I need to incorporate, but the structure and web site designs are finished.

Thanks to Drupal's non-aggressive use of .htaccess, I've been able to create a top-level Drupal installation to act as "feeder" to all of the sub-sites. I tried this once before with Wordpress, but the .htaccess entries necessary for that CMS made it impossible to have the sub-sites, much less static pages in sub-directories.

Rather than use Planet or Venus software to aggregate feed entries for all of my sites, I'm manually creating an excerpt describing a new entry, and posting it at Burningbird, with a link back to the full article. I also keep a listing of the last few months stories for each sub-site in the sidebar, in addition to random display of images.

There is no longer any commenting directly on a story. One of the drawbacks with XHTML and an unforgiving browser such as Firefox, is that a small error is enough to render the page useless. I incorporate Drupal modules to protect comments, but I also allow people to enter in some markup. This combination handles most of the accidentally bad markup, but not all. And it doesn't protect against those determined to inject invalid markup. The only way to eliminate all problems is not allow any markup, which I find to be too restrictive.

Comments are, however, supported at the Burningbird main site. To allow for discussion on a story, I've embedded a link in every story that leads back to the topmost Burningbird entry, where people can comment. Now, in those infrequent times when a comment causes a problem with a page, the story is still accessible. And there is a single Comment RSS feed that now encompasses all site comments.

The approach may not be ideal, but commentary is now splintered across weblog, twitter, and what not anyway—what's another link among friends?

I call my web site design "Silhouette" and will release it as a Drupal theme as soon as it's fully tested. It's a very simple two column design, with sidebar column either to the right (standard) or easily adjusted to fall to the right. It's an accessible design, with only the top navigation bar coming between the top of the page and the first story. It is valid markup, as is, with the XHTML+RDFa Doctype, because I've embedded RDFa into the design. It is not valid, however, when you also add SVG silhouettes, as I do with all but the top most site.

The design is also valid XHTML 5.0, except for a hard coded meta element that was added to Drupal because of security issues. I don't serve the pages up as HTML 5, though, because the RDFa Doctype triggers certain behaviors in RDFa tools. I'm also not using any of the new HTML 5 structural elements.

The site design is plain, but it suits me and that's what matters. The content is legible and easy to locate, and navigate, and that's my second criteria. I will be adding some accessibility improvements in the next few months, but they won't impact on the overall design.

What differs between all of the sites is the header graphic, and the SVG silhouettes, which I changed to suit the topic or mood of the site. The silhouettes were a lot of fun, but they aren't essential, and you won't be able to see them if you use a browser that doesn't support SVG inline. Which means you IE users will need to use another browser to see the images.

I also incorporate some new CSS features, including some subtle use of text-shadows with headers (to add richness to the stark use of black text on pastel graphics) and background-color: rgba functionality for semi-transparent backgrounds. The effects are not viewable by browsers that don't yet support these newer CSS styles, but loss of functionality does not impact access to the material.

Now, for some implementation basics:

  • *I manually reviewed all my old stories (from the last 8 years), and added 410 status codes for those I decided to permanently remove.
  • For the older stories I kept, I fixed up the markup and links, and added them as new Drupal entries in the appropriate sub-site. I changed the dates to match the older entries, and then added a redirect between the old URL and the new.
  • By using one design for all of the sites, when I make a change for one, it's a snap to make the change for all. The only thing that differs is the inline SVG in the page.tpl.php page, and the background.png image used for the header bar.
  • I use the same set of Drupal modules at all sub-sites, which again makes it very easy to make updates. I can update all of my 7 Drupal sites (including my restricted access book site), with a new Drupal release in less than ten minutes.
  • I use the Drupal Aggregator module to aggregate site entries in the Burningbird sidebar.
  • I manually created menu entries for the sub-site major topic entries in Burningbird. I also created views to display terms and stories by vocabulary, which I use in all of my sub-sites.
  • The site design incorporates a footer that expands the Primary navigation menu to show the secondary topic entries. I've also added back in a monthly archive, as well as recent writings links, to enable easier access of site contents.

The expanded primary menu footer was simple, using Drupal's API:

<?php
$tree = menu_tree_all_data('primary-links');
print menu_tree_output($tree);
?>

To implement the "Comment on this story" link for each story, I installed the Content Construction Kit (CCK), with the additional link module, and expanded the story content type to add the new "comment on this story" field. When I add the entry, I type in the URL for the comment post at Burningbird, which automatically gets linked in with the text "Comment on this story" as the title.

I manually manage the link from the Burningbird site to the sub-site writing, both because the text and circumstance of the link differs, and the CCK field isn't included as part of the feed. I may play around with automating this process, but I don't plan on writing entries so frequently that I find this workflow to be a burden.

The images were tricky. I have implemented both the piclens and mediaRSS Drupal Modules, and if you access any of my image galleries with an application such as Cooliris, you'll get that wonderful image management capability. (I wish more people would use this functionality for their image libraries.)

I also display sub-site specific random images within the sub-site sidebars, but I wanted the additional capability to display random images from across all of the sites in the topmost Burningbird sidebar.

To get this cross-site functionality, I installed Gallery2 at http://burningbird.net/gallery2, and synced it with the images from all of my sub-sites. I then installed the Gallery2 Drupal module at Burningbird (which you can view directly) and used Gallery2 plug-ins to provide random images within the Drupal sidebar blocks.

Drupal prevented direct access from Gallery2 to the image directories, but it was a simple matter to just copy the images and do a bulk upload. When I add a new image, I'll just pull the image directly from the Drupal Gallery page using Gallery2's image extraction functionality. Again, I don't add so many images that I find this workflow to be onerous, but if others have implemented a different approach, I'd enjoy hearing of alternatives.

One problem that arose is that none of the Gallery2 themes is XHTML compliant because of HTML entity use. All I can say is: folks, please stop using &nbsp;. Use &#160; instead, if you're really, really generating XHTML, not just HTML pretending to be XHTML.

To fix the non-compliant XHTML problem, I copied a version of my site to a separate theme, and just removed the PHP that serves the page up as XHTML for XHTML-capable browsers from this "Silhouette for HTML" theme. The Gallery2 Drupal modules allow you to specify a different theme for the Gallery2 pages, and I use the new HTMLated theme for the Gallery2 pages. I use my XHTML compliant theme for the rest of the site. Over time, I can probably add conditional tests to my main theme to test for the presence of Gallery blocks, but what I have is simple and works for now.

Lastly, I redirected the old Planet/Venus based feed locations to the Burningbird feed. You can still access full feeds from all of my sub-sites, and get full entries for all but the larger stories and books, but the entries at Burningbird will be excerpts, except for Burningbird-only posts. Speaking of which, all of my smaller status updates, and general chit-chat will be made directly at Burningbird—I'm leaving the sub-sites for longer, more in-depth, and "stand alone" writings.

As I mentioned earlier, I still have some work with SVG and RDFa to finish before I'm completely done with the redesign. I also have some additional tweaks to make with the existing infrastructure. For instance, I have custom 404, 403, and 410 error pages, but Drupal overrides the 403 and 404 pages. You can redirect the error handling to specific pages, but not to static pages, only to pages within the Drupal system. However, I'm not too worried about this issue, as I'm finding that there's typically a Drupal module for any problem, just waiting to be discovered.

I know I must come across as a Drupal fangirl in this writing, but after using the application for over a year, and especially after this site redesign, I have found that no other piece of software matches my needs so well as Drupal. It's not perfect software—there is no such thing as perfect software—but it works for me.

* This process convinced me to switch fully from using Firefox to using Safari. It was so much more simple to fix pages with XHTML errors using Safari than with Firefox's overly aggressive XHTML error handling.

Posted at 19:51

W3C Semantic Web News: Two Polish Translations of OWL Documents

Sylwia Tesarska has published a Polish translation of the OWL Guide, under the title “OWL Język Ontologii Sieciowej Przewodnik”. Also, Dorota Szwarc has published a Polish translation of the OWL Web Ontology Language Reference, under the title “OWL Język Ontologii Sieciowej Referencja”.

Posted at 16:36

W3C Semantic Web News: RDFa Primer Translated to (Simplified) Chinese

程龚 (Gong Chen) has published a Simplified Chinese translation of the RDFa primer, under the title “RDFa入门”.

Posted at 16:30

June 20

Ivan Herman: SemTech2009 impressions (addendum)


I wrote a

Posted at 14:09

June 19

Ivan Herman: SemTech2009 impressions


The first and possibly most important aspect of

Posted at 21:53

Tetherless World Constellation group RPI: I will pay delicious $100 for hierarchical tagging

Just saw Jim’s post on What is the Semantic Web really all about?

I have been wondering about this problem too. What is Semantic Web? Yesterday I have asked a question “Why few (or none?) Web 2.0 sites provide hierarchical tagging?” on LinkedIn and get some pretty good answers:

http://www.linkedin.com/answers?viewQuestion=&questionID=496785&askerID=14212719

For your convenience, I attached my LinkedIn post at the end of this blog.

There are two things in the answers that draw my attention:
* Many do _not_ believe tags, or even hierarchical tags, are semantic; “semantics” means RDF or triples at least to them;
* Some believe that even implementing a hierarchical tagging system is not easy in engineering or social aspects.

I think these two beliefs, among many other reasons, may explain in part why the “Semantic Web” is still far from a reality. The first is about the overestimation of what is “semantics”: triple is one way to express semantics, but it is a question that whether it is _the_ way. The second is about the underestimation of “Web”-scale: realizing a knowledge system, even if is conceptually “simple”,  on the Web can lead to serious scalability problems, both for machine (can you make <1s response for all queries?) and for people (on changing their way of thinking).

Here is what I believe about “semantic web” (note no-capitalization). First, it is not necessarily “the Semantic Web” (just like there is no “the Mobile Web”), as defined by W3C standards or the layered cake model. Semantics is a way of organizing things, RDF and OWL are some ways to express it, but other ways should be encouraged too and sometime work better. Second, tools and services should be “web-ish”, something like a semanticized version of youtube or gmail; after all, “web users” are rarely a bioinformatician or can master a Java-based ontology editor.  Third, start deployment with very very basic semantics like trees (yeah, I know some will protest) and sameAs, but do it in a very very efficient way - if we can’t even come up with a Web-efficient tree reasoner, then how realistic we can come up with a Web-efficient RDF or OWL reasoner?

Now I’m prepared to dodge tomatoes :D

by Jie Bao

===============

My original post on LinkedIn (reorganized a bit)

Why few (or none?) Web 2.0 sites provide hierarchical tagging?

Gmail label and delicious tagging are flat, which is troublesome all the time for me. I have to add (unnecessarily) many tags even if they can be easily inferred. I didn’t find an alternative that allows me to organize my tags in a tree or network. Is there any technical or marketing reason?

People have been talking about semantic web a for a while and are looking for a killer app. It’s apparent that hierarchical tagging is semantic, is in high demand, and is relatively easy to do. Why there is none in popular sites?

PS 1: Let me clarify some situations when hierarchical tagging will save me a lot of time: recently I’m reading a book of Qian Mu, a historian, and tagging my notes on delicious with tags “qianmu“; I also want all those notes be tagged with “history“, but I have to always add both “qianmu” and “history”.

Sometimes I want more than one tags to be inferred. For example, when I add “wuxu” (the year of 1898), I want tags “qing“, “china” and “reform” to be added. You will find how trouble it is to add all 4 tags together when you have about 10 notes on “wuxu”.

In another example, I want to share my tags in both Chinese and English. If I can define two subclass relations between two tags, each in a different language, I will not have to always add the both tags.

Now I have about 1000 tags on delicious. I’m really really in despair need for a hierarchy. I’m willing to pay delicious $100 for such a service.

PS 2: Further clarification: I don’t believe I will need a tagging system that always requires me to pick up terms from a tree, DAG, or a network. I can still freely add tags. But I need some way to clean up my tags from time to time, and organize them. It is just like how i clean up my “download” folder: put them into different folders, and if a folder is too big make some subfolders.

Posted at 20:26

Copyright of the postings is owned by the original blog authors. Contact us.