rel=”source” might be not so awesome! We already have alternative options.

Today I’ve learned from The Cangelog news blog about Jeremy Keith’s proposal to add a relation between a project/document and its souce code by using rel="source" somewhere in the HTML markup. Sounds good, but it would only be a unspecific relation between a document or project or whatever and something related to some unspecific type of source code container :) I really miss the improved semantics here. And we already have options to describe it.

Keith wrote:

I got chatting to Aral about a markup pattern that’s become fairly prevalent since the rise of Github: linking to the source code for a website or project. You know, like when you see “fork me on Github” links. […] We were talking about how it would be nice to have some machine-readable way of explicitly marking up those kind of links, whether they’re in the head of the document, or visible in the body. Sounds like a job for the rel attribute, I thought. […] I’ve proposed rel=”source”.

The “fork me on Github” link has at least 3 different meanings:

  1. the website of a software project contains a relation to the public source code repository of the software
  2. the link points to a repository where the content of the webseite is managed, this link is the same on all pages of the website
  3. the link points to the related source of the currently shown page deep in the repository

A human can see the difference, a machine not. If we add code to markup to make the relation machine-readable, a rel="source" wouldn’t do the job. A machine really needs a very clear description about “X has a relation to Y”, the machine need to know about X, Y and the relation. A machine needs a object, a predicate and a subject to understand the fact.

Lack of the object

The brainstorming page about rel="source" at the Microformats wiki describe the use case as:

When an author links to a project’s (or document’s) source code (e.g. on GitHub, Google Code, etc.) a rel value of “source” could be used to explicitly define that relationship.

It already describes the problem: is it the source code of the (software) project or is it a link to the source code of the document (content of the webpage)? X (the object) is not defined here, a simple rel="source" do not imply entailment, there is no logical consequence. Machines would have to guess what the object is.

Lack of the subject and its type

While there is a discussion about the label on the brainstorming site, a machine only cares about the semantical meaning, “sfuzcfzcsz” could be the name of the property that relate a software project to it’s download archive. rel="source" don’t say anything about the subject or its type, it only describes that the related resource has to do something with source code.

  • Is it the source code in a zip/tar/xyz archive?
  • Is it a file containing uncompressed source code?
  • Is it a repository address you can checkout or clone from? Is it Git, Mercurial, SVN, …?
  • Is it a relation to the homepage of a code repository (e.g. most Github links)?
  • Is the link about a special version, branch, tag of the source code?

Where to put the relation in?

You may able to use rel="source" on a meta and anchor element in HTML. Please try to use it on a PNG image — e.g. a computer generated fractal — to describe a relation to the source code of the generating program. Good luck! There is no Microformat vocabulary to wrap it in. Leads to the next problem:

What about listings?

If you have a listing of software projects on one webpage, how can we set the correct relations. Using rel="source" multiple times on various links? A machine would only see multiple source relations on one HTML document because currently there is no proper vocabulary in MF1 and MF2 that you can use to describe the software project or web document, maybe h-item but according the spec “h-item is almost never used on its own.”

Already existing and working alternatives

According to a “rel usecase repository”, Keith writes:

The benefit of having one centralised for this is that you can see if someone else has had the same idea as you. Then you can come to agreement on which value to use, so that everyone’s using the same vocabulary instead of just making stuff up.

Good advice :) There are alternative options developed by the RDF and Microdata communities, developed months or years ago:

  • The DOAP project “create an XML/RDF vocabulary to describe software projects, and in particular open source projects.” The DOAP vocabulary provides properties like repository and more to describe relations to Repositories, Downloads, Wikis, Issue Lists and a lot of more stuff. You can add DOAP to your HTML via RDFa and Microdata.

  • The schema.org vocabulary provides concepts like CreativeWork, SoftwareApplication, Code and properties like codeRepository and downloadUrl, also useable in you HTML markup via RDFa and Microdata syntax.

  • Update (Feb 21): there is already a rel-vcs Microformat that is used and supported by various tools. But t is not listed in the “the official registry for rel values.” Of course it shares most of the problems with rel-source regarding semantics.

We don’t need to live in a religious Microformats-centric world with the Microformats wiki as your bible. :) And rel="source" would be probably not so awesome because:

  • it does not add enough semantics that machines can understand the correct fact
  • there are already alternatives that work

SEO improvements with Google Data Highlighter?

Semantic enhancements and structured data on websites helps to improve your SEO. Usually you use Microformats, Microdata and RDFa/Lite for it but you (or your developers) have to learn the syntax markup and vocabularies first. Now, Google added the Data Highlighter to the webmaster tools: a point-and-click tool that helps you to “show Google patterns of structured data on your website without modifying your pages.”

Google loves structured data because they use it to get more information about your content, better understanding of you content could improve your ranking in search engines. Google pushed a lot of initiatives to invite website owners to create more structured data:

All these initiatives depends on webmasters and developers who are able to add special markup to their pages. The Data Highlighter now “improves” that process because you can use a point-and-click tool to describe your actual content, no markup additions are necessary.

This makes things easier but could harm the web ecosystem because if you use the Data Highlighter to describe your content then only Google knows this description. This may help you to rank your site better in Google but it puts you in a situation with strong dependencies on Google. No other search engine provider like Bing, Yahoo or Yandex can use that structured information, no other indepentent tool can see those semantic enhancements. Second, if you change your site structure and markup you need to perform the Data Highlighter tool again on your website, even if it still contains the same data.

Test and use the Data Highlighter (be aware: it currently only works with events! Do not define your shop items as events :) ), use it to learn what Structured Data is and how it can help to improve your business, but please don’t rely only on this tool. Learn how to add structured data and semantic enhancements directly to your website (or just pay someone who knows how to do it), long term this will perform better.

(Source: googlewebmastercentral.blogspot.de)

The Semantic Web of Data Tim Berners-Lee

This video is 4 years old, so the content probably does not include information about the latest development, TBL “only” explaines the idea of the Semantic Web.

Somehow it is always nice to watch him speaking, sometimes it is hard to understand and follow him if you are not a native english speaker but at least he is finishing almost all sentences this time :)

Google added Structured Data Dashboard to Webmaster Tools

How can you see that the Semantic Web is not only a nerdy imagination? Google is using and promoting it. The company now added a Structured Data dashbord to its Webmaster Tools, under “Optimization!”

What is it?

Google calls semantically annotated data Structured Data, now they added a dashbord for Structured Data to their Webmaster Tools, supporting site administrators to control how annotated pages are perform. Read: Google wants you to annotate your stuff! They need your help to understand your content better. You will find the Structured Data dashboard under the Optimization menu.

Screenshot: Structured Data dashboard, site-level view, image by Webmaster Central Blog

Dashboard Features:

  • Site-level view: shows root item type and vocabulary schema of the annotated data from your site indexed by Google.
  • Itemtype-level view: showing pages and special attributes for indexed item types.
  • Page-level view: details page showing all attributes of every item type on the given page, it contains a link to test the given page using the Rich Snippet testing tool.

The dashboard provides a very good overview how your annotations work and perform, seen through Google’s glasses.

Limitations

The dashboard shows how the annotations perform, it doesn’t tell you why they perfom that way.

Example: I use the dashboard on a site with only two pages, both they are annotated using schema.org vocalbulary and Microdata syntax. Until July 27th Google had their 2 item types indexed, from July 31st Google misses 1 item type. I didn’t changed anything on the annotations. The dashboard do not show indications about the reason. Does Google just re-index the content, did the algorithm change, is their now a conflict with other Microformats markup on the page. I just can guess, adjust the markup and wait some days to see the result in the dashboard (the rich snippets testing tool parses the item type correctly).

Why they add the dashboard?

Common information retrieval on text bodies and using vector rooms on text snippets is not enough anymore, because we have to much data out there. We need semantic annotation on that data to support machines understanding the data, using and creating real information from that data. Google, Yahoo and Bing know that, last year they created schema.org, providing a simple vocabulary to annotate your web documents and the things they are about by using Microdata syntax.

Top 15 from the 28th+29th week: Typography, Web Development, Tools, Semantic Web & Data

Last two weeks were full of great links and resources, here you are:

Typography

  • Linux Libertine is a community driven Open Fonts project, delivering alternative font families and styles to fonts like Times New Roman, Linux Libertine and its tools are released under GPL/OFL licenses. The fonts cover the codepages of Western Latin, Greek, Cyrillic, Hebrew, IPA and many more. Furthermore, typographical features such as ligatures, small capitals, different number styles, scientific symbols, etc. are implemented in this font. Linux Libertine thus contains more than 2000 characters.

Semantic Web & Data

  • The ESWC discusses the latest scientific results and technology innovations around semantic technologies, some of the ESWC 2012 video lectures and keynotes are available, around the topics Social Web, Linked Data, Machine Learning, Semantic Web in Use, Natural Language Processing and Information Retrieval.
  • The Data Hub is a community-run catalogue of useful sets of data on the Internet. You can collect links here to data from around the web for yourself and others to use, or search for data that others have collected. Depending on the type of data (and its conditions of use), the Data Hub may also be able to store a copy of the data or host it in a database, and provide some basic visualisation tools.

Social Web

Web Development

  • Jam is a package manager for JavaScript. It manages library dependencies and it supports automatically optimized custom builds of popular libraries.
  • Yeoman is a robust and opinionated client-side stack, comprised of tools and frameworks to support you to create web applications. It helps in scaffolding, compiling CoffeeScript & Compass, linting your scripts, image optimization, build processing, JS package management and unit testing. It isn’t released currently but you can enter your email to get notified if it gets available. Paul Irish spoke about it on Google’s IO 2012 conference.
  • On YouTube you can find all videos of the talks done at the EuroPython 2012 conference in Italy.
  • Motorola published Montage, a framework for building modern HTML5 web apps by providing modular components, real-time two-way data binding, CommonJS dependency management, and more.

Other Tools

  • If you need to convert files from one markup format into another, pandoc is your swiss-army knife. It can convert documents in markdown, reStructuredText, textile, HTML, DocBook, or LaTeX to HTML formats, Word processor formats (docx, odt), Epub, DocBook, TeX formats, PDF and Lightweight markup formats.
  • The goal of pdf.js is to create a general-purpose, web standards-based platform for parsing and rendering Portable Document Format (PDF) without native code assistance. It is community-driven and supported by Mozilla Labs.
  • Some of you read about Svbtle. There is a unofficial Wordpress theme to create the Svbtle feeling on administration/writing and viewing your blog. The theme misses the responsiveness of the original Svbtle layout but as the theme is open source, you can add it and push it back to the project owner.
  • Adobe published Brackets, an open source code editor for the web, written in JavaScript, HTML and CSS.

(Source: delicious.com)

JSON-LD: RDF serialization for Javascript

How to work with Linked Data and RDF data in Javascript? Use JSON, the well known object notation. The W3C JSON for Linking Data Community Group published a bulk of new JSON-LD drafts and calls for final group specification commitments.

  • JSON-LD Syntax 1.0: JSON-LD is designed as a lightweight syntax that can be used to express Linked Data. It is primarily intended to be a way to use Linked Data in Javascript and other Web-based programming environments.The syntax does not necessarily require applications to change their JSON, but allows one to easily add meaning by simply adding or referencing a context. (Call for Final Specification Commitments)
  • JSON-LD API 1.0: JSON-LD may be used to express Linked Data in JSON. Often, it is useful to be able to transform JSON-LD documents so that they may be easily processed in a programming environment like JavaScript, Python or Ruby. Compaction, expansion and RDF conversion are discussed in this document. (http://www.w3.org/community/json-ld/2012/06/27/call-for-final-specification-commitments-for-json-ld-api-1-0/)
  • JSON-LD Framing API 1.0: a JSON-LD document is a representation of a directed graph. A Frame can be used by a developer on a JSON-LD document to specify a deterministic layout for a graph. This document is a detailed specification for a serialization of Linked Data in JSON.
  • RDF Graph Normalization: this document outlines an algorithm for generating a normalized RDF graph given an input RDF graph. Beside software developers the documents is primarily intended for masochists :)

(Source: twitter.com)

Top 7 from the 23rd week: RDFa, Microdata, PDF obfuscation & skateboarding

The last week’s link roundup digest was planned for Sunday but this post was lost in a space-time continuum. Now it’s here:

Semantic Web

  • RDF 1.1 Concepts and Abstract Syntax was published as W3C working draft, defining the RDF data model, introduces new datatypes for HTML fragments and language-tagged strings, and re-worked the XML datatype.
  • 1 year ago schema.org was launched, Dan Brickley wrote a nice roundup “SemTech, RDFa, Microdata and more” what happened since then, and he is giving a outlook about future developments like schema.org 1.0.
  • RDFa.info added an RDFa playground editor, a helpful tool if you wanna test your RDFa markup, or to learn RDFa. Especially the graphical view could help a lot.

Developers zone

  • in “OMG-WTF-PDF” Julia Wolf talked about PDF obfuscation and critical backdoors, it’s from 2010 but still interesting and importing for your security.
  • gmaps.js allows you to use the potential of Google Maps in a simple way. No more extensive documentation or large amount of code.
  • Anchor CMS is “built for art-directed posts.” Basically it is a very simple blog system, handling posts and pages, using a Wordpress-like API but without all the ballast. Funny that it is licensed under WTFPL.

Now, if you wanna know where skateboarding, innovation, hacking and FLOSS meet then check out Rodney Mullen at TEDxUSC on “How Context Shapes Content”:

(Source: delicious.com)

Infographic: Linked Open Data Lifecycle

Linked Open Data Lifecycle

I know, it is not really a info graphic but the term ‘research project poster’ is not that 'en vogue' in the web :) I made it last week as a last-minute-job for the LOD2 lifecycle. Sebastian Tramp is currently at the ESWC 2012 in Crete, attending there at the EU Project networking track to represent the LOD2 project.

The lifecycle is taken from the slides “The Semantic Data Web” by Sören Auer from AKSW/Uni Leipzig. Text was also written by Sören. As the explanations on the poster aren’t that big, I add the texts here:

Linked Open Data Lifecycle

The lifecycle is supported by tools of the the Debian-based LOD2 Stack.

Extraction

RDF is the lingua franca for data integration on the Web. Other data structures, semi-structured and even unstructured information, however, are and will be always there as well. In LOD2 we develop techniques for mapping and accessing such information efficiently and effectively.

Tools: Triplify, D2R Server, DBpedia Extraction

Storage

RDF Data Mangement is still more challenging than relational Data Mangement. We aim to close this performance gap by employing column-store technology, dynamic query optimization, adaptive caching of joins, optimized graph processing, cluster/cloud scalability.

Tools: Openlink Virtuoso

Authoring

LOD2 facilitates the authoring of rich semantic knowledge bases, by leveraging Semantic Wiki technology, the WYSIWIM paradigm (What You See Is What You Mean) and distributed social, semantic collaboration and networking techniques.

Tools: OntoWiki, RDFaCE Text Annotation, Poolparty Taxonomy Editor

Interlinking

Creating and maintaining links in a (semi-)automated fashion is still a major challenge and crucial for establishing coherence and facilitating data integration. We aim at linking approaches yielding high precision and recall, which configure themselves automatically or based on end-user feedback.

Tools: Silk, LIMES, SemFM

Enrichment

Linked Data on the Web is mainly raw instance data. For data integration, fusion, search and many other applications, however, we need this raw instance data to be linked and integrated with upper level ontologies.

Tools: DL-Learner

Quality

The quality on the Data Web is varying as the quality on the document web varies. LOD2 develops techniques, which help to assess the quality based on characteristics such as provenance, context, coverage or structure.

Tools: WIQA, LODStats, LDIF Data Integration

Evolution

Data on the Web is dynamic. We need to facilitate the evolution of data while keeping things stable. Changes and modifications to knowledge bases, vocabularies and ontologies should be transparent and observable. LOD2 also develops methods to spot problems in knowledge bases and to automatically suggest repair strategies.

Tools: ORE, OntoWiki EvoPat

Exploration

For many users Data Web is still invisible below the surface. LOD2 develops search, browsing, exploration and visualization techniques for different kinds of Linked Data (i.e. spatial, temporal, statistic), which make the Data Web sensible for real users.

Tools: CubeViz, Sig.ma EE, Spatial Semantic Browser

LOD2 Project

LOD2 is a large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Programme. Started 2010 it is planned for 4 years, comprises leading Linked Open Data technology researchers, companies, and service providers from across 11 European countries and one associated partner from Korea, it is coordinated by the AKSW research group at the University of Leipzig.

Top 2 from the 19th week: Debian administration & RDFa

Spoken in links, it was a poor week. I’ve collected some links but it feels like only 2 of them deserve to be on my weekly top notch link digest list.

  • The Debian Administrator’s Handbook is written by Debian developers Raphaël Hertzog and Roland Mas, originally it started as a translation of their french book “Cahier de l’admin Debian” (Eyrolles). Traditional editors did not want to take the risk to make this translation, so they did it theirselves backed by a successful crowdfunding campaign. Now “The Debian Administrator’s Handbook” is finished, accessible online for free, and as paperpack and ebook through Lulu.
  • RDFa.info is a new starting point if you want to enrich your website content easily with semantic annotations about the things on your website. RDFa is an extension to HTML5 and HTML-like languages (e.g. XML, HTML4, SVG) that helps you markup things like People, Places, Events, Recipes and Reviews. Search Engines and Web Services use this markup to generate better search listings and give you better visibility on the Web, so that people can find your website more easily.

(Source: delicious.com)

6 from the 16th week: Decoupled CMS & The <blink> History

Decoupled CMS

In a coupled (monolithic) Content Management System (CMS) one software system (e.g. Drupal) is technically managing everything: content creation and editing, data storage (backend), data synchronizations and imports/exports, delivering content to the user/reader. In a Decoupled CMS, different tasks could be done by different and independent systems: while one system supports authors to create the content, another system delivers it to the users, the data storage is done by a third system and also decoupled from the system that is reponsible for import/export of additional data. You probably have heard of a simple Decoupled CMS variant: static site generators. If you wanna dig deeper: Deane Barker wrote his “Decoupled Content Management 101”, Henri Bergius authored a WWW2012 paper called “Decoupling Content Management” (PDF, appr. 600kb) and he provides a short overview about Decoupled CMS in his blog. Some projects to create your Decoupled CMS tech stack are:

  • Hallo.js: Hallo is the simplest web editor imaginable. Instead of cluttered forms or toolbars, you edit your web content as it is. Just you, your web design, and your content.
  • Create.js: a comprehensive web editing interface for Content Management Systems. It is designed to provide a modern, fully browser-based HTML5 environment for managing content. All content that you are allowed to change (just by annotating it via RDFa) becomes editable, right there on the page you’re reading. Any modifications you make are retained in your browser and can be sent back to the CMS with a push of a button. Create can be adapted to work on almost any content management backend.
  • Content Repository for PHP: PHPCR combines the best out of document-oriented databases (weak structured data) and of XML databases (hierarchical trees). On top of that, it adds useful features like searching, versioning, access control and locking on top of it. The API defines how to handle hierarchical semi-structured data in a consistent way. It is an adaption of the Java Content Repository (JCR) standard, an open API specification defined in JSR-283.

The sweet extra for this week: The history of the <blink> markup tag

This really made my week, and it is a very good anecdote, that the <blink> markup was really just a crackpot idea :)

The bar was the St. James Infirmary and it had a 30 foot wonder woman statue inside among other interesting things. At some point in the evening I mentioned that it was sad that Lynx was not going to be able to display many of the HTML extensions that we were proposing, I also pointed out that the only text style that Lynx could exploit given its environment was blinking text. We had a pretty good laugh at the thought of blinking text, and talked about blinking this and that and how absurd the whole thing would be. […] Saturday morning rolled around and I headed into the office only to find what else but, blinking text. It was on the screen blinking in all its glory, and in the browser. How could this be, you might ask? (Louis J. Montulli II, “The Origins of the <Blink> Tag”

(Source: delicious.com)