Today I’ve learned from The Cangelog news blog about Jeremy Keith’s proposal to add a relation between a project/document and its souce code by using
rel="source" somewhere in the HTML markup. Sounds good, but it would only be a unspecific relation between a document or project or whatever and something related to some unspecific type of source code container :) I really miss the improved semantics here. And we already have options to describe it.
I got chatting to Aral about a markup pattern that’s become fairly prevalent since the rise of Github: linking to the source code for a website or project. You know, like when you see “fork me on Github” links. […] We were talking about how it would be nice to have some machine-readable way of explicitly marking up those kind of links, whether they’re in the head of the document, or visible in the body. Sounds like a job for the rel attribute, I thought. […] I’ve proposed rel=”source”.
The “fork me on Github” link has at least 3 different meanings:
A human can see the difference, a machine not. If we add code to markup to make the relation machine-readable, a
rel="source" wouldn’t do the job. A machine really needs a very clear description about “X has a relation to Y”, the machine need to know about X, Y and the relation. A machine needs a object, a predicate and a subject to understand the fact.
The brainstorming page about
rel="source" at the Microformats wiki describe the use case as:
When an author links to a project’s (or document’s) source code (e.g. on GitHub, Google Code, etc.) a rel value of “source” could be used to explicitly define that relationship.
It already describes the problem: is it the source code of the (software) project or is it a link to the source code of the document (content of the webpage)? X (the object) is not defined here, a simple
rel="source" do not imply entailment, there is no logical consequence. Machines would have to guess what the object is.
While there is a discussion about the label on the brainstorming site, a machine only cares about the semantical meaning, “sfuzcfzcsz” could be the name of the property that relate a software project to it’s download archive.
rel="source" don’t say anything about the subject or its type, it only describes that the related resource has to do something with source code.
You may able to use
rel="source" on a meta and anchor element in HTML. Please try to use it on a PNG image — e.g. a computer generated fractal — to describe a relation to the source code of the generating program. Good luck! There is no Microformat vocabulary to wrap it in. Leads to the next problem:
If you have a listing of software projects on one webpage, how can we set the correct relations. Using
rel="source" multiple times on various links? A machine would only see multiple source relations on one HTML document because currently there is no proper vocabulary in MF1 and MF2 that you can use to describe the software project or web document, maybe h-item but according the spec “h-item is almost never used on its own.”
According to a “
rel usecase repository”, Keith writes:
The benefit of having one centralised for this is that you can see if someone else has had the same idea as you. Then you can come to agreement on which value to use, so that everyone’s using the same vocabulary instead of just making stuff up.
Good advice :) There are alternative options developed by the RDF and Microdata communities, developed months or years ago:
The DOAP project “create an XML/RDF vocabulary to describe software projects, and in particular open source projects.” The DOAP vocabulary provides properties like repository and more to describe relations to Repositories, Downloads, Wikis, Issue Lists and a lot of more stuff. You can add DOAP to your HTML via RDFa and Microdata.
The schema.org vocabulary provides concepts like CreativeWork, SoftwareApplication, Code and properties like codeRepository and downloadUrl, also useable in you HTML markup via RDFa and Microdata syntax.
Update (Feb 21): there is already a rel-vcs Microformat that is used and supported by various tools. But t is not listed in the “the official registry for rel values.” Of course it shares most of the problems with rel-source regarding semantics.
We don’t need to live in a religious Microformats-centric world with the Microformats wiki as your bible. :) And
rel="source" would be probably not so awesome because:
Semantic enhancements and structured data on websites helps to improve your SEO. Usually you use Microformats, Microdata and RDFa/Lite for it but you (or your developers) have to learn the syntax markup and vocabularies first. Now, Google added the Data Highlighter to the webmaster tools: a point-and-click tool that helps you to “show Google patterns of structured data on your website without modifying your pages.”
Google loves structured data because they use it to get more information about your content, better understanding of you content could improve your ranking in search engines. Google pushed a lot of initiatives to invite website owners to create more structured data:
All these initiatives depends on webmasters and developers who are able to add special markup to their pages. The Data Highlighter now “improves” that process because you can use a point-and-click tool to describe your actual content, no markup additions are necessary.
This makes things easier but could harm the web ecosystem because if you use the Data Highlighter to describe your content then only Google knows this description. This may help you to rank your site better in Google but it puts you in a situation with strong dependencies on Google. No other search engine provider like Bing, Yahoo or Yandex can use that structured information, no other indepentent tool can see those semantic enhancements. Second, if you change your site structure and markup you need to perform the Data Highlighter tool again on your website, even if it still contains the same data.
Test and use the Data Highlighter (be aware: it currently only works with events! Do not define your shop items as events :) ), use it to learn what Structured Data is and how it can help to improve your business, but please don’t rely only on this tool. Learn how to add structured data and semantic enhancements directly to your website (or just pay someone who knows how to do it), long term this will perform better.
This video is 4 years old, so the content probably does not include information about the latest development, TBL “only” explaines the idea of the Semantic Web.
Somehow it is always nice to watch him speaking, sometimes it is hard to understand and follow him if you are not a native english speaker but at least he is finishing almost all sentences this time :)
How can you see that the Semantic Web is not only a nerdy imagination? Google is using and promoting it. The company now added a Structured Data dashbord to its Webmaster Tools, under “Optimization!”
Google calls semantically annotated data Structured Data, now they added a dashbord for Structured Data to their Webmaster Tools, supporting site administrators to control how annotated pages are perform. Read: Google wants you to annotate your stuff! They need your help to understand your content better. You will find the Structured Data dashboard under the Optimization menu.
The dashboard provides a very good overview how your annotations work and perform, seen through Google’s glasses.
The dashboard shows how the annotations perform, it doesn’t tell you why they perfom that way.
Example: I use the dashboard on a site with only two pages, both they are annotated using schema.org vocalbulary and Microdata syntax. Until July 27th Google had their 2 item types indexed, from July 31st Google misses 1 item type. I didn’t changed anything on the annotations. The dashboard do not show indications about the reason. Does Google just re-index the content, did the algorithm change, is their now a conflict with other Microformats markup on the page. I just can guess, adjust the markup and wait some days to see the result in the dashboard (the rich snippets testing tool parses the item type correctly).
Common information retrieval on text bodies and using vector rooms on text snippets is not enough anymore, because we have to much data out there. We need semantic annotation on that data to support machines understanding the data, using and creating real information from that data. Google, Yahoo and Bing know that, last year they created schema.org, providing a simple vocabulary to annotate your web documents and the things they are about by using Microdata syntax.
Last two weeks were full of great links and resources, here you are:
The last week’s link roundup digest was planned for Sunday but this post was lost in a space-time continuum. Now it’s here:
Now, if you wanna know where skateboarding, innovation, hacking and FLOSS meet then check out Rodney Mullen at TEDxUSC on “How Context Shapes Content”:
I know, it is not really a info graphic but the term ‘research project poster’ is not that 'en vogue' in the web :) I made it last week as a last-minute-job for the LOD2 lifecycle. Sebastian Tramp is currently at the ESWC 2012 in Crete, attending there at the EU Project networking track to represent the LOD2 project.
The lifecycle is taken from the slides “The Semantic Data Web” by Sören Auer from AKSW/Uni Leipzig. Text was also written by Sören. As the explanations on the poster aren’t that big, I add the texts here:
The lifecycle is supported by tools of the the Debian-based LOD2 Stack.
RDF is the lingua franca for data integration on the Web. Other data structures, semi-structured and even unstructured information, however, are and will be always there as well. In LOD2 we develop techniques for mapping and accessing such information efficiently and effectively.
Tools: Triplify, D2R Server, DBpedia Extraction
RDF Data Mangement is still more challenging than relational Data Mangement. We aim to close this performance gap by employing column-store technology, dynamic query optimization, adaptive caching of joins, optimized graph processing, cluster/cloud scalability.
Tools: Openlink Virtuoso
LOD2 facilitates the authoring of rich semantic knowledge bases, by leveraging Semantic Wiki technology, the WYSIWIM paradigm (What You See Is What You Mean) and distributed social, semantic collaboration and networking techniques.
Creating and maintaining links in a (semi-)automated fashion is still a major challenge and crucial for establishing coherence and facilitating data integration. We aim at linking approaches yielding high precision and recall, which configure themselves automatically or based on end-user feedback.
Tools: Silk, LIMES, SemFM
Linked Data on the Web is mainly raw instance data. For data integration, fusion, search and many other applications, however, we need this raw instance data to be linked and integrated with upper level ontologies.
The quality on the Data Web is varying as the quality on the document web varies. LOD2 develops techniques, which help to assess the quality based on characteristics such as provenance, context, coverage or structure.
Tools: WIQA, LODStats, LDIF Data Integration
Data on the Web is dynamic. We need to facilitate the evolution of data while keeping things stable. Changes and modifications to knowledge bases, vocabularies and ontologies should be transparent and observable. LOD2 also develops methods to spot problems in knowledge bases and to automatically suggest repair strategies.
Tools: ORE, OntoWiki EvoPat
For many users Data Web is still invisible below the surface. LOD2 develops search, browsing, exploration and visualization techniques for different kinds of Linked Data (i.e. spatial, temporal, statistic), which make the Data Web sensible for real users.
Tools: CubeViz, Sig.ma EE, Spatial Semantic Browser
LOD2 is a large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Programme. Started 2010 it is planned for 4 years, comprises leading Linked Open Data technology researchers, companies, and service providers from across 11 European countries and one associated partner from Korea, it is coordinated by the AKSW research group at the University of Leipzig.
Spoken in links, it was a poor week. I’ve collected some links but it feels like only 2 of them deserve to be on my weekly top notch link digest list.
In a coupled (monolithic) Content Management System (CMS) one software system (e.g. Drupal) is technically managing everything: content creation and editing, data storage (backend), data synchronizations and imports/exports, delivering content to the user/reader. In a Decoupled CMS, different tasks could be done by different and independent systems: while one system supports authors to create the content, another system delivers it to the users, the data storage is done by a third system and also decoupled from the system that is reponsible for import/export of additional data. You probably have heard of a simple Decoupled CMS variant: static site generators. If you wanna dig deeper: Deane Barker wrote his “Decoupled Content Management 101”, Henri Bergius authored a WWW2012 paper called “Decoupling Content Management” (PDF, appr. 600kb) and he provides a short overview about Decoupled CMS in his blog. Some projects to create your Decoupled CMS tech stack are:
This really made my week, and it is a very good anecdote, that the
<blink> markup was really just a crackpot idea :)
The bar was the St. James Infirmary and it had a 30 foot wonder woman statue inside among other interesting things. At some point in the evening I mentioned that it was sad that Lynx was not going to be able to display many of the HTML extensions that we were proposing, I also pointed out that the only text style that Lynx could exploit given its environment was blinking text. We had a pretty good laugh at the thought of blinking text, and talked about blinking this and that and how absurd the whole thing would be. […] Saturday morning rolled around and I headed into the office only to find what else but, blinking text. It was on the screen blinking in all its glory, and in the browser. How could this be, you might ask? (Louis J. Montulli II, “The Origins of the Tag”