Semantic enhancements and structured data on websites helps to improve your SEO. Usually you use Microformats, Microdata and RDFa/Lite for it but you (or your developers) have to learn the syntax markup and vocabularies first. Now, Google added the Data Highlighter to the webmaster tools: a point-and-click tool that helps you to “show Google patterns of structured data on your website without modifying your pages.”
Google loves structured data because they use it to get more information about your content, better understanding of you content could improve your ranking in search engines. Google pushed a lot of initiatives to invite website owners to create more structured data:
All these initiatives depends on webmasters and developers who are able to add special markup to their pages. The Data Highlighter now “improves” that process because you can use a point-and-click tool to describe your actual content, no markup additions are necessary.
This makes things easier but could harm the web ecosystem because if you use the Data Highlighter to describe your content then only Google knows this description. This may help you to rank your site better in Google but it puts you in a situation with strong dependencies on Google. No other search engine provider like Bing, Yahoo or Yandex can use that structured information, no other indepentent tool can see those semantic enhancements. Second, if you change your site structure and markup you need to perform the Data Highlighter tool again on your website, even if it still contains the same data.
Test and use the Data Highlighter (be aware: it currently only works with events! Do not define your shop items as events :) ), use it to learn what Structured Data is and how it can help to improve your business, but please don’t rely only on this tool. Learn how to add structured data and semantic enhancements directly to your website (or just pay someone who knows how to do it), long term this will perform better.
This video is 4 years old, so the content probably does not include information about the latest development, TBL “only” explaines the idea of the Semantic Web.
Somehow it is always nice to watch him speaking, sometimes it is hard to understand and follow him if you are not a native english speaker but at least he is finishing almost all sentences this time :)
How can you see that the Semantic Web is not only a nerdy imagination? Google is using and promoting it. The company now added a Structured Data dashbord to its Webmaster Tools, under “Optimization!”
Google calls semantically annotated data Structured Data, now they added a dashbord for Structured Data to their Webmaster Tools, supporting site administrators to control how annotated pages are perform. Read: Google wants you to annotate your stuff! They need your help to understand your content better. You will find the Structured Data dashboard under the Optimization menu.
The dashboard provides a very good overview how your annotations work and perform, seen through Google’s glasses.
The dashboard shows how the annotations perform, it doesn’t tell you why they perfom that way.
Example: I use the dashboard on a site with only two pages, both they are annotated using schema.org vocalbulary and Microdata syntax. Until July 27th Google had their 2 item types indexed, from July 31st Google misses 1 item type. I didn’t changed anything on the annotations. The dashboard do not show indications about the reason. Does Google just re-index the content, did the algorithm change, is their now a conflict with other Microformats markup on the page. I just can guess, adjust the markup and wait some days to see the result in the dashboard (the rich snippets testing tool parses the item type correctly).
Common information retrieval on text bodies and using vector rooms on text snippets is not enough anymore, because we have to much data out there. We need semantic annotation on that data to support machines understanding the data, using and creating real information from that data. Google, Yahoo and Bing know that, last year they created schema.org, providing a simple vocabulary to annotate your web documents and the things they are about by using Microdata syntax.
Last two weeks were full of great links and resources, here you are:
The last week’s link roundup digest was planned for Sunday but this post was lost in a space-time continuum. Now it’s here:
Now, if you wanna know where skateboarding, innovation, hacking and FLOSS meet then check out Rodney Mullen at TEDxUSC on “How Context Shapes Content”:
I know, it is not really a info graphic but the term ‘research project poster’ is not that ‘en vogue’ in the web :) I made it last week as a last-minute-job for the LOD2 lifecycle. Sebastian Tramp is currently at the ESWC 2012 in Crete, attending there at the EU Project networking track to represent the LOD2 project.
The lifecycle is taken from the slides “The Semantic Data Web” by Sören Auer from AKSW/Uni Leipzig. Text was also written by Sören. As the explanations on the poster aren’t that big, I add the texts here:
The lifecycle is supported by tools of the the Debian-based LOD2 Stack.
RDF is the lingua franca for data integration on the Web. Other data structures, semi-structured and even unstructured information, however, are and will be always there as well. In LOD2 we develop techniques for mapping and accessing such information efficiently and effectively.
Tools: Triplify, D2R Server, DBpedia Extraction
RDF Data Mangement is still more challenging than relational Data Mangement. We aim to close this performance gap by employing column-store technology, dynamic query optimization, adaptive caching of joins, optimized graph processing, cluster/cloud scalability.
Tools: Openlink Virtuoso
LOD2 facilitates the authoring of rich semantic knowledge bases, by leveraging Semantic Wiki technology, the WYSIWIM paradigm (What You See Is What You Mean) and distributed social, semantic collaboration and networking techniques.
Creating and maintaining links in a (semi-)automated fashion is still a major challenge and crucial for establishing coherence and facilitating data integration. We aim at linking approaches yielding high precision and recall, which configure themselves automatically or based on end-user feedback.
Tools: Silk, LIMES, SemFM
Linked Data on the Web is mainly raw instance data. For data integration, fusion, search and many other applications, however, we need this raw instance data to be linked and integrated with upper level ontologies.
The quality on the Data Web is varying as the quality on the document web varies. LOD2 develops techniques, which help to assess the quality based on characteristics such as provenance, context, coverage or structure.
Tools: WIQA, LODStats, LDIF Data Integration
Data on the Web is dynamic. We need to facilitate the evolution of data while keeping things stable. Changes and modifications to knowledge bases, vocabularies and ontologies should be transparent and observable. LOD2 also develops methods to spot problems in knowledge bases and to automatically suggest repair strategies.
Tools: ORE, OntoWiki EvoPat
For many users Data Web is still invisible below the surface. LOD2 develops search, browsing, exploration and visualization techniques for different kinds of Linked Data (i.e. spatial, temporal, statistic), which make the Data Web sensible for real users.
Tools: CubeViz, Sig.ma EE, Spatial Semantic Browser
LOD2 is a large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Programme. Started 2010 it is planned for 4 years, comprises leading Linked Open Data technology researchers, companies, and service providers from across 11 European countries and one associated partner from Korea, it is coordinated by the AKSW research group at the University of Leipzig.
Spoken in links, it was a poor week. I’ve collected some links but it feels like only 2 of them deserve to be on my weekly top notch link digest list.
In a coupled (monolithic) Content Management System (CMS) one software system (e.g. Drupal) is technically managing everything: content creation and editing, data storage (backend), data synchronizations and imports/exports, delivering content to the user/reader. In a Decoupled CMS, different tasks could be done by different and independent systems: while one system supports authors to create the content, another system delivers it to the users, the data storage is done by a third system and also decoupled from the system that is reponsible for import/export of additional data. You probably have heard of a simple Decoupled CMS variant: static site generators. If you wanna dig deeper: Deane Barker wrote his “Decoupled Content Management 101”, Henri Bergius authored a WWW2012 paper called “Decoupling Content Management” (PDF, appr. 600kb) and he provides a short overview about Decoupled CMS in his blog. Some projects to create your Decoupled CMS tech stack are:
This really made my week, and it is a very good anecdote, that the
<blink> markup was really just a crackpot idea :)
The bar was the St. James Infirmary and it had a 30 foot wonder woman statue inside among other interesting things. At some point in the evening I mentioned that it was sad that Lynx was not going to be able to display many of the HTML extensions that we were proposing, I also pointed out that the only text style that Lynx could exploit given its environment was blinking text. We had a pretty good laugh at the thought of blinking text, and talked about blinking this and that and how absurd the whole thing would be. […] Saturday morning rolled around and I headed into the office only to find what else but, blinking text. It was on the screen blinking in all its glory, and in the browser. How could this be, you might ask? (Louis J. Montulli II, “The Origins of the Tag”
I’ve joined Geekli.st last week, just for fun. “Geekli.st is an achievement-based social portfolio builder where all bad-ass code monkeys around the globe can communicate, brag, build their street cred and get found. It was time for a united front, exclusively for developers, to build tangible credibility in the workplace.” I have an invite link to the beta program for you, just in case. Join (and maybe follow me) at http://geekli.st/haschek