With my company having a large degree of knowledge workers it is a fairly common event in the company to hold internal conference days with in-house specialists giving talks on their various topics of expertise. The previous one was held in the middle of December, and several of the talks of the day was about using tags for organizing information. While this probably isn't a particularly cutting edge topic any more, one of the talks by Filip Van Laenen stood out in being about how one should leave hierarchical code repositories behind, and instead use various forms of tagging to organize files with source-code in a so called tagarchy. While this is both a novel and quite interesting topic in itself, what really caught my attention was a mention of how using a combination of distinct 'hard' and 'soft' tags can be used to good effect in logically organizing files of program code. The example was that a set of 'hard' tags would describe generally unchanging technical aspects of the code in the file, like for instance pattern-types used or services provided. Then a separate set of 'soft' tags would be more about code usage, like for example if it is needed by or contains login functionality or whether it supports one or more particular areas of the business logic.
The presentation rapidly convinced me of the potential usefulness of having a distinction between 'hard' and 'soft' tags for semi-structured data like program code, but I sensed that the concept could be put to even better uses elsewhere. A rather obvious application for this would be to improve the currently popular approach of single level folksonomy or social tagging, like that which is used on YouTube, Flickr and Del.icio.us amongst others. By separating the tags used to describe items on such services into multiple logical groups, one will immediately get an extra level of semantics for searching or filtering the otherwise unstructured data. This should make the tagging systems of such services a lot more powerful and useful than they currently are, especially in providing better findability for items and more descriptive search-results on the service.
It is however apparent that a clear limitation to the potential of tag-typing hinge on which selection strategies are used to decided on which logical tag-groups to include. A first impulse could be to continue with the successful crowdsourcing used in the original folksonomy tagging, and simply let the users themselves assign the tag-groups. While tempting, I believe that this would not alleviate the current trend of non-semantic tags and neither provide any particular advantages, so in this case going towards the other extreme of semantic taxonomies appears to be more suitable. But while semantic taxonomies are generally considered very advantageous over folksonomy tagging, a major downside is that they are often overly complex and thus can be very demanding to work with, especially for amateurs. To alleviate this I instead propose using a professionally selected, limited set of tag-types, and combine these with folksonomy tagging within each type. This way one can get the best of both worlds by obtaining a modicum of semantic meaning from the tag-types, while at the same time providing the freedom of independent crowd-sourced tagging as we already know it.
On which tag-types to expect I would suggest that images for instance should have separate tag-types to describe its actual contents, its context, any persons depicted and perhaps its intended usage and any special techniques used to create it. With the addition of such tag-types the accuracy of an advanced search on Flickr or iStockPhoto would most certainly improve greatly.
The big open question then is if this is an actual feasible technique, or if there are a bunch of reasons for why this wouldn't work as I have proposed here. Please enlighten me if you have any thoughts or experiences about this, as I feel that a system such as this could be a suitable next step towards a more semantic web.