Folksonomies

In General on 3/6/2005 at 11:10 pm

Ok, granted, tags are great when we’re describing audio, images and more but I can tell you what the problem with folksonomies is when it comes to text. We’ve already folksonomied it by writing about it. Is that understood? Good. Now can we move on?

  1. Hold on! Are you using categories for your blog posts? Folksonomy is just letting users make their own categories for your posts as they store them. If you are putting your bookmarks in folders, you are engaging in the exact same practice (perhaps just not sharing).

    I want full disclosure. Give us the link to your jots bookmarks. I hope I don’t see any tags :).

  2. Actually I don’t use categories here :o) and am tag free here: http://jots.com/users/jamesfarmer

    Get your point though, I need to see beyond tags!

  3. Well, on another note, if you are not using categories how are you getting your posts to route into your major rubrics at the top? Maybe those are separate blogs, and you are using multiblogging to get them to appear on your home page.

    As I incorporate aspects of your design into my site, one of my main activities has been to come up with categories. I’m then going to create my nav around those (planned to happen this weekend).

    The advantage of the categories approach is that it is easier to do post-hoc. I don’t really have to manually reorganize beyond labeling and setting up a few templates.

    The separate blog approach to me is just a more hardcoded form of category.

  4. Whoops, wrong blog… you;re quite right that on Blogsavvy I do use cartagories (guilty as chartged :o) but on this one, incorporated subversion, I just dump it all in a big pile.

    Definitely try to do categories!

    What I particularly like about them with Blogsavvy is that the majority of the posts don’t go into cats, but the ones which I think people will like as browsers do.

    Still no tags though ;D

  5. I agree with you. The content is the metadata. With even a half-brained extractor, you can extract dynamic categorization from printed text (video, audio require more work). There’s no need for tags. I’ve been demonstrating this for years at Edu_RSS Topics - do you think anyone tags all these entries? Nope - just some intelligently written regex and the work is done.

  6. “I agree with you. The content is the metadata.”

    Metadata is data about data. What you are suggesting is that you can (perhaps really “should be able to”) derive all meaning from the visible part of the post itself. Well, I agree with that. Make the tags visible. Don’t tag if you think it would be purely superfluous.

    If I were to follow your lines of reasoning to their logically absurd conclusion, I would not write titles or abastracts either. At least to me, titles and abstracts seem like visible metadata.

  7. While I wouldn’t call Stephen’s logic absurd… I think it’s more of a comment than an argument to be tracked through to completion… I would say that personally this has been a bit enlightening thinking of folksonomies as derivatives of category, title etc.

    I guess one of the main things that strikes me about this, and Ulises DTD stuff too is that we seem to be constantly trying to metadata-ise very small that, by their contextual and content-based essence have already ‘folksonomied’ themselves.

    We did this with learning objects too.

    But almost inevitably we seem to end up at the same conclusions… that only large chunks (categories, titles etc.) count.

    Interesting ideas.

  8. If I were to follow your lines of reasoning to their logically absurd conclusion, I would not write titles or abastracts either. At least to me, titles and abstracts seem like visible metadata.

    Well, let’s tackle this head on. Typically I do not write abstracts for my papers (much to the displeasure of journals, who really insist on it).

    This is from my old habit of writing in the classical ‘pyramid style’ in which I assert what I am trying to claim very early in my article. So, typically, capturing the first 600 characters of something I write is sufficient to create an abstract.

    In Edu_RSS, I employ that methodology. Edu_RSS does not capture entire blog posts - it creates the description out of the first 600 characters. This turns out to be a remarkably effective means of generating an abstract.

    The title, of course, can be the first sentence (though if I wanted to be technical about it, the subject of the first sentence (ie., verb removed). Most blog posts come with titles. But some don’t, in which case I can just capture the first 6 words or so.

    Crude, but effective.

    There are more sophisticated techniques. Software called Copernic, for example, scans the entire content of an article and creates a summary. Another technique involves the scanning of an article for ’statistically improbable combinations of words’ to identify keywords and categories (and would also make for a good title). And as I mentioned, the use of regex works very well for me.

    It’s one thing to call an article absurd. It’s quite another to call it absurd when there are working examples of the point the author is trying to maked.

  9. “It’s one thing to call an article absurd. It’s quite another to call it absurd when there are working examples of the point the author is trying to maked.”

    I think I was flying in a bit of high rhetoric here and apologize using the phrase “logically absurd”. I did not mean to slight you, Stephen. Rather, I was making a “reductio ad absurdum” case.

    BTW, I think my case still stands after reading your response. Writing in the pyramid style in fact indicates to me that you have simply conformed your writing style to enclosing metadata at the top. I would say use of titles is more of the same.

    My own take is that electronic means have given us the opportunity to play all sorts of new metadata tricks for spreading our message. There are likely to be failed attempts like what James reports for Learning Objects.

    I just don’t think tagging writ large is one of them, although I think there are bound to be some failed experiments in tagging.

  10. [...] scurrilous search engine optimisation 2005-06-14

    Bud Gibson follows up on the tagging / folksonomy discussion that Stephen and him kicked off over here (one of the benefits of subscribe to [...]

  11. I like tagging because it is fun & easy. Sometimes I think people put waaaaay to much energy into over analyzing these things.

    I just moved my blog from categories to tags because I can basically create a category as a new one is needed instead of writting a post, realizing I hadn’t created an appropriate category, creating one, then categorizing it, then posting.

  12. Hey, we’re (well, some of us are) academics, we like analysing :o)

    Categories vs. tags is an interesting perspective, thanks for that.

  13. Hm… almost as if to prove my point… TagCloud http://www.tagcloud.com/ offers an ‘automatic folksonomy’.

  14. The tagcloud thing looks interesting. Stephen, I’ve become convinced that the real difference in our points of view comes down to tactics. You would like some sort of fair (or at least not so socially regulated) automated discovery system while I live in a world of seeing the system and wondering how I can use it to get my message across.

    In that regard, tags are a big deal. I’m not so sure your ideas about pyramid writing are really that far removed from tagging posts in the end.

    I will grant that the tactics issue is a big point of departure between our perspectives.

  15. Just came across this thread via a link to it on a folksonomy site. :)

    To respond to the “content is the metadata” remarks above - um, no, I disagree. For example, one property of a piece of text might be that it makes people laugh - that is, it’s “funny”. People searching for humorous text would consider “this text is funny” to be relevant metadata about it. But quite likely the text doesn’t *say* “I’m funny” in a way that automated text analysis can extract. AI technology isn’t powerful enough to detect humor reliably. Deciding whether something is funny or not is still a task for human beings.

    The value of attaching tags to data - including text - is that it allows people to associate properties with data, and then share those associations with others, in ways that we do not know how to automate. As a practical matter, I know that I’ve gotten a lot of good laughs by subscribing to the feed http://del.icio.us/rss/tag/funny in my RSS aggregator.

  16. Must be the largest thread to post ratio I’ve ever had!

    Thanks for the comment tho John!

    I’m going to be hugely overly simplistic here and ask a. Is what’s funny to one person necessarily funny to another?, b. Who is going to bother telling me that this stuff is funny? & c. Wouldn’t I be most likely to get what is funny from a human filter (i.e. blog about funny stuff) than a tag-generated folksonomy?

    Yes tags have some use but IMO it’s v. limited and of little significance outside of for small groups & media other than text. This is, of course, not to say that there is no value in small groups and other media (quite the opposite) but rather to suggest that if we moved on to looking at the *natural* tagging that we do (through categorisation, title, syntax, vocabulary etc.) then we would be doing ourselves a favour.

  17. To answer your questions:

    > a. Is what’s funny to one person necessarily funny to another?

    No, of course not. But the things that are funny to me are a subset of the things that are funny to somebody. Getting a list of things that are funny to somebody is a first pass at finding things that are funny to me. When I wander into the humor section of a bookstore, I don’t expect to find that every book there is funny to me. But that doesn’t mean that the bookstore’s categorization system is flawed. Rather, they’ve given me a convenient place to go to make a start at finding books that *are* funny to me.

    > b. Who is going to bother telling me that this stuff is funny?

    Well, lots of folks. That’s what social bookmarking sites are about. Every day on del.icio.us, thousands of people, one of whom is me, upload and tag links that they’ve found. And del.icio.us makes it easy to get lists of items with a particular tag: http://del.icio.us/tag/funny for a list of everything that anybody tagged as “funny”, or http://del.icio.us/popular/funny for a list of items that LOTS of people tagged as “funny”. The first list has hundreds of items posted in the last month alone, so clearly people are bothering to do this. A key point is that for the most part they are tagging material that they found, not material that they themselves created.

    Now, a lot of these links, probably most, aren’t in the least bit funny to me. But like I say, it’s a first pass, to which I will apply further filtering. Thanks to RSS, it’s a manageable first pass. In my news aggregator, I can subscribe to RSS feeds for tags. This breaks the lists up into daily chunks of reasonable size, chunks that I can scan through quickly, discarding the ones that are irrelevant to me, enjoying and saving the others. (The URLs for the “funny” RSS feeds on del.icio.us would be http://del.icio.us/rss/tag/funny and http://del.icio.us/rss/popular/funny.)

    > c. Wouldn’t I be most likely to get what is funny from a human filter (i.e. blog about funny stuff) than a tag-generated folksonomy?

    A tag-generated folksonomy *IS* a human filter - it’s just lots of people doing the filtering instead of just one. And the value of having lots of people doing it is that it’s less likely that relevant stuff will be missed. Of course, it means that a lot of the stuff isn’t of interest to me, but I’ve already described how I handle that.

    So is tagging text a perfect way of classifying material for subsequent indexing and retrieval? Certainly not. But is it useful? In the context of social bookmarking, and with convenient mechanisms for posting and retrieval, and with enough people participating that some sort of critical mass is achieved, I’d have to say definitely YES, based on my own experience. By subscribing to RSS feeds on del.icio.us for selected tags (e.g. php, folksonomy, ajax, etc.) I continually find items of interest that otherwise I’d probably have missed. For example, that’s how I found this discussion thread. Tagging and folksonomy have provided me with a new and useful way of exploring the web.

  18. Thanks John, good responses to some rather (admittedly) facetious questions by me.

    OK, so it is a good first pass and there are a whole bunch of ’socially responsible’ people out their tagging (now THAT would be an interesting study, the social motivations of tagging (or not))

    I still have trouble with the tagging of text though… though perhaps that’s dwindling a bit… and while I agree that a folksonomy is a human generated filter… I’m not sure that it’s a particularly *good* one…

    Hmmmmmmmmm… thanks again for commenting.

  19. Let me throw out another view.

    Much of the discussion about folksonomy is occurring amongst a group that is fully RSS-enabled and does a lot of machine processing of information. Therefore, folksonomy can appear superfluous. In fact, as raised by Downes, folksonomy can be construed as an inferior classification system relative to machine summarization. People are less systematic than machines at the very least and will therefore produce less consistent classification systems, which is a part of what folksonomy is all about. Question for Downes and Farmer, this line of reasoning suggests that Google News beats the New York Times on some dimensions, no?

    Now, consider the following scenario. An industry’s news sources are not RSS enabled, and its topics are not popular enough to appeal at the scale required for Google News. A small group of entrepreneurs has decided to try to transform the industry. They start to scan web pages and want to build a set of bookmarks for those pages. Now, following the methods advocated by Downes and Farmer, these people would be best off using a service like TagCloud, and automated tag generator. Will tagcloud add tags like “competitor”? Will tagcloud add that? More generally, will tagcloud add tags that take into account the user’s context as well what is written in the document? The answer to all of these questions is: “No”.

    Therefore, folksonomy has a role, even in classification tasks where you might think it is less useful.

  20. Not having delved much into the machine vs. human debate (or the Google news one for that matter) I’m not very well placed to answer to be honest. Good general point though… will appropriate (with attribution of course ;)

    I think if you were aiming to transform an industry’s news sources then using tags would not be the way to go about it though. I’m not sure if tags aren’t a bit like Wikis… good for some very specific things but not so great for a lot of others.

  21. “I think if you were aiming to transform an industry’s news sources then using tags would not be the way to go about it though. I’m not sure if tags aren’t a bit like Wikis… good for some very specific things but not so great for a lot of others.”

    I’m going to be releasing some data later this week. I think tagging to create news streams could very well work in niches.

  22. Cool, thanks for the heads up, looking forward to being proved wrong :O)

  23. Well, as much as I would like to say I will prove you wrong, I think it is better stated that I present evidence that may support my point of view.

    This conversation with various people on your blog has been a real eye-opener for me. I would have somewhat dismissed the whole algorithmic point of view had Stephen (and maybe you) not supported it.

    I’m not sure I think folksonomies are wiki-like in the sense you state which I take to mean “lead toward social convergence, not necessarily the individual voice.” I wonder if this is just not an issue of perspective and ultimately the tactics that derive.

  24. [...] and more has funnily enough led me to feel even less excited about social bookmarks / tags than I was before. Folksonomies made out of categorisation, titles, meta-tags and more are useful and v [...]