Tag Clouds Coming to Your Desktop

 

 Tag Clouds Coming to Your Desktop


Tag clouds are those colorful collections of words that display the frequency with which they are used on a website. For example, if we were to compare the tag cloud for this post to one from a food blog, you would see words like "kale" and "chocolate" but also "marzipan" and "Mandarin". The word size is proportional to its frequency in the tag cloud. Compare this illustration (below) to the photo on this screen shot of my own blog's tag cloud:

The photo above shows what a real-life version of a tag cloud looks like in progress. The red string stretches from my hand to the center of a whiteboard on which I have drawn a rectangle. The bottom of the rectangle is a circle representing all of the words currently in the tag cloud, while the top (where you see me point) is an elliptical shape representing all of the words I want to include in my cloud. The words are drawn to scale, with larger words higher up and smaller ones closer to my hand. At the moment, the current set of words are about 1 / 5th of my plan.

My blog is a small node-based site with a minor amount of user generated content. I want to build a tag cloud that will show the frequency (rank) of all of the words on my site in a visually appealing way. Although Google Trends shows us that there are 3,500,000 blogs written - and I would like to see what words are most frequently used on them - when I ran my own search for "tag cloud", I was surprised to find s l i g h t l y b o r e d results for Google searches for tag clouds . I decided that the problem was that most blogs don't tag their content and a simple Google search wasn't really going to help me. So, I am going to build a tag cloud from my own website and then use it on other sites.

Websites like Wikipedia use a word-count metric to divide up our words into those that are "common" and those that are "uncommon". Wikipedia counts every word as being uncommon if it doesn't appear in at least 2% of the pages tagged with the same prefix... but I don't want to do that for my content. Instead, I will use the "frequency" metric: the number of times a word appears divided by its total number of appearances.

I am thinking about a tag cloud algorithm that simply counts up all of the words in each category and then projects them out as ellipses in different sizes. For example, if I saw some words with ranks "10", "20", and "30" with 10,000, 4,000 and 2,200 appearances respectively, my algorithm would produce three ellipses ranging in size from 1 to 3 for each word:

Using this formula and my own data set... ...

Conclusion

The final tag cloud that I will post as part of this project looks like this:

I have also added a "Tag Cloud" tag to my blog which can be found via the "tags" dropdown on the right side of the main page. Feel free to add your own words there! For more information on how you can get started with my word cloud library, see TagCloud.net

automated word cloud generator here . Or you can use one of these WYSIWYG word cloud generators: My blog is marked up using Microformats and I have implemented some HTML5 elements using jQuery. I'm buildingandowning website that generates custom word clouds from incomplete content tags.

Post a Comment

About