Showing posts with label Meta. Show all posts

persistent.coffee #

I've been trying my hand at latte art. Though I have a very long way to go, I've been documenting my efforts, with a hope of learning from my mistakes. Blogger's mobile support makes it pretty easy to collect pictures, and I've finally gotten around to making a decent template for the "blog."

coffee.persistent.info is the result. Technically, this isn't a Blogger template, since I just have some static HTML as the content. Instead, it uses the JSON output that Blogger's GData API supports. Rendering the page in JavaScript allows for more flexibility. I wanted to make pictures that I liked take up 4 slots (a layout inspired by TwitterPoster). This imposed additional constraints (in order to prevent overlap between sequential large pictures). The display is generally reverse-chronological starting from the top left, but images are occasionally shuffled around to prevent such overlaps. There is also a bit of interactivity, the pictures are clickable to display larger versions. To help with all this, I've been experimenting with jQuery (also on Mail Trends), and am liking it quite a bit.

Bad Headlines #

Headlines seen over the past few weeks on the start page that Verizon forces on its users, and what they refer to in parentheses:

  • "Daily Life is a Struggle" (life in Baghdad)
  • "All Sides Deny Deals" (Iran/UK sailors hostage situation)
  • "Panel Seeks Documents" (Alberto Gonzales' federal prosecutors firings)
  • "Cheruiyot Wins Marathon" (Robert Kipkoech Cheruiyot won the Boston Marathon)
  • "Gates: Clock is ticking" (Defense Secretary Robert Gates on the situation in Iraq)
  • "Donkey Becomes Witness" (Dallas man is sued for noise disturbance caused by a donkey)
  • "Dems Predict a Win" (House Democratic leaders predict that they can win a vote on a bill calling for a withdrawal from Iraq)
  • "Candidates to Debate" (The Republican presidential candidate debate)

Someone forgot to read up on microcontent headlines.

Communicating through screenshots #

My posts to the Reader blog often have screenshots, and I'm pretty picky about what goes in them. It's nice to see that others have noticed this too. My most recent post has a link to Dinosaur Comics, which the author seems to appreciate. Similarly, I intentionally left a reference to Valleywag in my trends post, which they noticed.

In some fashion, Apple does this too. A patent of theirs had a BBEdit reference, which amused the author of the software.

Determining Twitter's Growth #

Opinion on Twitter is divided. What seems to be undisputed is that right now it's growing very quickly. I was curious just how "quickly" quickly was, preferably going beyond just anecdotal "my network doubled in size in the last 5 seconds" kind of observations. It seems like Twitter assigns globally unique, incrementing IDs to all messages it receives. By looking at the values of these IDs over time, it's possible to see how many status messages Twitter is keeping track of. I've generated a logarithmic graph of this.

I'm not sure why there was an inflection point in early November. It's also possible that this is affected by technical changes on Twitter's side. Still possibly interesting. Also, Joshua's post on autoincrement considered harmful is related and an interesting read.

Update: As it was pointed out to me in the comments, Andy Baio had the same idea except he executed it more throughly.

Twitter Message IDs

Intern on the Google Reader team #

The Reader team is hoping to have a student intern or two this coming summer. We're fast moving and always have more ideas than manpower, so an internship can be quite rewarding as far as the "working on real stuff" factor goes. For example, our intern last year, Brad, worked on the subscribe and feed search functionalities of the new Reader that launched last September. You can intern in Google's New York or Mountain View offices, working on either Reader's frontend/UI or backend.

If you or anyone you know is interested in this internship, contact me at mihai at persistent dot info. This page also has more general information about interning at Google.

Blogger Migration Part II: Getting Data Into Blogger #

Blogger doesn't have any built-in entry importing facilities. My plan for dealing with this was to use their API to re-post all of the entries I had exported from Movable Type. A quick test showed that such entries could be back-dated, which was my main concern.

Using some sample code that I found as a starting point, it was pretty easy to write a simple importing script for entries. Since I needed to parse the Atom responses from the API (e.g. to get at entry IDs), the Universal Feed Parser came in handy. Since I had used the Convert Line Breaks option on quite a few posts, I had to HTML-ify some post bodies before sending them to Blogger (I've turned off Blogger's similar setting. In addition to being only at the blog level, instead of a per-post setting, I've decided that the closer to regular HTML my posts are, the easier it will be to migrate again in the future).

The Blogger API allows for posting of comments too, but unfortunately without the ability to impersonate others (even when anonymous comments are enabled). The solution was to impersonate the regular (non-API) comment posting form, which does allow for authors to be specified (but no backdating, which is why all imported comments are dated February 10th). This was made slightly trickier since a security token is required (to avoid XSRF attacks), so the posting page had to be scraped first to extract it.

Finally, since Blogger proved to be occasionally flaky when doing many API operations in a row, I had to add some simple checkpointing so that if the process failed I could restart it and it would continue from where it left off. Once I did all that, importing 350+ entries and 600+ comments took around an hour, but worked flawlessly.

I've uploaded an archive of my code that I used for the importing. It's not the cleanest it could be, but others may find it useful too. Additionally, in the time that has passed since I began my importing, it looks like a similar script has appeared that does a similar import operation from WordPress, which may be worth a look too.

Understanding feed reader marketshare numbers #

Update on 2/25: FeedBurner has published a post discussing this same issue but providing numbers for their whole userbase, which makes it even more interesting.

Ever since the Reader team announced that we were making public subscriber counts (thanks Justin), bloggers have been excitedly posting about the bumps they're seeing in their subscriber stats. I'm obviously very happy that Reader is getting all this attention, and that we turn out to be quite popular when compared to other feed readers. However, these statistics need a bit of interpretation. Most people post charts of their subscriber counts, like this one for this blog:

FeedBurner subscribers

For web-based readers where feeds are fetched on behalf of multiple users, the subscriber number is based on what the site reports. To the best of my knowledge, with the exception of My Yahoo!, these number are total subscribers, even if an account is inactive. Unless the site is aggressive about cleaning up inactive accounts, these numbers are only upper bounds on the number of actual readers that you have.

A more interesting number to look at is how many viewers each item gets from each feed reader. FeedBurner provides this as part of their TotalStats package. By embedding a small tracking image in your burned posts and looking at referrers, it's possible to see these item-specific views. Here are how many views and clicks my post from yesterday got in various feed readers:

FeedBurner item use

From this it would appear that Reader has an even bigger lead over Bloglines (though given the biases in this blog's readership, I'm not reading too much into this). There are other factors involved here too. The user bases for feed readers are not identical, if an item appeals more to one population than another, that may skew things. Additionally, some readers (especially homepage-style ones like My Yahoo!, Google Personalized Homepage and Netvibes) don't have to display the item body and allow users to jump straight to the post page. These would show up in the "Clicks" column but not in the "Views" one.

What becomes apparent is that none of these statistics provide a complete picture of your readership, but that when used together they can still give you broad trends and help you tailor your content to your audience.

Blogger Migration Part I: Getting Data Out of Movable Type #

The first step in migrating away from Movable Type was to get all of my entries and comments in a structured format that could be parsed and uploaded to Blogger*. MT doesn't hold data hostage, there is a documented import/export format. Six Apart considers the format "lossy", in that it doesn't save a complete snapshot of your blog. I decided that what it did contain was good enough, though it turn out that what it lacked (entry IDs and permalinks) did make things slightly more difficult. A search on code.google.com for Python code to parse the format turned up Transfusion which does just that (searching for one of the magic strings in the format, CONVERT BREAKS specifically, was the easiest way to track this down).

As I was skimming through the exported entries, I saw that they weren't quite HTML. I had used MTMacro to create various shorthand tags for linking to entries, reference to images, etc. Similarly, I used MTCodeBeautifier to pretty-print code samples. None of these were getting evaluated when exporting, and even if they had been, I probably would have wanted to tweak their output anyway (e.g. to change URLs). Generally, it seemed like the time I had spent customizing my Movable Type installation with cruft-free URLs, plug-ins, etc. would be directly proportional to the time I would have to spend migrating away from it.

One of the more prevalent macros I had used was one of the form <entryLink id="NNN">foo</entryLink> so that I could link to my past entries. Unfortunately, since entry IDs were not included entry IDs issue, there was no easy way to turn these into actual links, since the exported information did not contain entry IDs or URLs. In the end, I ended up converting these by hand.

That's it for the exporting part, part II will contain the Blogger import process and part III the template/design reasoning.

* the other migration option that I was considering was WordPress. However, the idea of having to do SQL queries to serve traffic didn't seem that appealing given my current provider's slow SQL performance. WordPress.com would have been a hosted option, but if I was going to relinquish control of the installation, it might as well be to a Google product.

Switched to Blogger #

Partly because I was fed up with Movable Type's rebuild times, but also for dog food reasons, I've moved my blog over to Blogger (custom domains was the final missing piece). Redirects should be in place and no links should break, but feed readers will most likely see a bunch of new entries (I didn't see an option in FeedBurner to suppress duplicates). Please leave a comment if you see anything amiss.

I'll be writing up more details this coming week about the work that was necessary to migrate.

More Efficient FeedBurner Visitor Tracking #

I was quite happy to see that FeedBurner had integrated site visitor tracking in addition to their already great feed usage statistics. Actually using it with my blog was just a matter of cutting and pasting some code into my template. Here's the snippet that I was offered for Movable Type:

<script src="http://feeds.feedburner.com/~s/PersistentInfo?i=<$MTEntryPermalink$>" type="text/javascript" charset="utf-8"></script>

This seems reasonable, but it bothered me that the tracking URL varied for each entry. This meant that for my front page, 7 different (but nearly identical) pieces of JavaScript would have to be fetched, slowing down the rendering of my site. Others have complained too. I think a better way of implementing this would have been to have something like:

<script src="http://tracking.feedburner.com/tracking.js" type=text/javascript" charset="utf-8"></script>
<script type="text/javascript">FBTrack("PersistentInfo", "<$MTEntryPermalink$>");</script>

That would have meant that the same tracking code can be used across all sites thus most users would have it cached (I think that's what Google Analytics relies on). Unfortunately, that's not something that I can change on my own.

However, when running into a related problem (a FeedFlare was stuck on) it was pointed out to me that the i=... part of the script URL can be removed if all that's desired is the visitor tracking and not per-entry features like FeedFlares. This means that at least across my entire site there is only one tracking script URL that visitors have to load. It turns out FeedBurner documents this, but only in the "Others..." category (the bottom "If you want to use StandardStats only" section).

Google Reader Redux #

The new version of Reader has been out there long enough (and is now stable enough) that I have some time to catch my breath and make this post (my post-launch post last year came only a couple of days after the big announcement). I've jotted down some of my thoughts from the past few weeks, continuity will not be high.

There were some hints that something big was coming. Chris's Twitter updates were sounding rather intense. Someone in the discussion group inferred from my lack of posts that a major update was imminent (or that I stopped caring - never!). We even invited some bloggers for a sneak peek at the new Reader* but they were nice and respected their embargo.

Reader is in Google Labs, and that puts it in the "throw it against the wall and see what sticks" product family. I'm glad that people seem to have realized that this "throwing" and "seeing" are less passive than they sound. To stretch this metaphor further, if the spaghetti starts to slide off, engineers (and UI designers, and product managers, and others) will study the problem and figure out how to increase its coefficient of friction. Usually the changes are more subtle (witness the myriad of tweaks that have been done to the Google Video homepage) which is perhaps why there is this perception that no post-launch changes are made.

Gmail and Google Reader integrationA lot of people have remarked on the similarities between the new Reader interface and Gmail's. With this in mind, I've created a simple Greasemonkey script that adds a "Feeds" in Gmail. When clicked, Reader's list view is loaded on the right. To install the script (and Greasemonkey if you have never used it before):

  1. Install Greasemonkey from http://greasemonkey.mozdev.org/
  2. Restart Firefox.
  3. Click on the script link above
  4. Click on the "Install" button that's displayed in the upper-right corner of the page.
  5. Visit/reload Gmail

You may wonder why I felt the need to write a Greasemonkey script for my own product. The answer is that integrations are hard and generally require a lot of effort before you can even determine if they are worthwhile. Greasemonkey lets you experiment with UI concepts with minimal effort necessary from either team (I had to make exactly one change to Reader to better support this script, and that was the ability to force list view to be used, even if expanded view is normally selected). I can't really say what, if any, our integration plans are, but enough users have asked for something like this that I thought writing the script was the most expedient way to provide this (unofficial) feature.

I am still subscribed to the "google reader" Blog Search feed, so that is one way to reach the team with feedback. The discussion group is also being monitored, though with the increased volume we now find it hard to respond to a lot of posts. But please keep the feedback coming, it's been great to get direct, concrete indicators for what we should work on next.

* It is rather frustrating to have to call it "the new Reader" or more formally "the new version of Google Reader." It's unfortunate that version numbers are passé, "2.0.1" is a more accurate and concise representation where of where Reader is right now.

Facebook meets Dodgeball #

Dodgeball check-ins in my Facebook profile My initial reaction to the Facebook Mini-Feed/News Feed feature was pretty positive. I still don't think it's such a big deal, but the privacy settings they've added in response to the backlash are pretty well done. Because I felt like this wasn't transparent enough, I thought it would be cool to syndicate my Dodgeball check-ins into my Facebook profile, via their blog (i.e. RSS feed) to notes import feature. It worked pretty well (see picture on the left), and it's sort of neat that these two social networking sites are open even slightly, allowing such co-mingling of data.

Facebook currently allows only one feed to be imported, which means you can't have (say) your check-ins, regular blog and link blog all in there. However, it's possible to create a spliced feed in Google Reader and import that instead.

On a somewhat related note, I've set up a scraped feed for the Facebook Blog, since it doesn't have one for some bizarre reason.

Coincidence? #

Flickr's Geotagging Feature
Flickr's new geotagging feature

Overplot
Overplot

Probably.

410 Annoying #

Russell has stopped blogging, and seems to want people to unsubscribe from his feed. Not content to only return a 410 Gone status code*, he has also inserted a seizure-inducing image into a item with an ever-changing GUID. I'm not sure if this falls under "clever social hack" (since not all aggregators support 410 - including Reader to some degree) or "ugly perversion of HTTP."

* The Reader crawl logs suggest that initially his "unsubscribe now" feed was returning a 200 status code, which made this less interesting.

Google Reader Tidbits #

Google Reader recently launched sharing, a feature that I had a hand in. I've used it to power my link blog, available in the sidebar (only on the front page). Although these sharing "clips" are easiest to use when pre-styled with one of our color schemes, you can choose the "None" option and then use your own CSS to make them blend in with the rest of your site, as I've done. And since it's all JSON underneath, you really use a public label any way you want on your site.

What may not be obvious about this sharing feature is that it can be used together to splice feeds. Furthermore, you can chain shared labels. For example, the "Stuff written (or recommended) by the Reader team" section in the Google Reader blog was put together like this (arrows indicate subscriptions/labels being applied):

Sharing flow

This way, everyone gets to control their own "me" label, without having to modify the team account when wanting to add/remove feeds.

While developing this sharing feature, it became clear that the ultimate origin of an item in a feed is very important (i.e. I may see it because I'm subscribed to your "web-dev" label, but really it's from QuirksBlog). We joked about the need for a "Molecule" format that would specify the aggregation of multiple Atom feeds. We even began coding a (namespaced) origin element that would contain the title, id, homepage URL, etc. of the originating site for this item. Then, while re-reading RFC 4287 for another reason, we came across the source element in Atom, which does exactly what we had set out to (re)implement. This tells you two things:

  1. The Atom people were pretty clever to have foreseen this use case.
  2. No matter how well your spec/documentation is written, people will still miss things (a.k.a. everyone is a bozo at some point).

Now that we produce feeds for others to consume, it's nice to use Atom to its fullest so that we can validate (there are some scary looking warnings that we'd like to fix, but as of now that feed is in fact valid Atom 1.0).

Rendering Text Inside the Canvas Object #

I recently had an idea for a cool hack involving the <canvas> tag/object that is supported by Safari 1.3/2.0, Firefox 1.5 and Opera 9.0. However, I quickly realized that the object does not support text rendering, which made it seemingly useless. The WHATWG spec had this to say:

// drawing text is not supported in this version of the API
// (there is no way to predict what metrics the fonts will have,
// which makes fonts very hard to use for painting)

I'm not entirely sure why predictable font metrics are necessary (for pixel-perfect compliance testing I suppose), but the situation doesn't seem too hopeful. Mozilla's solution to this was the drawWindow, function that could be used with an iframe to render text. However, creating arbitrary windows for text rendering seemed like a lot of overhead, and I wanted something that worked in all browsers.

I remembered from my OpenGL hacking days that a similar shortcoming existed in that environment, and that there were a few workarounds available. One was to render fonts yourself, using the basic line/arc primitives to rasterize TrueType/PostScript fonts. However, this meant finding a font and mapping its drawing operations to canvas ones, which seemed like more work than I was willing to put in for a quick hack. Additionally, having to draw many complex strokes per letter seemed like it would impact performance.

The alternative was to use a font texture. This is usually an image composed of all the necessary letters and symbols for a font. By drawing pieces of it one after another, words can be composed. Since the font has been already rasterized into the texture image, it shouldn't matter how complex each letter is. This also mapped well onto the drawImage call supported by the canvas 2D context. Some quick Googling turned up a ready-made font texture and (more importantly) a table of character coordinates positions within it. If doing this from scratch, Bitmap Font Builder looks handy.

This is the result of putting all of that together. It has decent performance in Safari and Opera 9.0, but Firefox can only manage about ten frames per second. It was even slower when I used drawImage() with an image object. I can only assume that Gecko will decompress the image for each function call instead of keeping the raw pixels around. Thankfully there is an overloaded version of the function that accepts a canvas object. By rendering the desired image into a canvas first and then passing that, performance improved significantly. However, Safari does not seem to support canvas objects as arguments for drawImage(), so a bit of browser detection is necessary.

Update on 2/27/2006: I was curious about the drawImage() performance in Firefox with images vs. canvases that I decided to do a more thorough investigation. Using a simple test bed, I measured the speed of various image rendering calls:

  • drawImage() with an image argument
  • drawImage() with a canvas argument
  • Creating a pattern with an image and then using fillRect() with it
  • Creating a pattern with a canvas and then using fillRect() with it

I then ran 50 iterations of each in Firefox 1.5, Safari 2.0 and Opera 9.0 preview 2, all on a dual 2.3 Ghz G5, with these results:

Method 8-bit opqaue GIF 8-bit transparent GIF* 8-bit opaque PNG* 8-bit transparent PNG* 24-bit opaque PNG 24-bit transparent PNG JPEG
Firefox 1.5
drawImage w/ image 74 138 593 574 242 1959 227
drawImage w/ canvas 446 4378 4433 4444 443 495 454
fillRect w/ pattern w/ image 10 22 75 33 32 118 27
fillRect w/ pattern w/ canvas err err err err err err err
Safari 2.0
drawImage w/ image 15 27 97 34 47 123 62
drawImage w/ canvas NV NV NV NV NV NV NV
fillRect w/ pattern w/ image NV NV NV NV NV NV NV
fillRect w/ pattern w/ canvas err err err err err err err
Opera 9.0 preview 2
drawImage w/ image 521 273 3313 880 1651 4007 err
drawImage w/ canvas 3817 37612 37186 38024 3753 3862 err
fillRect w/ pattern w/ image 3773 36019 36735 37303 3709 3571 err
fillRect w/ pattern w/ canvas NV NV NV NV NV NV err

* 500 iterations
NV: No visible output (but no exceptions thrown either)

As it can be seen, the Firefox performance boost that I saw with drawImage() and a canvas argument only occurs with 24-bit PNGs with an alpha channel. In general, using a pattern is the fastest way to draw an image in Firefox. The one trade-off is that you don't get to use scaling (by playing with the source/destination rectangles), but that can be accomplished with the global matrix transform anyway. Since paterns are always drawn beginning at the top/left corner of the target rectangle, some use of clipping is necesary if only a portion of the image is necessary. However, even with clipping, the use of patterns brings Firefox speed in the text rendering test to ~36 fps instead of ~10 fps.

The Opera numbers are much lower than the others because Opera seems to do some event handling and extra screen refreshes during the benchmark. In general, the fastest approach in Opera is to use drawImage() with a canvas object. Safari seems to have the most trouble supporting alternate approaches, presumably because it had the earliest implementation of canvas and the spec didn't actually exist at that point.

Google Search History as RSS #

Update on 4/24/2006: The feeds have now been officially announced (i.e. they are exposed via auto-discovery on the page). HTTP Basic Authentication (over SSL) is now also supported (in addition to the cookie).

Google recently released some Dashboard Widgets, among them one for accessing your search history. Until now, it had only been accessible at its homepage, so I wondered how the widget got that data out. Thankfully the widget code was not obfuscated, and I was able to see snippets like the following:

  Widget.feed = "http://www.google.com/searchhistory/find?zx=" + 
              randomString() + "&num=50&output=rss&client=google-mcsmhwidget";
  ...
  var url = Widget.feed;
  url += "&start=" + Widget.resultsStart;
  url += "&q=" + encodeURIComponent(query);

Sure enough, URLs such as this one bring up a search through my search history as an RSS feed. The query part of the URL can be left blank to show all items. I'm guessing that by judicious use of the start and num parameters, one could even get at ones entire search history. Presumably the attention fanboys will like this.

The key in the above URL seems to be the output=rss parameter. Since bookmarks are in the same UI as search history, perhaps they can be viewed as RSS too? Yes, they can (though with some XML errors that the team is aware of as of 3/31/2006 it's well-formed XML ). The Trends page however doesn't work as RSS.

Note that these feeds are not really useful for most aggregators, since they require you to logged in to your Google Account and be authenticated by a cookie. The one exception might be Firefox Live Bookmarks. By creating one pointed to your searches or bookmarks feeds and putting it your toolbar, you have one-click access to your search history or bookmarks. However, the real use of the feeds is as a pseudo-API, as they are used in the Dashboard Widget.

This might seem like a convenient "leak," but it's something I decided to blog about for myself, without any prompting from the search history team (though they did get a heads-up about this post). The feed URLs and format may change at any time, though they probably won't deviate too much unless there's a good reason.

Interviewing at Google #

Someone recently asked me if I had any tips about interviewing at Google. Surprisingly enough, even though our jobs page mentions all the great perks and things you could be doing at Google, it doesn't say much about the process. However, I did come across this "What to expect from your Google interview" page created by our Zurich office. Although it's nominally aimed at European candidates, nearly all the information contained within is applicable to all our offices. In addition to emailing that person a link, I'm blogging about it, so that others may benefit as well.

Birthday #

My girlfriend knows me so well:

Cake Pictures

Greasemonkey Hacks #

I finally received my complimentary copy of Greasemonkey Hacks. I'm also glad to see that my saved searches user script was selected as one of the sample ones. Reading the contributors section was somewhat amusing due to the homogeneity of the entries (e.g. more than half mentioned blogs). The book is also already out of date, with Greasemonkey and Firefox both having undergone major releases. I also doubt the wisdom of printing the full code behind each hack - especially some of the longer ones that were already online. Commentary attached to interesting snippets might have worked better. However, as a whole the book seems very well put together and it brought to attention some scripts I hadn't heard of before.

As a side note, I realize that with the recent release of Firefox 1.5 and Greasemonkey 0.6.4 most of my user scripts have suffered some code rot. I hope to bring everything up to date in the next few days.

Linkblog