An algorithm for driving newsletter traffic

Last week I made a few updates to The Sample. First of all, there’s a spiffy new landing page:

The Sample already does automatic keyword extraction and stuff so that it can figure out what topics each newsletter is about. But to make sure the new landing page checkboxes work well, I went through all the newsletters I’ve imported (297 so far) and added tags manually. (“Eclectic” means roughly “I have no idea which topics to file this under”). I also added a secret “good” tag to about 40 or so newsletters and made the algorithm forward those more often. (Not that the other newsletters aren’t good! They’re all good, I guess, but some are just gooder than others I suppose. No hard feelings).

And finally, I added one more thing: referral links.

Now, if you submit your own newsletter, I can give you access to a stats page like the one above, including your own referral link. Instead of getting swag for referrals, the algorithm will forward your newsletter to more people.

The way it works is pretty simple. At its core, the recommendation algorithm is plain collaborative filtering (“people who liked X also liked Y”). First you make a CSV with three columns: newsletter URL, user ID, rating. For example, one row might indicate “Sally gave Giraffe Weekly a rating of 4 stars.” You run that CSV through some Python code, and then you get the ability to predict ratings. You can ask “What rating will Bob give to the Future of Discovery newsletter?”, and you’ll get an answer of “2.391928 stars, I think.”

(Caveat: you need a lot of ratings for this to be any good. Since we don’t have that many subscribers yet, I use some tricks, aka content-based filtering, to generate some fake ratings to augment the CSV.)

Anyway. Every morning, shortly after 7:00 AM PDT, The Sample goes through all the subscribers and all the newsletters and asks “What rating will this user give to this newsletter?” For each user, we pick whichever newsletter we think they’ll rate the highest and forward it to them.

(Actually there’s more to it than that; but that’s the gist of it.)

Back to referral links. We keep track of how much traffic each newsletter sends to us and how much we send to them. If we owe a newsletter some traffic, we add a slight twist to the algorithm: instead of asking “which newsletter would this user rate the highest,” we ask “which user would rate this newsletter the highest,” and then we forward the newsletter to that user.

Ta da, that’s it. This is also how we recommend ads, though in that case instead of 5 star ratings we use binary “clicked on the ad/didn’t click on the ad” ratings.