We need user-centric data

This is an idea I’ve had bumping around in my head for almost four years now, in various forms. I’ve been thinking about it again lately, so here’s the current iteration.

In ye olden days, we predominantly used local software which operated on data stored on your machine. All your data was in one place (the diagram on the right). In the shift to web software, users’ data is now fragmented across many different services (the diagram on the left). Obviously, moving to the web has been overall an improvement. But we can have our cake and eat it too; the key thing is there needs to be one place where all your data resides. Instead of that place being the hard drive in your laptop, it could be a location somewhere on the web.

Why? There are several potential benefits of making data user-centric rather than app-centric, and focusing on different benefits will yield different solutions. I’m mainly interested in extensibility and open data.

For extensibility, think about notifications. Imagine if all your email, your social media interactions, other people’s interactions with your content, etc were aggregated into a single feed. Then, either through code or some graphical interface, you define a filter function. For each item in the feed, that function signals whether you should receive it as a notification right now or not. You could easily decide what things you actually want notifications for, and you could batch them: any notifications below a certain priority would be hidden until the next “batch time” (perhaps once every four hours).

To explain what I mean by open data, imagine everyone had an RSS feed that contained all their interactions with content (except those they want to remain private). The feed items would be things like “published a blog post”, “liked this tweet”, “thinks the author of this comment is a troll”, etc. And also, “follows this person” (with a link to that person’s RSS feed).

Now, say you want to compete with Google, Facebook, Twitter etc etc. The barrier to entry just got a whole lot lower. You can easily write a crawler that indexes someone’s RSS feed, indexes the feeds for everyone they follow, and so on. That data would be rich for building your own search engine, social network, or (as I’m doing with Findka) recommender system. In my opinion, something like this needs to become the foundation of information discovery.

The vision here is compelling; making it a reality is hard. I’ve written a few times in the past about possible implementations. My thoughts have continued to evolve, and I now have different ideas about the best approach. Sadly I don’t have a grant that allows me to write open-source software all day—but typically, I eventually reach a point where I’ve thought about an idea so much that I have no choice but to shirk my responsibilities and start hacking on it (that’s how Biff was born). That might happen again soon.

In the mean time, I may write up some technical design thoughts next week.

Other stuff:

  • I signed up for a newsletter community on IntroSend after a short email exchange with the author. Serendipity is one of the big challenges of interacting with people on the web. I’ve never been to Silicon Valley, but based on what PG says that is (was) one of the main benefits: you make valuable connections by chance. On the internet, bumping into people takes more work, and I think anything that reduces friction is worth exploring (especially if it’s introvert-friendly).

  • I recently finished implementing The Sample’s algorithm, so now the issues you get aren’t purely random anymore. It’s a gluing-together of Surprise, fastText, and the tf-idf portions of this tutorial. It’s similar to Findka Essay’s algorithm, but with a few improvements (and adapted for The Sample’s data). “Glue” is a pretty good description of my data science skills—I’m looking forward to April when my brother graduates and takes over the algorithm.

  • You can also submit your own newsletter to The Sample.

  • I really liked these two posts from Scott Alexander this week.