How do you rank trendiness?
That’s the question we asked ourselves when we started thinking about an algorithm to sort playgrounds. At the beginning, we used to pick the contents that appeared the best to us and showed them on the “featured” playgrounds page. At that time, we were looking for a short-term solution.
We could surely assess the quality of playgrounds in a number of technologies, but what about the others? Besides, we don’t want to dictate what should be learned according to what seems trendy to us. We believe that having the community itself judge the quality of playgrounds shows way more authority.
We documented ourselves, and checked in particular how Reddit and Hackernews algorithms were built. We finally came up with a formula. Not ideal, but acceptable for a first version. We also wanted to synchronize this feature with the release of the explore page updated design.
The Formula for Trendiness
T(l,v) = Q(l,v) * W(l) * F(v)
Trendiness depends on likes and views for all playgrounds. We split it into the three following factors:
- Q – the quality factor
- W – the waning factor
- F – the freshness factor
The Quality Factor
Bad content usually doesn’t get trendy, so playgrounds on the top of the list should be of exceptional quality. We used to rate playgrounds from 1 to 5 stars. We could have taken the average rating into account for the quality factor. But we wanted to try a different –and much simpler– method of ranking.
Now, we have likes. We believe that liking a playground shows an appreciation from the reader for the quality of the playground. You wouldn’t like a bad playground.
Thus, the quality factor of a playground is proportional to the number of its likes and inversely proportional to the number of its views.
It appears as a rather logical factor. The more readers like it, the better it is. Keep it simple.
This could feel a bit paradoxical: trendy topics will touch a broad audience but not necessarily interest it, thus greatly increasing the number of views but not the number of likes. For example, if a playground reaches the front page of a large subreddit –like r/programming–, it will just fall down in the ranking.
Options to improve it
Instead of taking the number of views into account, we could consider the number of reads: the number of people who actually stay on the page, read it and try the code examples in the playground. It would also counter artificial likes from people wanting to push a playground to the top without reading it.
The Waning Factor
We value likes, but if we want to rank trendiness, then we have to take time into account, too. A topic that was trendy a month ago should not get the same attention as a topic that is trendy today. For you to get an idea of the waning factor’s effect, it makes a 1 day-old “like” weigh the same as 10 thirty-day-old likes.
The waning factor impacts the value of a like, inversely proportional to its age.
An old playground could resurface, if it still matters to people and gets likes.
The evolution of the ranking feels less continuous, since we update the age of likes once a day. It could greatly impact the ranking at once.
Options to improve it
To make this aging factor more accurate, we could also take into account the age of views. If an old playground suddenly gets a lot of reads, then it means it’s trendy again and should rank higher.
The Freshness Factor
Not all authors will promote their playgrounds. For this reason, the algorithm should allow fresh content to appear at the top for a while. Until the quality factor weighs in.
The freshness factor decreases as the number of views increases until a certain amount is reached, after which it is equal to 1; at this point the score is not impacted. Because showing a playground with just a handful of views as a trendy playground could look weird, we added a penalty for a playground with a very small number of views.
This factor values any kind of content and gives every topic the opportunity to be seen by a reasonable number of people.
The freshness factor has a maximum for a specific number of views. This number is arbitrary, and it’s quite difficult to determine what the right value should be.
Options to improve it
Once again, we could take reads into account instead of views. Having a “NEW” label could help, too, so people understand why fresh playgrounds get to the top.
The current version of the algorithm is not ideal. We’ll continue to look for ways to improve it. But we should keep in mind that the perfect algorithm doesn’t exist. There is always an edge case to take into account. Adding a factor to correct it just complexifies the whole algorithm, and possibly makes other factors less relevant.
We’ll analyze how the current version works in production, then start working on a second version. Meanwhile, don’t hesitate to tell us how we could improve it.