Skip to content

Scores : compréhension des algorithmes Reddit

Il est important de bien comprendre que les algorithmes Reddit de calcul de score sont différents dans le cas d’un fil de discussion ouvert (submission) ou dans le cas des commentaires à l’intérieur du fil de discussion.

Les explications qui suivent sont tirées de : https://www.datadial.net/blog/how-the-reddit-algorithm-works/. Les surlignages sont de nous.

Submission Algorithm

« As Reddit is an open source website, its code is freely available. The site’s algorithms are written in Python and the sorting algorithms are executed in Pyrex. Reddit has a story algorithm that it always uses, which is called the Reddit hot ranking. With the Reddit story algorithm, the number of votes and the submission time of a link have the largest effect on where a story will rank.

This is because Reddit implements a logarithm function in its algorithm. With this type of algorithm, the first votes on a link are more valuable than later votes on a link. For example, the first 10 up-votes will have the same value as the next 100 and so on. This means that as a link gets older, its ranking will slowly degrade, as the impact of the up-votes it gets becomes less significant. Conversely, it is also important to get some initial traction on a submission in order to give it early visibility.

Reddit ranks an item by calculating the number of votes a link has and then subtracting points based on how old that link is. This means that newer links generally rank higher than older links. This keeps the front page fresh, and ensures that links with thousands of up-votes aren’t stuck on the front page for weeks or months at a time. Stories that get a more equal range of up-votes and down-votes will generally be ranked lower than stories that have a larger percentage of up-votes. »

Submission and comment : 2 different algorithms

« For comments, Reddit uses a different algorithm, as using the hot ranking algorithm wouldn’t be practical. For comments, it is most logical to list the best rated comments prominently, rather than giving precedence to the older comments. Instead of using the hot ranking algorithm, Reddit uses a *confidence sort algorithm based on the Wilson score interval for its comments**.

With a confidence sort algorithm, the best rated comments that the system has the most data for will be ranked the highest. For example, a comment with ten up-votes and 1 down vote will rank higher than a comment with only 1 up-vote and no down-votes, even though the latter comment has a 100% up-vote rate. The comments are ranked by data sampling and the date the comments are submitted isn’t an active factor.

Understanding the basics of the Reddit algorithm can help you to better understand the way that the platform works, and be able to use it more effectively. »

Sur l’algorithme de classement des commentaires

On peut trouver quelques notions ici par exemple.

Cet algorithme permet de s’approcher de la moyenne pondérée des scores des commentaires dès un nombre faible de commentaires (qui seraient donc vus comme des échantillons de l’ensemble des commentaires, ensemble encore inconnu à un instant t) avec une bonne confiance, et doit probablement permettre à Reddit de ne pas avoir de sauts de score trop importants au fur et à mesure que le nombre de votes augmente dans le fil de discussions.

Scores : distribution et relation vis-à-vis de la profondeur

On observe effectivement un resserrement du score au fur et à mesure que la profondeur dans les fils de discussion augmente (figure ne retenant que les scores de -25 à +25) :

score-nb-comments

Ce sont essentiellement les 10 premiers niveaux de profondeur qui influent sur le score.