How do reddit’s popularity algorithms promote information about COVID-19?

Since 2016, the Citizens and Technology Lab (CAT Lab) has taken ongoing snapshots of reddit’s algorithms every 2-3 minutes. We created this dashboard to inform the design of our collaborations with reddit communities on COVID-19 public health research. We have made the dashboard, data, and code public for researchers and practitioners who study the role of algorithms in society.

During a pandemic, people need evolving information to guide our health decisions and what we share with others. These needs continue across the months-long pandemic cycle of prevention, resilience, and recovery.

Reddit is already a major hub of COVID-19 information for hundreds of millions worldwide. News and norms spread rapidly across the site, promoted by people and algorithms. Reddit memes, jokes, and ideas are also shared widely on other platforms. While the reddit ecosystem could be a powerful force for good, public health experts have also argued that cascades of human and algorithm sharing are the biggest pandemic risk. And if you wonder whether this social news site matters, reddit reportedly has more active users than Twitter.

One reason to monitor algorithms is that information interventions can have unexpected side-effects on ranking algorithms. In a large-scale experiment with r/worldnews on reddit, we found that encouraging fact-checking influenced what reddit’s popularity algorithms promoted by influencing human behavior.

How we collect data

This dashboard is generated every six hours using the latest data within CAT Lab’s research software. The output of this dashboard is based on the last 3 days (we are limiting how often we update the dashboard for ethics reasons listed below). It uses these sources:

  • snapshots every 2 minutes of two key reddit algorithms:
    • reddit’s HOT algorithm, the default popularity ranking on the site
    • reddit’s TOP algorithm
  • data about posts, currently queried from the reddit API at the time of report generation

With every report, this software also publishes research-quality datasets to covid-algotracker on github, along with information about the settings, keywords, and software behind the data.

This report is based on 3 days of data up to the following snapshots:

  • Last HOT snapshot: 2020-06-02 19:59:21 UTC
  • Last TOP snapshot: 2020-06-02 19:55:37 UTC

How we identify COVID-19 posts

This report identified a post as related to COVID-19 if the lower-case title or the text of the submission matched any of the following English language terms (code here). Terms will be updated as the pandemic evolves.

##  [1] "coronavirus"       "covid"             "corona"           
##  [4] "wuhan"             "pandemic"          "outbreak"         
##  [7] "epidemic"          "cdc"               "social distanc"   
## [10] "self-isolate"      "isolation"         "self isolate"     
## [13] "quarantine"        "sanitizer"         "toilet paper"     
## [16] "tp"                "wipes"             "world health"     
## [19] "community spread"  "herd immunity"     "respirators"      
## [22] "shelter in place"  "lockdown"          "n95"              
## [25] "n-95"              "work from home"    "virus"            
## [28] "ventilator"        "fauci"             "stimulus"         
## [31] "flatten the curve" "chloroquine"       "hcq"              
## [34] "serological"       "immunity test"     "containment"      
## [37] "antibody test"     "stay-at-home"      "stay at home"     
## [40] "reopen"            "re-open"

How algorithms promote reddit posts over time

Reddit’s public rankings are a classic recommender system. Using information from upvotes, downvotes, post age, and maybe other data, reddit’s algorithms create a ranked list suggestions to readers. According to Christian Sandvig, feeds like this outsource human knowledge and attention to machines. Hybrid human-machine curation can be valuable and difficult during crises, as Alex Leavitt has described in extensive research on reddit.

This dashboard reviews posts that appeared anywhere in the top 100 recommendations for TOP and HOT. On average over the last 3 days, COVID-19 articles that appeared anywhere in the rankings stayed on HOT for 4.8 hours and on TOP for 13.6 hours.

HOT and TOP aren’t the only algorithms that influence what people see. For several years now, reddit has provided a personalized newsfeed algorithm to logged in users, based on the communities that people subscribe to. The company has also developed BEST and POPULAR algorithms, which can add if there’s interest.

Here’s what that trajectory looks like for one randomly-sampled post that rose and fell from prominence. In the chart, a rank position of 0 means that the article is at the very top. People typically have to scroll down to see more than 3-4 posts. A rank position of -99 means that you would have to scroll past 99 other posts to see that one. In this example, notice that the HOT rankings are more volatile than TOP:

Currently-promoted COVID-19 posts on reddit

Posts currently promoted by the HOT algorithm

In the latest snapshot, 4.0% of posts appearing in the HOT algorithm on reddit are COVID-19 related.

  • (rank 0 to -99) (upvotes) (title)
  • (-15)(🔺 6221) Black Lives Matter
    • r/askscience. (💬 571) Domain: self.askscience. Highest Rank: -15. Time on hot: 38 minutes
  • (-59)(🔺17715) America seems to have successfully prevented a second wave of corona
    • r/Jokes. (💬 391) Domain: self.Jokes. Highest Rank: -51. Time on hot: 107 minutes
  • (-66)(🔺 3022) IAmA Supports Black Lives Matter
    • r/IAmA. (💬 0) Domain: self.IAmA. Highest Rank: -66. Time on hot: 8 minutes
  • (-67)(🔺28091) Wuhan tested the entire city of 9,899,829 people in 10 days, found no active case, 300 asymptomatic carriers.
    • r/Coronavirus. (💬1751) Domain: Highest Rank: -20. Time on hot: 252 minutes