How do reddit’s popularity algorithms promote information about COVID-19?

Since 2016, the Citizens and Technology Lab (CAT Lab) has taken ongoing snapshots of reddit’s algorithms every 2-3 minutes. We created this dashboard to inform the design of our collaborations with reddit communities on COVID-19 public health research. We have made the dashboard, data, and code public for researchers and practitioners who study the role of algorithms in society.

During a pandemic, people need evolving information to guide our health decisions and what we share with others. These needs continue across the months-long pandemic cycle of prevention, resilience, and recovery.

Reddit is already a major hub of COVID-19 information for hundreds of millions worldwide. News and norms spread rapidly across the site, promoted by people and algorithms. Reddit memes, jokes, and ideas are also shared widely on other platforms. While the reddit ecosystem could be a powerful force for good, public health experts have also argued that cascades of human and algorithm sharing are the biggest pandemic risk. And if you wonder whether this social news site matters, reddit reportedly has more active users than Twitter.

One reason to monitor algorithms is that information interventions can have unexpected side-effects on ranking algorithms. In a large-scale experiment with r/worldnews on reddit, we found that encouraging fact-checking influenced what reddit’s popularity algorithms promoted by influencing human behavior.

How we collect data

This dashboard is generated every six hours using the latest data within CAT Lab’s research software. The output of this dashboard is based on the last 3 days (we are limiting how often we update the dashboard for ethics reasons listed below). It uses these sources:

  • snapshots every 2 minutes of two key reddit algorithms:
    • reddit’s HOT algorithm, the default popularity ranking on the site
    • reddit’s TOP algorithm
  • data about posts, currently queried from the reddit API at the time of report generation

With every report, this software also publishes research-quality datasets to covid-algotracker on github, along with information about the settings, keywords, and software behind the data.

This report is based on 3 days of data up to the following snapshots:

  • Last HOT snapshot: 2020-06-02 19:59:21 UTC
  • Last TOP snapshot: 2020-06-02 19:55:37 UTC

How we identify COVID-19 posts

This report identified a post as related to COVID-19 if the lower-case title or the text of the submission matched any of the following English language terms (code here). Terms will be updated as the pandemic evolves.

##  [1] "coronavirus"       "covid"             "corona"           
##  [4] "wuhan"             "pandemic"          "outbreak"         
##  [7] "epidemic"          "cdc"               "social distanc"   
## [10] "self-isolate"      "isolation"         "self isolate"     
## [13] "quarantine"        "sanitizer"         "toilet paper"     
## [16] "tp"                "wipes"             "world health"     
## [19] "community spread"  "herd immunity"     "respirators"      
## [22] "shelter in place"  "lockdown"          "n95"              
## [25] "n-95"              "work from home"    "virus"            
## [28] "ventilator"        "fauci"             "stimulus"         
## [31] "flatten the curve" "chloroquine"       "hcq"              
## [34] "serological"       "immunity test"     "containment"      
## [37] "antibody test"     "stay-at-home"      "stay at home"     
## [40] "reopen"            "re-open"

How algorithms promote reddit posts over time

Reddit’s public rankings are a classic recommender system. Using information from upvotes, downvotes, post age, and maybe other data, reddit’s algorithms create a ranked list suggestions to readers. According to Christian Sandvig, feeds like this outsource human knowledge and attention to machines. Hybrid human-machine curation can be valuable and difficult during crises, as Alex Leavitt has described in extensive research on reddit.

This dashboard reviews posts that appeared anywhere in the top 100 recommendations for TOP and HOT. On average over the last 3 days, COVID-19 articles that appeared anywhere in the rankings stayed on HOT for 4.8 hours and on TOP for 13.6 hours.

HOT and TOP aren’t the only algorithms that influence what people see. For several years now, reddit has provided a personalized newsfeed algorithm to logged in users, based on the communities that people subscribe to. The company has also developed BEST and POPULAR algorithms, which can add if there’s interest.

Here’s what that trajectory looks like for one randomly-sampled post that rose and fell from prominence. In the chart, a rank position of 0 means that the article is at the very top. People typically have to scroll down to see more than 3-4 posts. A rank position of -99 means that you would have to scroll past 99 other posts to see that one. In this example, notice that the HOT rankings are more volatile than TOP:

Currently-promoted COVID-19 posts on reddit

Posts currently promoted by the HOT algorithm

In the latest snapshot, 4.0% of posts appearing in the HOT algorithm on reddit are COVID-19 related.

  • (rank 0 to -99) (upvotes) (title)
  • (-15)(🔺 6221) Black Lives Matter
    • r/askscience. (💬 571) Domain: self.askscience. Highest Rank: -15. Time on hot: 38 minutes
  • (-59)(🔺17715) America seems to have successfully prevented a second wave of corona
    • r/Jokes. (💬 391) Domain: self.Jokes. Highest Rank: -51. Time on hot: 107 minutes
  • (-66)(🔺 3022) IAmA Supports Black Lives Matter
    • r/IAmA. (💬 0) Domain: self.IAmA. Highest Rank: -66. Time on hot: 8 minutes
  • (-67)(🔺28091) Wuhan tested the entire city of 9,899,829 people in 10 days, found no active case, 300 asymptomatic carriers.
    • r/Coronavirus. (💬1751) Domain: aa.com.tr. Highest Rank: -20. Time on hot: 252 minutes

Posts currently promoted by the TOP algorithm

In the latest snapshot, 3.0% of posts appearing in the TOP algorithm on reddit are COVID-19 related.

  • (rank 0 to -99) (upvotes) (title)
  • (-14)(🔺94722) Gotta worry about covid-19, racism, and now ninjas. Ran over a shuriken today on my way home from the store.
    • r/Wellthatsucks. (💬 1304) Domain: i.redd.it. Highest Rank: -14. Time on top: 758 minutes
  • (-36)(🔺74462) It seems to hard to remember we’re STILL in the middle of a global pandemic
    • r/MurderedByWords. (💬 1319) Domain: i.redd.it. Highest Rank: -36. Time on top: 157 minutes
  • (-67)(🔺56244) Megathread: President Donald Trump Mobilizes Military Amid National Unrest
    • r/politics. (💬17276) Domain: self.politics. Highest Rank: -62. Time on top: 701 minutes

Top communities promoted by reddit’s algorithms

Reddit’s algorithms recommended these subreddits at least 3 times over the last 3 days.

Top subreddit communities with COVID-19 content promoted by reddit’s algorithms in the past 3 days.
frequency
Coronavirus 10
pics 5
politics 4
tifu 4
worldnews 3

Top domains promoted by reddit’s algorithms

These domains were recommended at least 3 times by HOT or TOP over the last 3 days.

Top web domains with COVID-19 content promoted by reddit’s algorithms in the past 3 days.
frequency
i.redd.it 16
i.imgur.com 5
self.tifu 4
v.redd.it 3

Ethics

This dashboard summarizes and archives information that is currently public on reddit’s front page. To reduce risks to communities, this dashboard does not link directly to discussions. All author information is removed from the archival datasets.

You can read more about CAT Lab’s ethics values and processes in our post on strategies for ethical, accountable online behavior research.

Don’t use this dashboard for interventions (yet)

At CAT Lab, we believe that powerful digital interventions ought to be tested to discover if they help rather than harm, and to enable public accountability. Interventions that form feedback loops with algorithms can have unpredictable side-effects, and we advise against proceeding without evaluation. To add extra friction to interventions, we are only updating the dashboard every six hours. The research data will still provide fine-grained information for people working to study and model reddit’s algorithm behavior by the minute.

We are currently talking to funders about a project we developed with reddit communities and public health experts to test public health information interventions with people and algorithms. If you have ideas for funding opportunities, please contact J. Nathan Matias at

Questions and bugs

If you find a bug or have a question, please post it to the issues page for this project on github. Thanks!

About CAT Lab

The Citizens and Technology Lab at Cornell University works with communities to study the social impacts of digital technologies and discover effective ideas for change. CAT Lab is led by Dr. J. Nathan Matias, an assistant professor in the department of Communication.

Working alongside communities and volunteers, we discover practical knowledge that also contributes to science, holds companies accountable, and is guided by the people most affected. Communities bring their problems, deep knowledge, and desire for change. We bring expertise in scientific research and a software platform for coordinating citizen behavioral science.

Over a dozen communities with tens of millions of people have worked with CAT Lab since 2016 on reddit, Wikipedia, and Twitter. We have tested practical ways to prevent harassment, fight misinformation, broaden inclusion, manage civic discourse, and protect freedom of expression. Our discoveries have directly influenced community practice, corporate policies, and government discussions worldwide. These industry-independent findings are regularly published by the world’s top scientific journals in the social sciences and computer science.

References and further reading

License

The Citizens and Technology Lab makes this report and associated code available under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License.