Where in the world is my tweet: Detecting irregular removal patterns on Twitter

PLoS One. 2018 Sep 20;13(9):e0203104. doi: 10.1371/journal.pone.0203104. eCollection 2018.

Abstract

Twitter data are becoming an important part of modern political science research, but key aspects of the inner workings of Twitter streams as well as self-censorship on the platform require further research. A particularly important research agenda is to understand removal rates of politically charged tweets. In this article, I provide a strategy to understand removal rates on Twitter, particularly on politically charged topics. First, the technical properties of Twitter's API that may distort the analyses of removal rates are tested. Results show that the forward stream does not capture every possible tweet -between 2 and 5 percent of tweets are lost on average, even when the volume of tweets is low and the firehose not needed. Second, data from Twitter's streams are collected on contentious topics such as terrorism or political leaders and non-contentious topics such as types of food. The statistical technique used to detect uncommon removal rate patterns is multilevel analysis. Results show significant differences in the removal of tweets between different topic groups. This article provides the first systematic comparison of information loss and removal on Twitter as well as a strategy to collect valid removal samples of tweets.

MeSH terms

  • Data Interpretation, Statistical
  • Humans
  • Politics
  • Probability
  • Social Media*
  • Social Sciences / methods
  • Time Factors

Grants and funding

The author received no specific funding for this work.