Skip to content

Overview of Data Sets Provided in This Repository

Teaching Text Analysis with Twitter Data (2021–2022)

In 2021 and 2022, the coordinators of the Machines of Knowledge course at Maastricht University taught computational text analysis with Twitter data. At that time, the Twitter API still allowed relatively straightforward access to large amounts of social media data on current events. Among the topics explored in class in winter 2022 were:

Working with these corpora while full academic API access was still available revealed major limitations. We observed large volumes of spam, bot-generated posts, and cryptocurrency or NFT advertisements, which distorted the data. These experiences led to critical discussions about data quality and the epistemological limits of social media research.

Exploring Alternative Data Sources After 2023

Following Elon Musk’s takeover of Twitter, major changes to platform access policies restricted academic data use. Since 2023/24, the course has integrated alternative sources. We currently focus on podcast reviews and YouTube comments. In 2023 and 2024, students worked on the following topics:

Data Feminism Lecture

The Data Feminism lecture uses two data sets as examples for critical and intersectional readings of digital culture:

This data set is used to analyse how gendered and racial stereotypes shape online discourse around celebrity performance, and how audiences critique certain gendered and ethnic representations on stage.

  • Menstruation data set (student-scraped during the first digital skills session)

This smaller data set of podcast reviews is used as a small practice corpus to discuss embodied experiences, stigma, and female visibility in digital media.

Postcolonialism Lecture

The Postcolonialism lecture currently uses a data set collected around the death and funeral of Queen Elizabeth II, which enables students to explore global media responses and post-imperial narratives in online discussions:

Case Studies for Student Assessments

Since 2024, Monika Barget has expanded the Machines of Knowledge teaching materials into case studies that students can use for their classroom presentations and final essays. These are regularly updated on GitHub.

Case Studies including Ready-Made Data Sets for Team Presentations

Case Studies for Data Scraping and Final Essays

Working with Voyant Tools

To analyse the provided data sets in Voyant Tools:

  1. Open the Voyant Tools website.
  2. Paste the raw URL (!) of the data file into the “Add Text” field.
  3. To obtain the raw URL on GitHub:
    • Open the desired file in the repository.
    • Click on Raw in the upper right corner.
    • Copy the URL from your browser’s address bar.
  4. Click Reveal in Voyant to load and explore the corpus.