Overview of Data Sets Provided in This Repository
Teaching Text Analysis with Twitter Data (2021–2022)
In 2021 and 2022, the coordinators of the Machines of Knowledge course at Maastricht University taught computational text analysis with Twitter data. At that time, the Twitter API still allowed relatively straightforward access to large amounts of social media data on current events. Among the topics explored in class in winter 2022 were:
- Twitter reactions to the 2022 attack on a gay bar in Oslo (ca. 50,000 tweets)
- Twitter reactions to the 2022 FIFA World Cup in Qatar (ca. 100,000 tweets)
- Twitter reactions to Elon Musk’s takeover of the platform, October 28–31, 2022
Working with these corpora while full academic API access was still available revealed major limitations. We observed large volumes of spam, bot-generated posts, and cryptocurrency or NFT advertisements, which distorted the data. These experiences led to critical discussions about data quality and the epistemological limits of social media research.
Exploring Alternative Data Sources After 2023
Following Elon Musk’s takeover of Twitter, major changes to platform access policies restricted academic data use. Since 2023/24, the course has integrated alternative sources. We currently focus on podcast reviews and YouTube comments. In 2023 and 2024, students worked on the following topics:
- App Store comments about successful true crime podcasts
- YouTube reactions to Judith Butler’s lecture on intersectionality (ca. 2,000 comments)
- YouTube comments on Amanda Gorman’s inaugural poem (ca. 8,000 comments)
Data Feminism Lecture
The Data Feminism lecture uses two data sets as examples for critical and intersectional readings of digital culture:
This data set is used to analyse how gendered and racial stereotypes shape online discourse around celebrity performance, and how audiences critique certain gendered and ethnic representations on stage.
- Menstruation data set (student-scraped during the first digital skills session)
This smaller data set of podcast reviews is used as a small practice corpus to discuss embodied experiences, stigma, and female visibility in digital media.
Postcolonialism Lecture
The Postcolonialism lecture currently uses a data set collected around the death and funeral of Queen Elizabeth II, which enables students to explore global media responses and post-imperial narratives in online discussions:
Case Studies for Student Assessments
Since 2024, Monika Barget has expanded the Machines of Knowledge teaching materials into case studies that students can use for their classroom presentations and final essays. These are regularly updated on GitHub.
Case Studies including Ready-Made Data Sets for Team Presentations
Case Studies for Data Scraping and Final Essays
Working with Voyant Tools
To analyse the provided data sets in Voyant Tools:
- Open the Voyant Tools website.
- Paste the raw URL (!) of the data file into the “Add Text” field.
- To obtain the raw URL on GitHub:
- Open the desired file in the repository.
- Click on Raw in the upper right corner.
- Copy the URL from your browser’s address bar.
- Click Reveal in Voyant to load and explore the corpus.