Working on a distant reading case study

Available data sets in this repository

In the data section of this repository, you find different types of sample data with a link to the Machines of Knowledge course. Older data sets, which were used for teaching in the past, can still be helpful as testing and practicing data, but you may not use them for your graded group presentations or your final essays.

The data sets we will use in Machines of Knowledge this year are the following:

First tests with Voyant tools in class: Menstruation Awareness, based on podcast reviews scraped during the first skills session
Voyant experiments in the data feminism lecture: Miley Cyrus and the VMA2013 scandal data set
Data sets you may use for your group presentations:

To read brief descriptions of the data and find ideas for analysis, please go to the relevant tasksheet pages in the page menu.

Case studies you can use (as an inspiration) for your final essays:

For these case studies, Monika Barget has created sample YouTube playlists whose reviews you can scrape and use as a starting point for your essays. You can also use the suggested podcasts and scrape their reviews. If you want to compare these data with different data sets or apply a different theoretical approach, this is also possible.

Ingest your data into Voyant Tools

Go to the Voyant Tools website and simply paste the URL of the data (as indicated in the link above) into the "add text" field. Press the blue "reveal" button and start exploring the dataset!

Tasks to perform in Voyant

Students often find it challenging to develop a logical workflow in Voyant as the interface offers so many analytical opportunities and show different visualisations at once. For beginners, a focus on a limited amount of tools, starting with a simple frequency analysis, is recommended. The steps below can easily be applied to different data sets and offer insights that ideally inspire critical close reading.

High-level analysis with word cloud and frequencies table

Look at the word cloud and the corresponding frequencies table. What words are the most prominent? What different topics or themes can you identify in the data?
Who are the "protagonists" mentioned in the data? Are any people mentioned by name, and who are they? Show hosts, politicians, police officers?
What places (e.g. cities) are mentioned in the data? Why do you think that could be the case?
What surprises you? What information is hard to contextualise?

Comparative analysis with the "trends" tool

Write down words that express emotions/people's feelings towards the show and its hosts. Put them into the trends tool. How did these emotions develop over time?
Try to find groups of words for comparative analysis in "trends". This group of words should be homogeneous in terms of word type or theme. You can, for example, look at different place names or trace the distribution of different adjectives (e.g. "good", "bad", "exciting", "boring"). It does not make sense to randomly combine terms from different word groups.

Analysing co-occurrences/correlations with the "links" tool or the "correlations" tool

What words are most prominently associated with the words "police" and "cop"? What does this say about the listeners' opinions?
Try other words such as "criminals", "journalism" or "investigators". What are the results? What do they tell us about the podcast and perhaps also about more general debates in the United States?

Reading keywords in context

Put the words which you have used to find co-occurrences in the "context" tool to see the full sentences in which they are used. Does this give you any additional insights?
Alternatively, you can also experiment with the "word tree".

Drawing general conclusions

What conclusions can you draw from all the individual distant reading results?
What is (un)expected?
What would have been difficult to find via close reading?
What research questions would be interesting to explore based on this dataset?

Working on a distant reading case study ​

Available data sets in this repository ​

Ingest your data into Voyant Tools ​

Tasks to perform in Voyant ​

High-level analysis with word cloud and frequencies table ​

Comparative analysis with the "trends" tool ​

Analysing co-occurrences/correlations with the "links" tool or the "correlations" tool ​

Reading keywords in context ​

Drawing general conclusions ​

Further Readings ​