Skip to content

Data cleaning

Up to a range of about 50000 social media posts, data can still be cleaned semi-manually in EXCEL or a browser-based cleaning tool. More data usually cause severe performance issues, so cleaning via script is recommended. The following script permits the cleaning of all kinds of social media collected in .txt format, with a special focus on deleting @ signs, hashtags, URLs and emojis:

Tool recommended for cleaning plain text in your browser: TextCleanR

Tool recommended for cleaning structured data (e.g. in CSV and EXCEL format): OpenRefine

Overview of collected data sets and task sheets

The overview of the available data sets will be regularly updated. For some data sets, task sheets for student group work can also be found in this repository: