Research Notebook

The 'Mixology' open research project aims to probe opinions in times of crisis from a corpus collected via the Twitter API. Its other objective is to develop an original research tool to be also reused for the analysis of headlines or media content (computational linguistics and machine learning methods), in line with media studies and journalism studies.

Blog 2: Collecting the corpus

11 décembre 2021


The data gathering from the social network Twitter is carried out via the rtweet package (#Rstats). The first requests followed the following code:

The geographical perimeter is calculated according to a radius of 1.367 kilometers from Juliandorp in the Netherlands (in the center) to Rozan in Poland. Despite the initial obstacles related to completing the requests (blocking at 10%, 1%, and 5% content retrieval), it was possible to identify 90 variables.


All these variables are not used in the constitution of the corpus, and this is also for ethical reasons:

  1. The users are not informed of this research.
  2. Users cannot exercise their right to withdraw.
  3. Respect for the privacy of users is fundamental.
  4. This research is less interested in who says it (even if it is also scientifically interesting) than what is said.

Also, the variables used are: text (the content of the tweet), lang (language), location, and country (location of the user, not necessarily always filled in), while user_id indicates the number of different users.

# # #

Read more

Blog 13: Building a stop words list

Blog 12: Main Dictionaries for Sentiment Analysis

Blog 11: Statistical description of the corpus #RStats

Blog 10: Sentiment analysis or the assessment of subjectivity

Blog 9: Topic modeling of the ‘vaccination’ corpus (English)

Blog 8: Linguistic and quantitative processing of the ‘vaccination’ corpus (English, part.2)

Blog 7: Linguistic and quantitative processing of the ‘vaccination’ corpus (English, part.1)

Blog 6: Collecting the corpus and preparing the lexical analysis

Blog 5: The textclean package

Blog 4: Refining the queries

Blog 3: The rtweet package

Blog 2: Collecting the corpus

Blog 1: An open research project

The challenges of research on media use in times of crisis