Research NotebookThe 'Mixology' open research project aims to probe opinions in times of crisis from a corpus collected via the Twitter API. Its other objective is to develop an original research tool to be also reused for the analysis of headlines or media content (computational linguistics and machine learning methods), in line with media studies and journalism studies.
Blog 4: Refining the queries14 décembre 2021
Several tests were necessary to calibrate the queries, which seem to perform poorly when a # is used. A watch of the trends posted on Twitter also led to the addition of the keywords ARN and mRNA, since the corpus analysis will be carried out in French and English.
Each retrieved dataset is first cleaned with Open Refine: column mergers are sometimes necessary because the “text” column is sometimes split into several columns (recording with comma separator). The three corpora with a defined geographical area present fewer quality problems than the general corpus, which targets all directions: big data does not necessarily mean good data.
- Deng, S., Sinha, A. P., & Zhao, H. (2017). Adapting sentiment lexicons to domain-specific social media texts. Decision Support Systems, 94, 65-76.
- Mowlaei, M. E., Abadeh, M. S., & Keshavarz, H. (2020). Aspect-based sentiment analysis using adaptive aspect-based lexicons. Expert Systems with Applications, 148, 113234.