Reliability of crowdsourcing as a method for collecting emotions labels on pictures

Olga Korovina, Marcos Baez, Fabio Casati

Research output: Contribution to journalArticle

Abstract

OBJECTIVE: In this paper we study if and under what conditions crowdsourcing can be used as a reliable method for collecting high-quality emotion labels on pictures. To this end, we run a set of crowdsourcing experiments on the widely used IAPS dataset, using the Self-Assessment Manikin (SAM) emotion collection instrument, in order to rate pictures on valence, arousal and dominance, and explore the consistency of crowdsourced results across multiple runs (reliability) and the level of agreement with the gold labels (quality). In doing so, we explored the impact of targeting populations of different level of reputation (and cost) and collecting varying numbers of ratings per picture. RESULTS: The results tell us that crowdsourcing can be a reliable method, reaching excellent levels of reliability and agreement with only 3 ratings per picture for valence and 8 per arousal, with only marginal difference between target populations. Results for dominance were very poor, echoing previous studies on the data collection instrument used. We also observed that specific types of content generate diverging opinions in participants (leading to higher variability or multimodal distributions), which remain consistent across pictures of the same theme. These can inform the data collection and exploitation of crowdsourced emotion datasets.

Original languageEnglish
Number of pages1
JournalBMC Research Notes
Volume12
Issue number1
DOIs
Publication statusPublished - 30 Oct 2019

Fingerprint

Crowdsourcing
Labels
Emotions
Arousal
Gold
Manikins
Health Services Needs and Demand
Costs
Costs and Cost Analysis
Experiments
Population
Datasets

Keywords

  • Crowdsourcing emotions
  • Empirical study
  • Rating behavior
  • Reliability

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

Reliability of crowdsourcing as a method for collecting emotions labels on pictures. / Korovina, Olga; Baez, Marcos; Casati, Fabio.

In: BMC Research Notes, Vol. 12, No. 1, 30.10.2019.

Research output: Contribution to journalArticle

@article{257b10377c18427890f03ce6879bc9d6,
title = "Reliability of crowdsourcing as a method for collecting emotions labels on pictures",
abstract = "OBJECTIVE: In this paper we study if and under what conditions crowdsourcing can be used as a reliable method for collecting high-quality emotion labels on pictures. To this end, we run a set of crowdsourcing experiments on the widely used IAPS dataset, using the Self-Assessment Manikin (SAM) emotion collection instrument, in order to rate pictures on valence, arousal and dominance, and explore the consistency of crowdsourced results across multiple runs (reliability) and the level of agreement with the gold labels (quality). In doing so, we explored the impact of targeting populations of different level of reputation (and cost) and collecting varying numbers of ratings per picture. RESULTS: The results tell us that crowdsourcing can be a reliable method, reaching excellent levels of reliability and agreement with only 3 ratings per picture for valence and 8 per arousal, with only marginal difference between target populations. Results for dominance were very poor, echoing previous studies on the data collection instrument used. We also observed that specific types of content generate diverging opinions in participants (leading to higher variability or multimodal distributions), which remain consistent across pictures of the same theme. These can inform the data collection and exploitation of crowdsourced emotion datasets.",
keywords = "Crowdsourcing emotions, Empirical study, Rating behavior, Reliability",
author = "Olga Korovina and Marcos Baez and Fabio Casati",
year = "2019",
month = "10",
day = "30",
doi = "10.1186/s13104-019-4764-4",
language = "English",
volume = "12",
journal = "BMC Research Notes",
issn = "1756-0500",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Reliability of crowdsourcing as a method for collecting emotions labels on pictures

AU - Korovina, Olga

AU - Baez, Marcos

AU - Casati, Fabio

PY - 2019/10/30

Y1 - 2019/10/30

N2 - OBJECTIVE: In this paper we study if and under what conditions crowdsourcing can be used as a reliable method for collecting high-quality emotion labels on pictures. To this end, we run a set of crowdsourcing experiments on the widely used IAPS dataset, using the Self-Assessment Manikin (SAM) emotion collection instrument, in order to rate pictures on valence, arousal and dominance, and explore the consistency of crowdsourced results across multiple runs (reliability) and the level of agreement with the gold labels (quality). In doing so, we explored the impact of targeting populations of different level of reputation (and cost) and collecting varying numbers of ratings per picture. RESULTS: The results tell us that crowdsourcing can be a reliable method, reaching excellent levels of reliability and agreement with only 3 ratings per picture for valence and 8 per arousal, with only marginal difference between target populations. Results for dominance were very poor, echoing previous studies on the data collection instrument used. We also observed that specific types of content generate diverging opinions in participants (leading to higher variability or multimodal distributions), which remain consistent across pictures of the same theme. These can inform the data collection and exploitation of crowdsourced emotion datasets.

AB - OBJECTIVE: In this paper we study if and under what conditions crowdsourcing can be used as a reliable method for collecting high-quality emotion labels on pictures. To this end, we run a set of crowdsourcing experiments on the widely used IAPS dataset, using the Self-Assessment Manikin (SAM) emotion collection instrument, in order to rate pictures on valence, arousal and dominance, and explore the consistency of crowdsourced results across multiple runs (reliability) and the level of agreement with the gold labels (quality). In doing so, we explored the impact of targeting populations of different level of reputation (and cost) and collecting varying numbers of ratings per picture. RESULTS: The results tell us that crowdsourcing can be a reliable method, reaching excellent levels of reliability and agreement with only 3 ratings per picture for valence and 8 per arousal, with only marginal difference between target populations. Results for dominance were very poor, echoing previous studies on the data collection instrument used. We also observed that specific types of content generate diverging opinions in participants (leading to higher variability or multimodal distributions), which remain consistent across pictures of the same theme. These can inform the data collection and exploitation of crowdsourced emotion datasets.

KW - Crowdsourcing emotions

KW - Empirical study

KW - Rating behavior

KW - Reliability

UR - http://www.scopus.com/inward/record.url?scp=85074358604&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074358604&partnerID=8YFLogxK

U2 - 10.1186/s13104-019-4764-4

DO - 10.1186/s13104-019-4764-4

M3 - Article

C2 - 31666124

AN - SCOPUS:85074358604

VL - 12

JO - BMC Research Notes

JF - BMC Research Notes

SN - 1756-0500

IS - 1

ER -