Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?

Gaël Le Mens, Balázs Kovács, Michael T. Hannan, Guillem Pros

Sociological Science March 3, 2023
10.15195/v10.a3


Social scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet, such descriptions are frequently unavailable. In this article, we introduce a method to measure typicality based on text data. Our approach involves training a deep-learning text classifier based on the BERT language representation and defining the typicality of an object in a concept in terms of the categorization probability produced by the trained classifier. Model training allows for the construction of a feature space adapted to the categorization task and of a mapping between feature combination and typicality that gives more weight to feature dimensions that matter more for categorization. We validate the approach by comparing the BERT-based typicality measure of book descriptions in literary genres with average human typicality ratings. The obtained correlation is higher than 0.85. Comparisons with other typicality measures used in prior research show that our BERT-based measure better reflects human typicality judgments.
Creative Commons LicenseThis work is licensed under a Creative Commons Attribution 4.0 International License.

Gaël Le Mens: Department of Economics and Business, Universitat Pompeu Fabra (UPF), Barcelona School of Economics, and UPF Barcelona School of Management, Barcelona, Spain
E-mail: gael.le-mens@upf.edu

Balázs Kovács: School of Management, Yale University, New Haven, CT, USA
E-mail: balazs.kovacs@yale.edu

Michael T. Hannan: Graduate School of Business, Stanford University, Stanford, CA, USA
E-mail: hannan@stanford.edu

Guillem Pros: Department of Economics and Business, Universitat Pompeu Fabra, Barcelona, Spain
E-mail: guillem.pros@upf.edu

Acknowledgments: We are grateful to Jerker Denrell, Amir Goldberg, Greta Hsu, Thorbjørn Knudsen, Cecilia Nunes, and Phanish Puranam for discussion of ideas developed in this article and for the detailed feedback we received from them on the earlier versions. We thank conference participants at the 2021 and 2022 Nagymaros Conferences for valuable feedback and discussion. G. Le Mens and G. Pros received financial support from ERC Consolidator Grant #772268 from the European Commission. G. Le Mens also received financial support from grant PID2019-105249GBI00/ AEI/10.13039/501100011033 from the Spanish Ministerio de Ciencia, Innovacion y Universidades (MCIU) and the Agencia Estatal de Investigacion (AEI) and from the BBVA Foundation Grant G999088Q. B. Kovács was supported by Yale School of Management. M. Hannan was supported by the Stanford Graduate School of Business. Data, material, and analysis code for all analyses are available online at https://osf.io/ta273/. We encourage readers to download the shared folder and use the code to compute BERT typicality on their own data sets.

  • Citation: Le Mens, Gaël, Balázs Kovács, Michael T. Hannan, and Guillem Pros. 2023. “Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?” Sociological Science 10: 82-117.
  • Received: September 28, 2022
  • Accepted: November 9, 2022
  • Editors: Ari Adut, Filiz Garip
  • DOI: 10.15195/v10.a3


, , , , ,

No reactions yet.

Write a Reaction


The reCAPTCHA verification period has expired. Please reload the page.

SiteLock