A study from the University of St.Gallen suggests that language use in instant messages allows for the prediction of users’ age and gender.
What instant messages can reveal about users’ demographics?
With the ubiquitous use of smartphones globally, text messaging has become one of the most prevalent types of communication and, therefore, one of the most common types of digital data. Research from Prof. Dr. Clemens Stachl from the Institute of Behavioral Science and Technology (IBT) at the University of St.Gallen (HSG), Timo Koch at Ludwig-Maximilians-Universität München and Peter Romero at Keio University shows that there are distinct language differences between gender and age groups that allow for the prediction of user demographics using machine learning algorithms.
The study analyzed more than 300,000 WhatsApp messages from 226 German volunteers. Some of the key findings regarding age and gender differences showed that:
- Younger participants utilized 1st person singular more frequently (e.g., “I” or “me”), informal language (for example the German “geil” which translates to “hot/great”), and emoticons (e.g., “:)”) in their messages.
- Female users tended to use emoji more frequently and employed a broader range of emoji than men. Further, women incorporated more function words, particularly personal pronouns in 1st person singular, in their messages.
- Men’s language, on the other hand, was found to be indicative of a more analytic thinking style.
This study indicates that private instant messages could be more predictive of user characteristics than pubic social media posts because users engage in more self-disclosure in their messages. As a consequence, digital language footprints in instant messages would allow tech firms to profile users and could threaten individual privacy rights beyond user demographics. Given the overall trend away from public posting and towards private communication, these findings open up many questions about how instant messaging data should be protected.
The corresponding paper has been published in Computers in Human Behavior and is openly accessible via: https://doi.org/10.1016/j.chb.2021.106990
Timo Koch, Department of Psychology, Ludwig-Maximilians-Universität München