THE CREATION OF WORD CLOUDS

The World Well-Being Project is a multi-disciplinary research group in the Positive Psychology Center at the University of Pennsylvania. Much of our work is part of an effort to develop an unobtrusive measurement of the psychological and physical well-being of large populations by analyzing their written expressions in social media such as Facebook and Twitter.

Our collaborators at myPersonality collected Facebook posts from over 75,000 volunteers who also took the standard Interpersonal Personality Item Pool (IPIP) personality test to measure the "Big Five" personality traits. A fraction of these volunteers also answered questions on their subjective well-being.

We analyze users who wrote at least 1,000 words and whose primary language is English. Extracting occurrences of each linguistic feature yields 700 million words, phrases, and topic instances.

To identify distinguishing language features, we normalize the word and phrase frequencies by the total word usage of the subject and apply the Anscombe transformation for variance stabilization. Ordinary least squares regression, adjusting for covariates such as gender and age, is used to determine the unique effect of each language feature on each psychosocial variable

We present the results using word clouds, but unlike most word clouds, which scale words by their frequency, ours scale words according to the strength of the relationship between the word or phrase and the variable tested; Words are colored to represent their frequency over all users.