PUBLICATIONS
Show All | APA only | Bibtex only


YOUR SEARCH HAD NO RESULTS
publicationImage

Flekova, L., Ruppert, E., & Preotiuc-Pietro, D. (2015). Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words. Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. bibtex

@inproceedings{lmi2015wassa, author={Flekova, Lucie and Ruppert, Eugen and Preoc{t{iuc-Pietro, Daniel}, title={{Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words}}, booktitle={Proceedings of the Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis}, series={EMNLP}, year={2015}, }


Abstract...
Sentiment (or valence) prediction from Twitter is of the utmost interest for research and commercial organizations. Systems are usually using lexicons, where each word is positive or negative. However, word lexicons suffer from ambiguities at a contextual level: the word cold is positive when in cold beer and negative in cold coffee or the word dark in dark chocolate (+) or dark soul (-). We introduce a method which helps to identify frequent contexts in which a word switches polarity, and to reveal which words often appear in both positive and negative contexts. We show that our method matches human perception of polarity and demonstrate improvements in automated sentiment classification. Our method also helps to assess the suitability to use an existing lexicon to a new platform (e.g. Twitter).
publicationImage

Weeg C, Schwartz H. A., Hill S, Merchant RM, Arango C, Ungar L Using Twitter to Measure Public Discussion of Diseases: A Case Study JMIR Public Health Surveill 2015;1(1):e6 DOI: 10.2196/publichealth.3953 bibtex

@article{weeg2015using, author={Weeg, Christopher and Schwartz, H Andrew and Hill, Shawndra and Merchant, M. Raina and Arango, Catalina and Ungar, Lyle}, title={{Using Twitter to measure public discussion of diseases: A case study}}, journal={JMIR Public Health Surveillance}, year={2015}, volume={1}, issue={1} }


Abstract...
Word-use patterns in Twitter, Facebook, newsgroups, and Google queries have been used to investigate a wide array of health concerns. Twitter is perhaps the most popular online data source for such studies, due in part to its relative accessibility. It has been used to monitor health issues including influenza [1,2], cholera [3], H1N1 [4-6], postpartum depression [7], concussion [8], epilepsy [9], migraine [10], cancer screening [11], antibiotic use [12], medical practitioner errors [13], dental pain [14], and attitudes about vaccination [15]. Such research has demonstrated the utility of mining social media for public health applications despite potential methodological challenges, including the following: (1) Twitter users form a biased sample of the population [16-18], and (2) their word usage within tweets can be highly ambiguous. For example, focusing just on the medical domain, “stroke” has many nonmedical uses (“stroke of genius” or “back stroke”); most mentions of “heart attack” are metaphorical, not literal (just had a heart attack and died the power went out while I was in the shower); and although doctors associate “MI” with myocardial infarction, on Twitter it refers more often to the state of Michigan.
publicationImage

Schwartz, H. Andrew, Park, G., Sap, M., Weingarten, E., Eichstaedt, J., Kern, M., Stillwell, D., Kosinski, M., Berger, J., Seligman, M., & Ungar, L. (2015). Extracting Human Temporal Orientation from Facebook Language. NAACL-2015: Conference of the North American Chapter of the Association for Computational Linguistics. bibtex

@inproceedings{schwartz2015extracting, author={Schwartz, H Andrew and Park, Gregory and Sap, Maarten and Weingarten, Evan and Eichstaedt, Johannes and Kern, Margaret and Stillwell, David and Kosinski, Michal and Berger, Jonah and Seligman, Martin and Ungar, Lyle}, title={{Extracting human temporal orientation from Facebook language}}, booktitle={Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies}, year={2015}, series={NAACL} }


Abstract...
People vary widely in their temporal orientation—how often they emphasize the past, present, and future—and this affects their finances, health, and happiness. Traditionally, temporal orientation has been assessed by self-report questionnaires. In this paper, we develop a novel behavior-based assessment using human language on Facebook. We first create a past, present, and future message classifier, engineering features and evaluating a variety of classification techniques. Our message classifier achieves an accuracy of 71.8%, compared with 52.8% from the most frequent class and 58.6% from a model based entirely on time expression features. We quantify a users’ overall temporal orientation based on their distribution of messages and validate it against known human correlates: conscientiousness, age, and gender. We then explore social scientific questions, finding novel associations with the factors openness to experience, satisfaction with life, depression, IQ, and one's number of friends. Further, demonstrating how one can track orientation over time, we find differences in future orientation around birthdays.
publicationImage

Daniel Preotiuc-Pietro, V. L., & Nikolaos Aletras (2015). An analysis of the user occupational class through Twitter content. Proceedings of the 53rd annual meeting of the Association for Computational Linguistics, ACL. bibtex

@inproceedings{jobs2015acl, author={Preoc{t}iuc-Pietro, Daniel and Lampos, Vasileios and Aletras, Nikolaos}, title={{An analysis of the user occupational class through Twitter content}}, booktitle={Proceedings of the 53rd annual meeting of the Association for Computational Linguistics}, series={ACL}, year={2015}, }


Abstract...
We explore the dynamics of social media information in the task of inferring the occupational class of users. Our analysis is based on the Standard Occupational Classification from the Office of National Statistics in the UK, which encloses 9 extensive categories of occupations.

The investigated methods take advantage of the user's textual input as well as platform-oriented characteristics (interaction, impact, usage). The best performing methodology uses a neural clustering technique (spectral clustering on neural word embeddings) and a Gaussian Process model for conducting the classification. It delivers a 52.7% accuracy in predicting the user's occupational class, a very decent performance for a 9-way classification task.

Our qualitative analysis confirms the generic hypothesis of occupational class separation as indicated by the language usage for the different job categories. This can be due to a different topical focus, e.g. artists will talk about art, but also due to more generic behaviours, e.g. the lower-ranked occupational classes tend to use more elongated words, whereas higher-ranked occupations tend to discuss more about politics or higher education.
publicationImage

Schwartz, H. A., & Ungar, L. H. (2015). Data-Driven Content Analysis of Social Media: A Systematic Overview of Automated Methods. The ANNALS of the American Academy of Political and Social Science, 659, 78-94. bibtex

@article{schwartz2015datadriven, author={Schwartz, H Andrew and Ungar, Lyle H}, title={{Data-driven content analysis of social media: A systematic overview of automated methods}}, year={2015}, journal={The ANNALS of the American Academy of Political and Social Science}, volume={659}, pages={78-94} }


Abstract...
Researchers have long measured people’s thoughts, feelings, and personalities using carefully designed survey questions, which are often given to a relatively small number of volunteers. The proliferation of social media, such as Twitter and Facebook, offers alternative measurement approaches: automatic content coding at unprecedented scales and the statistical power to do open-vocabulary exploratory analysis. We describe a range of automatic and partially automatic content analysis techniques and illustrate how their use on social media generates insights into subjective well-being, health, gender differences, and personality.
publicationImage

Preotiuc-Pietro, D., Eichstaedt, J., Park, G., Sap, M., Smith, L., Tobolsky, V., Schwartz, H. A., & Ungar, L. H. (2015). The Role of Personality, Age and Gender in Tweeting about Mental Illnesses. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, NAACL. bibtex

@inproceedings{pers2015clpsych, author={Preotiuc-Pietro, Daniel and Eichstaedt, Johannes and Park, Gregory and Sap, Maarten and Smith, Laura and Tobolsky, Victoria and Schwartz, H Andrew and Ungar, Lyle H}, title={{The role of personality, age and gender in tweeting about mental illnesses}}, booktitle={Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality}, series={NAACL}, year={2015}, }


Abstract...
Populations sharing similar demographics and personality traits are known to be more at risk than to have a certain mental illnesses. Our study focuses on personality and demographic influence in users tweeting about their mental illness: depression or post-traumatic stress disorder (PTSD). We find that age is a major predictor of PTSD users, with users having this illness being older than the rest. For personality, both sets of users are more neurotic and introverted and less conscientious. However, PTSD users are show more openness. Age together with the text-derived Big Five personality scores and gender show impressive predictive accuracies of around .8 AUC (area under the receiver operating characteristic curve), although lower than using all unigrams. We also study language use between populations. This analysis shows that we can recover many symptoms associated with the mental illnesses in the clinical literature. For example, depressed users disclose the presence of the two sets of core symptoms: sustained periods of low mood (dysphoria) and low interest (anhedonia).
publicationImage

Preotiuc-Pietro, D., Sap, M., Schwartz, H. A., & Ungar, L. H. (2015). Mental Illness Detection at the World Well-Being Project for the CLPsych 2015 Shared Task. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, NAACL. bibtex

@inproceedings{wwbpst2015clpsych, author={Preoc{t}iuc-Pietro, Daniel and Sap, Maarten and Schwartz, H Andrew and Ungar, Lyle H}, title={{Mental illness detection at the World Well-Being Project for the CLPsych 2015 Shared Task}}, booktitle={Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality}, series={NAACL}, year={2015}, }


Abstract...
Mental illnesses are globally widespread, but still underdiagnosed due to costly treatment, social stigma associated with them and imperfect screening methods. Screening for these illnesses through Social Media behaviour can represent a viable large-scale method. The CLPsych2015 (Computational Linguistics and Clinical Psychology) Shared Task represents the first attempt to provide an apples-to-apples comparison of automatic methods to classify users having a mental illness from their Social Media language use. The dataset consists of users disclosing to have either depression or post-traumatic stress disorder (PTSD) on Twitter. Our system represented a user as probability distributions over topic usage, where a topic is a group of words sharing similar functions, either semantic or syntactic. We combined different topic representations in a linear learning algorithm. Remarkably, our method is fully automatic and not dependent on hand-crafted lists of features. The approach ranked second in all tasks on average precision and showed best results at .1 false positive rates.
publicationImage

Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., Jha, S., Agrawal, M., Dziurzynski, L. A., Sap, M., Weeg, C., Larson, E. E., Ungar, L. H., & Seligman, M. E. (2015). Psychological Language on Twitter Predicts County-Level Heart Disease Mortality. Psychological Science 26(2), 159-169. bibtex

@article{eichstaedt2015psychological, author={Eichstaedt, Johannes C and Schwartz, H Andrew and Kern, Margaret L and Park, Gregory and Labarthe, Darwin R and Merchant, Raina M and Jha, Sneha and Agrawal, Megha and Dziurzynski, Lukasz A and Sap, Maarten and Weeg, Christopher and Larson, Emily E and Ungar, Lyle H and Seligman, Martin EP}, title={{Psychological language on Twitter predicts county-level heart disease mortality}}, journal={Psychological Science}, year={2015}, volume={26}, issue={2}, pages={159-169} }



Abstract...
Hostility and chronic stress are known risk factors for heart disease, but they are costly to assess on a large scale. We used language expressed on Twitter to characterize community-level psychological correlates of age-adjusted mortality from atherosclerotic heart disease (AHD). Language patterns reflecting negative social relationships, disengagement, and negative emotions—especially anger—emerged as risk factors; positive emotions and psychological engagement emerged as protective factors. Most correlations remained significant after controlling for income and education. A cross-sectional regression model based only on Twitter language predicted AHD mortality significantly better than did a model that combined 10 common demographic, socioeconomic, and health risk factors, including smoking and hypertension. Capturing community psychological characteristics through social media is feasible, and these characteristics are strong markers of cardiovascular mortality at the community level.
publicationImage

Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Stillwell, D. J., Kosinski, M., Ungar, L. H., & Seligman, M. E. (in press). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, Nov 3 , 2014. bibtex

@article{park2014automatic, author={Park, Greg and Schwartz, H Andrew and Eichstaedt, Johannes C and Kern, Margaret L and Stillwell, David J and Kosinski, Michal and Ungar, Lyle H and Seligman, Martin EP}, title={{Automatic personality assessment through social media language}}, year={2014}, journal={Journal of Personality and Social Psychology}, volume={108}, issue={6}, pages={934-952} }


Abstract...
We describe a personality assessment derived from an automated analysis of social media language. First, we build a model based on 66,000+ Facebook users and their personality traits, then we create a predictive model of personality based on their language. To test our model, we generate personality predictions for a new sample of 4,800 users. We compare predictions to (a) questionnaire assessments, (b) personality ratings from friends, and (c) outcomes related to personality (e.g., number of friends, political attitudes). We also assess the stability of predictions by making multiple predictions for single users at different time points and comparing predictions over time. We find that language-based assessments can constitute valid personality measures: they agree with questionnaires and friend ratings, they can be combined with friend ratings to improve accuracy, they have expected correlations to relevant outcomes, and they are stable over six-month intervals. This method can complement traditional assessments, and can quickly and cheaply assess many people with minimal burden.
publicationImage

Sap, M., Park, G., Eichstaedt, J. C., Kern, M. L., Stillwell, D. J., Kosinski, M., Ungar, L. H., & Schwartz, H. A. (2014). Developing Age and Gender Predictive Lexica over Social Media. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1146–1151. bibtex

@inproceedings{developing2014emnlp, author={Sap, Maarten and Park, Greg and Eichstaedt, Johannes C and Kern, Margaret L and Stillwell, David J and Kosinski, Michal and Ungar, Lyle H and Schwartz, H Andrew}, title={{Developing age and gender predictive lexica over social media}}, booktitle={Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing}, series={EMNLP}, year={2014}, }


Abstract...
Demographic lexica have potential for widespread use in social science, economic, and business applications. We derive predictive lexica (words and weights) for age and gender using regression and classification models from word usage in Facebook, blog, and Twitter data with associated demographic labels. The lexica, made publicly available, achieved state-of-the-art accuracy in language based age and gender prediction over Facebook and Twitter, and were evaluated for generalization across social media genres as well as in limited message situations.
publicationImage

Schwartz, H. A., Eichstaedt, J., Kern, M. L., Park, G., Sap, M., Stillwell, D., Kosinski, M., & Ungar, L. (2014). Towards Assessing Changes in Degree of Depression through Facebook. Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Association for Computational Linguistics, 118-125. bibtex

@inproceedings{schwartz2014towards, author={Schwartz, H Andrew and Eichstaedt, Johannes and Kern, Margaret L and Park, Gregory and Sap, Maarten and Stillwell, David and Kosinski, Michal and Ungar, Lyle}, title={{Towards assessing changes in degree of depression through Facebook}}, booktitle={Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality}, year={2014}, pages={118-125}, series={ACL} }


Abstract...
Depression is typically diagnosed as being present or absent. However, depression severity is believed to be continuously distributed rather than dichotomous. Severity may vary for a given patient daily and seasonally as a function of many variables ranging from life events to environmental factors. Repeated population-scale assessment of depression through questionnaires is expensive. In this paper we use survey responses and status updates from 28,749 Facebook users to develop a regression model that predicts users’ degree of depression based on their Facebook status updates.
publicationImage

Kern, M. L., Eichstaedt, J. C., Schwartz, H. A., Dziurzynski, L., Ungar, L. H., Stillwell, D. J., Kosinski, M., Ramones, S. M., & Seligman, M. E. (2014). The Online Social Self: An Open Vocabulary Approach to Personality. Assessment, 21(2), 158-169. bibtex

@article{kern2014socialself, author={Kern, Margaret L and Eichstaedt, Johannes C and Schwartz, H Andrew and Dziurzynski, Lukasz and Ungar, Lyle H and Stillwell, David J and Kosinski, Michal and Ramones, Stephanie M and Seligman, Martin EP}, title={{The online social self: An open vocabulary approach to personality}}, journal={Assessment}, year={2014}, pages={158-169}, volume={21}, issue={2} }


Abstract...
Objective: We present a new open language analysis approach that identifies and visually summarizes the dominant naturally occurring words and phrases that most distinguished each Big Five personality trait. Method: Using millions of posts from 69,792 Facebook users, we examined the correlation of personality traits with online word usage. Our analysis method consists of feature extraction, correlational analysis, and visualization. Results: The distinguishing words and phrases were face valid and provide insight into processes that underlie the Big Five traits. Conclusion: Open-ended data driven exploration of large datasets combined with established psychological theory and measures offers new tools to further understand the human psyche.
publicationImage

Kern, M. L., Eichstaedt, J. C., Schwartz, H. A., Park, G., Ungar, L. H., Stillwell, D. J., Kosinski, M., Dziurzynski, L., & Seligman, M. E. (2014). From "sooo excited!!!" to "so proud": Using language to study development. Developmental Psychology, 50(1), 178-188. bibtex

@article{kern2014from, author={Kern, Margaret L and Eichstaedt, Johannes C and Schwartz, H Andrew and Park, Greg and Ungar, Lyle H and Stillwell, David J and Kosinski, Michal and Dziurzynski, Lukasz and Seligman, Martin EP}, title={{From "sooo excited!!!" to "so proud": Using language to study development}}, journal={Developmental Psychology}, year={2014}, volume={50}, issue={1}, pages={178-188} }


Abstract...
We introduce a new method, differential language analysis (DLA), for studying human development that uses computational linguistics to analyze the big data available through online social media in light of psychological theory. Our open vocabulary DLA approach finds words, phrases, and topics that distinguish groups of people based on one or more characteristics. Using a dataset of over 70,000 Facebook users, we identifyhow word and topic use vary as a function of age, and compile cohort specific words and phrases into visual summaries that are face valid and intuitively meaningful. We demonstrate how this methodology can be used to test developmental hypotheses, using the aging positivity effect (Carstensen & Mikels, 2005) as an example. While this study focuses primarily on common trends across age-related cohorts, the same methodology can be used to explore heterogeneity within developmental stages or to explore other characteristics that differentiate groups of people. Our comprehensive list of words and topics are available on our website for deeper exploration by the research community.
publicationImage

Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E., & Ungar, L. H. (2013). Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLOS ONE, 8(9), e73791. bibtex

@article{schwartz2013personality,, author={Schwartz, H Andrew and Eichstaedt, Johannes C and Kern, Margaret L and Dziurzynski, Lukasz and Ramones, Stephanie M and Agrawal, Megha and Shah, Achal and Kosinski, Michal and Stillwell, David and Seligman, Martin EP and Ungar, Lyle H}, title={{Personality, gender, and age in the language of social media: The Open-Vocabulary approach}}, journal={PLoS ONE}, year={2013}, }


Abstract...
We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in language with personality,gender, and age. In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase .sick of. and the word .depressed.), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive .my. when mentioning their .wife. or .girlfriend. more often than females use .my. with .husband. or .boyfriend.). To date, this represents the largest study, by an order of magnitude, of language and personality.
publicationImage

Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Lucas, R. E., Agrawal, M., Park, G. J., Lakshmikanth, S. K., Jha, S., Seligman, M. E. P., & Ungar, L. H. (2013). Characterizing Geographic Variation in Well-Being using Tweets. Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM), Boston, MA. bibtex

@inproceedings{schwartz2013characterizing, author={Schwartz, H Andrew and Eichstaedt, Johannes C and Kern, Margaret L and Dziurzynski, Lukasz and Lucas, Richard E and Agrawal, Megha and Park, Gregory J and Lakshmikanth, Shrinidhi K and Jha, Sneha and Seligman, Martin E P and Ungar, Lyle H}, title={{Characterizing geographic variation in well-being using tweets}}, booktitle={Proceedings of the 7th International AAAI Conference on Weblogs and Social Media}, year={2013}, series={ICWSM} }


Abstract...
The language used in tweets from 1,300 different US counties was found to be predictive of the subjective well-being of people living in those counties as measured by representative surveys. Topics, sets of co-occurring words derived from the tweets using LDA, improved accuracy in predicting life satisfaction over and above standard demographic and socio-economic controls (age, gender, ethnicity, income, and education). The LDA topics provide a greater behavioural and conceptual resolution into life satisfaction than the broad socio-economic and demographic variables. For example, tied in with the psychological literature, words relating to outdoor activities, spiritual meaning, exercise, and good jobs correlate with increased life satisfaction, while words signifying disengagement like `bored’ and `tired’ show a negative association
publicationImage

Schwartz, H. A., Eichstaedt, J. C., Dziurzynski, L., Kern, M. L., Blanco, E., Kosinski, M., Stillwell, D., Seligman, M. E. P., & Ungar, L. H. (2013). Toward Personality Insights from Language Exploration in Social Media. Proceedings of the AAAI Spring Symposium Series: Analyzing Microtext, Stanford, California, USA. bibtex

@inproceedings{schwartz2013toward, author={Schwartz, H Andrew and Eichstaedt, Johannes C and Dziurzynski, Lukasz and Kern, Margaret L and Blanco, Eduardo and Kosinski, Michal and Stillwell, David and Seligman, Martin E P and Ungar, Lyle H}, title={{Toward personality insights from language exploration in social media}}, booktitle={Proceedings of the AAAI Spring Symposium Series: Analyzing Microtext}, year={2013}, }


Abstract...
Language in social media reveals a lot about people’s personality and mood as they discuss the activities and relationships that constitute their everyday lives. Although social media are widely studied, researchers in computational linguistics have mostly focused on prediction tasks such as sentiment analysis and authorship attribution. In this paper, we show how social media can also be used to gain psychological insights. We demonstrate an exploration of language use as a function of age, gender, and personality from a dataset of Facebook posts from 75,000 people who have also taken person- ality tests, and we suggest how more sophisticated tools could be brought to bear on such data.
publicationImage

Schwartz, H. A., Eichstaedt, J. C., Dziurzynski, L., Kern, M. L., Blanco, E., Ramones, S., Seligman, M. E. P., & Ungar, L. H. (2013). Choosing the Right Words: Characterizing and Reducing Error of the Word Count Approach. Proceedings of *SEM-2013: Second Joint Conference on Lexical and Computational Semantics, Atlanta, Georgia, USA. 296-305. bibtex

@inproceedings{schwartz2013choosing, author={Schwartz, H Andrew and Eichstaedt, Johannes C and Dziurzynski, Lukasz and Kern, Margaret L and Blanco, Eduardo and Ramones, Stephanie and Seligman, Martin E P and Ungar, Lyle H}, title={{Choosing the right words: Characterizing and reducing error of the word count approach}}, booktitle={Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics}, year={2013}, pages={296-305}, series={*SEM} }


Abstract...
Social scientists are increasingly using the vast amount of text available on social media to measure variation in happiness and other psychological states. Such studies count words deemed to be indicators of happiness and track how the word frequencies change across locations or time. This word count approach is simple and scalable, yet often picks up false signals, as words can appear in different contexts and take on different meanings. We characterize the types of errors that occur using the word count approach, and find lexical ambiguity to be the most prevalent. We then show that one can reduce error with a simple refinement to such lexica by automatically eliminating highly ambiguous words. The resulting refined lexica improve precision as measured by human judgments of word occurrences in Facebook posts