the language and psychological features of dreams

the language and psychological features of dreams (PDF)

2022 • 13 Pages • 141.28 KB • English
Posted July 01, 2022 • Submitted by Superman

Visit PDF download

Download PDF To download page

Summary of the language and psychological features of dreams

Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology, pages 13–25, Vancouver, Canada, August 3, 2017. c⃝ 2017 Association for Computational Linguistics In your wildest dreams: the language and psychological features of dreams Kate G. Niederhoffer Circadia Labs [email protected] Jonathan Schler Circadia Labs [email protected] Patrick Crutchley Qntfy [email protected] Kate Loveys Qntfy [email protected] Glen Coppersmith Qntfy [email protected] Abstract In this paper, we provide the first quanti- fied exploration of the structure of the lan- guage of dreams, their linguistic style and emotional content. We present a collec- tion of digital dream logs as a viable cor- pus for the growing study of mental health through the lens of language, complemen- tary to the work done examining more tra- ditional social media. This paper is largely exploratory in nature to lay the ground- work for subsequent research in mental health, rather than optimizing a particular text classification task. 1 Introduction Despite a prominent role in the origin of psychol- ogy (Freud, 2013; Jung, 2002), scientific research about the meaning and value of dreams has waned in the 21st century. Cartwright (2008), for one, has argued that dreams lost their prominence in the latter half of the 20th century as psychology attempted to become a more empirical science fo- cused on observable behavior and mental activity and less reliant on memory. In the last decade, the distinctive brain patterns of dreaming have be- come more identifiable (Siclari et al., 2017) and research has amassed on the impact of dreams on waking life with links to mood (Cartwright, 2013), relationship health (Selterman et al., 2012) and decision-making (Morewedge and Norton, 2009). While scientists debate the purpose of dreams (Barrett, 2007; Cartwright et al., 2006), dreams continue to be a universal and time intensive ex- perience across humanity. Until recently, dreams remained an offline phe- nomena, qualitatively separate from other forms of social interaction via social media. Online platforms such as Facebook and Twitter are fer- tile grounds for research in social science (Wil- son et al., 2012; boyd and Ellison, 2007) and more recently, in mental health via computational approaches in text analysis (Pennebaker et al., 2015; De Choudhury et al., 2013; Coppersmith et al., 2014) and network structure (Christakis and Fowler, 2014). However, dreams have remained as private, albeit important conversational currency (Wax, 2004). When dreams are studied, they are gathered from sleep labs, psychotherapeutic and inpatient settings, personal dream journals and oc- casionally classroom settings where “most recent dreams” and “most vivid dreams” are collected (Domhoff, 2000). The recent development of a so- cial network dedicated to dreams offers scientists unprecedented access to the language of dreams at scale, collected with consistent methodology. Un- derstanding the structure of this large corpus of dreams gives us access to previously unobserv- able mental activity and enables future research to identify abnormal patterns in themes, emotional tone, and styles associated with mental health di- agnoses and therapeutic outcomes. We begin with a brief overview of the impetus for this work and a discussion of related work in the intersection of dreams and text analysis. We then provide details on the corpus of dreams and discuss our results organized around three research questions. The paper concludes with implications for subsequent research on dreams, both to better understand nuances in the medium, and for mental health purposes. 1.1 Previous research on dream content and text analysis Dreams are challenging to understand. Dreams are a diverse medium that vary from being per- ceptual or cognitive, from involving simple set- tings to complicated narratives, which may be sim- ilar or dissimilar to waking life (Siclari et al., 13 2017). Analyzing them is similarly complex; re- searchers have put extensive effort into the de- velopment of systems to score their global con- tent, specific themes, psychological intensity, and theoretical underpinnings (Schredl, 2010). Dif- ferent researchers, research goals, collection ve- hicles and analytic techniques present issues in replication, reliability and the validity of standard- ized methods for the content analysis of dreams. The Hall-Van de Castle coding system is the most comprehensive protocol for content analysis of dreams, with eight main categories and over 300 sub scales in the dream manual (Hall and Castle, 1966). Categories include: Physical surroundings (e.g. indoor, outdoor), Characters (e.g. persons, animals), Social interactions (e.g. friendly vs. ag- gressive), Activities (e.g. communication, think- ing), Achievement outcomes (e.g. success, fail- ure), Environmental press (e.g. fortune, misfor- tune), Emotions (e.g. anger, happiness), Descrip- tive elements (e.g. size, age, color), and Theoreti- cal scales (e.g. castration anxiety, regression). A handful of studies have used automated text analysis to explore dreams, specifically to discern differences from waking narratives and identify the relationship between dream language and per- sonality (Hawkins and Boyd, in press), for auto- mated sentiment detection (Nadeau et al., 2006) and to distinguish linguistic features from personal narratives (Hendrickx et al., 2016). To our knowl- edge, no study has examined as large a sample of dreams from a naturalistic setting (neurotyp- ical research participants, online social context) across methodologies for psychological purposes (i.e. non classification/ non hypothesis driven). Hawkins and Boyd (in press) analyze dreams across three samples of recent dream reports, two undergraduate and one sample from Amazon’s Mechanical Turk1. Using Linguistic Inquiry and Word Count (Pennebaker et al., 2007), they find a distinctive pattern for recent dreams that dif- fers from the base rate norms for waking narra- tives, specifically characterized by more function words, common words, pronouns, personal pro- nouns, first person pronouns, past tense verbs, and more use of words describing leisure activities; less use of present tense and future tense verbs, causation words, second person pronouns, num- bers, swear words, and assent words. They did not 1Mechanical Turk users do short human intelligence tasks for small payments. For more see http://www.mturk. com. find consistent relationships between dream lan- guage features and personality. Hawkins & Boyd’s research paves the way for understanding how and why a dream narrative differs from a waking narra- tive and what these differences mean from a psy- chological perspective. For example, what does it mean for a dream to have more function words than a waking narrative? What is the relationship between the content of dreams and the more “in- visible” word differences (pronouns, prepositions, articles)? Nadeau et al. (2006) also used LIWC on dreams to gauge the efficacy of automated sentiment anal- ysis to bypass human judges or dreamer esti- mates of emotion. Comparing the performance of LIWC, the General Inquirer, a weighted lexicon (HM) and standard bag of words approach, they find machine learning outperforms human judg- ments - and specifically demonstrate that LIWC and the GI have the best features for sentiment classification. While a step in a promising di- rection, Nadeau et al.’s sample was small (100 dreams from 29 individuals) and sentiment was classified on a limited negative scale (4-class, from neutral to highly negative) omitting nuance in the purported emotional content of dreams, c.f. Cartwright (2013). Hendrickx et al. (2016) looked at the distin- guishing features from dreams as compared to per- sonal narratives (diary entries from Reddit and personal stories from Prosebox) via text classifi- cation, topic modeling and text coherence. The authors find dreams can be classified with near perfect precision based on the presence of un- certainty markers (somebody, remember, some- where, recall) and descriptions of scenes (set- ting, riding, building, swimming, table, room), with lower discourse coherence. Personal narra- tive markers (non-dream) include time (2014, to- day, tonight, yesterday, day, months) and conver- sational expressions (please, :), ?, thanks). Hen- drickx et al. also applied LDA topic modeling to explore the main themes in dreams as compared to personal narratives validating the classification results. Dream topics span everyday activities, setting descriptions, and uncertainty expressions. The Hendrickx et al. research is notable in its exploration of male vs. female topic distributions within dreams in addition to comparisons across corpus type (dream vs. personal narrative) though does not explore the relationship between topic 14 and emotion and excludes the analysis of func- tion words, which we believe is a critical piece in understanding the psychological value of dreams and dreamers, given previous findings (Chung and Pennebaker, 2007). 1.2 Relevant research on mental health and text analysis Computational text analysis allows for assessment of larger samples and proactive identification of mental illness. Language in social media can indi- cate the likelihood a user self-reports a particular mental disorder (Coppersmith et al., 2015), or has received a mental health diagnosis (De Choudhury et al., 2013). The language of online dreams has yet to be analyzed relative to mental health condi- tions, however prior laboratory research suggests that dream content may differ between clinical conditions. We refer the reader to Skancke et al.’s comprehensive review of dream content grouped by clinical disorder (Skancke et al., 2014). In brief, patterns in emotional tone, themes, and ac- tor focus have been associated with diagnoses of mood and anxiety disorders, schizophrenia, per- sonality, and eating disorders. Though, it remains unclear whether dream content can distinguish be- tween clinical disorders. Nightmares are especially relevant to mental health, featuring as a diagnostic symptom for post- traumatic stress disorder (Campbell and Germain, 2016), and a common correlate with schizophre- nia (Okorome Mume, 2009), depression and anxi- ety (Swart et al., 2013), and personality disorders (Schredl et al., 2012). Nightmare frequency and intensity have been positively correlated with in- cidence of suicidal thoughts and behaviors (Bern- ert et al., 2005), suggesting nightmares could be a near-term risk factor to assess during crisis. In sum, analysis of dream topics and emotional tone may provide some insight to the mental health of the dreamer. 2 Data Dreams were collected from DreamsCloud, a so- cial network for sharing dreams. DreamsCloud is available to the public; those who register for the site are informed that their data can be used for research purposes. DreamsCloud is moderated by professional dream reflectors who comment on dreams, in addition to the broader community of registered users who can also “like” and comment on dreams. DreamsCloud has the largest available digital collection of dreams with over 119k dreams from 73k users and an overall community of over 300k registered users. Visitors to the site come from 234 countries (according to Google Analytics) and have shared dreams in 8 languages. DreamsCloud differs from online dream banks in that dreams are voluntarily shared for social purposes rather than collections from research studies. A random sample of 10k English dreams over 100 words from September 1, 2013 through De- cember 31, 2016 was used in this study. Data cleansing removed 322 dreams due to incorrectly classified language (Spanish), lyrics or news con- tent copied from the Internet by the user, and duplicated data. The remaining sample included 9,678 dreams. No additional data about the gen- der, age, name, or ethnicity of the participants are included in our study. Only the original dream texts are analyzed. While DreamsCloud has com- ments and conversations around many of these dreams, we put off analysis of commentary for subsequent research and focus directly on the first- person accounts of dreams. The average length of dreams in the sample is 208 words (SD = 116.7). Data is organized by an encrypted alphanumeric Dreamer ID and a unique, encrypted alphanumeric Dream ID for each dream logged. 2.1 Ethical considerations While community members agree to Terms of Ser- vice that explicitly state their content is owned by the company and will be used for research pur- poses, the nature of the content is very intimate. Because of the unknowns about the science be- hind why we dream, what our dreams mean, how dreams are related to life events, there is less of a stigma about sharing otherwise private or bizarre information. The site refers to dream-sharing as an “anonymous-as-you-want” activity. Although the analyses in this paper are structural and aggre- gate in nature, deeper analysis of this data could raise privacy concerns as well as questions about appropriate intervention. Our hope is that addi- tional research in this area will shed light on the relationship between dreaming and waking life to help address these questions. 15 3 Results Three approaches are used to examine the dream narratives: content analysis using an LDA topic model (Blei et al., 2003), analysis of linguistic style via function words using LIWC (Pennebaker et al., 2015), and categorization of emotions us- ing an emotion classification model (Coppersmith et al., 2016). 3.1 The topical structure of dreams Topic models are statistical models which dis- cover topics in a corpus. Topic modeling is es- pecially useful in large data, where it is too cum- bersome to extract the topics manually. Due to the large volume of dreams in our corpus and the lack of prior knowledge about their subjects, we follow other content-based studies in employing topic modeling to understand the content of the dreams (Kireyev et al., 2009; Yin et al., 2011; Chae et al., 2012; Mitchell et al., 2015; Hendrickx et al., 2016). We analyzed the topical structure of the dream corpus using a popular topic modeling algorithm, latent Dirichlet allocation (LDA) (Blei et al., 2003). LDA is an algorithm for the auto- mated discovery of topics. LDA treats documents as a mixture of topics, and topics as a mixture of words. Each topic discovered by LDA is repre- sented by a probability distribution which conveys the affinity for a given word to that particular topic. We used the LDA implementation available in the Mallet package (McCallum, 2002). We con- verted the text to lower case and, because the topic analysis is focused on content of dream narra- tives, excluded all function words and punctua- tion marks. (Function and style will be consid- ered in the following section.) No reduction in in- flection (i.e. stemming, lemmatization) was per- formed to satisfy the goals of exploring the nu- ance of dream narratives as a medium and subse- quently make inferences about the psychological orientation of the authors (see section 3.2). Fur- ther, in order to make more valid comparisons to the existing literature based on human coding, it is important to understand how distributions of sin- gular vs. plural nouns and present vs. past tense verbs, for example are distributed topically. We selected 25 topics for LDA to infer and used 2000 iterations of Gibbs sampling to fit the model. The number of topics was informed by maximizing the computed information gain of the resulting feature sets, while maintaining a reasonable training time. LDA provides insightful information about the topics in the corpus. However, interpreting the ‘aboutness’ of a topic based on a list of words re- quires human judgment based on term frequency, exclusivity, meaning, and subjective inference. In- terestingly, we found 23 of 25 topics to be inter- pretable based on semantic meaning and 2 (Top- ics 17 and 22) which appeared more syntactically related. Most heavily weighted topic words are quoted in results tables, and the full 25-topic dis- tribution with manual labeling is included in Ap- pendix A. Note that the topic number is randomly assigned by LDA and does not indicate anything meaningful like rank, weight, or importance. Although we utilize a 25-topic solution as com- pared to Hendrickx et al.’s 50-topic solution, we see some consistency in the topics identified as characteristic of dream narratives. Specifically, we see similar support for the continuity hypoth- esis of dreams - that dreams are a continuation of waking life activities - in topics such as Topic 19 about School, Topic 12 about food and eating, and Topic 15 about driving and cars. Similar to their research, we also see clustering of present tense verbs in Topic 0, a water topic (11), and home set- tings topic (5). We see an almost exact replica- tion of their “dreaming in general,” in our Topic 18. Comprehensive comparisons in distributions or characteristic words are not possible with the data their published research makes available. In inspecting the topical distribution and not- ing the support for the continuity hypothesis, what also stands out is the lack of support for the ‘dreams-as-psychotic-state’ hypothesis. Be- ginning with Freud and Jung, researchers have drawn similarities between dreaming and psy- chosis. These similarities range from phenomeno- logical to neurobiological, qualitatively mani- fested as a loosening of associations, incongruity and bizarreness of personal experience, and distor- tion of time and space parameters (Scarone et al., 2008). Reviewing the content of our 25-topic solu- tion, we see no reason to interpret the clustering of words within any given topic as incongruous nor do we detect support for the content to be evalu- ated as “bizarre” (Hobson et al., 1987). The topics instead appear closely aligned with reality, reflec- tive or overt (actions) and covert (thoughts) behav- iors and demonstrate semantic congruity within topic. However, an automated approach to coding as subjective a construct as bizarreness demands 16 inspection beyond content words alone. LDA is an effective means to understand the distribution of content words in a given corpus. Importantly, it was developed for the purpose of dimensionality reduction - document summariza- tion and information retrieval (Blei et al., 2003). Some of the assumptions that enable the algo- rithms behind topic models, such as the exclusion of words that have no content relevance (e.g. func- tion words), leave room for additional methods to explore the psychological meaning of a given doc- ument, the author’s mindset, and emotions. 3.2 The linguistic style of dreams Recent research on language from a psycholog- ical perspective demonstrates that function word use reflects and is a reliable marker of personality and a range of social and psychological processes, cognitive thinking styles and psychological states (Pennebaker, 2011). Pennebaker proposes that function words are the infrastructure for thought and perspective: they connect (e.g. conjunctions, auxiliary verbs), shape (e.g. pronouns) and or- ganize (e.g. articles, prepositions) content. Con- tent is important in dreams, and often metaphori- cal (Lakoff, 1993). The style in which we remem- ber and share our dreams can give important clues to how we make sense of our dreams, and in turn, ourselves. Said another way, our goals in this pa- per are not just to explore the stuff that dreams are made of but the style of dreams as a reflection of the dreamers’ psychological states. With multiple lenses on the data, we can obtain an enhanced pic- ture of the psychological value of the corpus. LIWC categorizes the words in a given text into approximately 80 variables. Variables rep- resent the proportion of words in a given docu- ment (i.e. dream) that correspond to a lexicon composed of different categories of words, includ- ing function words (pronouns, prepositions), af- fect words (positive emotion, anxiety), and content words (money, religion, leisure activities). We re- duced the window of interest in LIWC categories to function words, affect, and cognitive processes, as justified by what remains from the LDA analy- sis (e.g. functions words) and comparisons to re- sults from the empirical literature described thus far (Hawkins and Boyd, in press; Nadeau et al., 2006). Table 1 shows the means and SDs for all LIWC categories within the Linguistic Processes dictionaries with Cognitive, Social and Affective Processes added. Unweighted means from the aggregated sample of expressive writing in Pen- nebaker et al. (2015) are provided for context. As compared to the base rates from expres- sive writing (Pennebaker et al., 2015), a dream narrative comes across as a first person (1st per- son pronouns) account of a past event (past tense) with particular attention to people (family, friends, women, and men), objects (articles), locations (prepositions) and what is seen, heard, and felt (perceptual processes) more than known or under- stood (cognitive processes). Low cognitive processes (M = 9.29; SD = 3.48) would suggest dreamers are not on a search for meaning in sharing their dreams, however it is un- clear if this is a case of displaced cognitive pro- cessing due to the more dominant perceptual expe- rience of dreams. Previous research indicates that narrative coherence has an inverse relationship with cognitive processing words (Klein and Boals, 2010; Boals et al., 2011). Boals et al. (2011) show that cognitive process words are related to sense making as a process which occurs prior to the de- velopment of a narrative (sense making as an out- come). This might suggest that dreamers do not tend to be caught up in why they had a given dream as much as explaining what happened. In other words, dreams are shared as complete sto- ries. A dream narrative’s low proportion of emo- tion words (Mean Affect = 3.42, SD= 1.90) are un- expected given recent research on the emotion reg- ulatory function of dreams and call for additional investigation, which we address below. One pos- sibility is the sensitivity of a lexicon-based instru- ment to the way in which emotions are expressed in dream narratives. In general, our findings are consistent with Hawkins and Boyd (in press), de- spite differences in the collection vehicle (recall: Hawkins and Boyd use the ‘most recent dream’ and ‘most vivid dream’ paradigm) and previous version of LIWC (2007 vs. 2015). 3.3 How is language style related to the content of dreams? To explore the relationship between dream topic and language style, we focus on function words only: pronouns, prepositions, articles, auxiliary verbs, and negations. In particular, we use an in- dex composed of the proportions of these classes of words called the Categorical Dynamic Index (CDI; Pennebaker et al. 2014) that measures the 17 Dreams (n=9,678) Expressive Writing (n=6,179) Mean SD Mean SD Word Count 208.85 116.61 408.94 248.23 Words per sen- tence 30.34 40.49 18.42 14.89 Words < 6 let- ters 11.66 3.41 13.62 4.12 Dictionary words 91.87 4.06 91.93 5.03 Total Function Words 60.04 4.32 58.27 6.26 Total Pronouns 19.72 4.31 18.03 5.36 Personal Pro- noun 14.87 4.17 12.74 4.28 1st person sing. 9.54 3.36 8.66 4.25 1st person plur. 1.24 1.54 0.81 1.22 2nd person 0.27 0.65 0.68 2.14 3rd person sing. 3.06 2.71 2.01 2.95 3rd person plur. 0.77 1.05 0.57 0.82 Impersonal Pronoun 4.82 2.13 5.28 2.36 Articles 6.99 2.62 5.7 2.56 Prepositions 13.99 2.67 14.27 2.82 Auxiliary verbs 8.08 2.38 9.25 3.06 Adverbs 5.03 2 6.02 2.3 Conjunctions 8.52 2.62 7.46 2.06 Negations 1.4 1 1.69 1.25 Cognitive Pro- cesses 9.29 3.48 12.52 5.11 Social Pro- cesses 11.18 5.07 8.69 5.46 Affective Pro- cesses 3.42 1.9 4.77 2.59 Positive emo- tion 1.64 1.4 2.57 1.63 Negative emo- tion 1.75 1.37 2.12 1.74 Table 1: Linguistic Processes Categories in LIWC2015 extent to which thinking is Categorical (high prepositions, articles) versus Dynamic (pronouns, auxiliary verbs). The CDI is a simple unit-weighted computation which adds the proportions of articles and preposi- tions and subtracts personal pronouns, impersonal pronouns, auxiliary verbs, conjunctions, adverbs and negations. It has been shown to be a reliable marker of cognitive style which we use to under- stand differences in the experience of various top- ics in dreams. Being categorical versus dynamic are different ways of sense-making. One of the goals of our research is to understand how people use “the dream” as a medium on the path to self insight and social connection. In the most basic sense, do people share dreams about certain top- ics as a narrative personal experiences indicating changes over time? Do certain topics lend them- selves to a more distant style- stories of what hap- pened to whom with precise descriptions of events and goals? The top five Categorical dream topics and top five Dynamic topics are depicted in Table 2. Top- ics that are the most categorical are primarily marked by physical environments: trees, sky, house, beach, road. Dynamic dream narratives are characterized by intimate relationships (baby, mom, boyfriend, sister) and experiences (remem- ber, time). The CDI acts a shortcut to identify those dreams that are experienced as a narrative, potentially offering cues to the role of the dreamer as the main character, a distinguishing factor in dreams of healthy controls as compared to psychi- atric patient samples (Skancke et al., 2014). Ad- ditionally, this shortcut points to a style of dream that would be difficult to discern with a topical lens only; that is, interpersonal situations with multi- ple characters and complex relationships. Interest- ingly, Cartwright et al. (1984) find that complex dreams containing multiple characters and shifts of scenes were one marker of depression remis- sion in their five month longitudinal REM track- ing study. Appendix B includes two samples of dreams with high and low CDI scores. LDA Topic Words Charac- terizing Topic Correlation with CDI (Pearson’s r) Categorical 13 walking tree trees small forest 0.25 8 see sky plave flying building 0.21 5 room door house floor stairs 0.2 11 water pool beach boat swimming 0.17 15 car driving road bus drive truck 0.11 Dynamic 21 baby hospital boy pregnant girl -0.12 4 mom dad house brother sister -0.13 18 remember know time think -0.16 17 guy phone told boyfriend -0.22 9 friend guy boyfriend friends -0.34 Table 2: Top and Bottom Five dream Topics on CDI continuum 18 3.4 The emotional landscape of dreams One of the goals of this paper is to investigate how emotions are revealed in dreams, which emo- tions, and how they vary with the topics that emerge. One prominent hypothesis in dream re- search posits that the function of dreams is to help regulate negative emotion by “intervening” be- tween waking emotional concerns and post sleep mood (Cartwright, 2008). Much of the literature points to a central role for emotions in dreams, yet there are inconsistencies in the frequencies of the emotional array detected and their valance. The inconsistencies are dependent on a similar vari- ety of reasons to those cited above which make standardized dream content analysis challenging, with the added challenge that make emotions dif- ficult to detect and discern in the broader computer science literature (Sikka et al., 2014; Schredl and Doll, 1998). For example, Merritt et al. (1994) tested a small student population (n=20) and found that there are an average of 3.6 emotions per dream with 95% of dreams having at least one emotion, with fear being the most pervasive. This is di- rectionally consistent with Hall and Castle (1966) who find negative emotions to be more prominent, however the frequencies vary. Sikka et al. (2014) find consistent differences in the external judg- ments of emotions in dreams as compared to self ratings. The predicted labels of each dream nar- rative should not be taken as a definitive represen- tation of the overall emotion of that narrative (a difficult task for even human annotators to accom- plish consistently; see Purver and Battersby 2012). Instead, these results should be viewed as an addi- tional feature of each narrative, able to be evalu- ated automatically and quickly to gain insight and explore broader trends. In our exploration of language style with a lexicon-based approach, LIWC detected a low proportion of affect (Mean Affect = 3.42, SD= 1.90). To assess the emotional content of dreams in an unsupervised manner (i.e., without annotat- ing each narrative manually), we turn to a model for classifying emotional content from text. (We briefly summarize here, but for complete details, see Coppersmith et al. 2016.) A series of character language models (one for each of anger, fear, joy, sadness, surprise, and no emotion) are trained on a large corpus of Twitter data with an included emo- tional hashtag, e.g., “#anger”. Tweets contain- ing indications of sarcasm were removed. Tweets were labeled by the emotional hashtag contained, and then that hashtag was removed for training the model, thus learning what words might contribute to something being tagged “#anger”. A two-step semi-supervised process is used to produce the no- emotion model, since most tweets with emotional content are not labeled with #[emotion]. (We also scored each narrative using the Mohammad and Turney 2013 NRC Emotional Lexicon and opted for the character language models for greater vo- cabulary coverage and possible explicit “no emo- tion” label.) We apply each of the emotion character lan- guage models (CLM) to each of the dream nar- ratives, producing a probability that each narra- tive’s content results from each emotion’s CLM. We then label that narrative with the maximum- probability emotion. Concretely, we expect dreams to have a mixture of emotions, and this technique is likely to surface the dominant emo- tion in the dream (as measured by the number of words used that indicate that emotion). Percent breakdown of predicted emotion labels were as follows: sadness, 31.6%; fear, 21.0%; surprise, 19.9%; joy, 18.7%; anger, 8.7%; no emotion, 0.0%. Only two narratives out of almost 10,000 were labeled no-emotion, and only 6 had the no- emotion label above 10% of the estimated emo- tional content within a dream; see caveats of this approach below. To continue to deepen our understanding of the psychological value of the corpus and gain in- sight on the relationship between dream content and emotion, we correlate each emotion’s CLM probability with each of the 25 LDA topics. Table 3 shows the most positively-correlated topic and most negatively-correlated topic for each emotion. Consistent with previous research (Merritt et al., 1994; Hall and Castle, 1966), we demonstrate emotions present in all dreams, with more negative than positive emotion: 61.3% negative emotions (sadness, fear, anger), and sadness as the domi- nant emotion. Drawbacks of this approach of re- lying on self-stated emotional content tags are out- lined in Coppersmith et al. (2016). In short, even given the two-step semi-supervised method of ob- taining the most emotionally neutral tweets possi- ble to use as no-emotion exemplars, it is likely that some nontrivial percentage of the tweets contain significant emotional content. In addition, even in a single tweet, emotional content is often mixed, 19 and the training method employed allows for only one label that may not be sufficiently descriptive. Perhaps the largest caveat of these results comes from the mismatch between the Twitter data the model was trained on and the dream data it is ap- plied to here. The featurization and parameters of the model are optimized for Twitter messages that are constrained to 140 characters, while the dream narratives are 1,047 characters on average (SD 716). Content varies as well; the dream nar- ratives, at least in theory, have a consistent purpose and theme: recounting the content of a dream. Content of tweets is incredibly varied, from a seg- ment of a story, meant to be read in the context of additional tweets; to a single hyperlink, perhaps with a few words of commentary; to a single emoji repeated 140 times. Future research directions in- clude training a semi-supervised emotion classi- fier that includes the dream narratives to general- ize better across domains. Topic number Correlation with topic (Spearman ρ ) Words characterizing topic Anger 16 0.187 people kill man trying guy gun shot killed 9 -0.08 friend guy boyfriend friends love girl Fear 18 0.17 remember know time think felt life feeling 19 -0.139 school class teacher high game friend friends Joy 0 0.151 see says look know comes walk run looks 9 -0.13 friend guy boyfriend friends love girl Sadness 9 0.237 friend guy boyfriend friends love girl 0 -0.101 see says look know comes walk run looks Table 3: Most positively and negatively-correlated topics for each emotion 4 Conclusion Our paper presents three types of analyses on an innovative corpus. First we explored the content of dreams with LDA topic modeling. The results demonstrate topics easily interpreted by a human including everyday activity, dreaming itself, and themes common in the dream literature (teeth, ani- mals, flying). These results are consistent with the limited amount of existing research in this area. Our second lens on the data using LIWC portrays dreams, in general, as first person accounts of past events with disproportionate social references and abstract descriptions of settings. Dreams tend to focus on perceptual processes more than cogni- tive processes. However, there are qualitative dis- tinctions in the content of dreams such that cer- tain topics are experienced as dynamic and oth- ers, more categorical. Lastly, we further explored the emotional content in dreams with an unsuper- vised approach. Our results indicate that emo- tion is present in dreams and is disproportionately negative, with the most common emotion being sadness. With a sensitive tool, emotion can help disambiguate content in dreams that would oth- erwise be lumped together, for example dreams about friends, romance, and love which show a complex configuration of emotion. One major question that underlies this paper is whether we are investigating how we dream or how we story and share our dreams. In fu- ture research, we hope to compare dream data to other corpora to better understand how this way of knowing a person, through their dreams, is re- lated to other forms of self expression. Identifying a reasonable comparative dataset for dreams col- lected from a social network is challenging. This data set is unique in its length (e.g. 140 charac- ter Tweets vs. 210 word dreams), content (inti- mate and quotidian content), and purpose (these dreams are shared for social connection and inter- action) making most social media, which would otherwise present the appropriate scale and date range, a poor fit. Interpreting topics in dreams is extra challeng- ing because there is no ground truth. Language style and emotional classification enhance our un- derstanding of topics and the mindset of a given dreamer, but it is as of yet unclear whether there are individual differences in the way dreams are experienced, or whether dreams are ‘victims’ of our memories and are yet another corpus to ex- plore the same individual differences we might see in conscious thought. Continued research on dreams over time, dreamers across media and a variety of facets within dream data as compared to different outcome measures (personality, etc.) 20 will help address this concern. Another limitation in our research is lack of in- formation about potential skew in the data. For ex- ample, there may be biases in who shares dreams and why; who knows about and has access to the social network. We also did not have access to ground truth of user mental health information, so we did not analyze dream content relative to clin- ical disorders. At this time, site behavior is un- reliable at the level of dream reporting to tell us whether there is any systematic bias in who pro- vides dreams. Future studies will certainly explore demographic variables including age, sex, race, socioeconomic status, education level, in addition to variables related to belief in dreams, dream fre- quency and other psychological attributes which would make people more or less likely to share their dreams. Additionally, future research could investigate associations between mental disorder diagnoses and the content of dreams. This is a preliminary investigation into a vast data set with many additional variables to explore. Much like this field has used social media data as a lens to study the conscious waking percep- tions, emotions, and thought processes of individ- uals with mental health conditions, we see this as a complementary set of quantifiable signals re- lated to the person’s unconscious processes. While more traditional social media data is a convolution of the person’s internal state and the world they in- habit, we see this dream data as a convolution of their dreaming self, as recalled and recorded by their waking self. Considered in context of the Fluid Vulnerability Theory, dream content could serve as one of many dynamic, near-term risk fac- tors for detecting transitions into psychological crisis (Rudd, 2006). Given the richness of so- cial media data for uncovering unknown signals related to mental health, we strongly suspect this data may hold similar and complementary power. In sum, our paper offers preliminary evidence that the language of dreams can be an insightful contribution to human-centric big data, as a means for an enhanced understanding of human behav- ior and cognition alongside standard psycholog- ical means and modern neuroimaging. Paired with large scale analysis of social media language, Internet behavior, and wearable sensor informa- tion that predict mental health, the language of dreams could serve as an additional data source from which to evaluate mental health by digital life traces. References Deirdre Barrett. 2007. An evolutionary theory of dreams and problem-solving. In The New Science of Dreaming: Content, Recall, and Personality Corre- lates, Praeger Publishers, volume 2, pages 133–154. Rebecca A. Bernert, Thomas E. Joiner, Kelly C. Cukrowicz, Norman B. Schmidt, and Barry Krakow. 2005. Suicidality and sleep disturbances. Sleep 28(9):1135–1141. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Ma- chine Learning Research 3:993–1022. Adriel Boals, Jonathan B. Banks, Lisa M. Hathaway, and Darnell Schuettler. 2011. Coping with Stressful Events: Use of Cognitive Words in Stressful Nar- ratives and the Meaning-Making Process. Journal of Social and Clinical Psychology 30(4):378–403. danah m. boyd and Nicole B. Ellison. 2007. So- cial network sites: Definition, history, and scholar- ship. Journal of Computer-Mediated Communica- tion 13(1):210–230. 6101.2007.00393.x. Rebecca L. Campbell and Anne Germain. 2016. Nightmares and Posttraumatic Stress Disorder (PTSD). Current Sleep Medicine Reports 2(2):74– 80. R. Cartwright. 2013. History of the Study of Dreams. In Clete A. Kushida, editor, Encyclopedia of Sleep, Academic Press, Waltham, pages 124–128. DOI: 10.1016/B978-0-12-378610-4.00028-0. Rosalind Cartwright. 2008. The Contribution of the Psychology of Sleep and Dream- ing to Understanding Sleep-Disordered Pa- tients. Sleep Medicine Clinics 3(2):157–166. Rosalind Cartwright, Mehmet Y. Agargun, Jen- nifer Kirkby, and Julie Kabat Friedman. 2006. Relation of dreams to waking con- cerns. Psychiatry Research 141(3):261–270. Rosalind D Cartwright, Stephen Lloyd, Sara Knight, and Irene Trenholme. 1984. Broken dreams: A study of the effects of divorce and depression on dream content. Psychiatry 47(3):251–259. J. Chae, D. Thom, H. Bosch, Y. Jang, R. Maciejew- ski, D. S. Ebert, and T. Ertl. 2012. Spatiotempo- ral social media analytics for abnormal event detec- tion and examination using seasonal-trend decom- position. In 2012 IEEE Conference on Visual An- alytics Science and Technology (VAST). pages 143– 152. 21 Nicholas A. Christakis and James H. Fowler. 2014. Friendship and natural selection. Proceedings of the National Academy of Sciences 111(Supplement 3):10796–10801. Cindy Chung and James Pennebaker. 2007. The psy- chological functions of function words. Social Com- munication . Glen Coppersmith, Mark Dredze, and Craig Harman. 2014. Quantifying mental health signals in Twitter. In Proceedings of the ACL Workshop on Computa- tional Linguistics and Clinical Psychology. Glen Coppersmith, Mark Dredze, Craig Harman, and Kristy Hollingshead. 2015. From ADHD to SAD: Analyzing the language of mental health on Twit- ter through self-reported diagnoses. In Proceed- ings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. North American Chapter of the Association for Computational Linguistics, Denver, Colorado, USA. Glen Coppersmith, Kim Ngo, Ryan Leary, and Tony Wood. 2016. Exploratory data analysis of social me- dia prior to a suicide attempt. In Proceedings of the Workshop on Computational Linguistics and Clini- cal Psychology: From Linguistic Signal to Clinical Reality. North American Chapter of the Association for Computational Linguistics, San Diego, Califor- nia, USA. Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013. Social media as a measurement tool of depression in populations. In Proceedings of the 5th ACM International Conference on Web Science. G. William Domhoff. 2000. Methods and measures for the study of dream content. In Meir H. Kryger, Thomas Roth, and William C. Dement, editors, Principles and Practice of Sleep Medicine, W. B. Saunders, Philadelphia. Sigmund Freud. 2013. The Interpretation Of Dreams. Read Books Ltd. Google-Books-ID: U0t8CgAAQBAJ. Calvin Springer Hall and Robert L. Van de Castle. 1966. The content analysis of dreams. Appleton- Century-Crofts. R. C. II Hawkins and Ryan L. Boyd. in press. Such stuff as dreams are made on: Dream language, {LIWC} norms, and personality correlates. Dream- ing . Iris Hendrickx, Louis Onrust, Florian Kunneman, Ali H¨urriyetoˇglu, Antal van den Bosch, and Wes- sel Stoop. 2016. Unraveling reported dreams with text analytics. arXiv:1612.03659 [cs] ArXiv: 1612.03659. J Allan Hobson, Steven A Hoffman, Rita Helfand, and Delia Kostner. 1987. Dream bizarreness and the activation-synthesis hypothesis. Human neurobiol- ogy . Carl Gustav Jung. 2002. Dreams. Routledge. Google- Books-ID: SWvdQyo ZX0C. Kirill Kireyev, Leysia Palen, and Kenneth M. Ander- son. 2009. Applications of topics models to analysis of disaster-related twitter data. In NIPS Workshop on Applications for Topic Models: Text and Beyond. volume 1. Kitty Klein and Adriel Boals. 2010. Coher- ence and Narrative Structure in Personal Ac- counts of Stressful Experiences. Journal of Social and Clinical Psychology 29(3):256–280. George Lakoff. 1993. How metaphor structures dreams: The theory of conceptual metaphor ap- plied to dream analysis. Dreaming 3(2):77–98. Andrew Kachites McCallum. 2002. MALLET: A machine learning for language toolkit. [Online; accessed 2015-03-02]. Jane M. Merritt, Robert Stickgold, Edward Pace- Schott, Julie Williams, and J. Allan Hobson. 1994. Emotion Profiles in the Dreams of Men and Women. Consciousness and Cognition 3(1):46–60. Margaret Mitchell, Kristy Hollingshead, and Glen Coppersmith. 2015. Quantifying the language of schizophrenia in social media. In Proceedings of the Workshop on Computational Linguistics and Clin- ical Psychology: From Linguistic Signal to Clini- cal Reality. North American Chapter of the Asso- ciation for Computational Linguistics, Denver, Col- orado, USA. Saif M. Mohammad and Peter D. Turney. 2013. Crowdsourcing a word-emotion association lexicon 29(3):436–465. Carey K. Morewedge and Michael I. Norton. 2009. When dreaming is believing: the (motivated) interpretation of dreams. Journal of Per- sonality and Social Psychology 96(2):249–264. David Nadeau, Catherine Sabourin, Joseph De Kon- inck, Stan Matwin, and Peter D. Turney. 2006. Au- tomatic dream sentiment analysis. In Proceedings of the workshop on computational aesthetics at the twenty-first national conference on artificial intelli- gence (AAAI-06). Boston, USA. Celestine Okorome Mume. 2009. Nightmare in schizophrenic and depressed patients. The Euro- pean Journal of Psychiatry 23(3):177–183. 22 James W. Pennebaker. 2011. The secret life of pro- nouns. New Scientist 211(2828):42–45. James W. Pennebaker, Ryan L. Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric proper- ties of LIWC2015. Technical report. James W. Pennebaker, Cindy K. Chung, Joey Frazee, Gary M. Lavergne, and David I. Beaver. 2014. When Small Words Foretell Academic Success: The Case of College Ad- missions Essays. PLOS ONE 9(12):e115844. James W. Pennebaker, Cindy K. Chung, Molly Ire- land, Amy Gonzales, and Roger J. Booth. 2007. The development and psychometric properties of LIWC2007., Austin, TX. Matthew Purver and Stuart Battersby. 2012. Ex- perimenting with Distant Supervision for Emo- tion Classification. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguis- tics. Association for Computational Linguistics, Stroudsburg, PA, USA, EACL ’12, pages 482–491. M David Rudd. 2006. Fluid vulnerability theory: A cognitive approach to understanding the process of acute and chronic suicide risk. . Silvio Scarone, Maria Laura Manzone, Orsola Gam- bini, Ilde Kantzas, Ivan Limosani, Armando D’agostino, and J Allan Hobson. 2008. The dream as a model for psychosis: an experimental approach using bizarreness as a cognitive marker. Schizophre- nia Bulletin 34(3):515–522. Michael Schredl. 2010. Dream content anal- ysis: Basic principles. International Journal of Dream Research 3(1):65–73. Michael Schredl and Evelyn Doll. 1998. Emotions in Diary Dreams. Con- sciousness and Cognition 7(4):634–646. Michael Schredl, Franc Paul, Iris Reinhard, Ulrich Walter Ebner-Priemer, Christian Schmahl, and Martin Bohus. 2012. Sleep and dreaming in patients with borderline personality disorder: A polysomnographic study. Psychiatry Research 200(23):430–436. Dylan Selterman, Deirdre Barrett, and Patrick McNa- mara. 2012. Attachment, sleep and dreams. In En- cyclopedia of Sleep and Dreams, Greenwood Pub- lishers, Santa Barbara, CA. Francesca Siclari, Benjamin Baird, Lampros Per- ogamvros, Giulio Bernardi, Joshua J. LaRocque, Brady Riedner, Melanie Boly, Bradley R. Postle, and Giulio Tononi. 2017. The neural correlates of dreaming. Nature Neuroscience advance online publication. Pilleriin Sikka, Katja Valli, Tiina Virta, and Antti Revonsuo. 2014. I know how you felt last night, or do i? self-and external ratings of emotions in rem sleep dreams. Consciousness and cognition 25:51– 66. Joacim Skancke, Ingrid Holsen, and Michael Schredl. 2014. Continuity between waking life and dreams of psychiatric patients: A re- view and discussion of the implications for dream research. International Journal of Dream Research 7(1):39–53. http://journals.ub.uni- Marijke L. Swart, Annette M. van Schagen, Jaap Lancee, and Jan van den Bout. 2013. Prevalence of Nightmare Disorder in Psychiatric Outpatients. Psychotherapy and Psychosomatics 82(4):267–268. Murray L. Wax. 2004. Dream sharing as so- cial practice. Dreaming 14(2-3):83–93. Robert E. Wilson, Samuel D. Gosling, and Lind- say T. Graham. 2012. A Review of Face- book Research in the Social Sciences. Per- spectives on Psychological Science 7(3):203–220. Zhijun Yin, Liangliang Cao, Jiawei Han, Chengx- iang Zhai, and Thomas Huang. 2011. Ge- ographical Topic Discovery and Compari- son. In Proceedings of the 20th International Conference on World Wide Web. ACM, New York, NY, USA, WWW ’11, pages 247–256. 23 Appendix A: Full list of LDA topics Topic Label Top words 0 Active first person dreams see says look know comes walk run looks find wake 1 Sex dreams, some ex- plicit girl guy sex room bathroom wanted girls shower naked talking 2 Animal dreams dog house cat snake dogs trying black big came bear 3 Metadreaming room bed woke sleep night wake asleep time felt see 4 Family presence mom dad house brother sister told came saw home family 5 Strange homes and set- tings room door house floor stairs old open window building doors 6 About family members house family husband mother old son sister home da...