Overview Image 1“I like tacos.” and “Tacos are the best.”   Two statements, let’s assume from two different people, clearly express a similar sentiment, but are differentiated by the words each person uses.  Person A, let’s use Alice for fun, articulates her taste for tacos through herself, using the one letter word “I,” as in- “I like tacos.”   Person B, on the other hand, Beauregard, a southerly gent, expresses his affection by juxtaposing tacos against the entire kingdom of food, determining that they are in fact the “best”—“Tacos are the best.” In the same way that we walk differently from one another, we talk differently from each other. These differences, if shown over extended periods of time or through written or spoken text, reveal things about who we are and how we approach the world. Using an examination of lexical tendencies, we can determine a person’s psychological drive, monitor their moods, diagnose their level of conviction, and finally make inferences about their thinking style. To do this, however, we must do something we’ve long been told is impolite; we must stop listening to what people say and instead focus on how things are said. The how rather than the what is where we find a person’s tendencies; Alice and Beauregard are twins in their tastes—their what —but they are very different in how they express this idea.  

Our focus on the how was inspired by Dr. James Pennebaker’s illuminating work at the University of Texas at Austin. Over the course of his career, Pennebaker, along with myriad esteemed colleagues, has found that certain words, beyond their meanings, reveal dimensions of our psychology. “Tacos” tells us little, but the “I,” if habitual, is profound: Alice’s “I,” if used throughout a text, indicates that she is prone to depression.1 Conversely, sipping a julep on a Faulkneresque, wisteria-shaded afternoon, Beauregard’s use of “best” reveals that he is driven by Achievement.2  These words, in this case, “I” and “best” are what Pennebaker calls function words.3 They are small and there are thousands of them; we often overlook them, but like stars in a light-polluted city night, though tough to see, they are there, twinkling as they always have. As simple as “an” and as complex as “calumnious,” specific words are applicable to certain psychological dimensions, and some are applicable to more than one. “Extraordinary,” for example, contributes both to the measurement of a person’s drive for achievement and also to their style of thinking.  

Our Method

The process we used to measure lexical tendencies involved a number of steps: first, we took Pennebaker’s theory of Word Counting, fine-tuned his Linguistic Inquiry and Word Count libraries, reconfigured them slightly to fit our study, and used people’s Twitter messages about the Common Core State Standards to measure dimensions of their psychology. Each dimension was subsequently assessed using an independent Overview Image 2 library of psychologically pertinent terms, some which, though independent of each other, cross over to aid in other measurements. For example, Honesty and Mood both rely on a person’s use of I words (I, me, my, mine, etc.), but only the Honesty measure utilizes a person’s use of auxiliary verbs (be, let, do, can, was etc.). On a similar note, some of the scales use numerous libraries to measure a dimension (Conviction uses 12) while others, like Power Drive, use only one. The libraries themselves range in size from 23 individual terms to just over nine hundred. The disparity in breadth is due to careful inclusion by Pennebaker and his team of expert judges. He, along with a bevy of trained psychologists, independently assessed every word in the English dictionary and determined whether or not it could be empirically supported to provide psychological insight. The words they chose to include are called function words, because these words have functions beyond their primary meaning.4  

After customizing Pennebaker’s word libraries, we then employed the help of the Department of Computer and Information Science at the University of Pennsylvania. Using a programming language called Python, we were able to sift through the 500,000 tweets in Time Periods Two and Three, extract the function words used by each individual tweeter, and finally create a proportion to the total words they used. The results arrived as proportional percentages that we then standardized. The reason for standardization is that each library is comprised of a different number of terms, so a proportional reading of one is not necessarily equal to another. Remember, some libraries have as few as 23 while others have over a thousand. So to address the disparity, we standardized each library, which then gave us equally weighted proportions, thus equal measurements across the various dimensions. Pennebaker did a similar thing in his work when examining the speeches of various presidential candidates during the 2016 election season.5 The difference between our work and his, however, is that not only did we measure individuals on the various psychological scales, but we used our social network data to create average scores in each domain for each of the factions as identified by their Twitter behavior. Generating the average group scores allows us to compare the psychological profiles of each faction, determining differences in their moods, drives, levels of conviction, and thinking styles.


Before moving forward, it is important to note the limitations inherent to measuring Lexical Tendencies. Our system, like Pennebaker’s, looks simply at function words. Without access to a speaker’s tone, we cannot account for sarcasm, irony, nor the various meanings of various terms (words like bark, nails, jam, pool, or mine). However, despite these constraints, this does not prevent the process from helping us explore a person or group’s psychology. It may not be a perfect process, but neither is the psychoanalyst; the analyst, like all people, is limited by their biases, perception, acumen, and their relative ability to deduce or infer. This of course doesn’t mean that psychological analysis, done by an analyst, is a useless means of understanding. It simply means that it is a limited prospect, much like our work here. Certainly, lexical tendencies cannot produce an unequivocal picture of a person. Our habits, however, whether sleeping, waking, writing, or speaking, if properly measured, do reveal important aspects of who we are. Furthermore, this process does things that the analyst cannot. To take note of every word a person uses when speaking or writing, cross-checking each word against massive word libraries, calculating proportions of function Overview Image 3 words to total words, and then coming up with a final count, all done while trying to listen to what a patient says, would be exhausting if not impossible. Yet if done together, one after the other, the two processes can provide a more thorough picture than their independent parts. A doctor uses a stethoscope for the same reason; it is a tool that can do what the doctor cannot—limited for sure, but still a helpful addition to any medical professional. 

Another thing to consider is that word counting is a psychological analysis meant to determine things about individuals. The individual has habits and those habits reveal aspects of who that person is. Here, however, we have taken a psychological tool and used it to assess things at the group level, using individual aggregated habits as measures for the groups to which our social network analysis determined the individuals belonged. Certainly, by moving up a level from the individual to the group, aspects of those people are lost and nuances are sloughed to the floor. Group measures may not represent any particular individual in that group, but rather represent the group average. Thus, for example, if the blue faction uses significantly more sad words than either the yellow or the green factions, this does not mean every blue faction member is sad. It just means that, on average, the members of the blue faction used more sad words.

The final consideration is the relatively loose nature of Twitter-based conversations, and whether or not this fact has any bearing on the type of words people use. And the answer to that question is no. In repeated studies, without great variance, people use similar function words when writing essays, letters, emails, Twitter messages, and even diary entries.6 In the same way that I walk with a similar gait regardless of street or circumstance, I use a similar set of words whenever writing, typing, or speaking. Though at times, I may walk faster or slower, with a greater sense of caution or urgency, the rhythm or placement of my feet does not necessarily change. We talk like we walk then despite the clichéd disparity noticed by the frustrated observer; talking the talk is in fact walking the walk; both are habits, that if examined, reveal who we are. 

In the following sections you will find links to detailed explanations of the various dimensions, more nuanced discussion of the words and processes involved, and visual comparisons of each group’s placement on the psychological dimensions.  


  1. Rude, S., Gortner, E., & Pennebaker, J. (2004). Language use of depressed and depression-vulnerable college students. Cognition & Emotion, 18 (8), 1121-1133. 
  2. Winter, D. G. (1987). Leader appeal, leader performance, and the motive profiles of leaders and followers: A study of American presidents and elections. Journal of Personality and Social Psychology, 52 (1), 196-202. 
  3. Chung, C., & Pennebaker, J. W. (2007). The psychological functions of function words.  Social communication, 343-359.
  4. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods.  Journal of language and social psychology,  29 (1), 24-54.
  5. Wordwatchers. (n.d.). Retrieved November 21, 2016, from https://wordwatchers.wordpress.com/  
  6. Pennebaker, J. W. (2011). The secret life of pronouns: What our words say about us. New York: Bloomsbury Press. 
Scene Number: 
Menu Label: