Natural Language Processing Analysis of Hillary and Trump Speeches

Big Data Digest Work, specific requirements for reprinting see the end of the article

Original Author | Maixent Chenebaux

Selected Text & Proofreading | Aileen Translation | Jiang Fanbo

Editor’s Note| On October 9, local time, the second public debate between the two candidates for the U.S. presidential election will take place, Trump and his opponent Hillary Clinton will hold the second debate at Washington University in St. Louis, lasting 90 minutes.

Whether in speeches or debates, both candidates have their unique “speaking style”. Using semantic analysis and natural language processing to analyze their speaking styles is an interesting task. This article analyzes the speeches of the two presidential candidates through natural language processing, revealing their distinct characteristics in word choice and speech rhythm.

Data science can be applied to many fields. From image processing to artificial intelligence, it is all-encompassing. Among them, semantic analysis is particularly useful in social media monitoring. This article focuses on politics rather than comment analysis on Twitter or Facebook.

On July 21 this year, Donald Trump accepted the Republican presidential nomination on the last day of the Republican National Convention held in Cleveland, Ohio. A week later, on the 28th, Hillary Clinton accepted the Democratic presidential nomination in Philadelphia.

With the support of family and thousands of fans, they delivered their respective nomination speeches. This article analyzes these speeches to better understand the deeper meanings hidden behind this political communication. The article focuses on three characteristics: vocabulary, style, and rhythm.

In-depth Vocabulary Analysis

One way to evaluate who used the most vocabulary is to look at how many unique words the speaker used. To do this, it is necessary to first remove the words in English that have no “meaning” (such as “the”, “a”, “of”, etc.). These words are also called stop words: a specific list can be referred to at this link http://www.ranks.nl/stopwords. Secondly, repeated words can only be counted once. The singular and plural nouns and verbs in different person tenses in English are also processed using the Snowball Stemmer algorithm: for example, Leaders and Leader count as one word, Am and Are also count as one word.

Note:

To learn more about the Snowball Stemmer algorithm, you can refer to here http://snowball.tartarus.org/texts/introduction.html

We found that about 13% of Trump’s speech vocabulary is unique (out of 7460 words in total, there are 965 different root words). On average, each word is repeated 7.7 times. Hillary has 17% unique words, with each word repeated about 6 times on average. The difference is obvious: 80% of Trump’s speech only requires 480 words, while Hillary needs 665 words! The extra 38% means we are starting to get some results.

Natural Language Processing Analysis of Hillary and Trump Speeches

Vocabulary Comprising 80% of Candidates’ Speeches

The efficiency of a speech partly depends on the speaker’s style. This article attempts to find the favorite words of the two candidates. Finding “Trump-style” or “Hillary-style” words means identifying words used most frequently by one candidate but least frequently by their opponent. For example, the word “really” appears 15 times in Trump’s speech and only once in Hillary’s speech. One method of calculation is to compute the “odds ratio” of each word. The formula is as follows:

Natural Language Processing Analysis of Hillary and Trump Speeches

The numerator is the probability of a word appearing in Trump’s vocabulary, while the denominator is the same word’s probability in Hillary’s text. Taking the logarithm allows us to efficiently sort: when both are equal, the logarithmic value is 0. Otherwise, it will be either negative (Hillary-style) or positive (Trump-style). The results are as follows:

Natural Language Processing Analysis of Hillary and Trump Speeches Words Almost Exclusively Used by Donald Trump

Natural Language Processing Analysis of Hillary and Trump Speeches Words Almost Exclusively Used by Hillary Clinton

The first thing we noticed is that Trump prefers to use short, common words, repeating them: really, nice, great, problem. Another point is that we can sense a certain bias from this Republican candidate: Mexico, China, Iran. Overall, Trump’s focus seems to lean more towards international issues. Most of the foreign affairs he mentions aim to incite fear and find scapegoats.

On the Hillary side, the range of vocabulary is broader. “Hillary-style” words tend to be rarer. Hillary Clinton mentions “America” significantly more times than Trump: 27:5. The word list for “Hillary-style” indicates that Hillary’s speech is more focused on domestic matters. Her typical words include: together, campaign, and hard. Donald Trump’s name also appears multiple times in her speech.

Careful readers will notice that the word “Trump” does not appear in the “Hillary-style” word list, as Trump mentions his own name multiple times in his speech (10 times), which brings down the odds ratio. In contrast, Hillary’s name is mentioned only 2 times: once in Hillary’s own speech (referring to her husband Bill Clinton), and once mentioned by Trump. Moreover, the word “wants” in the “Hillary-style” appears when criticizing her opponent (“He wants to divide us…”, “He wants us to fear the future, fear each other”) clearly indicating that Hillary talks about Trump, while Trump talks about… himself!

Natural Language Processing Analysis of Hillary and Trump Speeches

Everyone is Talking About Trump

We can also look at the words both sides are using. They represent the consensus between the two. Unsurprisingly, they are “jobs”, “country”, and “thinking”. They both said “thank you” many times, but in different ways: Hillary specifically thanked some people, while Trump mainly thanked the audience during applause.

Natural Language Processing Analysis of Hillary and Trump Speeches

Natural Language Processing Analysis of Hillary and Trump Speeches

Speech Rhythm

Due to different backgrounds, both candidates have their own rhythm. To evaluate the intrinsic rhythm of the language, a good starting point is to break the speech down into many sentences, and then break the sentences into words. We found that Trump’s speech is longer: 625 sentences and 7460 words. In contrast, Hillary used only 405 sentences and 6088 words. This means Trump used 54% more sentences than his opponent, and his speech is 23% longer.

Trump’s average sentence length is 12 words, while Hillary’s sentences are slightly longer, averaging 15 words. Most of Trump’s sentences are short: 21% of his speech consists of short sentences of 5-6 words. Hillary’s sentence length is more uniform, with 12 words being the most common.

Natural Language Processing Analysis of Hillary and Trump Speeches

Obama’s Sentence Length is the Sum of Trump and Hillary’s

We see a clear distinction between Trump and Hillary: Trump’s speeches are straightforward and concise, while Hillary’s are more varied and calm. But wait! She is not unusual: Obama, in his first nomination speech, averaged 25.7 words per sentence, almost the sum of Hillary and Trump’s. Obama’s repeated words were also 24% less than Hillary’s and 42% less than Trump’s. I think this indicates that although Hillary’s rhythm is a bit slower and her sentence structure is more complex, her speaking style remains very close to that of her opponents.

In Conclusion

Natural language processing is not an exact science. It can only provide us with some clues and elements to understand speeches. The corpus is also short, requiring more analysis to extract more precise features. However, from this analysis, what did we find?

  1. Trump talks about everything as “really”, “nice”, “great”, while Hillary talks about how to “work together for America”.

  2. Trump talks about himself, while Hillary talks about Trump. Although Hillary used a larger vocabulary and more complex sentence structures, she seems to have adopted Trump’s speaking style to some extent.

  3. Obama’s nomination speeches (both times) used a larger vocabulary and much more complex sentence structures, indicating that Trump has dramatically simplified such national-level speeches.

Natural Language Processing Analysis of Hillary and Trump Speeches

Regarding Reprinting

If reprinting, please prominently indicate the author and source at the beginning of the article (reprinted from: Big Data Digest | bigdatadigest), and place the prominent QR code of Big Data Digest at the end of the article. For articles without original identification, please edit according to the reprinting requirements, and may be directly reprinted. After reprinting, please send the reprint link to us; for articles with original identification, please send [Article Name - Waiting for Authorized Public Account Name and ID] to us to apply for whitelist authorization. Unauthorized reprints and adaptations will be legally pursued. Contact email: [email protected].

◆ ◆ ◆ Volunteer Introduction
Reply "Volunteer" in the background of Big Data Digest to learn how to join us

Natural Language Processing Analysis of Hillary and Trump Speeches

Natural Language Processing Analysis of Hillary and Trump Speeches

◆ ◆ ◆ Recommended Previous Articles, click the image to read

Nature: Storing All Data in the World with Two Pounds of DNA

Natural Language Processing Analysis of Hillary and Trump Speeches

Natural Language Processing Analysis of Hillary and Trump Speeches

Natural Language Processing Analysis of Hillary and Trump Speeches

Leave a Comment