Click to set custom HTML
One of R’s less known strengths is its array of natural language processing (NLP) tools. NLP tools can perform simple tasks such as counting word frequencies from twitter feeds to complex analyses, such as topic modeling. Many ecologists like to use wordles of their research as an evocative way of visualizing the main themes of their research. Until recently, most wordles were produced using web sites. With the growth of NLP tools in R, wordles can now be made easily and almost beautifully. In this tutorial, I demonstrate how to create a wordle in R to depict the most common terms I have used in the titles and abstracts of my publications. Step 1 Create a .csv file with two columns, one for article titles and the other for abstracts. Each row should represent a separate article. Such a file can be created rapidly using a bibliography manager like Mendeley, Zotero, or Endnote and then exported as a .txt file. Step 2 Import data into R
Step 3 Install (’install.packages()) and load packages for NLP and wordle-ing
Step 4 Now, we need to combine text from the titles and abstracts and do some extra cleaning to make the wordle pretty. Note that I included extra stop words (words that will be excluded) to eliminate words like ‘the’,‘and’, ‘results’, and ‘data’ from the final wordle. Also, I chose not to perform ‘stemming’, which shortens words to their roots. For NLP analysis this is a good idea, but for this wordle I prefer to show words in their entirety (i.e.,‘biodiversity’ and not ‘divers’).
Step 5 Create a data frame consisting of the most frequent words and their frequency. To minimize clutter, I restricted final data set to terms that appeared at least ten times across all my publications.
Extra step One can also adjust how certain words will appear in your wordle. In my case, ‘reforestation’ is too long to fit, so I shortened it to ‘reforest’.
Step 6 The default color option for text in wordcloud is black, which is a little bland for my tastes. Wordcloud scales the size of each term, but you can make this jump out more by assigning different colors to words with different frequencies. For this, I used ‘colorRamps’ because it allows you to create a large number of colors.
Last step! Now, we create the wordle!
|