Text Analysis

The Federal Writers’ Project employed over 300 writers by contracting them to document life histories of individuals throughout the Great Depression. An astounding number of records from these interviews have been digitized by the Library of Congress–almost 3,000 documents total. Thankfully for the purposes of meta-analysis, a random sample of these documents was converted into machine-readable .xml files which may be analyzed by such tools as Voyant Corpus.

A number of analysis tools are available on Voyant. One such tool is called the “TermsRadio”; the description given by the website states, “This tool can be used to examine word occurrence over a corpus spanning a period of time.”

By default, only the most frequently used words in the document are accompanied by a colored line, but scrolling over other terms will make lines for their frequency temporarily appear as well. Number of documents is shown across the x-axis of the chart, and can be increased with the sliding control called “Visible.”

This is a fascinating way to get a big-picture overview of the fluctuation in topics across the roughly 225 documents analyzed. Watching how “work,” in particular, changes slightly but remains mostly consistently frequently used across all of the documents shows how important employment was as a topic of conversation for these individuals.

However, knowing that this project was largely overseen by the Work Projects Administration makes me question whether it was used in the context I’m assuming. Voyant has helpful tools for that too, called “Context” and “Correlations.” Context provides the surrounding phrases for the words of interest:

This is helpful on a more detailed level, allowing one to scroll through every single usage of the word of interest (in this case “work”) in order to begin to understand some possible ways it is being used. To answer my question of whether work was being listed as frequently used because of the association with the Work Projects Administration, it would be more appropriate to use Correlation:

The correlation between “work” and “project” is approximately 0.62, indicating that my suspicions about the title of the administration skewing the word count were correct.

Since it would be prohibitively time-consuming to read all of these documents, Voyant Corpus is an enormously powerful tool. I can imagine using it extremely effectively in my field as a biologist to analyze how researchers are writing about different topics.

Leave a Reply

Your email address will not be published. Required fields are marked *