Voyant (voyant-tools.org) is a web-based text analysis tool that allows you to look at variety of documents and texts in ways that would be near impossible if you were merely reading through each one. And even if it were not impossible for a given corpus of works or documents, it certainly significantly cuts down the time needed to accomplish such a project. One Voyant’s features is that it generated a word “Cirrus,” which is a visual representation of the words in a given corpus, where the size of the word is relative to the frequency of its use.
For my Pecha Kucha presentation in Ryan Cordell’s “Doing the Digital Humanities” class, I decided to analyze my own writing and see what I could discover using this word Cirrus. I took every written assignment for history that I have handed in, dating back to the second semester of my junior year in my Undergraduate studies. These twenty-nine documents ranged from my thirty plus page senior history thesis to a series of one page reading responses for my Theory and Methodology I class last semester.
After removing certain Stop Words—the more commonly used words in the English language such as “the” and “and,” as well as my first and last name—this is the Cirrus that Voyant generated: http://voyant-tools.org/tool/Cirrus/?corpus=1358275371068.8429&stopList=1358277348558rundefined.
At first glance, one sees that the words “London” and “Underground” are my two most frequently used words. I used “London” 393 times and “Underground” 148 times. This makes sense, since I my historical interests lie in the history of modern London, particularly focusing on urbanization and transportation. My two largest documents—my senior thesis and my Theory and Methodology final proposal—both London and the London Underground played a vital role in both analyses.
But just looking at the titles of the documents probably would have told a reader this. What can Voyant tell others and me about my writing that I would not be able to (efficiently) accomplish through reading each individual document? This Cirrus also allows you to see words that I habitually use that do not have to do with the topic of my research. I use the words “instead,” “consequently,” “represents,” and “consider” ninety-seven, forty-nine, forty-two, and forty-two times respectively. These are words that I include in just about every document that I hand in, that are not the typical English Stop Words.
Finally, the last realization about my writing that I extrapolated from this Cirrus has to do with the historical methodologies I use and discuss often in my papers. The words “political,” “class,” “gender,” and variations of the word “culture” appear most often in my works. Due to the high frequency of usage of these terms, I can postulate that these are the main historical methodologies I either directly utilize or at least discuss in detail in my writing.
Now these examples are certainly not the only information one can extract from this Cirrus, but they show how much information one can quickly access from a relatively simple digital text visualization tool. The Cirrus is also only one part of the many useful tools Voyant offers someone interested in text analysis. It is a fun and easy tool to use that I would certainly recommend you try.