The Boston Marathon is a long–standing Boston tradition. It has become a very characteristic event in the city. In addition to the thousands of runners and spectators who travel to Boston to participate in and watch the race, Marathon Monday has developed into a city-wide celebration for many different groups of people. Local businesses, medical professionals, police officers, fire fighters, volunteer staff, local families and friends, and Boston-area college students all approach and experience the Boston Marathon in different ways. However, one thing is clear: Marathon Monday is not only a celebration of athleticism (the race) but also a celebration of Boston in general. As a newcomer to Boston in Fall 2012, I did not quite understand the hype about the race but I was curious to find out. I found out that everybody who had lived in Boston at least a year seemed to have a “Marathon Monday story.”
And if they did not, the events of April 15-19, 2013 certainly changed that. When two explosions went off near the Boston Marathon finish line, Boston transformed from a city in celebration to a city in panic. The bombings, the investigation, the news coverage, the events in Watertown, and the lockdown have had a profound impact on the collective memory of Marathon Monday and the city of Boston. But this post is not about the particulars of the bombings and lockdown. This does not attempt to explain what happened, who did it, what the timeline was, or even why it happened. Instead, this post (and following posts) attempts to analyze the stories of individuals affected by the 2013 marathon bombings and their aftermath.
In the week following the marathon bombings, Professors Ryan Cordell and Elizabeth Maddock Dillon (Northeastern University, English Department) met to discuss a potential project. They had realized that all their students wanted to do was talk about and share their experiences during the bombings and the lockdown. Everybody had a story. Everyone knew where they were, what they saw, what they heard, how they found out what happened, and what they did after finding out. Cordell and Dillon wondered if it would be possible to capture these stories, as insignificant as the person telling the story might think it is, and preserve it for the historical record. The idea of creating an open digital archive and community memorial devoted to capturing these stories and other artifacts emerged from this meeting and the project quickly developed. Five English and History graduate students were hired as research assistants (Alicia Peaker, Jim McGrath, Liz Hopwood, and Krist Girdharry) for the summer and, working with Cordell and Dillon, we began working on Our Marathon: The Boston Bombing Digital Archive. Our mission statement was pretty clear:
“Our Marathon is a crowd-sourced, digital archive of pictures, videos, stories, and social media related to the Boston Marathon bombing. We believe that sharing stories from survivors, families, witnesses, visitors to the city, and everyone around the world touched by the event will speed the healing process. This is the place to share those images, emotions, and experiences to help us understand the bombing and its aftermath.”
And no story is too small. I had the privilege of working on the project as one of those research assistants over the summer and continuing my work on the archive during Fall 2013 as part of my research assignment at Northeastern University’s NULab for Texts, Maps, and Networks. After working on the archive since May our collection has grown to include over 3500 items, including over 300 stories. This series of posts is an attempt to explore and analyze these 300+ stories using text analysis and distance reading. Although our archive is still growing, and we are getting more submissions every week, I want to use these stories as a means of extracting some preliminary findings on how people describe and write about their experiences during and after the Boston Marathon bombings. How do people react to a trauma? How do they describe it? How do they recall and write about it after the fact? A text analysis of 300 or so stories will not answer these questions definitively, but I believe it gives us a great place to start. But before jumping straight into some hypotheses or conclusions, I want to take some time to describe the corpus of stories I am working with and describe some of the method and tools I will be using to analyze these texts.
These posts will be considering 346 stories taken from the Our Marathon archive. These stories consist of 78,604 words—5,982 unique words—and provide an excellent starting point for looking closer at these texts.
This corpus is further divided into two main types of contributions. 289 stories were submitted to the archive via GlobeLab (a digital division of the Boston Globe) immediately following the bombings. The GlobeLab posted an interactive map on Boston.com, where users could click where they were during the bombings and share their story anonymously. The Our Marathon site then partnered with the GlobeLab and included these stories in the archive on June 25, 2013. Almost all of these stories were submitted in the week immediately following the bombings. Stories range in size from a single sentence (in a couple cases) to much more robust accounts of individual’s experiences. The remaining stories in the Our Marathon archive are publicly submitted stories (using our Contribution Plugin). These 57 publicly submitted stories are far fewer in number than the 289 Globe stories, but these accounts tend to be much longer and detailed than many of the Globe stories.
Of the 78,604 words in the entire corpus, the 289 Globe stories account for 45,579 words and the 57 public submissions comprise the remaining 33,025 words. This means that the Globe Stories average about 158 words per story whereas the Public Submission average around 580 words per story. If you take a look at the above image (taken from uploading text files of all Globe Stories and all Public Submissions separately into Voyant) you will see that the vocabulary density of the Public Submissions (125.5) is much higher than the Globe Stories (85.0). This, paired with the actual word counts of each sub-corpora, shows us that these public submissions, on average, are much longer and use more diverse vocabulary than the Globe Stories. Consequently, these different types of submissions not only provide us with more varied types and lengths of stories, but also allow us to analyze these two sub-corpora both collectively and comparatively. Throughout these blog posts I hope to zoom in and out from looking at the all the stories in this corpus to comparing differences and similarities between the Globe Stories and Public Submissions.
One more thing to note about the corpus before moving on to the various tools and tactics I use in these posts. First, because this is a digital archive, many of our stories, particularly the Public Submissions, have much richer metadata that provides contextual information about where and when these stories took place and were written, titles, and information about the contributor. This series of posts is only concerned with the text of the stories and will not consider other valuable information that accompanies each story. Eventually I hope to use this metadata to draw some other interesting conclusions, but for now, I am focusing only on the stories.
TOOLS AND METHODS
Omeka => Text Files: The Our Marathon site is an Omeka installation. It has a built-in function that allows you to export individual items as XML files. These XML files provide all the necessary metadata fields embedded into appropriate XML tags. Within the text tag is the actual text of the stories. After downloading all these stories as XML, I pulled all of the text of each story and incorporated it into a two different text files, GlobeStories.txt and PublicSubmissions.txt. I made sure that the whole text of each individual story was displayed on one line in the text document. So the GlobeStories.txt file had 289 lines with each line containing the full text of one story. I formatted the PublicSubmissions.txt exactly the same but on 57 lines. I then created another text file where I combined both the GlobeStories.txt and the PublicSubmissions.txt files into one (again with each line being a unique story) and called it allstories.txt. Much of the following posts use these three text files as the basis for my analysis. However, I wanted to create individual story files as well. Using the command line, I made a copy of the allstories.txt file and used the “split” function to create individual text files for each story:
~ DAVEDECAMP$ cp allstories.txt allstories-copy.txt
~ DAVEDECAMP$ split -l 1 allstories-copy.txt
Paper Machines: Paper Machines is a Zotero plugin. It allows you to easily create visualizations from your Zotero library. For more information about Paper Machines check out their website. There are a number of options for creating visualizations with Paper Machines, but I mostly utilized the Word Cloud and Phrase Net tools. Word clouds create blocks of words from your texts where the size of the word is directly proportional to the total amount of times a word is used in a corpus. Phrase Nets, on the other hand, create webs of word pairings (with various separators such as “and,” “or,” and “at”) that show both the magnitude of connection (i.e. the number of times the words are paired together) and the directionality (how the words are ordered when paired). I will discuss these different aspects of Paper Machines in more detail in “Where are the Bombers?: What Can Word Clouds Tell Us?”, and “Fireworks or Cannons: Phrase Nets of the Marathon Stories.”
Voyant-Tools.org: Voyant (voyant-tools.org) is a web-based text analysis tool that allows you to look at variety of documents and texts in ways that would be near impossible if you were merely reading through each one. Voyant is part of Hermeneuti.ca, which is a collaborative project whose aim is to develop and theorize text analysis tools and text analysis rhetoric. After uploading documents into Voyant, the tool generates various interactive tools that aid in text analysis including a Word Cloud, corpus summary, a corpus reader, and several sections devoted to looking at individual words (or groups of words) in your corpus (Words in Entire Corpus, Word Trends, Keywords in Context, and Words in Documents).
I am mostly using these latter sections as a way to zoom in and investigate questions or theories drawn from the visualizations I have created using Paper Machines. I also use regular expressions and calculate word frequencies in some of these following posts, particularly in “#BostonStrong”.
TABLE OF CONTENTS