Paper Machines has a very interesting function that allows one to create interesting and useful visualizations of a given corpus. The Phrase Net function of Paper Machines is derived from IBM’s Many Eyes visualization tool. This allows us create a web of interconnected words in a given corpus.
Building a phrase net from a collection of stories allows us to easily visualize not only word frequencies but also word connections. In a phrase net, the size of words is relative to the word’s frequency. So the bigger the word, the more often it is used. However, we do not have access to the actual count of these word frequencies. But by pairing these word connections with the availability of specific word counts (through a tool such as Voyant) we can get a deeper understanding of these connections.
What makes these phrase nets so useful is not in the exact sizes of the words or whether we can derive exact word frequencies from. First, Phrase Nets allow us to examine both the magnitude and directionality of word connections by linking word pairings with arrows of varying magnitudes. From there we can easily identify common word pairings across a given corpus.
Moreover, Paper Machines allows us to vary the word or characters that link terms. It functions on a simple syntax, “x [z] y,” where x is the first word, y is the second word, and z is either a space or a shorter connecting word such as “and,” “or,” “is,” and “at.” These variations give us much more flexibility than just looking at traditional word pairings separated by a space.
For this post I want to look at some interesting connections that arise from looking at phrase nets for “x [space] y” and “x [or] y” among the entire corpus of Marathon stories, and then zooming back in to comparing the Globe Stories to the Public Submissions. First, let’s take a look at the phrase net for “x [space] y” for all the stories:
At first glance, many of the word pairings may seem pretty obvious. One assumes that in a corpus about the Boston Marathon bombings that words such as “Boston” and “Marathon,” “Boylston” and “Street,” “mass” and “ave,” and “Finish” and “line,” would be paired together. However, if you look into the center of the phrase net, you might notice a triad connection of “people,” “started,” and “running” that might not have been as easily recognizable if it were not for the accessibility of the information in this phrase net. Now this might not be too surprising either, considering the suddenness of the attack, but I still think it is valuable to notice. I am not going to explore this connection in that much detail, but showing that these phrase nets can open your eyes to connections you might not initially think of was very valuable in recognizing the utility of these nets as an investigative tool. This was the first connection that I made and it made me excited to dig deeper into some of the other phrase nets and connections.
Another, but much smaller, connection that one sees in the above phrase net is the link between “celebratory” and “cannon.” Celebratory is only used twelve times in the entire corpus, but it is strongly connected to the word “cannon” (9 of 12 times) and twice connected to the similar term “firework.”
It is a small connection, but still significant, particularly when we consider the orientation of the phrase net “x [or] y”:
If you look to the center of the phrase net you will see that “firework” is also connected to the “celebratory” by an or. It is also connected with a misspelling of the word “cannon” (as “canon”).* You can also notice a strong connection in the phrase “fireworks or cannons.” Paired with the previous connection between “celebratory” and “cannon,” the “x [or] y” connections reveal how these storytellers tried to make sense of a random, but horrifying, explosions at a celebratory and exciting event such as the Boston Marathon. In developmental psychology, schema theory (developed through a Gestalt theory and Piaget’s constructivism) refers to the process the process of assimilation as using an existing schema to deal with a new object or situation.** And while Piaget’s work primarily deals with cognitive development, one can use this model of assimilation as an possible explanation for why these words were frequently paired with each other. This phrasing of “celebratory cannon,” and “fireworks or cannons” refers to people’s attempt to assimilate the suddenness of the Marathon bombings into their typical expectations of Marathon Monday or any large-scale sporting event in general. Again, these connections are certainly not in every story, or even in a majority of the stories for that matter. However, this process of trying to identify an unknown and previously unfamiliar event is significant enough to note. These stories help you work through the mindset and thought processes of many affected by the blasts.
TABLE OF CONTENTS
- Share Your Story: Storytelling and the Boston Marathon Bombings
- Where are the Bombers?: What Can Word Clouds Tell Us?
- “Fireworks or Cannons”: Phrase Nets of the Marathon Stories (this post)
- Conclusions and Future Research
*An interesting side note: The word “canon,” which after looking into the keywords in context was definitely a misspelling of “cannon,” was used twelve times in the entire corpus. Incidentally, all the misspellings were in the Globe Stories corpus. This might be due to the difference in contributing an item for a public archive versus posting an anonymous story to the Boston Globe site. This misspelling is definitely not significant and I do not think I can derive any useful conclusions from it, but it was definitely something that caught my eye when I was looking into this phrase net.
**For a much more detailed look at Piaget’s work and assimilation consult: Beilin, Harry. “Piaget’s enduring contribution to developmental psychology.” Developmental Psychology 28, no. 2 (March 1992): 191-204.