Last February, I posted about English doctoral candidate Amanda Licastro’s use of some basic text mining methods to extract frequent keywords from a set of novel prefaces. In her paper “The Rise of the Preface,” Amanda suggests that these keywords relate to the changing role of the novel and author in the 18th century. The first important thing to note about her data is that she uses it to supports traditional literary critic Cynthia Nixon’s argument that prefaces in the 18th century defend against and reflect upon the “shifting place of the novel in literary culture” (Nixon). But second, and perhaps more interestingly, Amanda’s project enabled her to develop methods that could help her analyze literature during other periods of technological change, such as the period of early web writing.
In short, her seemingly-simple project enabled her not only to practice using text mining tools with interesting results, but also sketched out a path for asking much larger questions relevant to her research interests. I, despite my inherited preference for the kind of reading done with one’s own two eyes, found myself intrigued by the possibility her work presented, but out of habit I suppose, failed to apply it to thinking about any of my current work.
The opportunity to revisit this topic came about in last week’s reading on data visualization in the ITP core course. As we discussed in class, the two major challenges to using data analysis on text are 1) having the data in machine-readable form and 2) having the programming skills to come up with meaningful ways of interpreting that data. To that, I’ll add a third challenge which has kept me thus far from considering much the first two, and that is, having the imagination to envision innovative and exciting ways which data analysis can be used on text. For some of us, text mining sometimes sounds just like turning words into numbers, something from which literature — until now — had always promised to spare us.
However, while reading Ted Underwood’s very user-friendly introduction to text mining, I came across the Stanford Literary Lab’s series of “pamphlets” that show diverse and creative ways in which digital tools can be used to analyze a set of texts or the culture it engages with. I have yet to fully digest the pamphlets presented here, but so far I have been surprised to find that digital humanists are mining texts in much more complex and nuanced ways than I had initially imagined. See for example, Pamphlet 3 — “Becoming Yourself: The Afterlife on Reception” (Finn) — which uses data analysis to trace to social life of the books of David Foster Wallace, and the online consumer communities that developed around them. Truly impressive — and rather daunting to read — is Pamphlet 4, “A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method” (Heuser and Khac), whose findings based on an intricate process of data analysis seem almost impossible:
Specifically, we present findings on two interrelated transformations in novelistic language that reveal a systemic concretization in language and fundamental change in the social spaces of the novel. We show how these shifts have consequences for setting, characterization, and narration as well as implications for the responsiveness of the novel to the dramatic changes in British society.
Though the methods of the last project are experimental, highly complicated and subject to critique, their attempt to meaningfully trace changes across nearly 3,000 books written in the 19th century opened my eyes to the fact that data analysis may well be capable of answering truly complex questions about large bodies of literature. Of course, the ability to ask these questions requires either that the literary critic is equipped with quantitative skills, or that they can convince those with the skills to collaborate on literary projects. Though many have reasonably argued that one should know how to code to practice digital humanities, it seems that the truly exciting projects — ones I don’t think we’ve even begun to fathom yet — will require experts from both sides of the track who are well-versed in the concepts, sensitivities and language of the other discipline. Someone point me to the computer science department.