LingLass reviewed UNCHARTED by Erez Aiden
Review of 'UNCHARTED' on 'Goodreads'
3 stars
Uncharted can be thought of as a case study for a piece of software that demonstrates two emerging intellectual trends: big data and digital humanities. These are explored in the book though the creation of the Ngram Viewer interface for examining the scanned Google Books collection. Digital humanities is an interdisciplinary trend that brings computerized tracking and digital curating tools to fields such as History, Literature, Philosophy, Geography, and Language studies. When the data being examined is itself language, digital humanities overlaps quite nicely with a methodology that has been in place for the past five decades, corpus linguistics. But while corpus linguistics relies on different pieces of specialized concordancing software to gather, count, and track word combinations, Google Ngram Viewer, launched on Dec 2010, is a very accessible way to bring some of these tools to the fingertips of the general public. In this book, Ngram Viewer is deployed …
Uncharted can be thought of as a case study for a piece of software that demonstrates two emerging intellectual trends: big data and digital humanities. These are explored in the book though the creation of the Ngram Viewer interface for examining the scanned Google Books collection. Digital humanities is an interdisciplinary trend that brings computerized tracking and digital curating tools to fields such as History, Literature, Philosophy, Geography, and Language studies. When the data being examined is itself language, digital humanities overlaps quite nicely with a methodology that has been in place for the past five decades, corpus linguistics. But while corpus linguistics relies on different pieces of specialized concordancing software to gather, count, and track word combinations, Google Ngram Viewer, launched on Dec 2010, is a very accessible way to bring some of these tools to the fingertips of the general public. In this book, Ngram Viewer is deployed as a way to answer questions quick questions about cultural history.
The larger field of DH is introduced in Chapter 7 (Utopia, Dystopia, and Dat(a)topia), which looks at the range of historical records that could be digitized, and also some of the pitfalls of ever-wider access to such records. They note, for example, the spotty coverage of newspaper digitization e.g. “Most of Poe’s newspaper articles have not been digitized, and no one knows when they will be” (p. 172), and the even spottier digitization of the many unpublished formats of writing: manuscripts, letters, wills, etc. It’s worth noting that the problem is not only one of getting data into a digital form. Even some of the born-digital materials that humans now create will have a limited appearance in the historical record, since blog posts, email, web page ads, and caches of digitized recordings and transcripts are only as accessible as the servers that host them.
In focusing on occurrences found in Google Books, the book provides an entry into diachronic changes in word use. The results they show are exciting, but a cautionary note should be sounded. That is, it’s not as simple as looking at an ngram chart to have the story. What words are used is now clearly knowable, but capturing why they are used and identifying the right contexts in which to interpret them are still the necessary next steps of scholarship. Yet the authors sometimes present these as finished tasks. On seeing the first graph of ngram data for the word “evolution”, they note: “drawing from an ocean of data, the curve had distilled a simple powerful story that anyone could understand” (p. 159).
They do, however, acknowledge that as a data source, book publishing is too slow to trace certain faster moving ideas and information (148) i.e., many ideas are more typically discussed in media other than books, e.g. texting, email, TV news, face to face conversation. But this is often overlooked in the book, such as the claims on p. 157 that it’s now possible “to quantify the spirit of the people, the Volksgeist, by empirically measuring aspects of collective consciousness and collective memory.” This enthusiasm leads the authors to coin the name of their approach as “culturomics”: where “the omics denotes big data” and the cultur- evokes the anthropological studies of Franz Boas in being “empirically knowable” (158-9).
Such big picture excitement is indicative of their repeated, but unexamined, premise that the number of written occurrences of a word can be equated with the frequency of the thoughts or experiences it represents: “By seeing how often people talk [in print] about a year, we can get a sense of how present the events of that year are in their minds” (p. 144), “Ngrams tell us about the past. Alas, they do not predict the future. Yet.” p. 157. However on p. 189, they return to the topic of predictions, with the claim that “Ngrams that are going up [in a 20-year period] tend to keep going up. Ngrams that are going down tend to keep going down,” leading the authors to hint at the possibility of “a predictive science of history.”
Some reader-friendly history of science is presented at several points throughout the book, including an amusing discussion of Ebbinghouse’s original experiments on long and short term memory, which make up some of groundwork of the field of psychology (pp.138-141) and a useful introduction to Zipf’s law, explaining normal and non-normal distribution (pp. 28-33).
Several of the cultural incidents chosen as illustrations, however, verge on the melodramatic: “the impact on their lives and careers was immediate and devastating” of the Hollywood Ten (p. 124); “this heartbreaking chart” showing mentions of Tiananmen Square (p. 127); the despondent painter Charlotte Saloman who died in Auschwitz (p. 131); the 9-11 destruction of the World Trade Center, on p. 142; the digital hounding that ended in the 2013 suicide of Rehtaeh Parsons, (p. 181). At the same time, it’s through the discussion of stories of such wide-ranging historical breadth that the authors first mention a very intriguing way to use the diachronic tracking of Ngram Viewer to automate finding gaps in the historical record that could indicate suppressed information.
The final chapter presents a much-needed call for the funding of humanities data collection to equal the level at which science projects are funded, suggesting that we need to “consider the potential impact of a multi-billion-dollar project aimed at recording, preserving, and sharing the most important and fragile tranches of our history to make them widely available for ourselves and our children” (p. 174). Ngram Viewer is put forward as an enticing way of showing what could be found by exploring such data collections. The fun of tracking ngrams is aptly described as “a new and extremely nerdy form of heroin” (p. 162). The book ends with 48 graphs that illustrate this addictiveness, with charts aptly presented in xkcd style drawings.
More about these authors:
• Jean-Baptiste Michel’s 2012 TED talk on this topic (called The Mathematics of History).
• Erez Aiden will appear as a keynote speaker at the 1st Inaugural Texas Digital Humanities Conference on Networks in the Humanities on April 10-12, 2014.
This review was written for LibraryThing Early Reviewers.