tl;dr: Voyant Tools is a free, open, web-based tool for textual analysis.
Voyant Tools is an open, web-based tool for textual analysis. Using the tool is easy. Go to the site and link to or upload your text (the system accepts a wide variety of formats including PDF, XML, TEI, and more). Once you ingest the text or corpus you are presented with a dashboard of visualizations and tools. Some of the tools built into Voyant include: Cirrus, a word cloud generator; Summary, a helpful overview of the corpus; Mandala, a visualization that shows the relationship between terms and documents; and many more (explore Voyant’s helpful documentation for the full list of tools). Another great feature is the ability to generate a URL for the entire corpus dashboard or specific visualizations which can then be linked to or embedded into web-based writing.
Voyant Tools creators Stéfan Sinclar (@sgsinclair) and Geoffrey Rockwell (@GeoffRockwell) have also written a book called Hermeneutica: Computer-Assisted Interpretation in the Humanities (2016, MIT Press). Rusty on your Greek and wondering what “hermeneutic” means, anyway? So was I. Hermeneutic means interpretive or explanatory and comes from the Greek “hermenēus,” interpreter. The book is accompanied by an extremely rich and helpful web site, Hermeneuti.ca, that uses Voyant to visualize and interpret the book’s content while providing examples of how humanities scholars might integrate textual analysis visualizations into their writing. One interesting example is found in Now Analyze That! in which speeches on the topic of race by Barack Obama and Jeremiah Wright are analyzed.
Text analysis has been part of the digital humanities toolkit for some time. Voyant has been in existence since 2013 and several examples of how it has been used in digital pedagogy are available. These include Brian Croxall’s (@briancroxall) discussion of using Voyant Tools to analyze Hemingway; an explanation of how Voyant Tools was used to analyze a corpus of runaway slave advertisements in the U.S. antebellum south as part of a digital history course at Rice University; and a recent write-up on ProfHacker.
I decided to play with Voyant Tools using the corpus of correspondence presented on our Dorr Letters Project site. I zipped up all 61 TEI files, uploaded the zip file to Voyant Tools, and got this dashboard:
How cool!? There is a lot to unpack in this data but I’ll highlight a couple of the things that most struck me:
- the most used words in the corpus are: dorr, letter, constitution and state (I didn’t remove the TEI Header, introductory text, or follow-up questions included in our TEI so what shows up in the dashboard is not just representative of the letter content)
- the second 30 letters in teh collection were written by “Anti-Dorrites.” isolating that part of the corpus and then comparing it to those letters written by Dorr might be revealing
- it would be interesting to select only those letters written by Dorr and analyze the frequency of certain terms to see if patterns arise over time in relation to Dorr’s political views (of course, this is a small corpus so broad generalizations are dangerous)
Voyant Tools is simple to use and extremely interesting- give it a try yourself!
tl;dr: Voyant Tools is a free, open, web-based tool for textual analysis. Voyant Tools is an open, web-based tool for... MORE
A couple of weeks ago, I participated in Strong Voices, Indigenous Women, a Wikipedia edit-a-thon at the Schlesinger Library on the History of Women in America at Harvard’s Radcliffe Institute for Advanced Study. It was my first foray into Wikipedia editing, and I was a little intimidated. I knew that Wikipedia uses a special markup language that I wasn’t familiar with, and beyond that I was feeling the import of editing such a hugely popular and public information resource. I’m no expert – who am I to edit this content?
But as I chatted with some of the other participants, it was clear that I wasn’t alone. These feelings are not uncommon among new editors, but overcoming them is a key to righting a big problem with Wikipedia – the lack of diversity in Wikipedia’s scope and content that’s been widely attributed to an overwhelmingly homogeneous editor community. Wikipedia’s gender issue has gained particular attention over the past several years, but the problem goes far beyond that. If you’re not very familiar with these issues, Sara Boboltz provides an incisive overview. As she succinctly puts it, Wikipedia editors are “mainly technically inclined, English-speaking, white-collar men living in majority-Christian, developed countries in the Northern hemisphere.”
There are many theories as to why this is, including the burden that the technical knowledge and time required place on potential editors. For example, women in many communities have less free time to devote to work like this. Also, like much of the rest of the male-dominated internet, women are not always welcomed and are much more likely to face harassment in these spaces, which inherently discourages their participation.
Another big part of this problem is Wikipedia’s notability guideline, which says that a topic has to have “received significant coverage in reliable sources that are independent of the subject” in order to be included in Wikipedia. It’s one of the ways that Wikipedia tries to maintain the integrity of its content, but it’s not hard to see the perpetuating effect that this guideline has on the lack of coverage of historically disenfranchised groups of people in our documented history.
If Wikipedia’s aim is to compile “the sum of all human knowledge”, everyone should be represented in the editor community. And as Wikipedia continues to grow as one of the most popular websites in the world, and its content becomes increasingly visible and authoritative, this is increasingly crucial. For example, Google now pulls Wikipedia content into it’s biographical sidebar making the information even more prominent.
The good news is that the Wikimedia Foundation is keenly aware of this problem and dedicating resources toward correcting it. For example, in 2012 they released VisualEditor, a more user-friendly editing interface and they’ve also allocated funds to initiatives that are building content on under-represented communities and subjects, like Wikipedia edit-a-thons.
While events like edit-a-thons are very successful at introducing Wikipedia editing and creating a safe space for first-timers to learn, a problem this entrenched and complex will require long-term engagement from this new wave of editors. We all have a right, and I might also argue, a responsibility, to participate in the documentation of our collective knowledge and history, and for all its shortcomings, Wikipedia provides an amazing space for us to do just that. In the words of co-founder, Jimmy Wales, “See that link up there? ‘edit this page’. Go for it, it’s a wiki.”
A couple of weeks ago, I participated in Strong Voices, Indigenous Women, a Wikipedia edit-a-thon at the Schlesinger Library on... MORE
SPARC Europe (the Scholarly Publishing and Academic Resources Coalition) has launched a new service – Europe’s Open Access Champions – focusing on highlighting those who are driving Open Access forward in Europe’s academic communities. These administrators and scholars share their personal views on what still needs to be done to achieve more Open Access.
SPARC Europe (the Scholarly Publishing and Academic Resources Coalition) has launched a new service – Europe’s Open Access Champions –... MORE
Several news outlets reported this week that the Beatles Anthology albums have just been released by Apple Records to digital streaming services worldwide. This is a significant development, as the Beatles’ music was long withheld from digital streaming services; it was not until December 2015 that the first of their catalog became available across platforms, a release which included the band’s thirteen U.K. studio albums and four compilation sets.
Anthology, Volumes 1-3, originally released in 1995 and 1996, are compilation albums that include rarities, studio outtakes, and alternative versions of iconic tracks They have been remastered at Abbey Road Studios by the same engineers who worked on the 2009 reissue of the same set. All three albums are available now on Apple Music, Spotify, GooglePlay, Tidal, Deezer, and Rhapsody, as well as other platforms. (Sources: 1, 2, 3, 4)
Several news outlets reported this week that the Beatles Anthology albums have just been released by Apple Records to digital... MORE
The latest installment of the Faculty Author Series is now available. Fred Drogula, Associate Professor of History, is the latest featured author. Drogula’s new book, Commanders & Command in the Roman Republic and Early Empire, explores how concepts of authority, control over territory, and military power underwent continual transformation throughout the history of the Roman Republic.
The latest installment of the Faculty Author Series is now available. Fred Drogula, Associate Professor of History, is the latest featured author. Drogula’s new... MORE
A couple of weeks ago, I attended a fantastic roundtable on Digital Futures of Indigenous Studies in the Digital Scholarship Lab at Brown University’s John D. Rockefeller Library. The event was “part of an ongoing initiative at the JCB to encourage and support a new generation of scholars and community members as they build consciousness about Indigenous issues not only in New England, but also in the United States and internationally”. The discussion centered on “the use of digital media to foster education, research, and outreach within Indigenous communities and studies.” There was a focus on how digital media and tools can help to create connections between people and materials, as well as the importance of relationship-building with Native communities, the ethics surrounding these projects, and project management issues of resource allocation, stewardship, and sustainability.
I was particularly impressed with Tobias Glaza and Paul Grant-Costa’s Yale Indian Papers Project. They focused on the importance of Indigenous communities as stakeholders in the project and collaborating with community members right from the beginning to answer questions like – What’s most important to the community? How do they tell their stories? What information should remain private? How do they want to access and use their digital history? With this approach, they published the New England Indian Papers Series – “a scholarly critical edition of New England Native American primary source materials gathered into one robust virtual collection.” Built on Yale’s Ladybird software and using a Blacklight front-end, the platform is clean and easy-to-use, and includes a document reader, scholarly transcription, and extensive metadata.
An eye-opening takeaway from Alyssa Mt. Pleasant’s presentation on the American Indian Studies (AIS) resources portal that she built at Yale, is the importance of maintaining a project’s stewardship to ensure its longevity. Unfortunately, the AIS portal, which took 3 years to build, wasn’t taken on by anyone else when she left Yale, and consequently, is no longer accessible.
Lisa Brooks from Amherst College gave a fantastic talk on the problem of trying to understand the history of Native spaces when the main existing reference points are colonial maps. She’s worked extensively on creating new historical maps of Indigenous spaces to support her research and is also engaged in the idea of maps as storytelling, often combining her maps with present-day photos of the locations to bring them to life. Her work is included in Amherst’s digital map collection, which was created using Esri’s ArcGIS platform, and is definitely worth checking out.
Another standout was Dana Leibsohn’s project, Vistas, which “seeks to bring an understanding of the visual culture of Spanish America to a broad audience.” Vistas was designed as a non-linear platform, in an effort to encourage multiple pathways between content that would support research in a variety of scholarly disciplines, as well as less formal modes of education and learning. Launched in the late 90’s, Vistas has undergone three major evolutions, from a website hosted by Smith College, to a DVD, and now back to an online version hosted by Fordham University. Dr. Leibsohn’s stewardship of the project over the years has clearly been integral to its longevity, which includes her commitment to tackling the challenges of migrating the platform to keep up with ever-evolving technologies.
There were also a couple of great discussions surrounding endangered Native languages, including a conversation on the power of digital activism to increase online, and particularly social media usage of these languages, as a way of preserving them.
Obviously all of these projects are contributing to content-collection, digital preservation, and scholarship needs, but it was great to hear that so many are focused on supporting Indigenous communities by facilitating access to their histories, preserving them, and ultimately, helping to amplify the voices of these communities.
A couple of weeks ago, I attended a fantastic roundtable on Digital Futures of Indigenous Studies in the Digital Scholarship Lab at... MORE
Recently, the Zentrum Paul Klee, a museum dedicated to the artist Paul Klee, located in Bern, Switzerland made available online almost all 3,900 pages of Klee’s personal notebooks, which he used as the source for his Bauhaus teaching between 1921 and 1931.
Recently, the Zentrum Paul Klee, a museum dedicated to the artist Paul Klee, located in Bern, Switzerland made available online... MORE
New to the online world is an extensive digital archive of MTV’s late night show, 120 Minutes. The show, which ran from 1986 through 2000 without cessation, and later on MTV2 from 2001-2003, was the 2-hour alternative music block that ran after hours and featured videos, interviews, and performances by alternative, underground, and fringe bands and artists. In May of 2003, the show was canceled without formal announcement, with the final episode co-hosted by Jim Shearer, the host at the time, and past hosts Dave Kendall and Matt Pinfield. The show made a brief return to MTV2 under the name 120 Minutes with Matt Pinfield in 2011, but was canceled for good shortly after in 2013. The 120 Minutes digital archive is the product of a collaboration between its founder, identified as Tyler (no last name), and a team of volunteers. The archive does not present each episode in its original recorded form, but rather, lists the videos contained within each episode (and links out to their YouTube versions) and notes hosts and guest artists by episode. Visitors can view the archival listings by year and episode; the site is presented in a tiered layout, with years listed at the top of each page that expand down into episode listings.
During its tenure, 120 Minutes was hosted by a slew of notable guest artists, including Iggy Pop, Bob Mould, Lou Reed, Robert Smith (the Cure), Tim Armstrong and Matt Freeman (Operation Ivy/Rancid), Superchunk, and Weezer. It featured interviews with the likes of Joe Strummer, the Cramps, John Lydon, Sonic Youth, and Mojo Nixon; spotlights on bands and artists like Bauhaus, the Jesus and Mary Chain, and Sisters of Mercy; and live performances by the Dead Milkmen, the Pixies, and Helmet.
The show aired thousands of videos, featuring artists like the Pogues, the Stone Roses, Hüsker Dü, Billy Bragg, John Doe, Big Audio Dynamite, PiL, the English Beat, X, Anti-Nowhere League, Descendents, the Mighty Lemon Drops, Ministry, the Smithereens, the Ramones, Nick Cave, Dinosaur Jr., Charlatans UK, and TSOL. Nirvana’s “Smells Like Teen Spirit” made its world premiere on 120 Minutes, but was quickly moved to daytime rotation due to popularity. To check out the archive, please visit the site here. (Sources: 1, 2, 3, 4, 5)
New to the online world is an extensive digital archive of MTV’s late night show, 120 Minutes. The show, which... MORE
Yale University, through a National Endowment for the Humanities grant (NEH), created a beautiful online database called Photogrammar for searching, organizing and visualizing over 170,000 photographs from 1935-1945. The photographs were created by the United States Farm Security Administration and Office of War Information (FSA-OWI). This web-based platform uses various ways to interact with this content, including a map of the United States that can organize the photographs by photographer and where the photographs were taken. There are Photogrammar Labs, which include a Treemap: “a three-tier classification starting with 12 main subject headings (ex. THE LAND), then 1300 sub-headings (ex. Mountains, Deserts, Foothills, Plains) and then sub-sub headings. 88,000 photographs were assigned classifications,” and a Metadata Platform: “an interactive dashboard showing the relationship between date, county, photographer, and subject in photographs from individual states. The dashboard is still in development, but California is now available.” Coming soon will be a ColorSpace lab, which explores “the 17,000 color photographs based on hue, saturation and lightness.”
A great resource for educators, researchers, students and the public alike.
Yale University, through a National Endowment for the Humanities grant (NEH), created a beautiful online database called Photogrammar for searching,... MORE
As I’ve begun settling into Providence after my move from New York, I’m finally having some time to catch up on my library news. I had heard about NYPL’s recent release of more than 180,000 public domain items from their digital collections, including the first known photography by a woman and more than 40,000 stereoscopic views of the U.S., but as I delved deeper, I discovered all of the exciting tools and initiatives that they’ve integrated into the collections to encourage discovery, interaction, sharing, research, and reuse. In particular, I’ve been musing on the fantastic visual browsing tool. Data visualization is still often thought of simply as a graphic, sometimes interactive, representation of statistics and other data, but it also clearly has so much potential as a tool for discovery, by helping users to better understand the scope of the information that they’re searching or exploring.
Beyond content visualization, NYPL is championing active user/content engagement with the Digital Collections API, a Remix Residency program and other tools from the creative folks at NYPL Labs, like The Green Book trip planner, which uses “locations extracted from mid-20th century motor guides that listed hotels, restaurants, bars, and other destinations where Black travelers would be welcome.”
For those of us who spend most of our days in the weeds of content management, NYPL’s Digital Collections initiatives are a great reminder to think innovatively about how we can better connect and engage users with digital collections.
For some Friday fun, check out their Stereogranimator and create some 3D images!
As I’ve begun settling into Providence after my move from New York, I’m finally having some time to catch up... MORE