text analysis with Voyant Tools

Posted by: on May 26, 2016   |Comments (0)|Digital Humanities

tl;dr: Voyant Tools is a free, open, web-based tool for textual analysis.

Voyant logoVoyant Tools is an open, web-based tool for textual analysis.  Using the tool is easy.  Go to the site and link to or upload your text (the system accepts a wide variety of formats including PDF, XML, TEI, and more).  Once you ingest the text or corpus you are presented with a dashboard of visualizations and tools.  Some of the tools built into Voyant include: Cirrus, a word cloud generator; Summary, a helpful overview of the corpus; Mandala, a visualization that shows the relationship between terms and documents; and many more (explore Voyant’s helpful documentation for the full list of tools).  Another great feature is the ability to generate a URL for the entire corpus dashboard or specific visualizations which can then be linked to or embedded into web-based writing.

Voyant Tools creators Stéfan Sinclar (@sgsinclair) and Geoffrey Rockwell (@GeoffRockwell) have also written a book called Hermeneutica: Computer-Assisted Interpretation in the Humanities (2016, MIT Press).  Rusty on your Greek and wondering what “hermeneutic” means, anyway?  So was I.  Hermeneutic means interpretive or explanatory and comes from the Greek “hermenēus,” interpreter.   The book is accompanied by an extremely rich and helpful web site, Hermeneuti.ca, that uses Voyant to visualize and interpret the book’s content while providing examples of how humanities scholars might integrate textual analysis visualizations into their writing.   One interesting example is found in Now Analyze That! in which speeches on the topic of race by Barack Obama and Jeremiah Wright are analyzed.

Text analysis has been part of the digital humanities toolkit for some time.  Voyant has been in existence since 2013 and several examples of how it has been used in digital pedagogy are available.  These include Brian Croxall’s (@briancroxall) discussion of using Voyant Tools to analyze Hemingway; an explanation of how Voyant Tools was used to analyze a corpus of runaway slave advertisements in the U.S. antebellum south as part of a digital history course at Rice University; and a recent write-up on ProfHacker.

I decided to play with Voyant Tools using the corpus of correspondence presented on our Dorr Letters Project site.  I zipped up all 61 TEI files, uploaded the zip file to Voyant Tools, and got this dashboard:

Voyant Dashboard

How cool!?  There is a lot to unpack in this data but I’ll highlight a couple of the things that most struck me:

  • the most used words in the corpus are: dorr, letter, constitution and state (I didn’t remove the TEI Header, introductory text, or follow-up questions included in our TEI so what shows up in the dashboard is not just representative of the letter content)
  • the second 30 letters in teh collection were written by “Anti-Dorrites.”  isolating that part of the corpus and then comparing it to those letters written by Dorr might be revealing
  • it would be interesting to select only those letters written by Dorr and analyze the frequency of certain terms to see if patterns arise over time in relation to Dorr’s political views (of course, this is a small corpus so broad generalizations are dangerous)

Voyant Tools is simple to use and extremely interesting- give it a try yourself!

Leave a Reply

Your email address will not be published. Required fields are marked *


Posted by: on May 26, 2016   |Comments (0)|Digital Humanities