Testing out Voyant Tools with a sample from Lettres Portugaises
This post first appeared on the blog for the Centre for Privacy Studies: https://privacy.hypotheses.org/1344
If you are starting to dip your toes into the sea of opportunities that automated text analysis gives you but were wondering where to start, take a look at Voyant Tools. This open source application lets you quickly gather some insights about texts you might be interested in. It's also very convenient to use, because it's directly available from your browser—you simply upload or copy and paste your text onto the tool, with no need to download or install software.
I tested it out with a sample from a French text I am currently working with to see how it worked. My text is the first letter from the Lettres Portugaises, an epistolary novel from the second half of the 17th century, published by Claude Barbin, whose book trade is one of the topics of my research.
One quick insight that became visible for me is the importance of properly configuring stop words when doing automated text analysis. (Stop words are common words, like pronouns and prepositions, that are removed when doing certain types of analysis.) Compare the two word clouds below. The first one was made using the option to "auto-detect" stop words in Voyant Tool. The second was made without removing stop words:
Stop words should be removed when we do types of automated text analysis that operate on a word-by-word basis, because they don't inherently contain much "meaning", or only have meaning in context. Topic modeling and sentiment analysis, for example, are two common tasks that require the removal of stop words to get high-quality results; without stop word removal the results would be dominated by the most common glue words in the language, drowning out the "real" results you might be looking for.
But when we are dealing with early modern texts—like my sample from Lettres Portugaises—the lists of stop words that are readily available usually don't work very well because early modern languages used different conventions from those we have today. Spelling is a case in point, as you can see from the word clouds above. Playing around with Voyant Tools alerted me to the potential benefits of making my own list of stop words for early modern French using the corpus of texts that I engage in my research.
For me, the biggest benefit of using Voyant Tools came on the meta level: allowing a peek behind the scenes to start understanding how automated text analysis works, including its potential benefits and its potential pitfalls. It also allows us to create visuals to use in presentations or blog posts, which is also super cool. For more robust analysis tasks, though, tools that allow more fine tuning might be a better choice.
Comentários