Testing out Voyant Tools with a sample from Lettres Portugaises

This post first appeared on the blog for the Centre for Privacy Studies: https://privacy.hypotheses.org/1344


If you are starting to dip your toes into the  sea of opportunities that automated text analysis gives you but were wondering where to start, take a look at Voyant Tools. This open source application lets you quickly gather some insights about texts you might be interested in. It's also very convenient to use, because it's directly available from your browser—you simply upload or copy and paste your text onto the tool, with no need to download or install software.

I tested it out with a sample from a French text I am currently working with to see how it worked. My text is the first letter from the Lettres Portugaises, an epistolary novel from the second half of the 17th century, published by Claude Barbin, whose book trade is one of the topics of my research.

One quick insight that became visible for me is the importance of properly configuring stop words when doing automated text analysis. (Stop words are common words, like pronouns and prepositions, that are removed when doing certain types of analysis.) Compare the two word clouds below. The first one was made using the option to "auto-detect" stop words in Voyant Tool. The second was made without removing stop words:

Word cloud made using the option to autodetect stop words 
Word cloud made using the option to auto-detect stop words
Word cloud made using the option to not remove any stop words 
Word cloud made using the option to not remove any stop words

Stop words should be removed when we do types of automated text analysis that operate on a word-by-word basis, because they don't inherently contain much "meaning", or only have meaning in context. Topic modeling and sentiment analysis, for example, are two common tasks that require the removal of stop words to get high-quality results; without stop word removal the results would be dominated by the most common glue words in the language, drowning out the "real" results you might be looking for.

But when we are dealing with early modern texts—like my sample from Lettres Portugaises—the lists of stop words that are readily available usually don't work very well because early modern languages used different conventions from those we have today. Spelling is a case in point, as you can see from the word clouds above. Playing around with Voyant Tools alerted me to the potential benefits of making my own list of stop words for early modern French using the corpus of texts that I engage in my research.

For me, the biggest benefit of using Voyant Tools came on the meta level: allowing a peek behind the scenes to start understanding how automated text analysis works, including its potential benefits and its potential pitfalls. It also allows us to create visuals to use in presentations or blog posts, which is also super cool. For more robust analysis tasks, though, tools that allow more fine tuning might be a better choice.

Comentários

Postagens mais visitadas deste blog

Le Jeu de Robin et Marion

Atomium, Flea Market, Fête de la BD... biking around Brussels

Pesquisando minhas origens