To present a broad picture of the research, our collaboration, and our material, below you’ll find an alphabetical guide to the most crucial aspects of our work.
Authorship Attribution: This is a category of digital text analysis, and a system of discovering characteristics of an author from the texts that author has written. Also known as stylometry, the core of our project is this process– we are searching for stylistic similarities and differences between the writing of R.L. Stevenson and his wife, Fanny.
Background of Team: We all come from different academic and professional backgrounds, but we were all mostly new to the field of digital humanities when we began our research. We’ve taken our diverse interests and abilities and geared them towards a more statistical and technical approach to literature and the results have thus far been fruitful!
Corpora: The body of work which is being analysed, in our project we are utilising the two collections of New Arabian Nights tales, Treasure Island, Jekyll and Hyde, several chapters of South Sea Tales, and a series of Fanny’s short stories published in various magazines and journals across the years.
Delta-Score (∆-score): A way to measure an absolute difference between two texts. The ∆-score is calculated by (to simplify the process) manipulating the statistics of word frequency in the two documents. A lot can be learned from comparing the use of outstanding or uncommon words, but often we can make inferences from auxiliary words as well.
Expectations: There is no software that can assign authorship to a work, the best we can do is to operate on a scale of likelihoods. One text can be more similar to the work of one writer than another, on a linguistic, stylistic level, but our goal is not to unearth the secrets of who wrote which sentences. Rather, we hope to find out new informations about the writing patterns of the Stevensons on a more holistic level.
Fanny Vandegrift Osbourne Stevenson: One of the two writers whose works we are investigating. She wrote short stories like “The Warlock’s Shadow,” “Anne,” “The Nixie,” and parts of the New Arabian Nights volumes. An American by birth, Fanny met Stevenson in Europe and they became lovers and collaborators.
Groundwork: Much of the work our team is doing is learning about the lives of the Stevensons and their circle, as well as trying to learn about other digital humanities tools and the history of authorship attribution. From early modern monks handwriting collocations for the Bible, all the way to Google Books, we’ve been discovering as much as we can about literary metadata and stylistic analysis.
Hangouts: How we plan, communicate, and organize our research and work. Our team is constantly in contact in order to make sure we evenly distribute our work and that we haven’t left out any texts or forgotten to post an update. Collaboration is the core of our research.
Island Night’s Entertainment: One of the short story collections by R.L. Stevenson that we are using in our research. The stories are also known as part of the South Sea Tales, and the travel writing done by both Mr. and Mrs. Stevenson are an important part of our research into stylistic difference.
Juola, Patrick: A digital humanities scholar and the creator of JGAAP, the Java Graphical Authorship Attribution Program. His work “Authorship Attribution” was a key text in our preliminary reading for the project.
Keywords: What seems like the obvious way to distinguish between authors, looking for the repeated use of certain outstanding terms actually returns fairly unreliable results. The more unconscious words, the ones the brain hardly notices, such as conjunctions and prepositions, these are the words that help us identify certain authors more effectively.
Longman’s: The publishers who worked closely with R.L. Stevenson and first distributed The Dynamiter, along with many of his other works.
More New Arabian Nights (The Dynamiter): The volume our research is primarily concerned with. Written by both Robert Louis and Fanny, we hope to dive headfirst into The Dynamiter and take a closer look at the distribution of authorship between the stories.
N-grams: A tool popularized by Google, N-grams searches a massive corpus of various languages’ fiction and nonfiction volumes. You can track the recurrence of one word (a unigram), or two words (a bigram), or a whole phrase across various centuries.
Observation: A necessary facet of our research, the computer can’t offer us any conclusions, we have to do that ourselves. Observation and analysis is the way to approach resolutions to our questions, just as it is in any other kind of academic enquiry. What we get from attribution software is just the beginning!
PCA (Principal Component Analysis): A technique used to illustrate variation and clarify patterns in a dataset. It makes data easy to explore and visualize. The variables used in a literary PCA are words, see ‘word-variables’ for further explanation.
Questions to Answer: Is there any stylistic coherence between the stories that Fanny reportedly wrote? Are there greater similarities in her writing and certain stories by R.L.?
Robert Louis Stevenson: The beloved author of classics such as Jekyll and Hyde, Treasure Island, and The Master of Ballantrae. He and his wife Fanny worked closely as writers and editors.
Stylo: The main program we are using to run our analyses, which includes all sorts of text-mining tools (see below), as well as authorship attribution options.
Text-Mining: A description of several types of work done in textual manipulation. It includes collocation, concordances, word frequency distribution, cluster analysis, and keyword analysis.
Understanding the Results: This is our current work-in-progress, and over the next several weeks we should have some exciting updates.
Vailima: The location of R.L. Stevenson’s death, and the location from which many of the South Sea-based writings were created. This village is near the Samoan capital, and Stevenson was buried in a tomb at Mount Vaea, which overlooks Vailima.
Word-Variables: In graphing differences between two texts, the variables on the X and Y axes are the most frequent words, in fact, each word is its own axis, and thus the graphs display about 150 dimensions. The computer visually simplifies this for us.
XML: The Extensible Mark-up Language for encoding text. Readable by both humans and computers, our project has utilised a lot of XML, both in executing our research and building our site.
Yes!: An answer to the question, “is this research extremely fascinating?”
Zotero: The tool we use for managing our bibliographic information. When we find an article or website that is relevant to our research, we add it to Zotero, so we can access it again later or easily include it in a document. Zotero has allowed us to share information with each other in a streamlined, formatted manner.