If one had to think of a writer whose versatility would be such that it would pose serious difficulties to stylometry in accurately identifying his works, Robert Louis Stevenson would likely come to mind. In the Introduction to the Oxford’s World’s Classics edition of South Sea Tales1, we find the following passage: ‘Generic diversity had always been the hallmark of Stevenson’s career as a writer; he had always produced many kinds and styles of writing – too many, indeed, for the comfort of critics who wished to pigeonhole him’.2

Learning Process

Our research problem was to analyse The Dynamiter for authorship attribution by applying a stylometry method pioneered by John Burrows called “Delta” on a software platform called Stylo, However, once we started our research, we felt rather encouraged by the results achieved by David Hoover3 and decided to deviate a bit from the traditional method used in these studies, the John Burrows method.

Through testing Burrows’ Delta, Hoover had arrived at different parameters that returned persuasively consistent results in analysing works of fiction. These implied using a very high Most Frequent-Word count (usually up to 850) and deleting pronouns, as they would yield the most accurate results. Uncertain about the culling measures (since Hoover culled the words that had a 70% of common occurrence in a text, a function that does not exist in Stylo) we experimented alternately with no culling and with a culling of 20%. Visualisation methods used: MDS, PCA and CA.

CA_850_MFWs_Culled_0_Pronouns deleted_Classic Delta_001

The results, though encouraging at start due to their accuracy, ended up being too accurate for our purposes. Our corpus was too small. Such a high word count ultimately divided every work separately, outweighing even authorial groupings. Indeed, such a high Most Frequent-Word count could mean over 20% of the total words in a given sample unit, therefore using far too great a portion of a text. Thus, we adjusted our parameters.

We experimented with running analyses with pronouns included, and the accuracy greatly decreased. Among other things, Robert Louis uses female pronouns with an extremely low frequency; thus, results could easily be skewed due to the gender of characters. With our parameters unchanged, we decided to lower the word-count used. The results, though now more interesting, were also now inconclusive.

Function-Word Analysis

After further research, we decided to return to traditional Burrows Method and use only function-words (non-content words, i.e. no adjectives, pronouns or nouns). Following Burrows’ method,4 the real breakthrough in our analyses was in using a single world-list with the most frequent words for both authors without including The Dynamiter. That is, we ran an analysis of all the texts we had which belonged to our two authors, except for The Dynamiter. After that we manually modified this list in order for it to include the top 150 most-frequent function words as the first 150 words of the list; thereby using it in all of our subsequent analyses.

Our next adjustment concerned the size of the sampling. Though from our experience random sampling with samples of at least 4,000 words yielded the most consistent results, we followed the example of José Binongo, one of the few scholars who has analysed short stories through stylometry: “Unlike Burrows’s studies which deals [sic] with novels, this paper studies short stories, some of which can be literally short, having a text length of a little over a thousand words. Thus, in this paper we cannot afford the divisions Burrows suggests”.5 For that reason, we decided to run our tests with no sampling. Most importantly, in that way we could effectively analyse The Dynamiter in chapters (as the authorship attribution of its chapters was the aim of our project.) With regards to the longer novels we used as samples in our corpora, we divided them in samples of roughly 5000 words, which was the average size of most of the chapters and short stories. For the shorter texts in our corpus, we bore in mind that the results they yielded were likely to not be as accurate as the rest. Such were the limits that our object of study imposed on us.

In order to be sure that function-words yielded more accurate results than general words, we ran comparative tests measuring their accuracy. For the general world-list, we used the list of the 150 most frequent word-list we had before taking out the content-words. As a result, we found that when working with this list, we needed to use the top 150 MFW, whereas with function-words, 60 MFW sufficed to achieve a similar accuracy.


Click to enlarge

Indeed, since we are dealing with different genres (short stories and novels) function-words are bound to yield more accurate results.

In addition to our parameters, we also adjusted the visualisation methods upon which we relied in our analyses. CA and MDS, though more legible than PCA, returned varied and inconsistent results. We believe both methods are especially useful to visualise the distance amongst works of known authorship; however, when analysing a work of disputed authorship, the patterns and groupings that PCA graphs provide become much more useful.

Established Parameters

Hence, we arrived at the parameters that were suited to our object of study given the limits of the corpora we had to work with. These parameters yielded the most consistent results and ultimately led us to our conclusions. Through Stylo, we used a manually modified word-list comprised of the 150 most-frequent function-words, which was generated through the analysis of Fanny’s short stories and a varied corpus of Robert Louis’ writings. As explained above, pronouns were deleted and no sampling size was used. The option of “English (ALL)” was selected for language in Stylo (this feature has the advantage of including both contractions and their longer forms in the analysis). With regards to visualisation methods, all our graphs were PCA (correlative) since they provided the most consistent patterns throughout our analyses and were the most useful for interpretation of resemblances amongst samples.


Stylo’s GUI and our word-list

Each chapter of The Dynamiter was treated as a discrete unit, as was each of Fanny’s stories. Our corpus for Robert Louis was divided in samples of roughly 5000 words.


  1. Stevenson, Robert Louis. South Sea Tales. Oxford: Oxford University Press, 1996. Print.
  2. Stevenson, South Sea Tales, xxvi
  3. Hoover, David L. “Testing Burrows’s Delta.” Literary and Linguistic Computing 19.4 (2004): 453-75. Print.
  4. Burrows, John. “Not Unless You Ask Nicely: The Interpretive Nexus Between Analysis and Information”. Literary & Linguistic Computing 7.2 (1992): 91-109. Print.
  5. Binongo, José Nilo G. “Joaquin’s Joaquinesquerie, Joaquinesquerie’s Joaquin: A Statistical Expression of a Filipino Writer’s Style.” Literary and Linguistic Computing 9.4 (1994): 274.