Compiling a Corpora and the Quest for Open Source Material

One of the most interesting issues we dealt with while compiling our corpora was deciding whether to use materials that were not open access. Open Access means that no special login credentials or subscription are required to access academic journals, archives, and other materials stored in a database. We are lucky to have access to many paid subscription-based archives and databases through the generosity of the University of Edinburgh Library, but had been influenced through several readings and discussions on the principle and history of the open access movement to try and use as many open access sources as possible in compiling our corpora.

Cultural historian and academic librarian Robert Darton sums up the open access argument well in his New York Review of Books essay, “A World Digital Library is Coming True!” “All over the country research libraries are canceling subscriptions to academic journals, because they are caught between decreasing budgets and increasing costs,” Darton writes. “The logic of the bottom line is inescapable, but there is a higher logic that deserves consideration—namely, that the public should have access to knowledge produced with public funds.”

Among price increases Darton cites are a 400% price increase a publisher cited to the University of California for access to 67 journals in 2010; and a 100% increase at the University of Pierre et Marie Curie. What do some of these tabs run? At Harvard, for one, the annual tab for journal access recently reached $9.9 million.

At Harvard, the faculty first (rather toothlessly) condemned the astronomical price increase as unsustainable, and then (perhaps more fearsomely) opened an open-access repository for all of their research — Digital Access to Scholarship at Harvard. (Three of this blogger’s twelve articles for another literature class this semester came from this single, free repository.)

Darton points out that academics are far from the only professionals whose work suffers when they cannot afford access to research. He uses the Human Genome Project, developed with $3.8 billion of (U.S.) public funds and responsible for $796 billion in subsequent commercial contributions, which he attributes to the open accessibility of the research. He also highlights that many small businesses, smaller research institutes, and hospitals have had to cancel subscriptions, and that publishers have responded by charging even more for access to journals.

We all agreed that for these reasons we wanted to support open access projects by using them. The plethora of open access material from both non-profit enterprises (like the National Library of Scotland) and for-profit organizations (like Google Books and Google Scholar), left us with no reason not to try and find the majority of our sources in open-access sources. Here are some of the most useful we found when constructing the corpora from which we ran our analyses:

Unz.org: This wide-ranging archive of periodicals, articles, books, and films dating from before the 1850s to the present day. It is easy to search by specific topic, or browse by general or specific interest. We found several of Fanny’s short stories in issues of Scriber’s magazine from 1888, 1891, and 1899. It should be noted that the PDFs of these issues are meant for research, and are prohibited from electronic (re)distribution.

Hathitrust Digital Library: Hathitrust offers millions of digital texts, collected from participating libraries and research institutions all over the world. We found many of the bibliographical editions of RLS’ works (for which Fanny wrote very valuable prefaces) in here.

Internet Archive: Digitized texts, video, audio, software, images, concerts and collections are available in this extensive archival repository. We found beautiful scans of several first editions of RLS’ work, and portraits of the author.

Google Books: Although Google’s profit-driven mission complicates research for some scholarly projects, (for example, texts from the first few centuries of printing and texts from non-Western cultures, as Anthony Grafton points out a New Yorker essay, “Future Reading”) our particular project’s Anglophone and late 19th century made it a valuable resource. Simply engineered for easy use by the non-expert, Google’s vast material resources made for an equally vast number of results for searches related to our project.

Leave a Reply

Your email address will not be published. Required fields are marked *