The Citation Conundrum

February 14, 2012

There is an unknown – but probably shockingly large – number of public domain texts on the web. Many of these could be of value to students and scholars. Lots of digital texts have page numbers which can be straightforwardly referenced in papers and publications. For example the journal article, the scanned monograph, born digital word processing documents, and so on. But how should we cite large public domain texts without pages or page numbers? Let’s call this the ‘citation conundrum’.

First of all, we might wonder about the long term prospects of the page. Usually physical books divide texts into pages more or less arbitrarily. Many document formats divide texts into pages, presumably partly so that they can be easily printed. Many digital devices enable dynamic formatting where the page divisions change with with the size of the font. Should we accept that page numbers are a thing of the past, a convenient metaphor, but one which will not be with us for much longer?

Perhaps in the future we’ll cite line numbers? Perhaps we’ll just search for the passage we’re after? Perhaps the whole textual estate of humankind will be retrofitted with hyperlinks? Perhaps we’ll have algorithms to help us identify the referents of references which no longer refer, obscure relics from a barely recognisable age when people had to butcher trees to capture their thoughts.

Perhaps to all these perhapses. But what until then? Until then people who work with public domain digital texts need to be able to find and refer to passages within them, in adherence with established stylistic principles, practises and standards. I can think of two options.

Firstly we can eschew page numbers in favour of other referential mechanisms. Technically, providing a URL with a date of access is sufficient. The MLA also provides guidance on citing ‘digital files’, which include PDFs, word processor documents, scanned images and so on (Rule 5.7.18). Presumably anyone who wants to put a passage into context can do a plain text search. Or we can use anchors or line numbers to point to precise parts. In this scenario the page number is replaced with a (hopefully) persistent URL.

Secondly we could introduce new (arbitrary) page numbers, or use the page numbers of some (arbitrary) public domain edition of the work we want to cite. Many of the works available on Wikisource have had their page numbers stripped out, and Project Gutenberg has an explicit policy to remove them. So either we can rather laboriously re-insert page numbers from some printed edition, or generate an arbitrarily paginated digital edition (as a digital file, via URL) which can be cited.

I’m very keen to learn more about what other have said, thought or done about this – partly so we can bear this in mind when building TEXTUS and OpenPhilosophy.org. Do you know of an interesting approach, paper, standard, or plugin? If so please do leave a comment below.

← Back to posts