TEXTUS: an open source platform for working with collections of texts and metadata

December 08, 2011

Since finally blogging about OpenPhilosophy.org last month I’ve been thinking about how one could make a generic open source platform that could be used to power it, and other things like it.

Enter ‘TEXTUS’, an open source platform for working with collections of texts and metadata. It enables users to transcribe, translate, and annotate texts, and to manage associated bibliographic data.

Here’s the rationale:

The combination of freely available digital copies of public domain works, open bibliographic data and open source tools has the potential to revolutionise research in the humanities. However there are currently numerous obstacles which mean that they are often under-utilised by scholars and students in teaching and research:

From classic literary and cultural works, to letters, drafts, notes, and other historical documents, there is a huge amount of freely available public domain material that is highly relevant to scholars and students engaged in research in the humanities. But these works can be difficult to find, difficult to work with, and works by a given author may be scattered in a variety of locations. Search results may be confusing or unclear. Automated Optical Character Recognition of texts may be inaccurate or incomplete. The metadata for the work for may be unclear and the provenance and rights status for a given digital edition may be unknown. It is not always clear how to cite passages from digital editions of public domain works.

Over the past few years, libraries and other cultural heritage organisations have been releasing open data about works they hold. This has the potential to be a rich resource for scholars interested in building scholarly bibliographies and working with large collections of texts. While there are a growing number of tools and services for working with bibliographic data, many researchers may not know how to use these, and online bibliographies may not link through to digital copies of public domain works which are available online.

There are a growing number of open source tools for transcribing, translating and annotating texts. However many of these are one off projects and it may not be clear how to deploy the tools in relation to a given text or collection of texts.

Here’s what it would do:

The TEXTUS platform will enable users to:

Transcribe texts from images, PDFs or other non-machine readable sources.

View texts and translations side by side – and create new translations of texts for use in teaching or research.

Annotate texts, and share annotations with groups of users, or with the public.

Curate, share and export collections of bibliographic metadata (scholarly references), including metadata associated with texts published on the platform.

Here’s a peek under the hood:

TEXTUS builds on and utilises existing best of breed open source components and software packages such as:

Annotator – an open-source Javascript tool to enable annotations to be added to any webpage

Bibserver – which includes numerous tools, services and standards for working with bibliographic metadata

Open Literature – which powers OpenShakespeare, OpenMilton and other sites

Public Domain Works – a nascent directory of works which have entered the public domain in different countries around the world

Scripto – an open source tool that enables users to contribute transcriptions to online documentary projects

WordPress – due to its popularity, ease of use, and extensive plugin system, TEXTUS will use WordPress as its main CMS

If you’re interested, you can join discussion on the Open Knowledge Foundation’s open-humanities mailing list.

← Back to posts