TEXTUS: an open source platform for working with collections of texts and metadata

Since finally blogging about OpenPhilosophy.org last month I’ve been thinking about how one could make a generic open source platform that could be used to power it, and other things like it. Enter ‘TEXTUS':

TEXTUS is an open source platform for working with collections of texts and metadata. It enables users to transcribe, translate, and annotate texts, and to manage associated bibliographic data.

Here’s the rationale:

The combination of freely available digital copies of public domain works, open bibliographic data and open source tools has the potential to revolutionise research in the humanities. However there are currently numerous obstacles which mean that they are often under-utilised by scholars and students in teaching and research:

  • From classic literary and cultural works, to letters, drafts, notes, and other historical documents, there is a huge amount of freely available public domain material that is highly relevant to scholars and students engaged in research in the humanities. But these works can be difficult to find, difficult to work with, and works by a given author may be scattered in a variety of locations. Search results may be confusing or unclear. Automated Optical Character Recognition of texts may be inaccurate or incomplete. The metadata for the work for may be unclear and the provenance and rights status for a given digital edition may be unknown. It is not always clear how to cite passages from digital editions of public domain works.
  • Over the past few years, libraries and other cultural heritage organisations have been releasing open data about works they hold. This has the potential to be a rich resource for scholars interested in building scholarly bibliographies and working with large collections of texts. While there are a growing number of tools and services for working with bibliographic data, many researchers may not know how to use these, and online bibliographies may not link through to digital copies of public domain works which are available online.
  • There are a growing number of open source tools for transcribing, translating and annotating texts. However many of these are one off projects and it may not be clear how to deploy the tools in relation to a given text or collection of texts.

Here’s what it would do:

The TEXTUS platform will enable users to:

  • Transcribe texts from images, PDFs or other non-machine readable sources.
  • View texts and translations side by side – and create new translations of texts for use in teaching or research.
  • Annotate texts, and share annotations with groups of users, or with the public.
  • Curate, share and export collections of bibliographic metadata (scholarly references), including metadata associated with texts published on the platform.

Here’s a peek under the hood:

TEXTUS builds on and utilises existing best of breed open source components and software packages such as:

  • Annotator – an open-source Javascript tool to enable annotations to be added to any webpage
  • Bibserver – which includes numerous tools, services and standards for working with bibliographic metadata
  • Open Literature – which powers OpenShakespeare, OpenMilton and other sites
  • Public Domain Works – a nascent directory of works which have entered the public domain in different countries around the world
  • Scripto – an open source tool that enables users to contribute transcriptions to online documentary projects
  • WordPress – due to its popularity, ease of use, and extensive plugin system, TEXTUS will use WordPress as its main CMS

If you’re interested, you can join discussion on the Open Knowledge Foundation’s open-humanities mailing list.

This entry was posted in bibliography, digital, digitalhumanities, history, humanities, ideas, literature, notes, open data, openknowledge, projects, technology. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.
  • To receive new posts via email, you can sign up here:

5 Comments

  1. Posted December 8, 2011 at 11:13 pm | Permalink

    Fabulous stuff Jonathan! Just joined the Open Humanities list though I can’t figure out how to access the list archives. Any help there?

  2. Posted December 9, 2011 at 9:57 pm | Permalink

    This is great stuff. I guess it would be an ideal platform for crowd sourced transcription of digitized works? If so, there are many orgs in the cultural heritage domain that could benefit from a tool like this. Where is the github repo?

  3. Lars Aronsson
    Posted December 10, 2011 at 12:01 am | Permalink

    When you write platform, do you intend to start a new website for people to transcribe text? If so, why start a new one instead of using the existing Wikisource? I’m not saying Wikisource is the perfect one-stop-shop, but I think you should document your reasons for rolling your own. Perhaps Wikisource could learn something from that.

  4. Posted December 13, 2011 at 7:50 am | Permalink

    You also might find this tool useful:

    http://bencrowder.net/blog/category/unbindery/

    My friend developed it for OCR correction and transcription. The source hasn’t been open sourced yet but he plans on doing so soon.

  5. Posted December 22, 2011 at 8:44 pm | Permalink

    Are you familiar with DocumentCloud? It’s an open source document handling platform for journalists, including upload, OCR, storage, search, annotation, embedded viewing, and metadata.

4 Trackbacks

  • […] Fuente:┬áTEXTUS: an open source platform for working with collections of texts and metadata […]

  • By The Citation Conundrum on February 14, 2012 at 6:54 pm

    […] have said, thought or done about this – partly so we can bear this in mind when building TEXTUS and OpenPhilosophy.org. Do you know of an interesting approach, paper, standard, or plugin? If so […]

  • By The Sea of Stories on March 11, 2012 at 10:08 pm

    […] used for his own research and publications. The index could be powered by the open source TEXTUS platform, which would enable users to update bibliographies and upload, transcribe and translate […]

  • By On Machine Readable Reading Lists on March 26, 2012 at 7:29 pm

    […] very keen to make it easy for people to create machine readable reading lists using TEXTUS, an open source platform for working with collections of texts which is currently being funded by […]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>