Fabio Carrera

Archive Assistant Project @ E-Scripts

This is a Java Application connected with the Emergent Transcriptions System, which represents an attempt at streamlining the process of “unearthing” the largely-untapped goldmine of historical knowledge contained in handwritten manuscripts stored in the archives all around the world. The basic idea is to make it possible for scholars to easily access the historical information contained in ancient manuscripts by instituting a virtuous cycle of automatically-accruing and ever-improving transcriptions of these ancient records.

Please, read the paper entitled Making History: an Emergent System for the Systematic Accrual of Transcriptions of Historic Manuscripts for more detailed information.

The Archive Assistant v.0.1

The Archive Assistant™ (AA) is an open-source Java application that runs on a Linux™ machine running an Apache™ server with a MySQL™ database backend. The entire suite of applications is available for free on the internet from the respective open-source providers. The application is partly similar to existing programs in that it is meant to assist archives and libraries in scanning, cataloguing and disseminating their manuscript collections. It differs from other similar initiatives in the fact that it enables the institution to gradually receive, collate and accumulate the transcriptions that are returned from the end-users who use the Transcription Assistant™ application described below. Full technical details of the AA application will be provided in future technical papers. An ineluctable premise upon which our whole initiative is predicated is that institutional archives will make available to transcribers the digital images of the manuscript pages in their holdings, so they can be downloaded via an internet search engine, as is becoming more and more customary at many institutions. Our system is designed to work with any type of manuscript in any language, with any alphabet and of any age, though it is currently being tested on Venetian manuscripts from the XIII-XVII cent. from the Venice Archive and on early American manuscripts from the American Antiquarian Society.

Our emergent transcription system relies on the diffusion of digital images of manuscripts as the basis for the distributed asynchronous production of transcriptions. Scanned images are packaged together with the metadata of the manuscript that they depict, to create an XPG (eXtended jPeG) file. Appropriate metadata accompanies a manuscript to make it usable in a historical context. These metadata are generally already used in the manuscript catalogues in operation at libraries and archives. The XPG file type supports metadata and image packaging into a single XML file.

Our metadata sub-system currently consists of a superset of the MARC and Dublin-Core standards, allowing for the conversion from one standard to the other. The Archive Assistant v.0.1 will add functions for the bulk importing of existing MARC and Dublin-Core databases into our system.

More sophisticated components of the AA application will allow advanced searches on metadata and transcription text, with the possibility of expanding searches in the future to more sophisticated image-based algorithms. After a successful search, users will be able to browse the resulting manuscript pages and select them for downloading into their own machine for use with the Transcription Assistant.

After the end-user has transcribed a manuscript page, the XPG file is augmented with an XML-based transcription section, according to the Manuscript Markup Language (MML) that we have developed for the occasion (technical details will be provided in a forthcoming paper). After an initial transcription is made, the manuscript page (manuscript metadata + image + transcription metadata + transcription) is packaged into an MML file from then on.

The Archive Assistant v.0.1 will finally receive returned MML files containing transcriptions produced by end-users and store them in the backend MySQL database where appropriate transaction activity information will also be retained, before passing the control to the Contribution Accountant for post-processing.

FABIO CARRERA [Home, Bio, News, Teaching, Research, Publications, Grants, Service, Contact]