Archive
Assistant Project @ E-Scripts
This is a Java Application connected with the Emergent
Transcriptions System, which represents an attempt at streamlining the process
of “unearthing” the largely-untapped goldmine of historical knowledge contained
in handwritten manuscripts stored in the archives all around the world. The basic idea is to make it possible for
scholars to easily access the historical information contained in ancient
manuscripts by instituting a virtuous cycle of automatically-accruing and
ever-improving transcriptions of these ancient records.
Please, read the paper entitled Making
History: an Emergent System for the Systematic Accrual of Transcriptions of
Historic Manuscripts for more detailed information.
The Archive Assistant
v.0.1
The
Archive Assistant™ (AA) is an open-source Java application that runs on a
Linux™ machine running an Apache™ server with a MySQL™
database backend. The entire suite of
applications is available for free on the internet from the respective
open-source providers. The application
is partly similar to existing programs in that it is meant to assist archives
and libraries in scanning, cataloguing and disseminating their manuscript
collections. It differs from other
similar initiatives in the fact that it enables the institution to gradually
receive, collate and accumulate the transcriptions that are returned from the
end-users who use the Transcription
Assistant™ application described below.
Full technical details of the AA application will be provided in future
technical papers. An ineluctable premise
upon which our whole initiative is predicated is that institutional archives
will make available to transcribers the digital images of the manuscript pages
in their holdings, so they can be downloaded via an internet search engine, as
is becoming more and more customary at many institutions. Our system is designed to work with any type
of manuscript in any language, with any alphabet and of any age, though it is
currently being tested on Venetian manuscripts from the XIII-XVII cent. from
the Venice Archive and on early American manuscripts from the American Antiquarian Society.
Our emergent transcription system
relies on the diffusion of digital images of manuscripts as the basis for the
distributed asynchronous production of transcriptions. Scanned images are packaged together with the
metadata of the manuscript that they depict, to create an XPG (eXtended jPeG) file. Appropriate metadata accompanies a manuscript
to make it usable in a historical context.
These metadata are generally already used in the manuscript catalogues
in operation at libraries and archives.
The XPG file type supports metadata and image packaging into a single
XML file.
Our metadata sub-system currently
consists of a superset of the MARC and Dublin-Core standards, allowing for the
conversion from one standard to the other.
The Archive Assistant v.0.1 will add functions for the bulk importing of existing MARC and
Dublin-Core databases into our system.
More sophisticated components of the
AA application will allow advanced
searches on metadata and transcription text, with the possibility of
expanding searches in the future to more sophisticated image-based
algorithms. After a successful search,
users will be able to browse the
resulting manuscript pages and select
them for downloading into their own machine for use with the Transcription
Assistant.
After the end-user has transcribed a
manuscript page, the XPG file is augmented with an XML-based transcription
section, according to the Manuscript Markup Language (MML) that we have
developed for the occasion (technical details will be provided in a forthcoming
paper). After an initial transcription is made, the manuscript page (manuscript
metadata + image + transcription metadata + transcription) is packaged into an
MML file from then on.
The Archive Assistant v.0.1 will
finally receive returned MML files
containing transcriptions produced by end-users and store them in the backend MySQL database where appropriate transaction activity
information will also be retained, before passing the control to the Contribution
Accountant for post-processing.
FABIO
CARRERA [Home,
Bio, News, Teaching, Research, Publications, Grants, Service, Contact]