When it comes to digitizing your historic newspaper collection invariably you are faced with a number of questions. What is the goal of your digitization project? How will you plan for long-term preservation? How will you approach online accessibility? What features and functionality do you need to ensure a positive user experience? How will users interact and engage with the online collection?
Lately, historic newspaper collection owners in both academic and public libraries are giving the accessibility question a lot more attention and they are looking for solutions beyond a simple PDF display. In addition to patrons’ high expectations for remote, digital access to content and an improved user experience, libraries are looking for solutions that have long-term sustainability and a common, supported standard. For many that solution is METS/ALTO, the industry standard file format developed and maintained by the Library of Congress.
There are a number of benefits to using the industry standard format:
Given the sponsorship of the Library of Congress it is accepted that if METS/ALTO were to become obsolete a suitable migration path will be developed for the many hundreds of millions of digitized pages already in this format.
It is easy for projects using the same open standard to share content when it is desirable to do so.
Projects using METS/ALTO benefit from the knowledge and tools created by and for other projects using the same standards.
METS/ALTO has the capability to store and search the full-text content of each page and word, it captures structural information like column, line, and word locations, and it may optionally support article segmentation
, so articles, headlines, bylines, and other article-level metadata are recognized.
Until now, the cost to produce METS/ALTO without article segmentation has been in the range of $0.50 - $0.60 per page – significantly higher than producing PDFs or similar! The good news is that Veridian can now create automated page-level METS/ALTO for less than one-quarter of this cost
Automated page-level METS/ALTO is a great option if you are starting with:
- Good quality images or PDFs,
- An archive of original newspapers,
- An in-house scanner producing TIFF files, or
- Images scanned from good quality microfilm
It is now very easy and cost effective for us to create a high quality METS/ALTO collection similar to the collections found in Chronicling America. Compared to a PDF display, this is a better and more standards compliant option.
To see an example of a Veridian Automated METS/ALTO collection, visit the Illinois Digital Newspaper Collections Sangamo Journal