1. Consider the entire project and all its steps from the outset, and talk to all your proposed vendors/suppliers early in the planning process.
2. Often the decisions you make about the platform the finished collection will reside on will affect the type/format of digital objects you need to produce. Don’t be tempted to start scanning/digitizing before you make decisions about hosting platforms and long term preservation.
3. Consider which parts of the project you’re able to do yourself and which you might outsource. For example, if the project is not too large you might consider doing the scanning work in-house, if you have suitable equipment. For larger projects though it might make sense to outsource all the scanning and data preparation work.
4. Carefully consider how the project fits with (and differs from) your other digitization projects. Is the platform and/or workflow you’ve used for previous digitization projects suitable for newspapers? Are there better alternatives?
5. Get advice from those who have worked on large newspaper digitization projects before. Newspaper projects have unique characteristics and are often more complex than other types of digitization projects and you are likely to get good advice from institutions that have gone through the process. Following a similar approach/workflow to that you’ve developed for other types of projects may not always be the best option.
6. Produce digital objects in the
METS/ALTO format. If done correctly it should not cost more to produce METS/ALTO than something simpler like PDF, and there are many benefits to doing so.
7. Evaluate your scanning/digitization options carefully. If your newspapers have already been microfilmed it is easier and less expensive to scan from microfilm than from originals. Scanning from microfilm also gives you a larger choice of vendors, since microfilms are much more easily transported than original newspapers. On the flip side, scanning from originals may produce better digital images.
8. If using different vendors for scanning and OCR, as is often the case for large newspaper projects, or if you’re doing the scanning in-house, talk to the people responsible for the OCR process early in the project. If possible send sample images to the OCR vendor prior to going ahead with large-scale scanning. Filters and image processing algorithms applied after scanning to make images “look nice” can often have a detrimental effect on OCR accuracy.
9. Consider the costs and benefits of “article segmentation”. Many modern newspaper digitization projects now do this, to allow individual newspaper articles to be identified on the page. Article segmentation is generally considered to provide a nicer user experience, but there is additional cost. Typically you might expect it to cost $0.30 - $0.50 per page to digitize newspapers without article segmentation, or $0.70 - $1.00 with it.
10. Consider running a pilot project with a small number of newspaper pages, prior to making any final decisions. Many scanning, OCR, and hosting platform vendors are willing to process a small number of samples and put them online for evaluation, for little or no cost. Doing so allows you to ensure the entire process works as expected before you commit to it.
This Top 10 list is the first segment in a new series. We will publish a new top 10 list on our
Knowledge Base each month and include the list in our newsletter.
Next month’s list: Top 10 things you may not know about Veridian Software.