November 29, 2021
Educational Portal of the Americas
 Printer Friendly Version  E-mail this Page  Rate this Page  Add this Page to My Favorites  Home Page 
New User? - Forgot your Password? - Registered User:     

Site Search

Number: 71
Year: 2002
Author: Johann Van Reenen, Editor
Title: Digital Libraries and Virtual Workplaces. Important Initiatives for Latin America in the Information Age

Creation and conversion of data

      The most significant contribution of digital libraries is the creation of digital content, whether “born digital” or converted from print-based formats. Several representative digital library projects have been initiated to advance developments in both arenas. Noerr’s Digital Tool Kit and Fox’s Chapters 4 and 5 of this book provide extensive descriptions of current projects.

      Most library ventures into the creation of digital libraries include the conversion of local collections from print-based formats to digital formats. There are two major methods for conversion of text-based collections: manual re-keying of text and Optical Character Recognition (OCR) scanning. Before any text is converted a document structure or schema must be determined and the markup method developed. Of course the application software used to produce the collection must also be considered.

      If a re-keying process of the text is chosen, then the materials are removed from their shelves or cabinets, data entry is performed, errors detected and then corrected. This process is very labor intensive, slow and can be expensive if executed in-house by the library. Commercial service bureaus or conversion vendors with highly skilled data entry operators located in several countries can provide cost effective quality conversions. Typically, re-keying is considered to be the most expensive means of conversion.

      OCR scanning is a viable option for some text data conversion projects. Rather than re-keying the text, scanners are used to “read” the characters and convert them into digitally encoded text.  Materials to be scanned must be removed from their storage containers. Bound materials may need to be photocopied or unbound. Scanning speeds can vary depending on the capacity of the scanners and the PC hardware and software selected. The quality of the original document ultimately will determine the quality of the scanned document.

      The conversion of documents that include images (i.e. photos, drawings, maps, graphs) must be converted using scanners and document imaging techniques. Integrated commercial packages have been bundled with hardware (servers, scanners, and workstations) and the software to facilitate indexing and retrieval of the processed collections.

      Extensive developments have been accomplished in recent years to increase the functionality and productivity of conversion techniques. Open standards should be used where available, affordable and feasible. Avoid investing in conversions of data that will result in long term archival storage in proprietary data formats such as Adobe’s Acrobat. Will a reader program exist in 20 years that can read that data? Or will that data have to be repeatedly converted to keep pace with future versions of reader programs?

      Cornell University Library has published a Digital Imaging Tutorial in both English and Spanish (Cornell 2000).  The Northeast Document Conservation Center conducts the School for Scanning, see their web site for more information (

      Saffady (1999, p. 291) and Lesk (1997, p. 48) offer comprehensive chapters on digital libraries and text conversions. The Archives Builders provides extensive information on document conversion techniques, see web site for more information (

      While the focus of this chapter has been on the technical aspects of digital libraries, perhaps the most valuable investment an organization can make is in the human resources necessary to create and support the hardware infrastructure and the creation and conversion of content. The recruitment and hiring of knowledgeable technicians is crucial, but the commitment does not end there. Ongoing training is a must. The constant surveying of technological developments and new digital library projects will be required to maintain an awareness of new techniques.