• Home
  • Scanning 100.000 pages Best answer on the web

  • I have a project to scan about 300 books (different sizes) and 100 magazines (about 100.000 pages total). It needs to be done fast. Books and magazines are ready as separate pages - no binding. I have a limited budged (buying $10,000 scaner is not an opption for me - renting maybe). I need a complete solution for coverting books and mags into digital format - including what scanner or copier with a scanner would do this fast, what serchable format to choose, how to scan mags - image enhancement, etc.


  • perhaps it would be cheaper, if:

    you shipped all the books to India, China
    had someone scan it for you
    and then ship the books back?


  • Hello Mindaugas,

    I have selected some scanners, in a wide range of prices, found from reading scanner reviews and recommendations. I feel sure one or more of these will complete the work you need to complete. I have also included some OCR software reviews.

    =======================
    About document scanners
    =======================

    ?The overwhelming majority of document scanners are sheetfed with an automatic document feeder (ADF). Part of what distinguishes document scanners from other sheetfed scanners is that most offer both a simplex mode to scan one side of each page and a duplex mode to scan both sides.
    Don't confuse a duplex scanner, which scans both sides of the page at once, with a duplex ADF, which scans one side, turns the page over, and then scans the other side. You can confirm that the scanner duplexes from its claimed speeds. The rating for pages per minute (ppm) tells you how many sheets of paper it scans per minute. The rating for images per minute (ipm) tells you how many images it scans, with one on each side of the page. Make sure the duplexing ipm speed is double the ppm speed.? Please read this web site for complete information.?
    ?Adding recognition can add a significant amount of time or hardly any, depending on the software. Thus, one scanner can have a slower scan speed than another but be so fast at recognition that it's faster in real-world use. The only way to find that out is to test the scanners. If you can't run a test yourself, look for the information in our reviews.?
    ?If you want to scan and edit files, you'll need an OCR program or module that can send files to your word processing program.?
    According to this site, scanning photos on a document scanner is not recommended. Please read this web site for complete information. http://www.pcmag.com/article2/0,1895,2006854,00.asp


    ?If you're using OCR or other recognition technologies and you're dealing with highly variable or hard-to-read documents, look for state-of-the-art image processing, such as Kofax VRS or Kodak's similar Perfect Page and iThresholding combo. This will ensure best-possible image quality and higher recognition accuracy. All the better if the scanner can deliver 300 dpi images at rated or near-rated speeds.? http://printscan.about.com/gi/dynamic/offsite.htm?site=http://www.transformmag.com/showArticle.jhtml%3FarticleID=53200355

    ?In these environments, image quality is often more important than speed. In the past, if an end user needed to scan a document at 300 dpi or higher to increase OCR [optical character recognition] and ICR [intelligent character recognition] accuracy, the scanner would slow down to a crawl. However, now users can get a high-resolution image without sacrificing speed.? http://www.businesssolutionsmag.com/index.php?option=com_jambozine&layout=article&view=page&aid=2650&Itemid=68

    Features to look for in a scanner:
    http://www.hp.com/sbso/buyguides/pg-scanners-features.html


    ?Workgroup Scanners
    Price Range?$500-$2,000 / Speed?10-25 ppm
    Workgroup scanners are typically used by individuals or by small groups of users where a single workstation is the scan station for the rest of the group.?
    Departmental Scanners
    Price Range?$2,000-$5,000 / Speed?25-40 ppm
    With their faster speeds, these scanners meet the needs of variable low-volume, non-production applications. They are cost effective for larger groups such as departments or small to medium-sized businesses. http://canonscanningsuccess.com/scanners.asp?nav=1


    =====================================================================
    Some scanners recommended by PC Magazine, a trusted resource
    =====================================================================

    Fujitsu ScanSnap S500
    $400-$475
    Document and business-card scanner. Rated at 18 pages per minute, or 36 images per minute for scanning both sides. http://www.pcmag.com/article2/0,1895,1990588,00.asp

    Review:
    ?Fujitsu still focuses on scanning to PDF, with JPG as a second choice, and it relies on Adobe Acrobat 7.0 to handle the scanned files. But the ScanSnap S500 includes a version of Abbyy FineReader?FineReader for ScanSnap 2.0?as an alternative to optical character recognition. You also have the option to scan a document, recognize the text, and send it to, say, Microsoft Word in one step. But you still have to initiate the scan from within the ScanSnap software.? http://www.pcmag.com/article2/0,1895,1992786,00.asp




    Rated as Very Good by PC Magazine,
    ?The Canon DR-2580C is the fastest document scanner we've tested for scanning and saving in searchable PDF format, but those who need software for document management or indexing to help organize scanned files must buy that separately.? http://www.pcmag.com/article2/0,1895,1891755,00.asp


    ?The DR-2580C is not only one of the smallest scanners in its class at a mere 4.2 lbs, but also one of the most easy-to-use. Simply assign commonly used functions to its Scan-To Job buttons for one-touch operation. Offering advanced scanning technology, the DR-2580C acheives superior color quality and reproduction, as well as enhanced results from low contrast documents.? http://canonscanningsuccess.com/detail.asp?id=45&nav=1&snav=1&action=1

    ?In addition to a full-feature ISIS/TWAIN driver, the DR-2580C comes bundled with CapturePerfect 3.0 and Adobe Acrobat 7.0 Standard - giving you total control over scanning from start to finish.? http://canonscanningsuccess.com/detail.asp?id=45&nav=1&snav=1&action=2

    Full Review
    ?A document scanner's core task is turning large stacks of paper into digital format in a hurry, a task that the DR-2580C excels at. Canon claims that at resolutions of both 200 pixels per inch (ppi) and 300 ppi, the engine can process 25 pages per minute (ppm) in simplex mode (scanning one side of the page) and 50 images per minute (ipm) in duplex mode. Our test times when scanning to PDF image files were just below those speeds, at 24.5 ppm and 49.1 ipm. Although that is not the fastest we've seen, it's the fastest at this price or below, which is impressive.? http://www.pcmag.com/article2/0,1895,1891756,00.asp





    Canon 2050 Document Capture Scanner 0433B002
    $539
    Up to 20 pages per minute
    ? OmniPage SE? Bundled with ScanSoft?s industry leading OCR software that accurately converts scanned documents into editable text. ? CapturePerfect 3.0 New Features ? 1) Zone OCR can be used as indexed filename. 2) PDF Encryption and Security features. 3) Add, Delete and Insert pages. 4) Adjust and save brightness/contrast onscreen, after scan ? Adobe Acrobat 7.0 Standard Full Version ? Includes full version software application (not just the reader), a $299 US value. You can create PDF documents in any application that allows printing. And much more? http://www.ipaperlessoffice.com/cadrcoscca20.html




    Canon DR-2050C
    $547
    ?The DR-2050C incorporates Canon's renowned, high-precision roller system that delivers smooth, jam-free feeding. Whether scanning single sheet documents or multiple sheets of mixed document sizes and weight, the DR-2050C offers uninterrupted performance with one of the most reliable feeding systems in its category.? ? High-speed scanning of up to 20 ppm in simplex or 40 ipm in duplex.
    ? Simple connectivity with USB 2.0 interface.
    ? Text Enhancement mode can overcome obstacles leading to illegible image files such as color backgrounds, light colored lettering or pencil writing. ? Quickly and reliable scan mixed batch with efficiency-boosting features like Skip Blank Page and Automatic Page detection ? Up to 100 programmable user settings can be customized for improved productivity of frequent scan operations http://www.1st-in-scanners.com/canon/dr2050c.htm

    http://www.thenerds.net/index.php?page=productpage&affid=3&pn=0433B002&srccode=cii_9324560&cpncode=08-13644886-2
    ?When it comes to cost-effective, reliable document scanning, look no further than the DR-2050C. The DR-2050C incorporates Canon's renowned, high-precision roller system that delivers smooth, jam-free feeding. Whether scanning single sheet documents or multiple sheets of mixed document sizes and weight, the DR-2050C offers uninterrupted performance with one of the most reliable feeding systems in its category.? http://www.imagingsolutions.com/products_scanners_Canon_2050c.htm



    ScanSnap Fujitsu FI-5110C Color Scanner
    15 pages per minute, 50 sheet feed
    http://chuck888.stores.yahoo.net/fujitsufi5110c.html



    Other Recommended Scanners
    ===========================

    Canon DR-2580C
    $645
    25 pages per minute, 50 sheet feed
    Included software: Adobe Acrobat 7.0, Capture Perfect 3.0
    http://www.doxtek.com/products/cust_product.jsp?id=255


    =======================

    HP Scanjet 7800 Document Sheetfeed Scanner
    $799.00
    ?Scan both sides of a page with one pass?at 50 ipm (25 ppm)?using 50-page automatic document feeder. ? Save time on recurring projects with up to 30 customized scan profiles?select profiles from display. ? Easily scan different paper types, from business cards and plastic IDs up to legal-size documents. ? Scan and manage business cards using a unique card feeder and NewSoft Presto! BizCard Reader.
    Workgroup document management solution
    ? Increase OCR accuracy?HP Scanjets and Kofax VirtualReScan work together to optimize scan quality. ? Easily convert scans into editable text using your HP Scanjet and IRIS Readiris Pro OCR software. ? Save and manage scans using preset one-touch profiles created with HP Smart Document Scan Software. ? Easily organize digital documents using your HP Scanjet and included ScanSoft PaperPort software. http://h10010.www1.hp.com/wwpc/us/en/sm/WF05a/15179-64195-215155-15202-215155-1151021.html
    Complete specs found here:
    ?Spend less time fine-tuning. Get optimized scans the first
    time, without manually adjusting color or contrast.
    HP Scanjet Scanners and Kofax VirtualReScan work
    together to make automatic adjustments for greater OCR
    accuracy and sharper detail, such as logos and barcodes.
    ? Get editable text from hard copies. The included IRIS
    Readiris? Pro OCR software converts scanned documents
    to editable text with accuracy, using Microsoft Word or
    Adobe PDF formats.
    ? Make copies with the touch of a button. A ?copy? button
    delivers the convenience of a copier by sending a scan to
    your default printer, providing multiple copies of a single
    scan.?
    ?Convert your hardcopies into a variety of popular file formats: including
    Adobe PDF, Microsoft Word and Excel, Corel
    WordPerfect, TIFF, and JPEG. Electronic files can be easily
    edited, e-mailed, delivered to local or network folders and
    organized. A preview window lets you rearrange, delete
    or add scanned pages. Use bar codes to separate jobs
    and manage workflow.?
    http://h10010.www1.hp.com/wwpc/pscmisc/vac/us/product_pdfs/1151021.pdf

    $651 and free shipping at Amazon.com
    ? 1200 dpi Optical Resolution and 48-bit Color Depth
    ? Automatically Scan on Both Sides of a Page in One Pass
    ? Scan a Variety of Paper Sizes up to Legal 8.5 x 14 Inches
    ? Convert Scanned Documents Into Editable Text with Included OCR Software
    http://www.amazon.com/HP-Scanjet-7800-Document-Scanner/dp/B000FL1OU4/ref=pd_sxp_f_r/103-6624925-2089466?ie=UTF8
    TigerDirect provides some clear illustrations and specs:
    http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=2371666



    =======================

    Canon 3080 Document Capture Scanner 9673A002
    $2,298
    32 pages per minute, 100 sheet feed
    ? 100-sheet Automatic Document Feeder, ideal for continuous batch scanning of mixed documents ? business card to legal size ? the Canon DR-3080CII automatically adjusts for varying document sizes and thickness. ? Built-in Skew Correction automatically straightens misaligned documents.
    http://www.ipaperlessoffice.com/cadrcoscca30.html

    =======================


    This is a heavy duty, and fast scanner. It?s pricier than those above, but still far less than $10,000!
    Canon DR 7080C - Document scanner
    $4,864.82
    ?With the same impressive scanning speed of 70ppm for both color and black and white documents (A4/landscape/200dpi), the dynamic DR-7080C provides the perfect way to get through more work in far less time.
    Duplex scanning in color is equally rapid, at 36ipm (images per minute). Despite its high speed, the DR-7080C produces unsurpassed quality scanning, giving you the very best of both worlds. By adapting a 3-line CCD sensor, it provides continuous tone quality scanning.?
    ?Powerful software brings you the versatility, ease and efficiency you need for extensive applications. The many benefits of CapturePerfect 2.0 include Scan to Mail, PC and Print as well as 90 Degree Auto Rotation. While wide-ranging ISIS/TWAIN driver advantages include MultiStream scanning and Book? http://www.dreamhardware.com/store/product/index.php?product_id=112982


    More specs on this scanner.
    http://review.zdnet.com/Canon_DR_7080C_document_scanner/4507-3136_16-30915877.html?tag=ut

    ============
    OCR Software
    ============

    ?Optical Character Recognition (OCR) is a process of scanning printed pages as images on a flatbed scanner and then using OCR software to recognize the letters as ASCII text. The OCR software has tools for both acquiring the image from a scanner and recognizing the text. Ideal Source Material for OCR
    OCR works best with originals or very clear copies and mono-spaced fonts like Courier. If you have choices, use the following source material: ? 12 point or greater font size.
    ? Black text on a white background.
    ? A clean copy; not a fuzzy multi-generation copy from a copy machine.
    ? Standard type font (Times, New Roman, etc.) Fancy fonts may not be recognized.
    ? Single column layout.
    http://www.plu.edu/~libr/workshops/scanning/ocr.html



    OCR Primer
    ==========
    Loads of tips for OCRing
    http://www.braille2000.com/brl2000/docs/OCRprimer.pdf#search=%22how%20OCR%20works%22
    ?There are two basic methods used for OCR: Matrix matching and feature extraction. Of the two ways to recognize characters, matrix matching is the simpler and more common. Matrix Matching compares what the OCR scanner sees as a character with a library of character matrices or templates. When an image matches one of these prescribed matrices of dots within a given level of similarity, the computer labels that image as the corresponding ASCII character.
    Feature Extraction is OCR without strict matching to prescribed templates. Also known as Intelligent Character Recognition (ICR), or Topological Feature Analysis, this method varies by how much "computer intelligence" is applied by the manufacturer. The computer looks for general features such as open areas, closed shapes, diagonal lines, line intersections, etc. This method is much more versatile than matrix matching. Matrix matching works best when the OCR encounters a limited repertoire of type styles, with little or no variation within each style. Where the characters are less predictable, feature, or topographical analysis is superior.? http://www.dataid.com/aboutocr.htm



    ReadIris Pro
    ============
    ?Readiris Pro lets you reproduce your documents in more than 20 different applications such as Word, Excel, Acrobat, Internet Explorer, Netscape, WordPerfect, StarOffice and many others. The output file retains perfectly the lay-out of the original document.? http://www.irislink.com/c2-480/Readiris-Pro-11-OCR-software.aspx?gclid=CMX-37HQ1IcCFSjTYAodcSxToA

    ?How does it work?
    1. Scan your documentSimply scan your paper text or open a PDF file or an image document. Readiris Pro 11 opens the most common used images and PDF files. 2. Convert it into editable text
    Once you opened your file into Readiris, just click on recognize and save. Within seconds, your document is converted into digital files you can edit, share and save! It?s fast and accurate. 3. Export your file into your favourite application
    Automatically send the recognized document into your favourite application such as: Word, Excel, Acrobat (PDF), Internet Explorer (HTML), WordML, SpeadsheetML or save it as an external file. http://www.amazon.com/Readiris-Pro-11/dp/B000FID7N4/sr=1-3/qid=1159594170/ref=pd_bbs_3/103-6624925-2089466?ie=UTF8&s=software

    PaperPort
    =========
    ?There's a lot to like in PaperPort Pro 9 Office ($199.99 direct), ScanSoft's latest iteration of its popular document management utility. Perhaps the biggest draw is the ability to create, edit, annotate, and even to search for text within PDF files, all from PaperPort's convenient interface.
    Designed to make scanned documents more manageable, PaperPort lets you organize them into folders in its main window. You can stack individual pages to create combined documents, then view them either as thumbnails or in full-screen mode. This version includes ScanSoft's invaluable Form Typer utility. Scan a form, then drag and drop it on the Form Typer icon; Form Typer automatically identifies the blanks on the form so that you can type in your responses. In addition, you can import any file to PaperPort's management system. If it's an image file, you can use the built-in editing features to adjust color, contrast, and brightness and even remove red eye.? http://www.pcmag.com/article2/0,1759,1090218,00.asp




    OmniPage 15
    ===========
    $149.99
    This page displays how the product works, and the file formats to which the scanned document can be converted, including Word, Excel, Power Point, PDF and 26 others.
    ?OmniPage 15, the latest version of the world's best selling OCR software, is the most precise way to convert paper and PDF files into your favorite PC applications quickly and cost-effectively. Powerful new OCR technology, advanced layout analysis and intuitive editing tools allow you to quickly turn paper and PDF files into more than 30 different editable electronic file formats that look just like the original ? complete with text, tables and graphics. What's more, improved speed and efficient workflow capabilities combine to make document conversion faster and easier than you could imagine. Save time and money like never before with the world?s most powerful document conversion application.? http://www.digitalriver.com/v2.0-img/operations/scansoft/site/html/omnipage/080206/omnipage15_standard.htm


    A scanning tutorial
    ===================
    http://www.abdn.ac.uk/dit/docu/facts/gc/fsgc05.pdf#search=%22scanning%20large%20number%20of%20documents%22
    Another tutorial
    ================
    http://www.its2.uidaho.edu/cti/res/tuts/scan/


    Corel Paint Shop Pro
    ====================
    For inexpensive image enhancement, consider Corel Paint Shop Pro
    Corel?s site will allow you to download a free trial version.
    http://www.corel.com/servlet/Satellite?c=Product_C1&cid=1155872554948&lc=en&pagename=CorelCom%2FLayout


    There you go! I didn?t know your budget, or exactly how fast you need the job done. You may consider getting two lower priced scanners and scan in tandem. Once the job is done, you could sell one on eBay, if you no longer need it for future jobs. You?d need two PCs, as scanning and OCRing uses a great deal of memory and processor power.
    Of all the above scanners, I believe, if it were me, I?d chose the HP Scanjet 7800 Document Sheetfeed Scanner. It is reasonably priced, and comes with Readiris Pro 11 for OCR ease. The software allows you to decide to convert your scanned files into a PDF document, or various others including graphics files and Word documents. You won?t need to purchase any other software.
    If this answer is unclear, or is not the information you were seeking, please request an Answer Clarification, and allow me to respond, before you rate.
    Sincerely, Crabcakes




    Search Terms
    ============
    Heavy duty document scanners
    Canon DR 7080C - Document scanner + review
    OCR software
    Best document scanners + cnet
    Best document scanners + PCMagazine
    Document scanners recommendations
    Leasing document scanners
    Batch scanners


  • I know of a company in Gainesville Florida who does this: they do business as a record storage alternative. They scan many file cabinets full of records every day.
    The images an be in any format you like: .jpeg, .tiff, .pdf, whatever.

    They are NOT OCR'ed, but are high photo quality color images of the original.

    I'd post the info e-mail, but google answers won't let me post e-mail addresses.


  • Have you tried Imaging Connections yet? Imaging Connections is an online marketplace for document scanning services and scanning systems. The service is free to use. All you do is complete a simple form with your scanning requirements to receive up to 5 free scanning quotes from multiple vendors. Check it out, heres the website www.imagingconnections.com