Data Preservation Tips: Three Ways to Digitize Documents and Books #Digitization #BookScanners

Digitizing documents and books has never been easier. There are a number of products and apps designed for easily scanning material and turning it into digital text to be navigated and read from a computer or smartphone. Is there an old book you’ve wanted to keep and remember forever even when it gets worn out? The data preservation techniques featured in this blog post can help you turn your old books, documents and more into ageless digital artifacts. These products range from free, simple smartphone scanner apps to more expensive document scanners with automatic page-turners. Read on to discover which option may make the sense for you in your digitization journey. 

What is OCR?

OCR stands for Optical Character Recognition. From Wikipedia:

OCR is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image.

Many of the products and services have OCR functionality built-in and it’s helpful to know a little about OCR before considering the following various scanning options. With the scanning, apps as you’ll see, it’s often a premium feature that you must pay for.

Method #1: Scanner Apps (Free – $21/year)

I will cover two of the highest-rated smartphone apps for scanning documents and then scan a book I’ve been meaning to digitize to compare the results. The book I am using is called Computers: Their History and How the Work by Richard B. Rusch. This one needs to be archived ASAP.

App #1: The Adobe Scan App

This app was super straightforward to use. To show how this app works, I’ll demonstrate the “book” feature which I found quite useful for scanning a bound book.

First I selected the “book” mode after opening the app. The app then automatically finds the corners of the book and captures a shot.

Then the app separates out the two pages and allows you to remove any marks such as fingers holding the book open. I found this process quite intuitive.

Additionally, the app allows you to easily send the finished pages to your computer or any other external device as a PDF or each individual page as a PNG.

Final results:

The only downside I see to this app is that in order to annotate any pages you must import the scanned PDF into the Adobes Acrobat Reader app.

The paid version of the app gives access to the following (at $9.99/mo)

  • Export PDFs to Word, Excel, or PowerPoint
  • Combine multiple files
  • Compress files
  • Password protect your scans
  • OCR and Recognize text on up to 100 pages
  • Up to 200 GB of storage on Document Cloud
  • More features accessible acrobat Reader App

Here are some additional details I’ve gathered about this app:

  1. Cost: free
  2. Unlimited scans: yes
  3. Book scan feature: yes
  4. Extra Features/modes: whiteboard, business card 
  5. OCR / Text recognition: yes, paid
  6. Background mat needed: no
  7. Paid option: yes, 
  8. Annotation feature: no, directed to adobe acrobat reader
  9. Available file formats: PDF, PNG 

App #2: Fine Reader

The Fine Reader app also has the book mode for scanning two pages at once, however, I found it was not effective at auto capture and didn’t crop the pages as nicely as the Adobe App. So I just used the document scanner mode which seemed to work a lot better. I did need to put a background mat behind the book for the app’s auto capture to work properly. I did not have this problem with Adobe’s app.

The neat thing about Fine Reader is that it has free OCR /  text recognition available for up to 5 scans.

Results:

Here are some of the premium features if you sign up for the $21 yearly plan:

Extra details:

  1. Cost: free
  2. Unlimited scans: no, first 5 free
  3. Book scan feature: yes
  4. OCR / Text recognition: yes
  5. Background mat needed: yes
  6. Auto capture: yes but not great
  7. Paid option: yes, the first 5 scans are free then, $21 a year
  8. Annotation feature: no, directed to adobe acrobat reader
  9. features/modes: annotation
    1. Pen, marker, sign, text
  10. Available file formats: PD, JPG

Bottom line: Adobe’s app is the most user-friendly and effective. If you need OCR / text recognition use Fine Reader. If you want to be able to annotate from the app itself go with Fine Reader.

Scanning tip: When scanning books with a smartphone app, it is often difficult to hold the phone with one hand and the book open with the other. Generally, the quality and consistency of the scans suffer as well. One helpful tip is to use a book holder such as the one above to hold the book open while you scan. You can move the page holder “legs” over so they do not block the text and then you can go back through the scans later and remove the holder legs similar to how I did with my fingers when I used them to hold the book open.

Method #2 – Document Scanner Products Designed for Bound Books (~$80 – $600)

The next category of scanners I’d like to cover is the category of hardware that can scan bound books. This covers a wide array of different scanners including handheld OCR scanners, overhead scanners, automated page-turning scanners, and even mouse scanners. This category ranges from the inexpensive end including mouse scanners ($80) to the high end ($600) which includes overhead book scanners to the super high-end which included automated page-turning scanners. I will start with the simple and inexpensive scanners and work my way up to the more expensive scanners with all the bells and whistles.

Mouse and “handheld” Scanners

Price range: ~$80 – $250

These handy little scanners are easily portable and have lots of features packed inside. Mouse scanners like the IRIScan Mouse Executive 2 are best for short jobs such as scanning a couple of documents or some business cards. Additionally, they have OCR which is very handy for later searching through the text and documents that were scanned.

While these products are better used for flat documents, they can be used to scan books. Check out this helpful video on how to best scan a bound book with a mouse scanner.

This category also includes mobile or “handheld” OCR scanners.

These scanners work like the mouse scanners however they are typically more portable and connected through Bluetooth. Additionally, their ergonomic shape allows them to more easily scan bound books. Some OCR pen products out there even offer text-to-speech features as well as translation to other languages. With many of these products, you can also digitally highlight text too.

Overhead scanners

Price range:~$100 – $500

These scanners are great for books and thickly bound documents. The form factor of these scanners resembles that of an overhead projector. Additionally, these products come with software that is able “flatten” the images to appear flat in digital PDF form. They also typically come with OCR capability as well which makes them great for scanning textbooks that you want to be able to search through later.

Bonus scanners

The next area of this category includes scanners that are in a significantly higher price range. These are the type of scanners used by libraries and mass document archival. They have intricate mechanical systems that allow them to automatically turn each page of the book to scan.

Automatic book scanners

Price range ~$10K + 

Some automatic scanners such as the Treventus Scan Robot allow you to scan up to 2,500 pages per hour without lifting a finger.

In development

This scanning method uses “monocular video” and has the user quickly turn through the book. It isn’t yet commercially available and it’s not fully automated however it’s quite innovative in its efficiency and simplicity.

Method 3 – Document Scanners for Unbound books and flat documents ($300 – $500)

The last category of scanners to cover is those designed for flat documents that are unbounded. These are sheet feed and flatbed scanners.

Sheet feed Scanners

Price range: ~$200 – $500

These scanners require that you feed them unbound paper. This means that in order to scan a book in this method you would have to remove the book’s spine. The benefit of these scanners is the ability to scan many documents without having to manually do much. Most allow you to load up to 50 pages at once for the scanner to scan the front and back automatically before having to load more. Most include built-in OCR. These scanners are often referred to as duplex scanners meaning both sides of the page can be scanned without any manual page-turning. Many of these scanners can scan both sides of the page at a speed of 35 ppm (pages per minute).

Flatbed Scanners 

Price range: ~$100 – $500

These scanners are best for scanning flat documents such as IDs, passports, single sheets of paper, and photos. They are typically high quality (~300 DPI) and scan quickly. If you have to scan a large number of unbound documents you may be better off with a sheet feed scanner.

Bookend

Overall, you can get most of your scanning needs met with a free simple scanning app. However, if you are scanning in high volume regularly and are looking for some reliability, the various hardware options can be quite helpful. I hope this post was valuable in finding the scanner or app for your digitization needs!


Adafruit publishes a wide range of writing and video content, including interviews and reporting on the maker market and the wider technology world. Our standards page is intended as a guide to best practices that Adafruit uses, as well as an outline of the ethical standards Adafruit aspires to. While Adafruit is not an independent journalistic institution, Adafruit strives to be a fair, informative, and positive voice within the community – check it out here: adafruit.com/editorialstandards

Join Adafruit on Mastodon

Adafruit is on Mastodon, join in! adafruit.com/mastodon

Stop breadboarding and soldering – start making immediately! Adafruit’s Circuit Playground is jam-packed with LEDs, sensors, buttons, alligator clip pads and more. Build projects with Circuit Playground in a few minutes with the drag-and-drop MakeCode programming site, learn computer science using the CS Discoveries class on code.org, jump into CircuitPython to learn Python and hardware together, TinyGO, or even use the Arduino IDE. Circuit Playground Express is the newest and best Circuit Playground board, with support for CircuitPython, MakeCode, and Arduino. It has a powerful processor, 10 NeoPixels, mini speaker, InfraRed receive and transmit, two buttons, a switch, 14 alligator clip pads, and lots of sensors: capacitive touch, IR proximity, temperature, light, motion and sound. A whole wide world of electronics and coding is waiting for you, and it fits in the palm of your hand.

Have an amazing project to share? The Electronics Show and Tell is every Wednesday at 7:30pm ET! To join, head over to YouTube and check out the show’s live chat and our Discord!

Join us every Wednesday night at 8pm ET for Ask an Engineer!

Join over 38,000+ makers on Adafruit’s Discord channels and be part of the community! http://adafru.it/discord

CircuitPython – The easiest way to program microcontrollers – CircuitPython.org


New Products – Adafruit Industries – Makers, hackers, artists, designers and engineers! — New Products 9/4/2024 Featuring Raspberry Pi Pico 2 – RP2350! @adafruit

Python for Microcontrollers – Adafruit Daily — Python on Microcontrollers Newsletter: The latest on Raspberry Pi RP2350-E9, Bluetooth 6, 4,000 Stars and more! #CircuitPython #Python #micropython @ThePSF @Raspberry_Pi

EYE on NPI – Adafruit Daily — EYE on NPI Maxim’s Himalaya uSLIC Step-Down Power Module #EyeOnNPI @maximintegrated @digikey

Adafruit IoT Monthly — IoT Vulnerability Disclosure, Decorative Dorm Lights, and more!

Maker Business – Adafruit Daily — A look at Boeing’s supply chain and manufacturing process

Electronics – Adafruit Daily — Function Generator Outputs

Get the only spam-free daily newsletter about wearables, running a "maker business", electronic tips and more! Subscribe at AdafruitDaily.com !



No Comments

No comments yet.

Sorry, the comment form is closed at this time.