For the past few years I’ve been endeavoring to move toward a “paperless” lifestyle, eliminating unwanted papers from our life. We live in a relatively small house for seven people and we already have tons of books taking up space. Stray papers, especially old receipts and invoices and statements that we may never need to consult, just add to the clutter.
Instead, I’ve set about digitizing the lot of them, a whole file cabinet full and plus some. I’ve written before about Evernote, but that’s only part of the whole process. That’s where all the scanned and digitized paper ends up. To get it in there requires a scanner.
I started several years ago with the Fujitsu ScanSnap S300M, which was their low-cost scanner, but slower and less capable than their main product line. A couple of years ago, through the Amazon Vine program, I received a review unit of the Canon imageFormula P–215, another low-profile, portable scanner with many of the same speed and capacity limitations of the Fujitsu, which has gone to live in my office at work.
And now this past week, I got another scanner to review through Vine, the Epson DS–510 Workforce. This is a full-fledged desktop document scanner and so it is much faster at scanning (a lot!) and it’s much less prone to jams. It’s not perfect, though, because the software isn’t as good as that from Canon and Fujitsu, most notably because it doesn’t do OCR as it scans like the others do. Setting up the Epson was a pain because they give you a CD-ROM with the software on it. Unfortunately, my MacBook Pro doesn’t have a CD drive in it. I tried downloading the drivers from the Epson site, but it turns out that it’s just barebone and none of the special features that make this scanner worthwhile were available in it. So I got out my external CD drive and did the regular install. As usual in my experience with Epson, the process was quirky. You have to run software update over and over again, with it installing only one new component at a time, until at last it tells you there’s nothing else. Only then are you ready to go. There were some other quirks as well, related to the one-touch button and setting up various profiles.
Anyway, I decided to automate some of the more onerous parts of scanning various bits of paper that I scan all the time. Using the preference pane Hazel from Noodlesoft, some Folder Action scripts, and PDFPen Pro 6 from Smile Software, I can automate 90% of the process from clicking go on the scanner to having it in Evernote in the correct notebook with the correct tags and correct name.
First, I have the Epson Document Capture software scan the document into a folder on my hard drive called “Scans”. I have set up a Folder Action script on that folder that takes any PDF put in that folder, opens it in PDF Pen Pro which runs OCR on it, and then saves it in a folder called “HazelScans”. Hazel is watching that folder with a series of rules based on the most common kinds of documents I receive: credit card statements from my banks, utility bills, doctor’s bills, etc. Hazel could look at the content inside the document and so it knows what kind of document each is and then renames them accordingly. Finally, Hazel runs an Applescript that imports the file into Evernote with the appropriate folder, note name, and tags.
I started with this article by Katie Floyd at Macworld, “How I went Paperless with Hazel and Evernote”. Her workflow is different but it gave me the Applescript snippet I needed for importing into Evernote:.
I also relied on this blog post from David Sparks, “Hazel 3.1 with Date Matching”, that explained to me how best to use the new content date matching function of Hazel to allow me to name and date the scans based on when they were actually sent, not when I got around to scanning them. And this one from David, “PDFpen OCR Folder Action Script” that is for an older version of PDFpen, but still does the job in getting the OCR started.
Finally, an adaption of this script on Stack Overflow let me move the files from one folder to another after they were OCR’d.
Here’s a few screenshots to illustrate what I’m doing.
I look forward to developing this process even more. I would like to figure out how to automate the changing of the creation date in Evernote. I want the creation date for these kinds of documents to be the date I put in the name from Hazel. Unfortunately, Hazel cannot yet pass the matched date to an Applescript although the developer says he’s working on it. Once that happens then it can be hands off end to end. I also wish the various scripts were a bit snappier. As it is now, from pressing the scan button to the note appearing in Everenote can be more than 60 seconds, which is a long time when you have a stack of paper to scan.
I’ll be sure to update here as necessary.
- Optical character recognition creates a layer within the PDF that makes the text selectable. Otherwise it’s just an image, like a photo you take of a road sign. ↩
- paperless: Photopal604 | Dreamstime.com | Copyright by owner. Used with permission.