I have started reading Paperless by David Sparks and one of the key ingredients to going paperless is being able to search your documents not only by the title of the document, but also by what is inside the document as well.
In order to reach this nirvana of document searchability, you have to figure out how work around the problem of PDF’s. While the PDF seems to be the universally accepted format of devices these days, it’s not exactly search friendly when it comes OS X’s Spotlight or my favorite search tool Alfred.
However, you can make this happen through the magic of OCR (Optical Character Recognition). With OCR, you can turn any pdf into a searchable document. This is actually quite easy if you have copy of Adobe Acrobat Standard or PDFpen. Both of which contain tools that will convert your PDF into readable text.
This all sounds great, however if you are like me, you don’t want to have think about this. You are constantly getting emails with PDF attachments that have been downloaded, but you don’t want to spend the time converting them to be OCR friendly. You would prefer it would happen in the background while you are working so that you are not having to remember to open up your PDF application of choice and convert the pdf.
So I have dug around and found a way to make this happen. Using Applescript and Folder Actions you can automate the whole process to happen in the background so you don’t have to think about it. (Notice that one of the keys here to making this work is designating a folder to where all of your email and web downloads will go. This is important as we will be applying Folder Actions to this folder.)
So if you are brave, here is how the process breaks down:
- You will need a copy of Adobe Acrobat Standard or Pro. You can also use PDFpen or if like as well.
- Go to Take Control Books and download these Applescripts. Follow the directions listed in readme.txt file as to where to place the Applescripts.
- Create a folder where you want the PDF’s to reside whenever they are downloaded from your email or the web. You can leave the default as the Downloads folder as you like, however that tends to become junk drawer for me so I have created a folder called Action Items and have made that my default downloads folder.
- Select the folder you have just created (or the Downloads folder) and then go to Finder > Services > Folder Action Setup. Once selected, a Folder Action Set Up dialog box will appear asking you which script to attach to the folder. Select the script entitled “OCR This (Acrobat)”. Then check the box labled “Enable Folder Actions”. (This is assuming you have Adobe Acrobat installed, if you are using PDFpen select the script for that program instead.)
- Now test the folder action by placing a PDF in the Downloads folder (or Action Items). At this point Adobe Acrobat should start up and ask you to name the new file that will be OCR ready for you to search on. (You might be asked the first times to run the script to verify that you want perform the recognize text function on the file. It happened to me twice, and then thereafter quit popping up and started working smoothly.)
Now if you follow these steps you should be able search for text inside your PDF’s using spotlight or Alfred (for Alfred just simply type “in” then spacebar then text you want to search). Hope this works for you, contact me if you questions!