Workaround for OCR on PDF with Renderable Text (with Bookmarks)
One of the frustrating things about working with Adobe Acrobat is trying to OCR a PDF with renderable text. If you try running OCR on the PDF, it will stop at each instance of renderable text on a page and not go any further. This is often a problem for me because the documents I deal with tend to be scanned documents, with page numbers inserted in the PDF as renderable text.
The usual workaround is to export all the pages of the PDF as TIFF images, then to re-create the PDF and then to run OCR.
However, today, I encountered this issue with the added headache of trying to keep the existing bookmarks that I had added to the original PDF. The workaround I found on the Adobe forums (here) works, and I am recorded this here so I do not have to scavenge for the same instructions every time!
- Export the PDF to TIFFs, and merge them into a new PDF.
- Save this new PDF as a separate document.
- Run OCR on this new PDF and save the new PDF again.
- Go back to the original PDF.
- Use “Replace Pages…” and select the new, OCR’d PDF.
- Specify the full range of page numbers (1 to the end).
- Replace the pages. The PDF should now have all the bookmarks and also have been OCR’d.