Either automatically from the scanned document to remove the information?
There is a task scanning a large volume of documents. In these documents there is information that you want to automatically remove. Key words. For example, a specific item in the specifications. To scan a document (pdf, jpg, etc.) has these words had
Is there such a software?
Resposne tesseract'ω HOCR, find in it the right words and their coordinates. Imagemagick'om painted the words on the scan coordinates.
Recognize FineReader'Ohm, export to djvu extracted from the djvu text layer with coordinates and parsim it. Then the same thing with Imagemagick.
You can automate this with scripts.