Python basiertes, optical character recognition Skript zur Auto-Verarbeitung von Dokumenten und Speicherung in SMB Ordner.

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Go to file

dev_alex 24fa651f0a Edit Readme		1 year ago
input	Add sample bild	1 year ago
.gitignore	Add ignore output	1 year ago
LICENSE	Initial commit	2 years ago
README.md	Edit Readme	1 year ago
_requirements.txt	Add initial docs	2 years ago
ocr_config.ini	Fixed paths für saubere path joins	1 year ago
ocr_scan.py	Add non-english scanning	1 year ago

ocr_document_scanner

Python basiertes, optical character recognition Skript zur Auto-Verarbeitung von Dokumenten und Speicherung in SMB Ordner.

Install Guide:

Install Tesseract OCR je nach OS (https://tesseract-ocr.github.io/tessdoc/Installation.html)
Sprachpakete Installieren oder manuel in C:\Program Files\Tesseract-OCR\tessdata hineinkopieren (herunterladen @ https://github.com/tesseract-ocr/tessdata_fast)
Umgebungsvariable TESSDATA_PREFIX zum Sprachpaket Ordner setzen (z.b. C:\Program Files\Tesseract-OCR\tessdata)
Config anpassen
Zu scannende Bilder in Input Ordner Packen
Skript auführen
Output Ordner Inhalt extrahieren