You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
24fa651f0a | 1 year ago | |
---|---|---|
input | 1 year ago | |
.gitignore | 1 year ago | |
LICENSE | 2 years ago | |
README.md | 1 year ago | |
_requirements.txt | 2 years ago | |
ocr_config.ini | 1 year ago | |
ocr_scan.py | 1 year ago |
README.md
ocr_document_scanner
Python basiertes, optical character recognition Skript zur Auto-Verarbeitung von Dokumenten und Speicherung in SMB Ordner.
Install Guide:
-
Install Tesseract OCR je nach OS (https://tesseract-ocr.github.io/tessdoc/Installation.html)
-
Sprachpakete Installieren oder manuel in C:\Program Files\Tesseract-OCR\tessdata hineinkopieren (herunterladen @ https://github.com/tesseract-ocr/tessdata_fast)
-
Umgebungsvariable TESSDATA_PREFIX zum Sprachpaket Ordner setzen (z.b. C:\Program Files\Tesseract-OCR\tessdata)
-
Config anpassen
-
Zu scannende Bilder in Input Ordner Packen
-
Skript auführen
-
Output Ordner Inhalt extrahieren