Interface Development for Digitization of Documents Using OCR
Bachelor thesis
Permanent lenke
https://hdl.handle.net/11250/3075625Utgivelsesdato
2023Metadata
Vis full innførselSamlinger
- Studentoppgaver (TN-IDE) [823]
Sammendrag
The purpose of this thesis is to develop a semi-automated interface that uses Optical Character Recognition (OCR) routines to identify text-based information from a large volume of digitized drawings associated with the oil and gas industry. The identified information is presented in an appropriate interface for any necessary manual modifica- tion, with the target of improving the efficiency of maintaining large amounts of older documents. The thesis outlines the design of the interface and the implementation of Tesseract OCR engine, in combination with tailor-made functions and classes that lever- age OpenCV to enhance the recognition process. The purpose of this thesis is to develop a semi-automated interface that uses Optical Character Recognition (OCR) routines to identify text-based information from a large volume of digitized drawings associated with the oil and gas industry. The identified information is presented in an appropriate interface for any necessary manual modifica- tion, with the target of improving the efficiency of maintaining large amounts of older documents. The thesis outlines the design of the interface and the implementation of Tesseract OCR engine, in combination with tailor-made functions and classes that lever- age OpenCV to enhance the recognition process.