Interface Development for Digitization of Documents Using OCR
Bachelor thesis
Permanent lenke
https://hdl.handle.net/11250/3075673Utgivelsesdato
2023Metadata
Vis full innførselSamlinger
- Studentoppgaver (TN-IDE) [823]
Sammendrag
The purpose of this thesis is to develop a semi-automated interface that uses OpticalCharacter Recognition (OCR) routines to identify text-based information from a largevolume of digitized drawings associated with the oil and gas industry. The identifiedinformation is presented in an appropriate interface for any necessary manual modification, with the target of improving the efficiency of maintaining large amounts of olderdocuments. The thesis outlines the design of the interface and the implementation ofTesseract OCR engine, in combination with tailor-made functions and classes that leverage OpenCV to enhance the recognition process The purpose of this thesis is to develop a semi-automated interface that uses OpticalCharacter Recognition (OCR) routines to identify text-based information from a largevolume of digitized drawings associated with the oil and gas industry. The identifiedinformation is presented in an appropriate interface for any necessary manual modification, with the target of improving the efficiency of maintaining large amounts of olderdocuments. The thesis outlines the design of the interface and the implementation ofTesseract OCR engine, in combination with tailor-made functions and classes that leverage OpenCV to enhance the recognition process