arXiv Table Extractor

dc.contributor.advisor	Krisztian Balog
dc.contributor.author	David Gordon Ramsay and Rebeca Pop
dc.date.accessioned	2021-09-07T16:30:12Z
dc.date.available	2021-09-07T16:30:12Z
dc.date.issued	2021
dc.identifier	no.uis:inspera:78872743:36748811
dc.identifier.uri	https://hdl.handle.net/11250/2774415
dc.description	Full text not available
dc.description.abstract
dc.description.abstract	Tables are common and important in scientific publications. They serve as the main elements for presenting findings in a structured way. This project concerns the extraction of tables from scientific papers that have been published on arxiv.org. ArXiv is an open archive for scholarly articles, where articles are published not only in PDF format, but the respective LaTeX sources are also made available for most. The specific project objectives are: (i) Developing a method for identifying and extracting tables from a La- TeX document; (ii) Enriching the extracted table data with metadata from the article; (iii) Creating a large-scale table corpus that can be dis- tributed; (iv) Setting up batch processes to continuously update the table corpus
dc.language	eng
dc.publisher	uis
dc.title	arXiv Table Extractor
dc.type	Bachelor thesis

Files in this item

Files	Size	Format	View

Studentoppgaver (TN-IDE) [823]
Studentoppgaver i informasjonsteknologi, datateknikk / kybernetikk, signalbehandling