arXiv Table Extractor

David Gordon Ramsay and Rebeca Pop

David Gordon Ramsay and Rebeca Pop

Bachelor thesis

Permanent lenke

https://hdl.handle.net/11250/2774415

Utgivelsesdato

2021

Metadata

Vis full innførsel

Samlinger

Studentoppgaver (TN-IDE) [866]

Beskrivelse

Full text not available

Sammendrag

Tables are common and important in scientific publications. They serve as the main elements for presenting findings in a structured way. This project concerns the extraction of tables from scientific papers that have been published on arxiv.org. ArXiv is an open archive for scholarly articles, where articles are published not only in PDF format, but the respective LaTeX sources are also made available for most.

The specific project objectives are:

(i) Developing a method for identifying and extracting tables from a La- TeX document; (ii) Enriching the extracted table data with metadata from the article; (iii) Creating a large-scale table corpus that can be dis- tributed; (iv) Setting up batch processes to continuously update the table corpus

Utgiver

uis