Claim detection data annotation tool

2022

Full text not available

Automatic fact-checking relies on claim detection systems to find claims and estimate

their check-worthiness. To improve current claim detection systems, we need high-quality

labeled data sets. More specifically, a data set based on claims from general news articles.

To our knowledge, no such dataset exists currently. We explore an approach for collecting

data for such a set by creating an annotation tool and distributing the work using

crowdsourcing platforms. We show that such platforms can be viable, even with complex

annotation tasks. We can train participants and test the submitted data quality by

developing the right tools and systems. We show that a structured approach to claim

definitions using a claim taxonomy can be beneficial when creating a labeling schema.

Furthermore, we implement and test a rules-based claim detection system using natural

language processing libraries, intending to integrate it into the data collection process.

uis