Extended overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance
Alkhalifa, Rabab; Borkakoty, Hsuvas; Deveaud, Romain; El-Ebshihy, Alaa; Espinosa-Anke, Luis; Fink, Tobias; Galuscakova, Petra; Gonzalez-Saez, Gabriela; Goeuriot, Lorraine; Iommi, David; Liakata, Maria; Tayyar Madabushi, Harish; Medina-Alias, Pablo; Mulhem, Philippe; Piroi, Florina; Popel, Martin; Zubiaga, Arkaitz
Conference object
Published version

View/ Open
Date
2024Metadata
Show full item recordCollections
- Publikasjoner fra CRIStin [5012]
- Studentoppgaver (TN-IDE) [937]
Original version
Alkhalifa, R., Borkakoty, H., Deveaud, R., El-Ebshihy, A., Espinosa-Anke, L., Fink, T., ... & Zubiaga, A. (2024, March). Longeval: longitudinal evaluation of model performance at CLEF 2024. In European Conference on Information Retrieval (pp. 60-66). Cham: Springer Nature Switzerland. 10.1007/978-3-031-71908-0_10Abstract
We describe the second edition of the LongEval CLEF 2024 shared task. This lab evaluates the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. Task 1 requires IR systems to run on corpora acquired at several timestamps, and evaluates the drop in system quality (NDCG) along these timestamps. Task 2 tackles binary sentiment classification at different points in time, and evaluates the performance drop for different temporal gaps. Overall, 37 teams registered for Task 1 and 25 for Task 2. Ultimately, 14 and 4 teams participated in Task 1 and Task 2, respectively.