Improving Evaluation Metrics for Vision-and-Language Models
Just Added

Improving Evaluation Metrics for Vision-and-Language Models

Priberam Machine Learning Lunch Seminar

By Priberam Labs

Date and time

Tue, 22 Apr 2025 13:00 - 14:00 WEST

Location

Instituto Superior Técnico, Anfiteatro PA2

1 Avenida Rovisco Pais 1049-001 Lisboa Portugal

About this event

  • Event lasts 1 hour

Abstract:

Evaluating image captions is essential for ensuring both linguistic fluency and accurate semantic alignment with visual content. While reference-free metrics such as CLIPScore have advanced automated caption evaluation, most existing work on learned evaluation metrics remains limited to pointwise English-centric assessments, with significant gaps in terms of reliability, interpretability, and multilingual inclusivity of vision-and-language evaluation metrics. In this seminar session I will explore extensions of current English-centric benchmarks to a multilingual scenario promoting the development of more inclusive frameworks. Additionally I will present two extensions from CLIPScore metric aiming to improve its interpretability and reliability in real world applications. Leveraging a model-agnostic conformal risk control framework, I will explore the calibration of CLIPScore distributions values for task-specific control variables tackling both granular assessment for individual word errors within captions, and the calibration of these raw distribution scores producing a more reliable interval for captioning evaluation by improving the correlation between uncertainty estimations and prediction errors.


Bio:

Gonçalo Gomes received a MSc degree in Data Science and Engineering, from Instituto Superior Técnico, Universidade de Lisboa. He is currently a second-year PhD student at the same institution and a junior researcher at the Human Language Technologies Lab of INESC-ID and also at SARDINE Labs of Instituto de Telecomunicações (IT). His research interests focus on developing more informative and trustworthy evaluation frameworks for vision-and-language applications, particularly envisioning a more inclusive AI frameworks for non-English environments.

www.priberam.com

Tickets

Organised by