文档详情

cvpr18-learning to evaluate image学习以评估图像字幕.pdf

发布：2025-05-12约7.26万字共19页下载文档

文本预览下载声明

LearningtoEvaluateImageCaptioning

YinCui1,2GuandaoYang1AndreasVeit1,2XunHuang1,2SergeBelongie1,2

1DepartmentofComputerScience,CornellUniversity2CornellTech

Labeledtrainingexamples

Evaluationmetricsforimagecaptioningfacetwochal-ayellowbirdsittingonaskateboardonablueblanket

lenges.Firstly,commonlyusedmetricssuchasCIDEr,ME-

TEOR,ROUGEandBLEUoftendonotcorrelatewellwithacloseupofaholdingabanana

humanjudgments.Secondly,eachmetrichaswellknown

blindspotstopathologicalcaptionconstructions,andrule-Learnedcritique

basedmetricslackprovisionstorepairsuchblindspots

onceidentiﬁed.Forexample,thenewlyproposedSPICE

correlateswellwithhumanjudgments,butfailstocaptureCNNLSTM

thesyntacticstructureofasentence.Toaddressthesetwo

challenges,weproposeanovellearningbaseddiscrimina-ImagerepresentationCaptionrepresentationBinaryclassification

tiveevaluationmetricthatisdirectlytrainedtodistinguish

CaptionEvaluation

betweenhumanandmachine-generatedcaptions.Inaddi-CaptionScore

tion,wefurtherproposeadataaugmentationschemetoex-acatiswatchingatelevisiononatelevision0.1

plicitlyincorporatepathologicaltransformationsasnega-acatissittingontopofa

显示全部

相似文档