cvpr18-learning to evaluate image学习以评估图像字幕.pdf
LearningtoEvaluateImageCaptioning
YinCui1,2GuandaoYang1AndreasVeit1,2XunHuang1,2SergeBelongie1,2
1DepartmentofComputerScience,CornellUniversity2CornellTech
Labeledtrainingexamples
Evaluationmetricsforimagecaptioningfacetwochal-ayellowbirdsittingonaskateboardonablueblanket
lenges.Firstly,commonlyusedmetricssuchasCIDEr,ME-
TEOR,ROUGEandBLEUoftendonotcorrelatewellwithacloseupofaholdingabanana
humanjudgments.Secondly,eachmetrichaswellknown
blindspotstopathologicalcaptionconstructions,andrule-Learnedcritique
basedmetricslackprovisionstorepairsuchblindspots
onceidentified.Forexample,thenewlyproposedSPICE
correlateswellwithhumanjudgments,butfailstocaptureCNNLSTM
thesyntacticstructureofasentence.Toaddressthesetwo
challenges,weproposeanovellearningbaseddiscrimina-ImagerepresentationCaptionrepresentationBinaryclassification
tiveevaluationmetricthatisdirectlytrainedtodistinguish
CaptionEvaluation
betweenhumanandmachine-generatedcaptions.Inaddi-CaptionScore
tion,wefurtherproposeadataaugmentationschemetoex-acatiswatchingatelevisiononatelevision0.1
plicitlyincorporatepathologicaltransformationsasnega-acatissittingontopofa