文档详情

ReSearch:通过强化学习让大型语言模型(LLMs)2503.19470v2-translate-chinese-5c13.pdf

发布:2025-04-04约3.25万字共13页下载文档
文本预览下载声明

ReSearchLLMs

11111

MingyangChen,TianpengLi,HaozeSun,YijieZhou,ChenzhengZhu,

2344

HaofenWang,JeffZ.Pan,WenZhang,HuajunChen,

1∗11

FanYang,ZenanZhou,WeipengChen

1BaichuanInc.2TongjiUniversity3TheUniversityofEdinburgh4ZhejiangUniversity

{chenmingyang,yangfan}@

/Agent-RL/ReSearch

Abstract

LLMsOpenAI-

o1DeepSeek-R1

ReSearchLLMsRe

Search

Qwen2.5-7B(-Instruct)Qwen2.5-

32B(-Instruct)ReSearch

ReSearch

1Introduction

LLMs[1,4,9,27]

LLMs

[3,10,14,17]

RAG[2,5,26,30]

RAGRAG

[15,21,23]

LLMs[25,

29]OpenAI-o1[12]DeepSeek-R1[4]

LLMs[11,19]

RAG

LLMs

RAG

∗Correspondingauthor

63.6

ReSearch-Qwen-32B-InstructIRCoT

60.3ReSearch-Qwen-32BNaiveRAG

60

Iter-RetGenNaiveGeneration

54.254.454.4

52.252.1

49.650.1

50

)

%

(

e40

g36.8

d35.2

u33.4

J-31.932.232.0

a30.630.6

s-30

显示全部
相似文档