ReSearch:通过强化学习让大型语言模型(LLMs)2503.19470v2-translate-chinese-5c13.pdf
ReSearchLLMs
11111
MingyangChen,TianpengLi,HaozeSun,YijieZhou,ChenzhengZhu,
2344
HaofenWang,JeffZ.Pan,WenZhang,HuajunChen,
1∗11
FanYang,ZenanZhou,WeipengChen
1BaichuanInc.2TongjiUniversity3TheUniversityofEdinburgh4ZhejiangUniversity
{chenmingyang,yangfan}@
/Agent-RL/ReSearch
Abstract
LLMsOpenAI-
o1DeepSeek-R1
ReSearchLLMsRe
Search
Qwen2.5-7B(-Instruct)Qwen2.5-
32B(-Instruct)ReSearch
ReSearch
1Introduction
LLMs[1,4,9,27]
LLMs
[3,10,14,17]
RAG[2,5,26,30]
RAGRAG
[15,21,23]
LLMs[25,
29]OpenAI-o1[12]DeepSeek-R1[4]
LLMs[11,19]
RAG
LLMs
RAG
∗Correspondingauthor
63.6
ReSearch-Qwen-32B-InstructIRCoT
60.3ReSearch-Qwen-32BNaiveRAG
60
Iter-RetGenNaiveGeneration
54.254.454.4
52.252.1
49.650.1
50
)
%
(
e40
g36.8
d35.2
u33.4
J-31.932.232.0
a30.630.6
s-30