文档详情

DeepSeek模型关键创新技术综述(EN).pdf

发布:2025-04-16约5.55万字共11页下载文档
文本预览下载声明

AReviewofDeepSeekModels’KeyInnovativeTechniques

ChengenWangMuratKantarcioglu

UniversityofTexasatDallasVirginiaTech

chengen.wang@muratk@

Abstract

5

2DeepSeek-V3andDeepSeek-R1areleadingopen-sourceLargeLanguageModels(LLMs)for

0general-purposetasksandreasoning,achievingperformancecomparabletostate-of-the-artclosed-

2sourcemodelsfromcompanieslikeOpenAIandAnthropic—whilerequiringonlyafractionoftheir

rtrainingcosts.UnderstandingthekeyinnovativetechniquesbehindDeepSeek’ssuccessiscrucial

aforadvancingLLMresearch.Inthispaper,wereviewthecoretechniquesdrivingtheremarkable

Meffectivenessandefficiencyofthesemodels,includingrefinementstothetransformerarchitecture,

innovationssuchasMulti-HeadLatentAttentionandMixtureofExperts,Multi-TokenPrediction,

4theco-designofalgorithms,frameworks,andhardware,theGroupRelativePolicyOptimization

1algorithm,post-trainingwithpurereinforcementlearninganditerativetrainingalternatingbe-

tweensupervisedfine-tuningandreinforcementlearning.Additionally,weidentifyseveralopen

]

Gquestionsandhighlightpotentialresearchopportunitiesinthisrapidlyadvancingfield.

L

.Keywords:DeepSeek,Multi-HeadLatentAttention,MixtureofExperts,GroupRelativePolicy

s

cOptimization(GRPO)

[

11Introduction

v

6

显示全部
相似文档