文档详情

Hive高级编程-weibo-大数据文档资料.docx

发布:2025-02-20约1.1万字共71页下载文档
文本预览下载声明

Hive高级编程

天照

Agenda

Agenda

?HiveComponents

?MapReduce

?HiveQL

?Hive优化

?SQL优化

HIVE:Components

HIVE:Components

HDFSMapReduce

HDFS

Web

WebUI

Hive

HiveCLI

BrowsingDDL

Queries

ThriftAPIMetaStore

ThriftAPI

MetaStore

Parser

Planner

Optimizer

Execution

DBSerDeThrift

DB

SerDe

ThriftCSVJSON..

Facebook

(Simplified)Map

(Simplified)MapReduceReview

nk1,nv1nk2,nv2nk3,nv3

nk1,nv1nk2,nv2nk3,nv3

nk1,nv1nk3,nv3nk1,nv6

Machine1

k1,v1

k1,v1k2,v2k3,v3

nk1,nv1nk1,nv6nk3,nv3

nk1,2nk3,1

LocalMap

GlobalShuffle

LocalSort

Local

Reduce

nk2,nv4nk2,nv5nk2,nv2nk2,nv4nk2,

nk2,nv4nk2,nv5nk2,nv2

nk2,nv4nk2,nv5nk2,nv2

nk2,3

nk2,nv4nk2,nv5nk1,nv6

Machine2

k4,v4

k4,v4k5,v5k6,v6

Hive

HiveQL-Join

pv_usersuserpageidage1

pv_users

user

pageid

age

1

25

2

25

1

32

userid

age

gender

111

25

female

222

32

male

X

=

pageid

userid

time

1

111

9:08:01

2

111

9:08:13

1

222

9:08:14

?SQL:

INSERTINTOTABLEpv_usersSELECTpv.pageid,u.age

FROMpage_viewpvJOINuseruON(pv.userid=u.userid);

HiveQL-Joinin

HiveQL-JoininMapReduce

page_view

pageid

userid

time

1

111

9:08:01

2

111

9:08:13

1

222

9:08:14

user

useri

userid

age

gender

111

25

female

222

32

male

Map

keyvalue

key

value

111

1,1

111

1,2

222

1,1

keyvalue

1111,1

1111,2

1112,25

Reduce

Shuffle

ShuffleSort

keyvalue111

key

value

111

2,25

222

2,32

key

value

222

1,1

222

2,32

Hive

HiveQL-GroupBy

pageid_age_

pageid_age_sum

pageid

age

Count

1

25

1

2

25

2

1

32

1

pageid

pageid

age

1

25

2

25

1

32

2

25

?

.INSERTINTOTABLEpageid_age_sum.SELECTpageid,age,count(1)

.FROMpv_users

-GROUPBYpageid,age;

HiveQL-Group

HiveQL-GroupByinMapReduce

pv_users

pagei

pageid

age

1

25

2

25

pagei

pageid

age

1

32

2

25

Map

key

key

value

1,32

1

2,25

1

p

p

keyvalue1,2

key

value

1,25

1

2,25

1

key

value

1,25

1

1,32

1

p

Shuffle

ShuffleSort

Reduce

Reduce

keyvalue2,251

key

value

2,25

1

2,25

1

p

HiveQL-Group

显示全部
相似文档