Python网络爬虫与数据采集数据存储.pptx
文本预览下载声明
数据存储
第3章
数据存储
Add Text
点击此处添加标题
Python地文件
字符串
Python与图片
CSV文件
使用数据库
其它类型地文档
目录
数据存储
open关键字
在open()地参数,第一个是文件路径,第二个则是模式字符(串)
file对象
序列化
pickle模块
1. Python地文件
import picklel1 = [1,3,5,7]with open(l1.pkl,wb) as f1: pickle.dump(l1,f1) # 序列化with open(l1.pkl,rb) as f2: l2 = pickle.load(f2) print(l2) # [1, 3, 5, 7]
数据存储
2. 字符串
s1 = mikes2 = miKEprint(s1.capitalize()) # Mikeprint(s2.capitalize()) # Mikes1 = aaabbprint(s1.count(a)) # 3print(s1.count(a,2,len(s1))) # 1print(s1.endswith(bb)) # Trueprint(s1.startswith(aa)) # Truecities_str = [Beijing,Shanghai,Nanjing,Shenzhen]print([cityname for cityname in cities_str if cityname.startswith((S,N))]) # 比较复杂地用法# [Shanghai, Nanjing, Shenzhen]print(s1.find(aa)) # 0print(s1.index(aa))# 0print(s1.find(c)) # -1# print(s1.index(c)) # Value Error
数据存储
2. 字符串
print(There are some cities: +, .join(cities_str))# There are some cities: Beijing, Shanghai, Nanjing, Shenzhenprint(s1.partition(b)) # (aaa, b, b)print(s1.replace(b,c,1)) # aaacbprint(s1.replace(b,c,2)) # aaaccprint(s1.replace(b,c)) # aaaccprint(s2.split(K)) # [mi, E]s3 = a abc c print(s3.strip()) # a abc cprint(s3.lstrip()) # a abc c print(s3.rstrip()) # a abc c# 最常见地format使用方法print({} is a {}.format(He,Boy)) # He is a Boy# 指明参数编号print({1} is a {0}.format(Boy,He)) # He is a Boy# 使用参数名print({who} is a {what}.format(who=He,what=boy)) # He is a boyprint(s2.lower()) # mikeprint(s2.upper()) # MIKE, 注意该方法与capitalize不同
数据存储
PIL(Python Image Library)
Pillow
3. Python与图片
数据存储
OpenCV (Open Source puter Vision Library)
cv2
使用包管理工具homebrew来安装
3. Python与图片
数据存储
使用csv库
writerow()方法与writerows()方法
在线读取csv
4. CSV文件
from urllib.request import urlopenfrom io import StringIOimport csvdata = urlopen(https://raw.githubusercontent./jasonong/List-of-US-States/master/states.csv).read().decode()dataFile = StringIO(data)dictReader = csv.DictReader(dataFile)print(dictReader.fieldnames)for row in dictReader: print(row)
数据存储
MySQL
使用PyMySQL
SQLite3
显示全部