python爬虫——requests
2021/4/20 1:25:32
本文主要是介绍python爬虫——requests,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
python爬虫requests上
3.用urlretrieve:
2.with open
import urllib.request import urllib.parse import requests url ='https://image.baidu.com/search/detail?ct=503316480&z=0&ipn=d&word=python%20%E5%9C%A8%E5%BA%93%E9%87%8C%E9%9D%A2%E5%AE%89%E8%A3%85json&step_word=&hs=0&pn=0&spn=0&di=3200&pi=0&rn=1&tn=baiduimagedetail&is=0%2C0&istype=0&ie=utf-8&oe=utf-8&in=&cl=2&lm=-1&st=undefined&cs=3292127761%2C2561460082&os=1102035953%2C4035766811&simid=0%2C0&adpicid=0&lpn=0&ln=1195&fr=&fmq=1618834820344_R&fm=&ic=undefined&s=undefined&hd=undefined&latest=undefined©right=undefined&se=&sme=&tab=0&width=undefined&height=undefined&face=undefined&ist=&jit=&cg=&bdtype=15&oriquery=&objurl=https%3A%2F%2Fgimg2.baidu.com%2Fimage_search%2Fsrc%3Dhttp%3A%2F%2Fimg-blog.csdnimg.cn%2F20200526170234488.png%3Fx-oss-process%3Dimage%2Fwatermark%2Ctype_ZmFuZ3poZW5naGVpdGk%2Cshadow_10%2Ctext_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3ZpY3RvcnkwOTQz%2Csize_16%2Ccolor_FFFFFF%2Ct_70%26refer%3Dhttp%3A%2F%2Fimg-blog.csdnimg.cn%26app%3D2002%26size%3Df9999%2C10000%26q%3Da80%26n%3D0%26g%3D0n%26fmt%3Djpeg%3Fsec%3D1621426988%26t%3D12927a841d449e0a8c00666fac19a8cb&fromurl=ippr_z2C%24qAzdH3FAzdH3Fooo_z%26e3Bvf1g_z%26e3BgjpAzdH3F2wpij6_dcAzdH3FMpTwM2afNDvyM3QpY4xeZoOaOaOOaOaO_z%26e3Bip4s&gsm=1&rpstart=0&rpnum=0&islist=&querylist=&force=undefined' res =requests.get(url) # fn =open('code.png','wb') # fn.write(res.content) # fn.close() with open('code2.jpg','wb') as f: f.write(res.content)
方法三:urlretrieve
1.urllib——基础图片的爬取
urlib.request的使用
字节流数据转换——字节→str类型——bytes
import urllib.request import urllib.parse import requests url = 'https://www.baidu.com/' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'} # 创建urllib.req请求对象 req = urllib.request.Request(url,headers=headers) # 获取响应对象 res =urllib.request.urlopen(req) html =res.read().decode('utf-8') print(html)
urllib.parse的使用——转化中文处理
urllib.parse.urlencode(字典)
# url =https://www.baidu.com/s?wd=%E8%BF%AA%E5%8D%A2%E6%9C%A8%E5%A4%9A # url =https://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&word=迪卢姆多 wd ={'wd':'迪卢姆多'} result =urllib.parse.urlencode(wd) print(result) base_url ='https://www.baidu.com/s?' url =base_url +result print(url)
urllib.parse.quote(str)
r ='迪卢姆多' result =urllib.parse.quote(r) print(r) base_url ='https://www.baidu.com/s?' url =base_url +result print(url)
拓展:网页百分号:unquote:
需求:输入搜索内容,输出并保存html
urlencode——字典;quote——字符串
# 需求:搜索查询内容,输出+保存本地html from urllib input request import urllib.parse base_url = 'https://www.baidu.com/s?' key =input('请输入内容:’) zidian ={'wb':key} key1 =urllib.parse.urlencode(zidian) url =base_url + key1 headers ={ 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36 } req =urllib.request.Request(url,headers =headers) res =urllib.requset.urlopen(req) html =res.read().decode('utf-8') with open ('搜索页面信息.html','w',encoding ='utf-8') as f: f.write(html)
quote:
# 需求:把只能单一搜索内容换成其他内容 key =input('请输入搜索内容:') wd ={'wd':key} result =urllib.parse.urlencode(wd) base_url ='https://www.baidu.com/s?' url =base_url +result print(url)
# 需求:把只能单一搜索内容换成其他内容 key =input('请输入搜索内容:') wd ={'wd':key} result =urllib.parse.urlencode(wd) base_url ='https://www.baidu.com/s?' url =base_url +result # print(url) headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'} req =urllib.request.Request(url,headers=headers) res =urllib.request.urlopen(req) html =res.read().decode('utf-8') with open('搜索.html','w',encoding='utf-8') as f: f.write(html)
百度贴吧的练习html
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'} name = input('请输入贴吧名:') begin = int(input('请输入起始页')) end = int(input('请输入结束页')) kw = {'kw': name} result = urllib.parse.urlencode(kw) # 拼接url for i in range(begin, end + 1): pn = (i - 1) * 50 print(pn) base_url = 'https://tieba.baidu.com/f?' url = base_url + result + '&pn=' + str(pn) req =urllib.request.Request(url,headers=headers) res =urllib.request.urlopen(req) html = res.read().decode('utf-8') # html = res.read().decode('utf-8') filename = '第' +str(i) +'页.html' with open(filename,'w',encoding='utf-8') as f: print('正在爬取%d页'% i) f.write(html)
这篇关于python爬虫——requests的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-05-08有遇到过吗?同样的规则 Excel 中 比Python 结果大
- 2024-03-30开始python成长之路
- 2024-03-29python optparse
- 2024-03-29python map 函数
- 2024-03-20invalid format specifier python
- 2024-03-18pool.map python
- 2024-03-18threads in python
- 2024-03-14python Ai 应用开发基础训练,字符串,字典,文件
- 2024-03-13id3 algorithm python
- 2024-03-13sum array elements python