Python爬虫-文件存储txt、json、csv(五)
2022/9/4 14:24:24
本文主要是介绍Python爬虫-文件存储txt、json、csv(五),对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
一、TXT文件存储
将数据保存到 TX 文本的操作非常简单, 而且 TXT 文本几乎兼容任何平台,但是这有个缺点,那就是不利于检索 所以如果对检索和数据结构要求不高,追求方便第一的话,可以采用 TXT 文本存储 本节中,我们就来看下如何利用 Python 保存 TXT 文本文件 代码示例:import csv import requests from pyquery import PyQuery as pq url='https://www.zhihu.com/explore' headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36'} html=requests.get(url,headers=headers).text #pyquery写法01 doc=pq(html) items=doc('.ExploreCollectionCard-contentItem').items() def save_json(): for item in items: url = item.find('.ExploreCollectionCard-contentTitle').attr('href') # print(url) contentExcerpt = item.find('.ExploreCollectionCard-contentExcerpt').text() # print(contentExcerpt) span_txt = item.find('.ExploreCollectionCard-contentTags').find('span').filter( '.ExploreCollectionCard-contentCountTag').text() data =[url,contentExcerpt,span_txt] with open('data.csv','a',encoding='utf-8',newline='') as file: writer=csv.writer(file) writer.writerow(data) if __name__ == '__main__': save_json()
二、json文件存储
JSON ,全称为 JavaScript ect Notation 也就 JavaScript 象标记 它通过对象和数组的组合来表示数据,构造简洁但是结构化程度非常高,是一种轻量级的数据交换格式 本节中,我们就来了解如何利用 ython 保存数据到 JSON 文件 代码示例:import requests from pyquery import PyQuery as pq import json url='https://www.zhihu.com/explore' headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36'} html=requests.get(url,headers=headers).text #pyquery写法01 doc=pq(html) items=doc('.ExploreCollectionCard-contentItem').items() objs = [] def save_json(): with open('data.json','a',encoding='utf-8') as file: for item in items: url = item.find('.ExploreCollectionCard-contentTitle').attr('href') # print(url) contentExcerpt = item.find('.ExploreCollectionCard-contentExcerpt').text() # print(contentExcerpt) span_txt = item.find('.ExploreCollectionCard-contentTags').find('span').filter( '.ExploreCollectionCard-contentCountTag').text() # print(span_txt) data = { "url": url, "contentExcerpt": contentExcerpt, "span_txt": span_txt } # print(data) # 将提取的内容写入json格式的文件 # file.write(json.dumps(data,ensure_ascii=False)+'\n') objs.append(data) print(objs) file.write(json.dumps(objs,ensure_ascii=False,indent=2)) if __name__ == '__main__': save_json()
三、json文件存储
csv ,全称为 Comma-Separa ed Values ,中文可以叫作逗号分隔值或字符分隔值,其文件以纯文本形式存储表格数据代码示例:
import csv import requests from pyquery import PyQuery as pq url='https://www.zhihu.com/explore' headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36'} html=requests.get(url,headers=headers).text #pyquery写法01 doc=pq(html) items=doc('.ExploreCollectionCard-contentItem').items() def save_json(): for item in items: url = item.find('.ExploreCollectionCard-contentTitle').attr('href') # print(url) contentExcerpt = item.find('.ExploreCollectionCard-contentExcerpt').text() # print(contentExcerpt) span_txt = item.find('.ExploreCollectionCard-contentTags').find('span').filter( '.ExploreCollectionCard-contentCountTag').text() data =[url,contentExcerpt,span_txt] with open('data.csv','a',encoding='utf-8',newline='') as file: writer=csv.writer(file) writer.writerow(data) if __name__ == '__main__': save_json()
这篇关于Python爬虫-文件存储txt、json、csv(五)的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-03-30[OIDC in Action] 2. 基于OIDC(OpenID Connect)的SSO(纯JS客户端)
- 2024-03-29terraform jsonencode
- 2024-03-13vuex-persist
- 2024-03-11icons for vue
- 2024-03-07breadcrumbs react js
- 2024-03-06react login page example
- 2024-03-06react router uselocation
- 2024-03-04postgres jsonb_set
- 2024-03-01react native uuid
- 2024-02-29vuejs sidebar