Python scrapy.pipelines.images item_completed方法中[x for ok, x in results if ok]的意思

2022/4/6 17:20:17

编程Tag： item python PATH self OK results Pipelines completed

本文主要是介绍Python scrapy.pipelines.images item_completed方法中[x for ok, x in results if ok]的意思，对大家解决编程问题具有一定的参考价值，需要的程序猿们随着小编来一起学习吧！

今天在DEBUG的时候又出现了一个问题，用Scrapy下载图片，需要重写ImagesPipeline类的item_completed方法。
书上代码如下：

    def item_completed(self, results, item, info):
        image_paths = [x['path'] for ok, x in results if ok]
        if not image_paths:
            raise DropItem('Image Downloaded Failed')
        return item

大概的意思就是给图片的路径赋值，可我对[x['path'] for ok, x in results if ok]中的x和ok百思不得其解，path又是怎么来的？

Scrapy源码中，其父类的item_completed方法也有类似的推导式：

    def item_completed(self, results, item, info):
        if isinstance(item, dict) or self.images_result_field in item.fields:
            item[self.images_result_field] = [x for ok, x in results if ok]
        return item

x和ok是什么？？？

再往上推，在终极父类MediaPipeline中找到了一些思路，item_completed方法的代码如下所示：

    def item_completed(self, results, item, info):
        """Called per item when all media requests has been processed"""
        if self.LOG_FAILED_RESULTS:
            for ok, value in results:
                if not ok:
                    logger.error(
                        '%(class)s found errors processing %(item)s',
                        {'class': self.__class__.__name__, 'item': item},
                        exc_info=failure_to_exc_info(value),
                        extra={'spider': info.spider}
                    )
        return item

我是否也可以像上面那样，将列表推导式转换成普通的for循环？再单步调试一下不就知道x和ok的值了吗！
于是，将我的代码修改，如下所示：

    def item_completed(self, results, item, info):
        # image_paths = [x['path'] for ok, x in results if ok]
        image_paths = []
        for ok, x in results:
            if ok:
                image_paths.append(x)
        if not image_paths:
            raise DropItem('Image Downloaded Failed')
        return item

关于Scrapy项目的单步调试：
在scrapy.cfg的同级目录下新建一个py文件，其内容为

from scrapy.cmdline import execute
import os
import sys

sys.path.append(os.path.dirname(os.path.abspath(__file__)))
# 执行 scrapy 内置的函数方法execute，  使用 crawl 爬取并调试，最后一个参数jobbole 是爬虫文件名
execute(['scrapy', 'crawl', 'images'])

在for循环处加上断点，调试结果如下所示：
在这里插入图片描述
可以很清晰的看出，results为一个列表，其元素是一个元组，元组的第一个元素为bool值即ok，用来判断下载成功或失败。第二个元素为一个字典即x，是该Item对应的下载结果，字典中分别有url、path、checksum三个键值对。

这篇关于Python scrapy.pipelines.images item_completed方法中[x for ok, x in results if ok]的意思的文章就介绍到这儿，希望我们推荐的文章对大家有所帮助，也希望大家多多支持为之网！

Python scrapy.pipelines.images item_completed方法中[x for ok, x in results if ok]的意思

相关编程文章