Scrapy框架初探( 二 )

  • meta The initial values for the Request.meta attribute. If given, the dict passed in this parameter will be shallow copied.该参数也是字典形式 , 是为了在spider类的多个parse函数之间传递信息 , 见 知乎  。 注意Response对象也有一个它对应的Request对象 :The Request object that generated this response. This attribute is assigned in the Scrapy engine, after the response and the request have passed through all Downloader Middlewares In particular, this means that:HTTP redirections will cause the original request (to the URL before redirection) to be assigned to the redirected response (with the final URL after redirection).Response.request.url doesn’t always equal Response.urlThis attribute is only available in the spider code, and in the Spider Middlewares, but not in Downloader Middlewares (although you have the Request available there by other means) and handlers of the response_downloaded signal.But Unlike the Response.request attribute, the Response.meta attribute is propagated along redirects and retries, so you will get the original Request.meta sent from your spider.
  • Response obejct这里仅介绍一些reponse对象的属性:
    • url 即该response的来源url
    • status 即该response的状态码
    • headers response的响应头 , 形式为dict
    • body response的相应数据体 , 形式为bytes
    • request response对应的Request对象 , 对于它上文已经介绍 , 即Response.url可能不等于Reponse.request.url , 因为redirection的原因
    SettingsSettings can be populated using different mechanisms, each of which having a different precedence. Here is the list of them in decreasing order of precedence:
    1. Command line options (most precedence)
    2. Settings per-spider
    3. Project settings module(settings.py)
    4. Default settings per-command
    5. Default global settings (less precedence)
    一般我们直接在settings.py文件中对其进行修改 , 常见需要增改的有:user-agent指定 , ITEM_PIPELINES解除注释以开启pipeline功能 , LOG_LEVEL和LOG_FILE指定 , ROBOTSTXT_OBEY设为False等等 。

    【Scrapy框架初探】


    推荐阅读