百度网站客服电话人工服务_西安新冠疫情最新公布_seo关键词快速排名介绍_今日最新新闻

时间:2025/7/9 17:03:56来源：https://blog.csdn.net/m0_37134868/article/details/143441191 浏览次数:0次

爬取名人名言：http://quotes.toscrape.com/

1 创建爬虫项目，在终端中输入：

scrapy startproject quotes

在这里插入图片描述

2 创建之后，在spiders文件夹下面创建爬虫文件quotes.py，内容如下：

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractorclass Quotes(CrawlSpider):name = "quotes"allowed_domains = ["quotes.toscrape.com"]start_urls = ['http://quotes.toscrape.com/']rules = (Rule(LinkExtractor(allow='/page/\d+'), callback='parse_quotes', follow=True),Rule(LinkExtractor(allow='/author/\w+'), callback='parse_author'))def parse_quotes(self, response):for quote in response.css('quote'):yield {'content': quote.css('.text::text').extract_first(),'author': quote.css('.author::text').extract_first(),'tags': quote.css('.tag::text').extract_first()}def parse_author(selfself, response):name = response.css('.author-title::text').extract_first()author_born_date = response.css('.author-born-date::text').extract_first()author_born_location = response.css('.author-born-location::text').extract_first()author_description = response.css('.author-description::text').extract_first()return ({'name': name,'author_born_date': author_born_date,'author_born_location': author_born_location,'author_description': author_description})

目录结构如下：
在这里插入图片描述

3 运行爬虫

在终端中执行scrapy crawl quotes，结果如图所示：
在这里插入图片描述
到此，一个简单的爬虫就完成了。

关键字：百度网站客服电话人工服务_西安新冠疫情最新公布_seo关键词快速排名介绍_今日最新新闻

本网仅为发布的内容提供存储空间，不对发表、转载的内容提供任何形式的保证。凡本网注明“来源：XXX网络”的作品，均转载自其它媒体，著作权归作者所有，商业转载请联系作者获得授权，非商业转载请注明出处。

我们尊重并感谢每一位作者，均已注明文章来源和作者。如因作品内容、版权或其它问题，请及时与我们联系，联系邮箱：809451989@qq.com，投稿邮箱：809451989@qq.com

责任编辑：