Nodejs 爬虫案例

时间:2025/9/9 21:03:58来源：https://blog.csdn.net/weixin_47818125/article/details/139228324 浏览次数:0次

1.安装：

npm install cheerio
npm install axios

2.介绍：

2.1 cheerio 特点和用途描述：

HTML解析和操作：Cheerio 可以将 HTML 字符串加载到内存中，并将其转换为一个可操作的 DOM 树结构，从而可以方便地对 HTML 文档进行解析和操作。
类似于 jQuery 的API：Cheerio 提供了类似于 jQuery 的选择器和操作方法，使用户可以使用 CSS 选择器、DOM 操作等方法来操纵 HTML 文档，例如查找元素、修改属性、添加样式等。
轻量级：相比于浏览器端的 jQuery，Cheerio 是一个轻量级的库，适用于服务器端的 Node.js 环境，可以高效地进行 HTML 解析和操作，而无需运行整个浏览器引擎。
方便的数据提取：通过 Cheerio，用户可以方便地从 HTML 文档中提取所需的数据，例如爬取网页内容、解析HTML 结构等，常用于网络爬虫、数据抓取等任务。
模块化：Cheerio 可以与其他 Node.js 模块和工具结合使用，例如请求库（如 Axios、request）、文件系统操作等，从而实现更复杂的任务和功能。

2.2 使用axios进行网络请求
2.3 fs进行文件操作：将请求的数据，写入到指定的文件夹中

涉及到的知识点：

response.data.pipe(); 返回的是文件流的操作
fs.createWriteStream() 写入文件流的操作

3.示例：

        const cheerio = require('cheerio');const axios = require('axios');const fs = require('fs');const path = require('path');const headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36'};const downloadImage = async (url, filePath) => {const response = await axios({url: url,method: 'GET',responseType: 'stream'});response.data.pipe(fs.createWriteStream(filePath));return new Promise((resolve, reject) => {response.data.on('end', () => {resolve();});response.data.on('error', (err) => {reject(err);});});};const crawler = async (options) => {for (let i = 1; i <= options.page; i++) {const url = i === 1 ? options.url : `${options.url}list_${i}.html`;console.log(url);try {const response = await axios.get(url, {headers: headers});const $ = cheerio.load(response.data);const imageElements = $('.pics img');imageElements.each((index, element) => {const imageUrl = $(element).attr('src');if (imageUrl) {const imageName = `${i}-${index}.jpg`;const imagePath = path.join(__dirname, 'img', imageName);downloadImage(imageUrl, imagePath).then(() => {console.log(`${i} ---- ${index}`, imageUrl, 'Downloaded successfully.');}).catch((error) => {console.error(`${i} ---- ${index}`, imageUrl, 'Download failed. Error:', error);});}});} catch (err) {console.error('Error fetching or parsing the page:', err);}}};crawler({url: 'http://www.duoziwang.com/head/gexing/',page: 10});

关键字：Nodejs 爬虫案例

本网仅为发布的内容提供存储空间，不对发表、转载的内容提供任何形式的保证。凡本网注明“来源：XXX网络”的作品，均转载自其它媒体，著作权归作者所有，商业转载请联系作者获得授权，非商业转载请注明出处。

我们尊重并感谢每一位作者，均已注明文章来源和作者。如因作品内容、版权或其它问题，请及时与我们联系，联系邮箱：809451989@qq.com，投稿邮箱：809451989@qq.com

责任编辑：

Nodejs 爬虫 案例

1.安装：

2.介绍：

3.示例：

Nodejs 爬虫案例