当前位置: 首页> 健康> 养生 > 爬虫学习2

爬虫学习2

时间:2025/7/9 6:34:38来源:https://blog.csdn.net/Zero___0_0/article/details/139340757 浏览次数:0次

中国国家地理网

单张图片爬取

import requests
url = 'http://img0.dili360.com/ga/M00/02/AB/wKgBzFQ26i2AWujSAA_-xvEYLbU441.jpg@!rw9'
headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36'
}
img_data = requests.get(url = url,headers=headers).content
with open('./img0.jpg','wb') as fp:fp.write(img_data)

在这里插入图片描述

多张爬取

import requests
import re
import os
if not os.path.exists('./tupian'):os.mkdir('./tupian')# UA标识
headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36'
}
url= 'http://www.dili360.com/travel/sight/20400.htm'
page_text = requests.get(url=url,headers=headers).text
ex = '<div class="thumb-img">.*?<img src="(.*?)".*?</div>'
img_src_list = re_text = re.findall(ex,page_text,re.S)
print(img_src_list)
for src in img_src_list:img_data = requests.get(url=src).contentimg_name = src.split('/')[-1]img_name = img_name.split('@')[0]img_path = './tupian/'+img_namewith open(img_path,'wb') as fp:fp.write(img_data)print(img_name,"success")

在这里插入图片描述

多页爬取

import requests
import re
import os
if not os.path.exists('./tupian'):os.mkdir('./tupian')
# UA标识
headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36'
}
url= 'http://www.dili360.com/Travel/sight/20400/%d.htm'
for page_num in range(1,6):new_url = format(url % page_num)page_text = requests.get(url=new_url,headers=headers).textex = '<div class="thumb-img">.*?<img src="(.*?)".*?</div>'img_src_list = re_text = re.findall(ex,page_text,re.S)print(img_src_list)for src in img_src_list:img_data = requests.get(url=src).contentimg_name = src.split('/')[-1]img_name = img_name.split('@')[0]img_path = './tupian/'+img_namewith open(img_path,'wb') as fp:fp.write(img_data)print(img_name,"success")

在这里插入图片描述

关键字:爬虫学习2

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com

责任编辑: