Python源码

点击排行

当前位置:首页 > 源码下载 > Python源码 > Python网络爬虫源码 > 源码详情

python爬取人民网新闻详情页图片


from lxml import etree
import requests
import os

BASE_DOMAIN = "http://m2.people.cn/"
HEADERS={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
'Cookies':'UM_distinctid=1651ee5c46fbf-0945655e26d91c-36465d60-13c680-1651ee5c47072; userid=1534074497776_fx1dg6661; vjuids=-aa0da9193.1652df7482a.0.32b2d7e1c2af1; prov=cn025; city=0516; weather_city=js_xz; region_ip=180.124.143.237; region_ver=1.30; FTAPI_BLOCK_SLOT=FUCKIE; FTAPI_ST=FUCKIE; ALPHA_BLOCK_SLOT=FUCKIE; ALPHA_ST=FUCKIE; ifengRotator_iis3_c=5; Hm_lvt_19e6abe24a3c609065f14647cc8597e8=1534983935; Hm_lpvt_19e6abe24a3c609065f14647cc8597e8=1534984054; Hm_lvt_03ee991a65e88f21553ed82a0ddec689=1534983935; Hm_lpvt_03ee991a65e88f21553ed82a0ddec689=1534984054; CNZZDATA5611988=cnzz_eid%3D1363530859-1534978730-%26ntime%3D1534978730; CNZZDATA1257375399=1876175329-1534980279-%7C1534980279; CNZZDATA1266101310=138415197-1534981003-%7C1534981003; FTAPI_ASD=1; ifengRotator_AP2842=0; CNZZDATA1263371036=570310223-1534983016-%7C1534983016; Hm_lvt_8e88b58d2a5a8144902c43831ed6a673=1534983942; Hm_lpvt_8e88b58d2a5a8144902c43831ed6a673=1534984060; ALPHA_ASD=1; Hm_lvt_a4df3210e63a95bd1fb74a6900965454=1534983942; Hm_lpvt_a4df3210e63a95bd1fb74a6900965454=1534984060; Hm_lvt_f7c75c1aeaf925055eb87d67d6645277=1534983945; Hm_lpvt_f7c75c1aeaf925055eb87d67d6645277=1534984061; Inner_IfengRotator_Ap2270=0; ifengWindowCookieName_Innernews=2; FTAPI_PVC=1013515-2-jl6b0t66|1006766-12-jl6b152i; ALPHA_PVC=180108-12-jl6b17fm|180107-24-jl6b1de8; vjlast=1534074505.1534074505.30'
}
#本程序需要配置几处:1.文件编辑gbk或utf-8 2.详情页xpath 3.文件路径及分页


def savepic(picurl,mulu):#将图片路径picurl保存到文件夹cdir
picdir='./news'
#创建待保存的文件夹
homedir = os.getcwd()#获取项目当前路径
if os.path.exists(homedir+'\\'+picdir+'\\'+mulu):
pass
else:
os.mkdir(homedir+'\\'+picdir+'\\'+mulu)

parser = etree.HTMLParser(encoding='utf-8')
html = etree.parse(picurl,parser=parser)
#print(html)
pic_url = html.xpath('//div[@class="content"]/p/img/@src')
#print(pic_url)
i=0
for each in pic_url:
print('正在下载:p'+str(i)+' '+each)
pic = requests.get(each)
fp = open(homedir+'\\'+picdir+'\\'+mulu+'\\' + str(i) + '.jpg','wb')
fp.write(pic.content)
fp.close()
i+=1
print(i)
print('目录'+mulu+'保存完成!')


if __name__ == '__main__':
savepic('http://m2.people.cn/r/MV8wXzExNDg3MjM1XzU4XzE1MzQ5MDI4ODY=','people')




亲,试试微信扫码分享本页! *^_^*

下载地址

下载地址1 下载1密码: 下载地址2 下载2密码: