点击排行

您现在的位置：首页 > 技术文档 > Python网络爬虫

Python读取本地文件并解析网页元素的方法

来源：中文源码网浏览：371 次日期：2024-05-17 03:23:14

Python读取本地文件并解析网页元素的方法
如下所示：
from bs4 import BeautifulSoup
path = './web/new_index.html'
with open(path, 'r') as f:
Soup = BeautifulSoup(f.read(), 'lxml')
titles = Soup.select('ul > li > div.article-info > h3 > a')
for title in titles:
print(title.text)
输出：
Sardinia's top 10 beaches
How to get tanned
How to be an Aussie beach bum
Summer's cheat sheet
#其中
titles = Soup.select('ul > li > div.article-info > h3 > a')
#等效
titles = Soup.select('h3 a')
print(title.text)
#等效
print(title.get_text())
print(title.string)
也可以使用以下代码
import bs4
path = './web/new_index.html'
with open(path, 'r') as f:
Soup = bs4.BeautifulSoup(f.read(), 'lxml')
titles = Soup.select('h3 a')
for title in titles:
print(title.string)
Html原文：