python 爬虫提取span文字? python怎么爬去<span>lt;/span...

python\u600e\u4e48\u722c\u53bb<span>lt;/span>\u4e2d\u95f4\u6807\u7b7e\u7684\u5185\u5bb9

\u8fd9\u4e2a\u8981\u770b\u4f60\u4f7f\u7528\u7684\u662f\u4ec0\u4e48\u9875\u9762\u89e3\u6790\u5de5\u5177\u4e86
html = """item1 item2"""# \u4f7f\u7528 scrapy \u7684Selectorfrom scrapy.selector import Selector # scrapy \u7684\u9009\u62e9\u5668\u652f\u6301 css\u548cxpath\u9009\u62e9\u3002\u4e0b\u9762\u662fcss\u9009\u62e9\u5668\u3002\u5982\u679c\u4f60\u4e86\u89e3\u524d\u7aefJQuery\u7684\u77e5\u8bc6\uff0c# \u4f1a\u53d1\u73b0Selector(text=html).css('span::text').extract()# \u8f93\u51fa : ['item1','item2']Selector(text=html).css('span#s1::text').extract()# \u8f93\u51fa : ['item2']Selector(text=html).css('div>span::text').extract()# \u8f93\u51fa : ['item2'] # \u4f7f\u7528bs4from bs4 import BeautifulSoup soup = BeautifulSoup(html,'html.parser')sl = soup.find_all("span")result = [span.get_text() for span in sl]print(result)# ['item1', 'item2']

\u8fd9\u4e2a\u8981\u770b\u4f60\u4f7f\u7528\u7684\u662f\u4ec0\u4e48\u9875\u9762\u89e3\u6790\u5de5\u5177\u4e86\uff0c
html = """item1 item2"""# \u4f7f\u7528 scrapy \u7684Selectorfrom scrapy.selector import Selector# scrapy \u7684\u9009\u62e9\u5668\u652f\u6301 css\u548cxpath\u9009\u62e9\u3002\u4e0b\u9762\u662fcss\u9009\u62e9\u5668\u3002\u5982\u679c\u4f60\u4e86\u89e3\u524d\u7aefJQuery\u7684\u77e5\u8bc6\uff0c# \u4f1a\u53d1\u73b0Selector(text=html).css('span::text').extract()# \u8f93\u51fa : ['item1','item2']Selector(text=html).css('span#s1::text').extract()# \u8f93\u51fa : ['item2']Selector(text=html).css('div>span::text').extract()# \u8f93\u51fa : ['item2']# \u4f7f\u7528bs4from bs4 import BeautifulSoupsoup = BeautifulSoup(html,'html.parser')sl = soup.find_all("span")result = [span.get_text() for span in sl]print(result)# ['item1', 'item2']

xpath('//span/text()')就直接所有span下文字的列表

扩展阅读:为什么爬虫会坐牢 ... python为什么叫爬虫 ... 爬虫python官网 ... python爬虫赚钱的经历 ... 免费网络爬虫网站 ... 适合拿来爬虫的网站 ... 用爬虫抓取数据违法吗 ... python turtle graphics ... python爬虫xpath提取数据 ...

本站交流只代表网友个人观点,与本站立场无关
欢迎反馈与建议,请联系电邮
2024© 车视网