Feedparser
Parse Atom and RSS feeds in Python.
Install⚑
pip install feedparser
Basic Usage⚑
Parsing content⚑
Parse a feed from a remote URL⚑
>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')
>>> d['feed']['title']
u'Sample Feed'
Parse a feed from a string⚑
>>> import feedparser
>>> rawdata = """<rss version="2.0">
<channel>
<title>Sample Feed</title>
</channel>
</rss>"""
>>> d = feedparser.parse(rawdata)
>>> d['feed']['title']
u'Sample Feed'
Access common elements⚑
The most commonly used elements in RSS feeds (regardless of version) are title, link, description, publication date, and entry ID.
Channel elements⚑
>>> d.feed.title
u'Sample Feed'
>>> d.feed.link
u'http://example.org/'
>>> d.feed.description
u'For documentation <em>only</em>'
>>> d.feed.published
u'Sat, 07 Sep 2002 00:00:01 GMT'
>>> d.feed.published_parsed
(2002, 9, 7, 0, 0, 1, 5, 250, 0)
from time import mktime
from datetime import datetime
dt = datetime.fromtimestamp(mktime(item['updated_parsed']))
Item elements⚑
>>> d.entries[0].title
u'First item title'
>>> d.entries[0].link
u'http://example.org/item/1'
>>> d.entries[0].description
u'Watch out for <span>nasty tricks</span>'
>>> d.entries[0].published
u'Thu, 05 Sep 2002 00:00:01 GMT'
>>> d.entries[0].published_parsed
(2002, 9, 5, 0, 0, 1, 3, 248, 0)
>>> d.entries[0].id
u'http://example.org/guid/1'
>>> d.feed.image
{'title': u'Example banner',
'href': u'http://example.org/banner.png',
'width': 80,
'height': 15,
'link': u'http://example.org/'}
Feeds and entries can be assigned to multiple categories, and in some versions of RSS, categories can be associated with a “domain”.
>>> d.feed.categories
[(u'Syndic8', u'1024'),
(u'dmoz', 'Top/Society/People/Personal_Homepages/P/')]
As feeds in the real world may be missing some elements, you may want to test for the existence of an element before getting its value.
>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')
>>> 'title' in d.feed
True
>>> 'ttl' in d.feed
False
>>> d.feed.get('title', 'No title')
u'Sample feed'
>>> d.feed.get('ttl', 60)
60
Advanced usage⚑
It is possible to interact with feeds that are protected with credentials.
Issues⚑
- Deprecation warning when using updated_parsed, once solved tweak theairss/adapters/extractor.py#RSS.getatupdated_at.