Parse value between tags, i.e. XML, HTML
e.g. <title
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom">New York Knicks' Jeremy Lin needs knee surgery, likely done for season</title>
Lets say, you want to extract value from title tag, which is
shown in bold in above.
There are multiple ways; one of the simple one is as listed
as below -
>>> import xml.dom.minidom
>>> a = ‘<title xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom">New York Knicks' Jeremy Lin needs knee surgery, likely done for season</title>’
>>> x = xml.dom.minidom.parseString(a)
>>> x.firstChild.firstChild.toxml()
u'New York Knicks' Jeremy Lin needs knee surgery, likely
done for season'
There are other ways too, but I found this one simplest!
No comments:
Post a Comment