Sunday, April 1, 2012

Python XML node parsing


Parse value between tags, i.e. XML, HTML

e.g. <title xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom">New York Knicks' Jeremy Lin needs knee surgery, likely done for season</title>

Lets say, you want to extract value from title tag, which is shown in bold in above.
There are multiple ways; one of the simple one is as listed as below -

>>> import xml.dom.minidom
>>> a = ‘<title xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom">New York Knicks' Jeremy Lin needs knee surgery, likely done for season</title>’
>>> x = xml.dom.minidom.parseString(a)
>>> x.firstChild.firstChild.toxml()
u'New York Knicks' Jeremy Lin needs knee surgery, likely done for season'

There are other ways too, but I found this one simplest!

No comments:

Post a Comment