Re: Failed parsing of an ATOM feed

Top Page

Reply to this message
Author: Sam Ruby
Date:  
To: Vasil Kolev
CC: devel
Subject: Re: Failed parsing of an ATOM feed
On Mon, Jun 2, 2008 at 3:47 PM, Vasil Kolev <vasil@???> wrote:
> This is what I get from this feed:
>
> INFO:planet.runner:Updating feed http://debian.fmi.uni-sofia.bg/~ogi/blog/feeds/atom10.xml
> ERROR:planet.runner:Error processing http://debian.fmi.uni-sofia.bg/~ogi/blog/feeds/atom10.xml
> ERROR:planet.runner:UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 0: ordinal not in range(128)
> ERROR:planet.runner: File "/home/vasil/www/pesho/venus/planet/spider.py", line 468, in spiderPlanet
> writeCache(uri, feed_info, data)
> ERROR:planet.runner: File "/home/vasil/www/pesho/venus/planet/spider.py", line 214, in writeCache
> output = xdoc.toxml().encode('utf-8')
> ERROR:planet.runner: File "xml/dom/minidom.py", line 47, in toxml
> ERROR:planet.runner: File "xml/dom/minidom.py", line 62, in toprettyxml
> ERROR:planet.runner: File "StringIO.py", line 271, in getvalue
> self.buf += ''.join(self.buflist)
>
> Looking at it, doesn't seem to be any problem with it (especially at
> position 0), and I don't seem to find 0xd1 anywhere where it should
> matter, any ideas? I updated to the latest snapshot and still see this.


This probably doesn't help much, but the feed is not well formed.
Once the feed is not well formed, the feed parser applies a bunch of
heuristics for trying to salvage what data it can. Those heuristics
may not be as good at extracting meaning from cyrillic text as latin
text. I can try to debug further, but IMHO a fix upstream would also
be worth pursuing.

http://feedvalidator.org/check.cgi?url=http%3A%2F%2Fdebian.fmi.uni-sofia.bg%2F%7Eogi%2Fblog%2Ffeeds%2Fatom10.xml#l129

- Sam Ruby
-- 
devel mailing list
devel@???
http://lists.planetplanet.org/mailman/listinfo/devel