Some rawdog plugins

Top Page

Reply to this message
Author: Virgil Bucoci
Date:  
To: rawdog-users
Subject: Some rawdog plugins
Hi,

I've modified/created some plugins for rawdog and I'd like to share
them. They are as follows:

* slashdot.py

Modified to create proper template parameters with the
slash:dept. & slash:section contents, instead of adding to the
end of the description. Being template params they can be
conditionally shown and formated as seen fit from the template.

* feedwise.py

Modified to use the proper hook to generate the feed headings,
added navigational links between feeds, sorted the feeds by name
(like feedwise-ca.py) and added default value for config option
articles_per_feed (idea taken from Dog Walker <forestiero at
qwest.net>).

* imgstrip.py

Replaces image tags with links to the image, or simply strips
them. This is to get rid of image-bugs from feedburner.

I don't know if having multiple plugins doing basically the same thing
(feedwise already has 2 and now comes a 3rd, slashdot is going to have
2 as well now) is really good. Maybe it would be better to merge them
to one version and to add config options for the extra functionality.
What do you think?

Also, there are some plugins that don't really play nice with other
plugins. tagcat.py is the one I hit. It binds to output_write_files
hook that completely by passes rawdog.write_output_file, so
output_bits* hooks calls need to be implemented by it, but aren't so
plugins like slashdot.py or feedwise.py don't work anymore (I saw on
this list that there are other plugins affected as well).

It would be better if one hook wouldn't shadow other hooks. Till
then, maybe it will be useful that rawdoglib.plugins.attach_hook
outputs a warning when attaching to such hooks, so you can at least
see in the log that some plugin overrides other hooks.

peace,
virgil
-- 
Do not meddle in the affairs of wizards, for they are subtle and quick to anger.

"""
Author BAM

Wed Sep 6 16:59:06 EEST 2006
Modified by Virgil Bucoci <vbucoci at acm.org>

Add support for 'from the ... dept.' and section lines to the Slashdot
articles (found in the feed as slash:department and slash:section).

Add the following lines to the item template. You have to use a file
template, this won't work with the default template. See the README
and the config files that come with rawdog.

__if_slash-section__
<br /><span class="slash-section">Category <em>__slash-section__</em></span>
__endif__


__if_slash-department__
<br /><span class="slash-dept">from the <em>__slash-department__</em> dept.</spa
n>

__endif__

"""

class Slashdot:
def output(self, rawdog, config, feed, article, itembits):
try:
itembits["slash-department"] = article.entry_info['slash_department']
itembits["slash-section"] = article.entry_info['slash_section']
except KeyError:
pass
return True


### Init code
import rawdoglib.plugins
rawdoglib.plugins.attach_hook("output_item_bits", Slashdot().output)
"""
rawdog plugin to strip img tags from articles.
author Virgil Bucoci <vbucoci at acm.org>
version 0.1
license: GNU GPL v2.0

This rawdog plugin strips img tags from feed articles. More and more
feeds include web-bug and advertisement images these days, the most
notorious example being slashdot.

Having only a couple of tens of bugged articles in a rawdog page
really slows down the page reload (because each web-bug image has a
unique URL/name in every article, so they can trace each article, even
though the images are identical and quite small as images go :D),
taking all the fun away from aggregating the feeds locally and
exposing you to privacy invasion.

By default, images are replaced with the string [img] linked to the
image source, but can also be removed without a trace.

Configuration options:

imgstrip link
(default) img tags are replaced with the string [img] linked to the
image source

imgstrip none
img tags are simply removed from the article

TODO
- make a per-feed setting, for feeds those images you want to see (flickr?)
- something more general for stripping obnoxious tags: font, style,
script/javascript (maybe tidy already does part of this?)
"""
import rawdoglib.plugins, re

none_i = '<img\\s[^>]*/?>'
# this is kind of kludgy, but it works for now
link_i = """
<img # tag name
\\s # whitespace
[^>]* # anything but a >
src= # attribute name
['"]? # optional apostrophe or quote
([^ '"]*) # image URL
['"]? # optional apostrophe or quote
[^>]* # anything but a >
/? # optional /
> # tag end

"""

none_repl = ' '
link_repl = '[<a href="\\1">img</a>]'

class ImgStripPlugin:
"""
Strip img tags from articles.

The image is replaced by default with a link to the image, but can
also be only removed with the "imgstrip none" option.
"""
def __init__(self, pat, repl):
self.repl = repl
self.img = re.compile(pat, re.IGNORECASE | re.VERBOSE)

def imgstrip(self, config, html, baseurl, inline):
"""
Strip <img> tags from the feed HTML.
"""
html.value = self.img.sub(self.repl, html.value)

def config_option(self, config, name, value):
"""
Configures the stripping through the config file.

name - the option name, 'imgstrip'
value - 'none': simply remove the img tag
'link': replace the image with a link to image's source.
This is the default.
anything else: raise ValueError
"""
if name == 'imgstrip':
if value == 'none':
self.__init__(none_i, none_repl)
return False
elif value == 'link':
self.__init__(link_i, link_repl)
return False
else:
    raise ValueError, \
"imgstrip error: option '%s' has invalid value '%s'" \
% (name, value)
return True

istrip = ImgStripPlugin(link_i, link_repl)
rawdoglib.plugins.attach_hook("clean_html", istrip.imgstrip)
rawdoglib.plugins.attach_hook('config_option', istrip.config_option)
# Feedwise Plugin version 0.2 for Rawdog.
# Ian Glover ian@???
#
# Sort articles into chunks by feed rather than date.
#
# Wed Sep 6 16:32:40 EEST 2006
# Modified by Virgil Bucoci <vbucoci at acm.org>
# * use the right hook to write the feed headings
# * sort feeds by feed name
# * added navigation links between feeds
# * feed content gets wrapped in ordered-list tags, so if you add
# list-item tags <li> to your item template, you will get per-feed
# numbered articles
# * added default value for articles_per_feed config option (idea
# stolen from Dog Walker <forestiero at qwest.net>,
# http://lists.us-lot.org/pipermail/rawdog-users/2006-September/000254.html)
# * minor code cleanup

import rawdoglib.plugins

class FeedwisePlugin:
def __init__(self):
self.last_feed = None
self.feed_no = 0

def startup(self, rawdog, config):
# daysections and timesections config options are not used
# when this plugin is used. We disable them just to be safe.
msg = "FeedwisePlugin: %s config option is NOT used when this plugin is active."
if config["daysections"]:
print msg % "daysections"
config["daysections"] = 0 # not really necessary
if config["timesections"]:
print msg % "timesections"
config["timesections"] = 0 # not really necessary
try:
articles_per_feed = config["articles_per_feed"]
except KeyError:
config["articles_per_feed"] = 100

# Scan the sorted list of articles and through away old ones to
# so we don't exceed the configured limit.
def limit_articles_per_feed(self, rawdog, config, articles):
feed_counts = {}
to_remove = []
for i in range(len(articles)):
feed = articles[i].feed
if feed not in feed_counts:
feed_counts[feed] = 1
else:
feed_counts[feed] += 1
if feed_counts[feed] > config["articles_per_feed"]:
# We don't want to from the list whilst iterating through it.
# So use None as a marker for deletion.
to_remove.append(articles[i])
for x in to_remove:
articles.remove(x)

# Sort the articles by feed and inside each feed by time.
def sort_by_feed(self, rawdog, config, articles):
def comparator(lhs, rhs):
if cmp(lhs.feed, rhs.feed) != 0:
#return cmp(lhs.feed, rhs.feed)
return cmp(rawdog.feeds[lhs.feed].get_html_name(config).lower(),
rawdog.feeds[rhs.feed].get_html_name(config).lower())
else:
# Inverted as we want the most recent first.
return -cmp(lhs.date, rhs.date)
articles.sort(comparator)
self.limit_articles_per_feed(rawdog, config, articles)

def _link(self, no):
a = """<a href="#%s">%s</a>"""
if no == 0: # first link
return a % (no+1, 'next')
elif no < 0: # last link
return a % (-no-1, 'prev')
else:
return ("%s | %s" % (a, a)) % (no-1, 'prev', no+1, 'next')

# Split each feed into its own block and add links to the other
# feeds.
def write_divider(self, rawdog, config, f, article, date):
feed = rawdog.feeds[article.feed]

# Basic division header.
basic_divider = '''<div class="feeddisplay">
<h3 class="feedtitle"><a id="%s">%s</a></h3>
%s
<ol class="feedarticles">''' \
% (self.feed_no, feed.get_html_link(config), self._link(self.feed_no))

if self.last_feed != feed: # A new feed
if self.last_feed != None: # not the first feed
print >>f, '</ol></div>\n'
print >>f, basic_divider
self.feed_no += 1
self.last_feed = feed
return False

# All the items have been written but we have a couple of tags
# open so close those and add the last anchor and link.
def write_end(self, rawdog, config, f):
f.write('</ol></div>\n<a id="%s">%s</a>'
% (self.feed_no, self._link(-self.feed_no)))

# Handle the articles_per_feed configuration option
def handle_config(self, config, name, value):
if name == "articles_per_feed":
config["articles_per_feed"] = int(value)
return False
return True


plugin = FeedwisePlugin()

rawdoglib.plugins.attach_hook( "startup", plugin.startup )
rawdoglib.plugins.attach_hook( "output_sort", plugin.sort_by_feed )
rawdoglib.plugins.attach_hook( "output_items_heading", plugin.write_divider )
rawdoglib.plugins.attach_hook( "output_items_end", plugin.write_end )
rawdoglib.plugins.attach_hook( "config_option", plugin.handle_config )
_______________________________________________
rawdog-users mailing list
rawdog-users@???
http://lists.us-lot.org/mailman/listinfo/rawdog-users