simonodell has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

With my limited and old fashioned understandings in PERL I've written this getrss plugin to my aXML system. Whilst this plugin works already, I beseech information on how to modernise and improove this code such that it passes the poetic standards of perl in 2007.

The plugin is called from within the aXML doc using the <getrss> tag in the following way:

(getrss url="http://news.bbc.co.uk/rss/newsonline_world_edition/front_ +page/rss.xml") <h3><d>title</d></h3> <p><d>description</d></p> <p><link to="<d>link</d>">Read More...</link></p> (/getrss)

The ( ) brackets indicate to the aXML parser that this tag must be run before its child tag, since it contains a link command which depends on data which would not be available if the link were run first.

The contents of the (getrss) tag is a "mask" for each item found in the rss return, and <d> represents an item of data within that item. Link is a hyperlink generator plugin.

The data within the getrss tag is handed to the getrss plugin and is referred to as $data within the plugin. The plugin builds the string $result, which is then handed back.

my $module = 'LWP::Simple'; my $return_val = $module->use; if ($return_val) { $raw_data = get($command_args->{url}); $result = ""; if ($raw_data =~ m@<channel>(.*?)</channel>@s) { $channel_data = $1; $channel_data =~ s@\[@<lftsqbrk/>@g; $channel_data =~ s@\]@<rtsqbrk/>@g; $channel_data =~ s@\(@<lftbrk/>@g; $channel_data =~ s@\)@<rtbrk/>@g; $channel_data =~ s@\+@<plus/>@g; $channel_data =~ s@\?@<ques/>@g; $channel_data =~ s@\&apos;@<apos/>@g; $channel_data =~ s@\$@<dollar/>@g; #these escapes are needed for the regex's below to #work properly. while ($channel_data =~ m@<item>(.*?)</item>@s) { $item_info = $1; $mask = $data; #the output mask is set to a copy of the data #handed to the plugin while ($mask =~ m@<d>(.*?)</d>@s) { $get_item = $1; my $item_string; if ($item_info =~ m@<$get_item>(.*?)</$get_item>@s) { $ +item_string = $1; } $mask =~ s@<d>$get_item</d>@$item_string@; } $result .= $mask; $channel_data =~ s@<item>$item_info</item>@@; } } else { $result = "<error>Couldnt find channel tag</error>"; } } else { $result = "<error>Couldnt retrieve url</error>"; }

Replies are listed 'Best First'.
Re: Modernisation Needed
by Joost (Canon) on Apr 14, 2007 at 02:34 UTC
Re: Modernisation Needed
by diotalevi (Canon) on Apr 14, 2007 at 02:58 UTC

    If you were using XML::Twig, you could have a handler watching for the start of a getrss tag. When that handler fired, it'd temporarilly remove all the other handlers until the closing tag was detected. You'd then have a complete tag and could then handle the innards however you liked (I suppose just a recursive call back to Twig). In general, this kind of flow control is much easier to achieve once you've delegated your parsing to a module so you can concentrate on the real task.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: Modernisation Needed
by GrandFather (Saint) on Apr 14, 2007 at 04:32 UTC

    If you would like a hint of what it would look like as an OO module rather than as an eval'd text file then consider the following:

    First a sketch of the plug in base class:

    use strict; use warnings; package PluginBase; sub new { my ($class, %params) = @_; return bless {%params}, $class; } sub get_arg { my ($self, $arg) = @_; return $self->{command_args}{$arg}; } sub quote_special { my ($self, $data) = @_; $data =~ s@\[@<lftsqbrk/>@g; $data =~ s@\]@<rtsqbrk/>@g; $data =~ s@\(@<lftbrk/>@g; $data =~ s@\)@<rtbrk/>@g; $data =~ s@\+@<plus/>@g; $data =~ s@\?@<ques/>@g; $data =~ s@\&apos;@<apos/>@g; $data =~ s@\$@<dollar/>@g; return $data; } sub get_raw_data { my $self = shift; return $self->{raw_data}; } sub get_data { my $self = shift; return $self->{data}; } 1;

    and then the plug in:

    use strict; use warnings; package GetURL; use base 'PluginBase'; sub new { my ($class, %params) = @_; return $class->SUPER::new (%params); } sub doit { my $self = shift; my $module = 'LWP::Simple'; return "<error>Couldnt retrieve url</error>" unless $module->use; my $w_data = $self->get_arg ('url'); return "<error>Couldnt find channel tag</error>" unless $self->get_raw_data () =~ m@<channel>(.*?)</channel>@s; my $channel_data = $self->quote_special ($1); my $result = ''; while ($channel_data =~ m@<item>(.*?)</item>@s) { my $item_info = $1; my $mask = $self->get_data (); #the output mask is set to a copy of the data #handed to the plugin while ($mask =~ m@<d>(.*?)</d>@s) { my $get_item = $1; my $item_string; $item_string = $1 if $item_info =~ m@<$get_item>(.*?)</$ge +t_item>@s; $mask =~ s@<d>$get_item</d>@$item_string@; } $result .= $mask; $channel_data =~ s@<item>$item_info</item>@@; } return $result; }

    Note that a lot of guesses are being made about where stuff might come from in the original code. Using an object that becomes pretty clear. There is also a huge benefit when (for example) you decide you need another level of nesting. Make the changes in the base class and all past, current and future plug ins do the right thing (cf. quote_special).


    DWIM is Perl's answer to Gödel
Re: Modernisation Needed
by blazar (Canon) on Apr 14, 2007 at 10:51 UTC
    $channel_data =~ s@\[@<lftsqbrk/>@g; $channel_data =~ s@\]@<rtsqbrk/>@g; $channel_data =~ s@\(@<lftbrk/>@g; $channel_data =~ s@\)@<rtbrk/>@g; $channel_data =~ s@\+@<plus/>@g; $channel_data =~ s@\?@<ques/>@g; $channel_data =~ s@\&apos;@<apos/>@g; $channel_data =~ s@\$@<dollar/>@g;

    All those bindings look clumsy and people often like to use a one-shot for loop:

    for ($channel_data) { s@\[@<lftsqbrk/>@g; s@\]@<rtsqbrk/>@g; s@\(@<lftbrk/>@g; s@\)@<rtbrk/>@g; s@\+@<plus/>@g; s@\?@<ques/>@g; s@\&apos;@<apos/>@g; s@\$@<dollar/>@g; }

    Also, you may want to maintain a "hash of substitutions" and generate the pattern:

    # Outside the loop my %subst = ( '[' => '<lftsqbrk/>', ']' => '<rtsqbrk/>', '(' => '<lftbrk/>', ')' => '<rtbrk/>', '+' => '<plus/>', '?' => '<ques/>', '&apos;' => '<apos/>', '$' => '<dollar/>', ); my $re=join '|', map quotemeta, keys %subst; # ... # Inside the loop $channel_data =~ s/($re)/$subst{$1}/g;