in reply to Perl, SQLite3, and Parsing the Chatterbox Feed.

You definitely need to use less greedy regex's. Instead of:

my $pat = qr{ .*<author>(.*)<\/author>.*<text>(.*)<\/text }xs;

use:

my $pat = qr{ .*?<author>(.*?)<\/author>.*?<text>(.*?)<\/text }xs;

Also, I'm not sure you are using the /g option correctly. I've had better luck with:

while ($data =~ m/$pat/gc) { my ($auth, $text) = ($1, $2); for( $text ) { s/[ ]+/ /g; s/^\s+//; s/\s+$//; } printf "%s: %s\n\n" , $auth , $text; }

Replies are listed 'Best First'.
Re^2: Perl, SQLite3, and Parsing the Chatterbox Feed.
by ikegami (Patriarch) on Feb 14, 2008 at 18:35 UTC
    I don't see why either of you are using /c. It's definitely not useful, and I suspect it's harmful.

      I am responsible for /c in match condition & simultaneous assignment in while loop (for replied in hurry, misread the /c description). Here is what works ...

      # Without /g, it would be an endless loop for match will # always start at the start of $data. while ( $data =~ m/$parse/g ) { my ( $auth , $text ) = ( "$1" , "$2" ); ... }

      (Circa 2001-2005, there are some examples of XML::(Twig|Simple) use to parse the chatterbox XML around here somewhere.)