downer has asked for the wisdom of the Perl Monks concerning the following question:

I a lot of data from which I am trying to extract information. The data is pretty well ordered, so this shouldnt be a problem. I dont have the ability to install any packages, sadly, so this is made slightly more complicated. here is a sample of my code
my $name = ''; if($x =~ /<name>(.*?)<\/name>/igs) { $name = $1; } my $time = ''; if($x =~ /<published>(.*?)<\/published>/igs) { $time = $1; } my $content = ''; if($x =~ /<content type='text'>(.*?)<\/content +>/igs) { $content = $1; $content =~ s/\n/ /ig; } print "$id\t$name\t$time\t$content\n";
here is an example of the data to be parsed:
<id>http://gdata.youtube.com/feeds/api/videos/5InqyMvRZ8o/comments/730 +7D6E7F6E2D1B8 </id> <published>2007-04-05T12:05:42.000-07:00 </published> <updated>2007-04-05T12:05:42.000-07:00 </updated> <category scheme='http://schemas.google.com/g/2005#kind' term='http:// +gdata.youtube.com/schemas/2007#comment'/> <title type='text'>Fantastisk video,, ... </title> Keep up the good work. - jeg glæder mig meget til at se flere video +er fra dig..uper billeder du har fundet (: </content> <link rel='related' type='application/atom+xml' href='http://gdata.you +tube.com/feeds/api/videos/5InqyMvRZ8o'/> <link rel='alternate' type='text/html' href='http://www.youtube.com/wa +tch?v=5InqyMvRZ8o'/> <link rel='self' type='application/atom+xml' href='http://gdata.youtub +e.com/feeds/api/videos/5InqyMvRZ8o/comments/7307D6E7F6E2D1B8'/> <author> <name>cajaneil </name> <uri>http://gdata.youtube.com/feeds/api/users/cajaneil </uri> </author>
for some reason, my regular expressions aren't matching any field except for content. any idea what the problem is?

Replies are listed 'Best First'.
Re: trouble with regular expressions, dont know why patters aren't matching
by GrandFather (Saint) on Mar 31, 2008 at 23:14 UTC

    You need to provide more context, but I suspect you are trying to parse one line at a time, but match strings that span several lines.

    I strongly suggest that you use a module such as XML::Twig to parse XML! Consider:

    use strict; use warnings; use XML::Twig; my $twig = XML::Twig->new (twig_roots => {content => \&contents}); my $xml = do {local $/; <DATA>}; $twig->parse ($xml); sub contents { my ($twig, $contents) = @_; my @children = $contents->children (); my @wanted = qw(id title published); my $match = join '|', @wanted; my %params; for my $child (grep {$_->tag () =~ /^($match)$/} @children) { $params{$child->tag ()} = $child->text (); } print join "\t", @params{@wanted}; } __DATA__ <data> <content> <id>http://gdata.youtube.com/feeds/api/videos/5InqyMvRZ8o/comm +ents/7307D6E7F6E2D1B8 </id> <published>2007-04-05T12:05:42.000-07:00 </published> <updated>2007-04-05T12:05:42.000-07:00 </updated> <category scheme='http://schemas.google.com/g/2005#kind' term= +'http://gdata.youtube.com/schemas/2007#comment'/> <title type='text'>Fantastisk video,, ... </title> Keep up the good work. - jeg glder mig meget til at se flere +videoer fra dig..uper billeder du har fundet (: </content> <link rel='related' type='application/atom+xml' href='http://gdata +.youtube.com/feeds/api/videos/5InqyMvRZ8o'/> <link rel='alternate' type='text/html' href='http://www.youtube.co +m/watch?v=5InqyMvRZ8o'/> <link rel='self' type='application/atom+xml' href='http://gdata.yo +utube.com/feeds/api/videos/5InqyMvRZ8o/comments/7307D6E7F6E2D1B8'/> <author> <name>cajaneil </name> <uri>http://gdata.youtube.com/feeds/api/users/cajaneil </uri> </author> </data>

    Prints:

    http://gdata.youtube.com/feeds/api/videos/5InqyMvRZ8o/comments/7307D6E +7F6E2D1B8 Fantastisk video,, ... 2007-04-05T12:05:42.000-07:00

    Note that I altered the XML to make it valid and that I chose elements that exist as children of content for demonstration purposes. You will need to alter the code to suit what you are actually doing.


    Perl is environmentally friendly - it saves trees
Re: trouble with regular expressions, dont know why patters aren't matching
by jettero (Monsignor) on Mar 31, 2008 at 22:52 UTC
    XML is notoriously difficult to parse with regular expressions... The packages are really the way to go, even if it seems like a lot of effort — even political effort if your admin is an adversary or something. I wouldn't be surprised at all if one of the many XML choices are core by now, although I haven't checked recently.

    Otherwise, based on that data, it looks like it aughta match. Are you sure the data in $x looks how you think it looks?

    -Paul

Re: trouble with regular expressions, dont know why patters aren't matching
by ikegami (Patriarch) on Mar 31, 2008 at 23:00 UTC
    Get rid of those g modifiers.
Re: trouble with regular expressions, dont know why patters aren't matching
by BrowserUk (Patriarch) on Mar 31, 2008 at 23:07 UTC
      that was a mistake copying, sorry. this is the actual data, i had another line in the code which printed it out. I spoke to the admin, got me some root privileges, and installed XML::Simple. at that point, solving the problem and getting the data was, well, simple! thanks!