rsennat has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I have a TXT file in which im printing the data as,
print FILE "<user_name>$userId</user_name>\n"; print FILE "<job_id>$jobId</job_id>\n"; print FILE "<finish_time>$timeF$ampmF</finish_time>\n"; print FILE "<status>COMPLETED</status>\n";
How do i extract the info within the tags?

Thanks

Replies are listed 'Best First'.
Re: perl extraction from a file
by prasadbabu (Prior) on Nov 08, 2005 at 09:26 UTC

    You can make use of regular expressions or modules like XML::Twig.

    one way to do this is

    undef $/; $str =<DATA>; while ($str =~ /<([^\/]*[^>]*)>((?:(?!<\/\1>).)*)<\/\1>/gsi) { print "Data inside $1 tag:\t$2\n"; }

    This works only if the tag does not have attribute like your input.

    Thanks in advance

    Prasad

      Hi, Below code will work with or without attributes. Try this,

      undef $/; $str =<DATA>; while ($str =~ /<([^\/ ]+) ?[^>]*>((?:(?!<\/\1>).)*)<\/\1>/gsi) { print "Data inside $1 tag:\t$2\n"; } __DATA__ <user_name type="new">$userId</user_name> <job_id>$jobId</job_id> <finish_time>$timeF$ampmF</finish_time> <status cate="true">COMPLETED</status>

      Regards,
      Velusamy R.


      eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@|6%,53!-9@2~j';

Re: perl extraction from a file
by tphyahoo (Vicar) on Nov 08, 2005 at 09:54 UTC
    This doesn't seem to be html, but it's in tags, and HTML::Treebuilder can parse it, so if all you want to do is strip out the tags, maybe this is helpful for a quick and dirty. Probably easier to maintain than the regexes anyway.

    use strict; use warnings; use HTML::TreeBuilder; use Data::Dumper; while ( <DATA> ) { my $tree = HTML::TreeBuilder->new_from_content($_); my $body = $tree->look_down("_tag" => "body" ); my $contents = ( $body->content_list() )[0]; print "$contents\n"; } __DATA__ <user_name>$userId</user_name> <job_id>$jobId</job_id> <finish_time>$timeF$ampmF</finish_time> <status>COMPLETED</status>
    outputs:
    $userId $jobId $timeF$ampmF COMPLETED
    UPDATED: Same basic idea, but I like this better.
Re: perl extraction from a file
by blazar (Canon) on Nov 08, 2005 at 10:07 UTC

    Huh?!? If you're printing the data, you already have the info contained in it. Or else it is irrelevant that you're printing, and you may have just shown the format of the file you have to parse. And since it appears to be XML (as opposed to generic "TXT"), an xml parsing module would be the best tool to use. XML::Twig is a common recommendation, then. But also check the "Ways to Rome" series for some comparisons between Perl XML modules.

    Incidentally, if you had to print all that text (or even more), you may have considered an "here-doc" instead (see perldoc perlop.)

Re: perl extraction from a file
by rsennat (Beadle) on Nov 08, 2005 at 09:50 UTC
    How abt extracting it like this,
    $JobId = $1 if $string =~ m{<job_id>(.*?)</job_id>};
    Thanks

      Well, if you already knew, why did you ask in the first place? Whatever, the answer is that it may work in some simple case, but due to the many subtleties involved, it is deemed not to be applicable to more complex ones. So it all depends on your actual data and on how much you can rely on them.

      Incidentally you may also use /g and the return value of m/// in list context although I admit that here, due to the fact that you're assigning to a single scalar, the "goatse operator" =()= would be needed and it wouldn't make such a cleaner solution.