perl extraction from a file

rsennat has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: perl extraction from a file by prasadbabu (Prior) on Nov 08, 2005 at 09:26 UTC
You can make use of regular expressions or modules like XML::Twig. one way to do this is `undef $/; $str =<DATA>; while ($str =~ /<([^\/][^>])>((?:(?!<\/\1>).)*)<\/\1>/gsi) { print "Data inside $1 tag:\t$2\n"; }` [download] This works only if the tag does not have attribute like your input. Thanks in advance Prasad	[reply] [d/l]
Re^2: perl extraction from a file by Samy_rio (Vicar) on Nov 08, 2005 at 09:52 UTC
Hi, Below code will work with or without attributes. Try this, `undef $/; $str =<DATA>; while ($str =~ /<([^\/ ]+) ?[^>]>((?:(?!<\/\1>).))<\/\1>/gsi) { print "Data inside $1 tag:\t$2\n"; } __DATA__ <user_name type="new">$userId</user_name> <job_id>$jobId</job_id> <finish_time>$timeF$ampmF</finish_time> <status cate="true">COMPLETED</status>` [download] Regards, Velusamy R. eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@\|6%,53!-9@2~j';	[reply] [d/l] [select]
Re: perl extraction from a file by tphyahoo (Vicar) on Nov 08, 2005 at 09:54 UTC
This doesn't seem to be html, but it's in tags, and HTML::Treebuilder can parse it, so if all you want to do is strip out the tags, maybe this is helpful for a quick and dirty. Probably easier to maintain than the regexes anyway. `use strict; use warnings; use HTML::TreeBuilder; use Data::Dumper; while ( <DATA> ) { my $tree = HTML::TreeBuilder->new_from_content($_); my $body = $tree->look_down("_tag" => "body" ); my $contents = ( $body->content_list() )[0]; print "$contents\n"; } __DATA__ <user_name>$userId</user_name> <job_id>$jobId</job_id> <finish_time>$timeF$ampmF</finish_time> <status>COMPLETED</status>` [download] outputs: `$userId $jobId $timeF$ampmF COMPLETED` [download] UPDATED: Same basic idea, but I like this better.	[reply] [d/l] [select]
Re: perl extraction from a file by blazar (Canon) on Nov 08, 2005 at 10:07 UTC
Huh?!? If you're printing the data, you already have the info contained in it. Or else it is irrelevant that you're printing, and you may have just shown the format of the file you have to parse. And since it appears to be XML (as opposed to generic "TXT"), an xml parsing module would be the best tool to use. XML::Twig is a common recommendation, then. But also check the "Ways to Rome" series for some comparisons between Perl XML modules. Incidentally, if you had to print all that text (or even more), you may have considered an "here-doc" instead (see perldoc perlop.)	[reply]
Re: perl extraction from a file by rsennat (Beadle) on Nov 08, 2005 at 09:50 UTC
How abt extracting it like this, `$JobId = $1 if $string =~ m{<job_id>(.*?)</job_id>};` [download] Thanks	[reply] [d/l]
Re^2: perl extraction from a file by blazar (Canon) on Nov 08, 2005 at 10:30 UTC
Well, if you already knew, why did you ask in the first place? Whatever, the answer is that it may work in some simple case, but due to the many subtleties involved, it is deemed not to be applicable to more complex ones. So it all depends on your actual data and on how much you can rely on them. Incidentally you may also use `/g` and the return value of `m///` in list context although I admit that here, due to the fact that you're assigning to a single scalar, the "goatse operator" `=()=` would be needed and it wouldn't make such a cleaner solution.	[reply] [d/l] [select]