Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
The information I am trying to extract lies between the <notes > and </notes> tags (which may or may not fall across lines. Once extracted I want to put the information into 1 file which has the information on each line (line returns stripped). This is what I have created so far, being new to Perl, I have added bits and pieces from scripts found on the site.
I am having difficulty removing the line returns from the final output file. Also my current output gives me text which is outside of the notes tag, but this is only from the second record.
Can anyone help?
#!/usr/bin/perl -w use strict; use File::Find; use HTML::TokeParser; ##Here define the directory to work across my $root_dir = 'c:/test1'; ##Search the directory, when a file is found run the sub. find(\&wanted, $root_dir); sub wanted { # if the extension fits... if ( /(LOG[^\n]*)|(REC[^\n]*)\.xml?/i ) { ##Grab the filename for error to screen if cannot open. my $input = $_; open (OUTPUT, ">>c:\\1-Actnte.txt"); open INPUT, "$input" or die "Cannot open $input"; select OUTPUT; $\ = "\n"; my $foundstart; while (<INPUT>) { chomp; next unless ($foundstart || /<notes[^>]*>/i); while (/<notes[^>]*>/i && ! $foundstart) { $_ =~ s/^.*?<notes [^>]*>$/\n<notes $1\/i; $foundstart++; next unless($_); } while ($_ =~ m|<notes[^\r\n]*</notes>|i) { $_ =~ s|^(.*?)</notes>.*$|$1|i; print if($_); last; } print; } close INPUT; } } close OUTPUT;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Extracting information from multiple files in a directory
by BrowserUk (Patriarch) on May 09, 2003 at 04:05 UTC | |
|
Re: Extracting information from multiple files in a directory
by chromatic (Archbishop) on May 09, 2003 at 05:34 UTC | |
|
Re: Extracting information from multiple files in a directory
by zby (Vicar) on May 09, 2003 at 08:22 UTC |