RE: RE: RE: Re: Stripping page headers

Lets try this again. hehe <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <meta name="GENERATOR" content="Microsoft FrontPage 4.0"> <meta name="ProgId" content="FrontPage.Editor.Document"> </head> <body>

#!/usr/bin/perl -w
use strict;

my ($REPFILE, $report);

undef $/;

open REPFILE, "report.rpt" or die "Cant open $REPFILE: $!\n";

$report = <REPFILE>;

$report =~ s/^User Report//g;
$report =~ s/^All Users//g;
$report =~ s/^User Name//g;
$report =~ s/^-> Token//g;

print $report;

close REPFILE;

#First section of report.rpt follows

User Report Date: 09/26/2000 09:55:13

All Users Page: 1 of 114

User Name Default Login Name Default Shell Name

-> Token Serial No. Replacement Last Login Original Token Type

Temp 1 Temp1

-> 000050488538 01/01/1986 00:00:00 SoftID

Temp 2 temp2

-> 000050488537 01/01/1986 00:00:00 SoftID

Temp 3 temp3

-> 000050488536 01/01/1986 00:00:00 SoftID

</body> </html>

Comment on RE: RE: RE: Re: Stripping page headers

Replies are listed 'Best First'.
(Dermot) RE: RE: RE: RE: Re: Stripping page headers by Dermot (Scribe) on Sep 28, 2000 at 01:54 UTC
Ok, I've made two modifications to what I originally posted and now it works ok with your data. My original script would never have worked properly with your report file, it worked with the trivial example that I tested it on but how and ever. Here is a working version: `#!/usr/bin/perl -w use strict; my ($REPFILE, $report); undef $/; open REPFILE, "report.rpt" or die "Cant open $REPFILE: $!\n"; $report = <REPFILE>; $report =~ s/^User Report.//mg; $report =~ s/^All Users.//mg; $report =~ s/^User Name.//mg; $report =~ s/^-> Token.//mg; print $report; close REPFILE;` [download] Addition of m modifier to the substitution. Because we are dealing with the whole report file in one scalar it is effectively one string and the rule for ^ and $ is that they match at the start and end of a string, not a line. To get ^ and $ matching at the start and end of a line instead of a string we have to add the m modifier. Now it sees the string in $report as a series of lines delimited by \n characters. Addition of .* to the regex to deal with the rest of the line. By adding .* to the regex we cause it to match (i) start of line, (ii) piece of text that we're using as a tag on the line, (iii) rest of the line up to the next \n. A dot character in a regex matches any character except a newline (\n). If you want it to match a newline you can specify this using the s regex modifier. Just to top off the confusion you can actually use both the s and m modifiers on the same regex. Most people assume that they mean single-line vs multi-line but actually they mean match newlines with dot and match ^ and $ in lines not in the whole string.	[reply] [d/l]
RE: RE: RE: RE: RE: Re: Stripping page headers by Anonymous Monk on Sep 29, 2000 at 04:31 UTC
You rock! :) It works flawlessly. BIG Thanks!	[reply]