Removing white-lines...

pyro.699 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, i know that there is a way to remove whitespace

sub trim($)
{
    my $string = shift;
    $string =~ s/^\s+//;
    $string =~ s/\s+$//;
    return $string;
}
[download]

$str = " Hello ";
$str =~ trim($str);
print $str; #"Hello"

I need to do this exact same thing, except more vertical. I have this file:

This is a line.

:O and this is another line.


This line is a bit more of a loner, because hes 2 lines away.
[download]

And i would like it to look like this:

This is a line.
:O and this is another line.
This line is a bit more of a loner, because hes 2 lines away.
[download]

So, basically all it does, is reads the file and removes all blank lines. (some of these lines have spaces and indents on them, it should first clear those out :)) This is what i have so far:

opendir(DIR, ".");
for $file (glob("*.shtml"))
{
    $line = "";
    $file_data = "";
    open(FILE, $file);
    my @data = <FILE>;
    foreach my $line (@data)
    {
        #Work to be done on each line of the file

        $file_data .= $line; #rebuild file
    }
        open (WRITE_FILE, ">$file");
        print WRITE_FILE $file_data;
        close WRITE_FILE;
        close FILE;
        print "Processing... ".$file."... Done!\n";
}
close DIR;
[download]

Thanks a ton guys :) ~Cody Woolaver

Comment on Removing white-lines... Select or Download Code

Replies are listed 'Best First'.
Re: Removing white-lines... by kyle (Abbot) on Apr 06, 2007 at 02:35 UTC
`foreach my $line (@data) { #Work to be done on each line of the file $file_data .= $line if ( $line =~ /\S/ ); }` [download] This way, a line is only added to the `$file_data` if it has a character that is not white space.	[reply] [d/l] [select]
Re: Removing white-lines... by bobf (Monsignor) on Apr 06, 2007 at 02:41 UTC
`# open the input and output files open( my $in_fh, '<' ... open( my $out_fh, '>' ... # operate on a single line at a time rather than reading the # input file in all at once (more memory-efficient if the # input file is very large) while( my $line = <$in_fh> ) { # skip the line if it contains only white space, # otherwise print it to the output file next if $line =~ m/^\s*$/; print $out_fh $line; } close $in_fh; close $out_fh;` [download]	[reply] [d/l]
Re: Removing white-lines... by saintly (Scribe) on Apr 06, 2007 at 02:35 UTC
`$file_data =~ s/\n\n+/\n/gs;` [download] Should work. On an unrelated note: `# Suggest using scalar filehandles and checking to # see if open succeeded... open( my $my_filehandle, $file ) \|\| next; # possibly: \|\| die "Can't open file: $!"; local $/ = undef; # tell perl not to stop reading at newline my $file_data = <$my_filehandle>; close $my_filehandle; # process $file_data # etc...` [download] The 'open' command is very prone to failure on UNIX systems for lots of reasons (you don't have access to the file, it's not readable, the file's really a directory, etc...), so getting in the habit of checking it is a plus. If you use scalar filehandles, you can use them with 'my' to keep them local to your block. `local $/ = undef;` tells perl that newlines shouldn't be considered the 'end of input'. If you want to look at the whole file instead of individual lines, you can turn it off inside the block and the next read on the filehandle will give you the whole thing. The 's' option after the substitution regex tells perl to not treat newlines as the end of the string in a regular expression. Then you can treat them as normal characters and remove them when there are a few in a row. If lines have spaces on them (and you want to remove those 'empty' lines too), then: `$file_data =~ s/\n[\s\n]+/\n/gs;` [download] should work. You can do this task from the command line: `$ perl -ni -e 'print if /\S/' .shtml` [download] Although that's technically a line-by-line approach. Update:* Fixed typo ($/ not $\); Update 2:Arr! Fixed problem #2 (parens around open, cause I don't like to use 'or' instead of '\|\|');	[reply] [d/l] [select]
Re^2: Removing white-lines... by ikegami (Patriarch) on Apr 07, 2007 at 06:41 UTC
`open my $my_filehandle, $file \|\| next;` means `open my $my_filehandle, ($file \|\| next);` which is quite wrong.	[reply] [d/l] [select]
Re: Removing white-lines... by f00li5h (Chaplain) on Apr 06, 2007 at 03:02 UTC
Since it looks like you're tidying up html, perhaps HTML::Tidy could be of help to you. Also, `perl -p -i -e ' s/^\s+//; s/[^\S\n]+$//;' .shtml` will do as you ask. `-i` do inplace editing of files. opens STDIN and STDOUT to file(s) listed `-e` run this code `-p` stick each line of your files (listed in `@ARGV`) into `$_` and print `$_` back out after your `-e`'ed code perlrun tells more on these and the `-e` does these things `s/^\s+//; # strip leading whitespace s/[^\S\n]+$//; # strip trailing whitespace` [download] I'm not sure which part of this is removing the all whitespace lines but it does appear to be happening `<_<` Update* ofcourse! the all-whitespace lines (including thier `\n`) is being removed by the first regex! `@_=qw; ask f00li5h to appear and remain for a moment of pretend better than a lifetime;;s;;@_[map hex,split'',B204316D8C2A4516DE];;y/05/os/&print;` Update to make a little more sense	[reply] [d/l] [select]
Re: Removing white-lines... by osunderdog (Deacon) on Apr 06, 2007 at 10:51 UTC
Just for the record, there's syntax oddity in your example code. `$str =~ trim($str);` isn't really doing anything. Your output from this would be: `Hello` [download] The `=~` assignment is used to assign a string to a regular expression. It is not assignment of one scalar to another. In this case, I don't think it even calls the trim function. I've modified the example to return the expected result: `use strict; sub trim($) { my $string = shift; $string =~ s/^\s+//; $string =~ s/\s+$//; return $string; } my $str = " Hello "; $str = trim($str); print $str; #"Hello"` [download] Hazah! I'm Employed!	[reply] [d/l] [select]