Yoda_Oz has asked for the wisdom of the Perl Monks concerning the following question:

If i had a text file, how would i write a small script to reprint the file to the screen but removing all the new line characters and converting everything to lowercase???
so far i've learnt how to work out how many lines there are in the file... ie.
#!usr/local/bin/perl use strict; use warnings; my %lineHash; my @lineArray; print ("Enter filename to count lines: "); my $path=<STDIN>; print ("\n"); open(DATA, "<$path") || die "Couldn't open $path for reading: $!\n"; while (<DATA>) { while (s/(\n)(.*)/$2/) { my $count = $1; $lineHash{$count}++; } } close (DATA); my %charname = ( "\n" => "Lines" ); while ( my ($lines, $count) = each(%lineHash) ) { push @lineArray, "$count\t$lines"; print ("$count\t$charname{$lines}\n"); }

Replies are listed 'Best First'.
Re: calculating lines
by Zaxo (Archbishop) on Jan 12, 2007 at 00:25 UTC

    You don't want to use DATA as a user file handle. It's reserved for inline data after __END__, __DATA__, or Ctrl-Z.

    I'll call the file handle $fh. Reading line by line,

    while (<$fh>) {
    convert all clusters of whitespace to a single space,
    s/\s+/ /g;
    get rid of initial whitespace to handle "\n\t" and such,
    s/$\s//;
    and print the lowercased line,
    print lc; }
    That's it. Each of those operations takes advantage of $_ as the default argument.

    With that method, there's no need to do any special accounting of lines or to store any of your work as you go. If you do need to store the lines for some other purpose, you only need to push the output of lc onto an array where the print occurs.

    If you just want a line count, you can immediately follow the while block with,

    my $linecount = $. ;
    $. is a running linecount of the most recently accessed read handle. It will be volatile in an application with several read handles, hence the assignment to a user variable.

    After Compline,
    Zaxo

      i dont understand. all that does is removes all the characters and leaves behind all the punctuation. i want to remove all the punctuation and print out all the letters in one line with no spaces and stuff.

        Oops, I meant \s, whitespace, not \w, word characters. Corrected, sorry for the brain fart.

        After Compline,
        Zaxo

      my $linecount = $. ;
      is code for a linecount...
      how do i do a word count?

        $wc += split; within the while loop will do it. That's another case of default arguments.

        After Compline,
        Zaxo

      how do i remove the punctuation and whitespace characters too?

        Just use another character class than \s. Non-word, \W, might be what you want.

        If all else fails you can make a custom character class with square brackets. See perlre, the regex doc.

        After Compline,
        Zaxo

        never mind, worked it out...
        s/\s+//g; s/$\s//; s/([\041-\057]|[\72-\100]|[\133-\140]|[\173-\176])+//g; print lc;