http://qs1969.pair.com?node_id=45295

ashok has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am counting no. of lines in a file using cat like this on a unix system.
$line_count = `cat $filename | wc -l`
This works fine. But one of the files having a blank in int's path like this.
Module: /org/trans/program files/src/apptask1.c.
Then the above my code to count lines fails. Also some times when the path is big it breaks in to two lines and I unable to pickup the file name properly. I am reading file names from a file. So how to find the file name if in two lines in a file? For ex:
Module: /org/trans/program files/dir1/dir2/dir3/dir4/ dir5/apptask2.c
I able to read first line /org/trans/program files/dir1/dir2/dir3/dir4/ only. I do not know when the file name breaks into two lines. Each file name is preceded by "Module: ". So my regular expression is to match a line beginning with 'Module: ' and pickup the remaining text as a file name. So can any one help me? Thanks Ashok

Replies are listed 'Best First'.
Re: path is broken
by btrott (Parson) on Dec 06, 2000 at 23:55 UTC
    Why are you using cat? Wouldn't
    $line_count = `wc -l $filename`
    work just as well?

    Of course one could argue that you could just do the entire thing in Perl rather than using system calls; I might argue that, for instance. Here's a version of wc in Perl; you could extract the bits you need.

    Or you could do something like this:

    sub lines_in_file { my $f = shift; open FH, $f or die "Can't open $f: $!"; 1 while <FH>; $.; }
    This would take care of your "space-in-filename" problem as well, since Perl won't care; it's only a problem when you pass it off to the shell like that.

    You can use this like:

    my $line_count = lines_in_file($filename);
      'cause
      cat file | wc -l and: wc -l file
      produce different output:
      9
      vs.
      file: 9 (or some approximation)

      a

Re: path is broken
by kilinrax (Deacon) on Dec 06, 2000 at 23:53 UTC
    Don't use cat to read the file!

    Seriously, you can do all of this very easily from within Perl itself. Here's an example of how:
    #!/usr/bin/perl -w use strict; my $filename = "foo.txt"; open FILE, $filename or die "could not open $filename"; my @data = <FILE>; close FILE; print scalar @data;
      that's well and good... but what if the file is large? why not do basically the same thing except...
      use strict; my $filename = "foo.txt"; my $count; open FILE, $filename or die "could not open $filename"; while(<FILE>) { $count++; } # while close FILE; print $count;
      I imagine it should run about the same speed wise, but shouldn't require quite so much memory...
        Perl keeps track of the line number of the file you are in so you don't have to:
        perl -e "while(<>){};print $." somefile.txt
        Gotta love that $. variable :)

        Cheers,
        Ovid

        Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Hi, I appreciate your help. But still I have one more problem. As I did explain, I am reading file names from a file. This file contains some history of each file in the entire source code. Like:
      Module: /org/trans/program files/dir1 /dir2/file1.cpp some flags some history
      So I pickup file name using reglar expression. My problem is when it breaks into two lines, like
      Module: /org/trans/program files/dir1/subdir1/subdir2/file1.cpp some flags some history
      In this case how to write my regular expression? I do not know in which line filename in a file breaks into two lines. The file I read is a big file and I can not keep into memory. Thanks again to you all. I am learning a lot. Ashok
        So if you're going through your file looking for ^Module and getting the what may be the filename;
        You may want to do a test if the file exists: (-e $filename) and then if it does not, grab the next line and tack it on.
        Though make sure you tack through the next Module line.
        Does it break on line length or on spaces? Consider that as well.
        I assume that what you're looking for here is the string '/org/trans/program files/dir1/subdir1/subdir2/file1.cpp'.
        Crafting a regex to extract the file name without having a format you can rely on is going to be tricky.
        However, if you are always going to be looking for a '.cpp' file, then the following will work:
        #!/usr/bin/perl -w use strict; my $data = join '', <DATA>; my ($filename) = ($data =~ m|^Module: (.*\.cpp)$|ms); $filename =~ s|\n||; print $filename; __DATA__ Module: /org/trans/program files/dir1/subdir1/subdir2/file1.cpp some flags some history
Re: path is broken
by mirod (Canon) on Dec 07, 2000 at 00:07 UTC

    To pass a file name with spaces to the shell you have to backslash the file names:

    $filename=~ s/(\s)/\\$1/g; $line_count = `wc -l $filename`;