fasoli has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I am still new to Perl as I've only written about 10 very simple scripts, all with the help of a lot of Googling and reading two precious lifesaver books on Perl. Unfortunately I found myself stuck again at a simple task, I would be grateful if you could provide some hints towards how I should solve it.

I have a big text file with lots of whitespace separated columns in it. I want to ignore the first 12 lines and the two first columns and start copying the data from the third column onwards, until the end of the file. I have "written" (well, I tried one million things I read online and I found something that worked, this doesn't really count as "writing" code) this bit of code (copied below) but I have a few problems:

1. I have started copying data from where I find a match of (/^0.00000/), as this is the 13th line and this is where I want to start. I don't like that I'm matching the zeros though, as I realised that it's not sensitive enough. After checking what happens if I matched for (/^0.00/)) and (/^0.0/)) or even (/^0/)) I found that it's not looking for 5 decimal zeros, it's matching 0.00000 even when I'm looking for 0. As the file contains angle degrees, I'd like to match for something that only appears once, just to be safe.

In line 12, which is one line above where I want to start data copying, there is a unique pattern (@TYPE xy). I'd prefer to match this line and then tell the script to start copying from the next (13) line onwards. I've tried doing that in various different ways but I can't get it right. Unfortunately I don't have an example of what I wrote as I've changed the script more than 200 times, making different attempts until I stopped at matching the zeros since it worked.

2. I have managed to get my printf formatting right ("%8.3f%10.3f") but I'm stuck at another problem. I want to start importing data from column 3 onwards until the end, but the file is too big for me to count the columns, plus the column number might vary between files. So I've thought that I could ask Perl to count the columns for me, since I've already split them by whitespace, only I didn't have any success doing that. Also, I can't understand how I'll tell printf to print the first column in %8.3f format and the rest of them/unknown number in %10.3f format.

This is my script so far:

#/bin/perl/ use strict; my @files; @files = `ls 2kc29-out.txt`; my $file = $_; $_= $file; my $start = 0; open my $input, '<', './2kc29-out.txt' or die $!; open my $output, '>', 'test_mon.txt' or die $!; while (<$input>) { s/^\s+//; next unless (($start) || (/^0.00000/)); $start = 1; my @columns = split /\s+/, $_; printf $output ("%8.3f%10.3f","$columns[2] $columns[3]"); #here I' +m testing how columns 3 and 4 print #print $output (@columns[2..10]); #here I'm trying to see how to w +rite "print from column 3 until the end - pretending that 10 is the e +nd just as a test #printf $output ("%8.3f%10.3f",(@columns[2..10])); #same as above, + but trying to see how to use printf, obviously this only prints colu +mns 3 and 4 and nothing else printf $output "\n"; }

Any hints/guidelines would be much appreciated, I already feel ridiculous for spending 6 days on this and only getting this "far" :(

Update:

I've now tried the following:

print $output $columns[2] .. $columns[$#columns];

...am I getting there? The format is so all over the place that I don't know if it's indeed printing the angle degrees, but does it make sense as in "from column 3 to the last column"?

Replies are listed 'Best First'.
Re: column counter and printf question
by Athanasius (Archbishop) on Oct 28, 2015 at 12:47 UTC

    Hello fasoli,

    I want to ignore the first 12 lines...

    If you know in advance that the number of lines to ignore is exactly 12, you can use the range operator in scalar context like this:

    next if 1 .. 12;

    (This works because it’s short for next if ($. == 1 .. $. == 12);) But:

    In line 12, which is one line above where I want to start data copying, there is a unique pattern (@TYPE xy). I'd prefer to match this line and then tell the script to start copying from the next (13) line onwards.

    In this case, change the range as follows:

    while (<$input>) { next if 1 .. /\@TYPE xy/; printf ...; }

    See perlop#Range-Operators.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thanks so much for that! I had also found this suggestion somewhere and tried it, but there was no context given so I found myself trying it the wrong way and it was then discarded as "not working".

      I have updated my question with a new attempt I made at counting the columns, do you think I'm getting a bit closer...? Also, should I put it in a separate post too instead of just an update to the original post?

      Thank you again :)

        toolic and scorpio17 have given you two ways to print out the columns. Here’s a variation on the theme:

        while (my $line = <$input>) { next if 1 .. $line =~ /\@TYPE xy/; my @columns = split /\s+/, $line; printf $output "%8.3f", $columns[2]; printf $output " %10.3f", $columns[$_] for 3 .. $#columns; print $output "\n"; }

        This makes use of the fact that for any array @array, the variable $#array contains the value of the highest array index. (Note that this is one less than the number of elements in the array, because array indices start at 0.)

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: column counter and printf question
by toolic (Bishop) on Oct 28, 2015 at 12:27 UTC

    That regex doesn't match what you think it does. The . is special. Tip #6 from the Basic debugging checklist: YAPE::Regex::Explain:

    ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- 0 '0' ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- 00000 '00000' ----------------------------------------------------------------------

    You could use map to format certain elements of your array:

    use warnings; use strict; my @cols = qw(3.456 8.903 1.223); @cols = map { sprintf '%8.1f', $_ } @cols; print "@cols\n"; __END__ 3.5 8.9 1.2
    Show some input and desired output.

      Hi toolic,

      Yes, I should have remembered that . is a special character :( I didn't remember as I was lucky enough to match what I wanted but that was accidental.

      Regarding your suggestion, I don't understand how using quote word would help...? I don't get how I would include all the elements of my array as they're thousands of angle degrees? This map function looks useful and I'll try it and post the results, if I can find how to use it with/without qw.

        qw(3.456 8.903 1.223) is there only to set an example of an array with three values, to then show you how to process the fields of that array. You don't need to use it in your case.
Re: column counter and printf question
by scorpio17 (Canon) on Oct 28, 2015 at 14:12 UTC
    use strict; # for actual use, read from an open file handle line this: #while (my $line = <$input> { # for testing, read from special DATA file handle # (data will come from end of file, below DATA tag) while( my $line = <DATA>) { chomp $line; # remove end-of-line character $line =~ s/^\s+//; # strip leading whitespace next unless $line; # skip blank lines # skip first 12 lines next unless $. > 12; # $. contains current line number my @columns = split(/\s+/, $line); # split columns on whitespace my $col1 = shift @columns; # column 1 (throw away) my $col2 = shift @columns; # column 2 (throw away) my $col3 = shift @columns; # column 3 (special - keep this one) my $result = sprintf("%8.3f", $col3); # special format for col 3 # loop over remaining columns, appending to result string for my $c (@columns) { my $data = sprintf("%10.3f", $c); $result .= " $data"; # note the space character (ch +ange if required) } print "$result\n"; # print the final result (note + end of line character)" } # my guess at the input data __DATA__ SKIP 1 SKIP 2 SKIP 3 SKIP 4 SKIP 5 SKIP 6 SKIP 7 SKIP 8 SKIP 9 SKIP 10 SKIP 11 SKIP 12 Line_1 0.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 Line_2 0.0 0.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 Line_3 0.0 0.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 1.0 Line_4 0.0 0.0 4.0 5.0 6.0 7.0 8.0 9.0 1.0 2.0 Line_5 0.0 0.0 5.0 6.0 7.0 8.0 9.0 1.0 2.0 3.0

    update: fixed typo in "strip whitespace" regex (added '^' character)

Re: column counter and printf question
by Tux (Canon) on Oct 28, 2015 at 17:41 UTC

    Your problem descriptions sounds almost as if you are dealing with a file with fixed-width columns. If that is true, use unpack instead of split in order not to miss empty columns that cause the rest of the data to move columns.

    Maybe you can show a few lines of the data, and we can tell you a better approach.


    Enjoy, Have FUN! H.Merijn
Re: column counter and printf question
by BillKSmith (Monsignor) on Oct 28, 2015 at 15:02 UTC
    Perl can count the columns.
    use strict; use warnings; while( my $line = <DATA>) { next unless $. > 12; # $. contains current line number next if $line =~ /^$/; # skip blank lines chomp $line; # remove end-of-line character $line =~ s/\s+//; # strip leading whitespace my @columns = split(/\s+/, $line); # split columns on whitespace my $format = "%8.3f" . "%10.3f"x(@columns-3) . "\n"; printf $format, @columns[2..$#columns]; } # my guess at the input data __DATA__ SKIP 1 SKIP 2 SKIP 3 SKIP 4 SKIP 5 SKIP 6 SKIP 7 SKIP 8 SKIP 9 SKIP 10 SKIP 11 SKIP 12 Line_1 0.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 Line_2 0.0 0.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 Line_3 0.0 0.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 1.0 Line_4 0.0 0.0 4.0 5.0 6.0 7.0 8.0 9.0 1.0 2.0 Line_5 0.0 0.0 5.0 6.0 7.0 8.0 9.0 1.0 2.0 3.0
    Bill
      $line =~ s/\s+//; # strip leading whitespace

      The comment doesn't match the code, it strips the first whitespace it finds. Or perhaps you're missing a ^.

        Oops... thank you. I've corrected that.

      Hi and thanks for the help,

      Can you explain something from this bit

      my $format = "%8.3f" . "%10.3f"x(@columns-3) . "\n";

      Why "@columns-3", what does the "-3" mean? Also, what does "x" mean there? Literally, how many times to repeat the format?

        A printf format is a perl string. In this case, it is constructed using perl's string operators. The dot ('.') is the concatenation operator. The 'x' is the repetition operator (Refer to Multiplicative Operators section of perlop). The specification '10.3f' is repeated @columns-3 times where @columns is the total number of columns. (The three comes from the fact that the first two columns are not printed and the third has it own format specification)
        Bill

        Second of your questions in this thread that's easily answered from the docs which came with your distribution.

        See perldoc perldoc et seq.