How to process several files with different line numbers

thanos1983 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

Once more I need your wisdom. I am processing files through arguments as input from the terminal. It is very important for me to process the files and store the data in are in correct order. I thought by using an array of arrays will solve my problem but it does not seem to be processed correctly.

Sample of code:

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;
use Fcntl qw(:flock); # import LOCK_* and SEEK_END constants

$| = 1;
my @value;
my @value_2;
my $i = 0;

sub first {

    foreach my $argnum (0 .. $#ARGV) {

    open (READ, "<" , $ARGV[$argnum])
        or die ("Could not open: ".$ARGV[$argnum]." - $!\n");
    
    flock(READ, LOCK_EX)
        or die "Could not lock '".$ARGV[$argnum]."' - $!\n";

    if (-z "".$ARGV[$argnum]."") {
        print READ "is empty!\n";
        # -z  File has zero size (is empty).
    }

    print "This is the \$ARGV[$argnum]: ".$ARGV[$argnum]."\n";

    while ( my @doc_read = <READ> ) {

        chomp @doc_read;

        foreach $_ (@doc_read) {
        my @result = split (':', $_);

        if (/^\s*$/) { # /^\s*$/ check for "blank" lines may contain s
+paces or tabs
            next;
        }

        push (@{ $value[$i++] }, $result[3]);
        }
        $i = 0 if eof;
    }

    close (READ)
        or die ("Could not close: ".$ARGV[$argnum]." - $!\n");
    }

    return @value;
}

sub second {
    
    foreach my $arg (@_) {

    open (READ, "<" , $arg)
        or die ("Could not open: ".$arg." - $!\n");

    flock(READ, LOCK_EX)
        or die "Could not lock '".$arg."' - $!\n";

    if (-z "".$arg."") {
        print READ "is empty!\n";
        # -z  File has zero size (is empty).
    }
    
    print "This is the \$arg: ".$arg."\n";

    while ( my @doc_read = <READ> ) {

        chomp @doc_read;

        foreach $_ (@doc_read) {
        my @result = split (':', $_);

        if (/^\s*$/) { # /^\s*$/ check for "blank" lines may contain s
+paces or tabs
            next;
        }

        push (@{ $value_2[$i++] }, $result[3]);
        }
        $i = 0 if eof;
    }

    close (READ)
        or die ("Could not close: ".$arg." - $!\n");
    }
    
    return @value_2;
}

my @result = &first();
my @result_2 = &second(@ARGV);

print "\nValues \@result\n";
print "@$_\n" for @result;
print "\nValues \@result_2\n";
print "@$_\n" for @result_2;
[download]

The problem appears that if files have different line numbers. I am trying to separate the arrays based on the file order.

Sample of sample.txt:

Line_1:Line_1_1:Line_1_2:Line_1_3:Line_1_4
Line_2:Line_2_1:Line_2_2:Line_2_3:Line_2_4
[download]

Sample of sample_2.txt:

Line_3:Line_3_1:Line_3_2:Line_3_3:Line_3_4
[download]

Sample of sample_3.txt:

Line_4:Line_4_1:Line_4_2:Line_4_3:Line_4_4
Line_5:Line_5_1:Line_5_2:Line_6_3:Line_5_4
Line_6:Line_6_1:Line_6_2:Line_6_3:Line_6_4
Line_7:Line_7_1:Line_7_2:Line_7_3:Line_7_4
[download]

Sample of output after processing the files:

Values @result
Line_1_3 Line_3_3 Line_4_3
Line_2_3 Line_5_3
Line_6_3
Line_7_3

Values @result_2
Line_1_3 Line_3_3 Line_4_3
Line_2_3 Line_5_3
Line_6_3
Line_7_3
[download]

It is very important for me to know the order stored in the array and the values to be placed on the correct position. I am going to apply more than 6 files on my experiment that will evolve mathematical calculations. So the ordered stored in the array it is very important for me.

Thank you all in advance for your time and effort to assist me.

I have created the same script with two different ways just in case that someone is interested in speed and process time as I was. I measured the difference and the second script is much much faster than the first one.

Update and solution:

I just needed to push all values in one array, and then push this array to a separate array. It sounds a bit confusing they way that I described it.

So here is the code that I used:

push (@value , $result[3]);

push (@array, [@value]);

# Important to empty the array for the next file!
@array = ();
[download]

All together as a solution:

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;
use Benchmark qw(:all); # For timming reasons
use Fcntl qw(:flock); # import LOCK_* and SEEK_END constants

$| = 1;

my @value_2;
my @array_2;

sub second {
    
    foreach my $arg (@_) {

    open (READ, "<" , $arg)
        or die ("Could not open: ".$arg." - $!\n");

    flock(READ, LOCK_EX)
        or die "Could not lock '".$arg."' - $!\n";

    if (-z "".$arg."") {
        print READ "is empty!\n";
        # -z  File has zero size (is empty).
    }

    while ( my @doc_read = <READ> ) {

        chomp @doc_read;

        foreach $_ (@doc_read) {
        my @result = split (':', $_);

        if (/^\s*$/) { # /^\s*$/ check for "blank" lines may contain s
+paces or tabs
            next;
        }

        push (@value_2 , $result[3]);
        }
        
        push (@array_2, [@value_2]);

    }
    @value_2 = ();

    close (READ)
        or die ("Could not close: ".$arg." - $!\n");
    }
    
    return @array_2;
}

my @result_2 = &second(@ARGV);

print Dumper(@result_2);
[download]

The solution includes only the second sub because the process time is much much smaller.

Seeking for Perl wisdom...on the process...not there...yet!

Comment on How to process several files with different line numbers Select or Download Code

Replies are listed 'Best First'.

Re: How to process several files with different line numbers
by Athanasius (Archbishop) on Jul 05, 2014 at 14:56 UTC

Hello thanos1983,

I’m glad to see from your update that the script is now working as you want. However, I still have a couple of observations:

You open the filehandle READ for reading ("<"), but then try to write to it:
```
if (-z "".$arg."") {
    print READ "is empty!\n";
[download]
```
If the test ever succeeds (because the file is empty), the code will break.
```
while ( my @doc_read = <READ> ) {
[download]
```
Assigning the output of the diamond operator to an array puts the operation into list context, which means the entire file is read on the first iteration of the loop. So there is no point in having a while loop here.
Calling a subroutine by prefixing an ampersand ( &second(@ARGV); ) is usually a bad idea, unless you really need to disable prototyping, in which case you have to omit the trailing parentheses (see What's the difference between calling a function as &foo and foo()?). Prefer an ordinary subroutine call: second(@ARGV);

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^2: How to process several files with different line numbers

by thanos1983 (Parson) on Jul 05, 2014 at 15:43 UTC

Hello Athanasius

It seems that we have the same name.

1. I am applying this condition to check if the file is empty otherwise to break. I am not trying to the file after, unless I do not understand what you mean with write.

2. You are absolutely right about that I did not even think about it since I am putting the whole file into an array what is the point of the While () condition. Thanks

3. I was not aware for that thanks for telling me. I am still on the learning curve. :D

Seeking for Perl wisdom...on the process...not there...yet!

[reply]

Re^3: How to process several files with different line numbers

by Athanasius (Archbishop) on Jul 05, 2014 at 16:15 UTC

...unless I do not understand what you mean with write.

The line in question is:

print READ "is empty!\n";
[download]

This syntax has the form print FILEHANDLE LIST, as documented in print, and it says: print the string “is empty!”, followed by a newline, to the filehandle READ. But, as I pointed out, the READ filehandle has been opened for reading (input), not writing (output). The only reason you are not seeing this die is that the files you are testing aren’t empty, so the condition never evaluates to true and the print statement is never actually called.

You probably meant to write just print "is empty!\n"; which is equivalent to print STDOUT "is empty!\n";, but you should also consider using warn "is empty!"; which is equivalent to print STDERR "is empty!\n"; (see warn).

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^4: How to process several files with different line numbers

by thanos1983 (Parson) on Jul 05, 2014 at 23:05 UTC

Re^3: How to process several files with different line numbers

by soonix (Chancellor) on Jul 05, 2014 at 20:59 UTC

print READ "is empty!\n";

as Athanasius points out: this tries to write "is empty!" to the file READ.

Probably what you intended was to print the file name with that message instead of (mis)using the file handle - should look like

print "$arg is empty!\n";

[reply]
[d/l]
[select]

Re^4: How to process several files with different line numbers

by thanos1983 (Parson) on Jul 05, 2014 at 23:08 UTC