Re^3: help merging perl code routines together for file processing

JaeDre619:

When you call a subroutine, you typically want to give it some data to work on. In your case, we give it the name of the file to process. When you call the subroutine, you can give it a list of values, so your subroutine needs a way to access them.

The subroutine gets an array @_ which contains the argument list. While you can use the arguments in the @_ array directly, it can cause you difficulties^[1]. Instead, I like to copy the values into local variables, which is what I'm doing in the line you asked about. On the left, I put the list of local variables I want, so the first argument goes into the first variable, the second goes into the next variable, and so on. If any of your local variables is an array, it will consume all remaining values in the argument list--so be aware of it.

$ cat test3.pl
#!/usr/bin/perl
use strict;
use warnings;

mysub(1);
mysub(2,3,4);
mysub(5,6,7,8,9,0);

sub mysub {
    my ($first, $second, @third, @fourth)=@_;
    print "First: $first, Second: $second, Third: @third, Fourth: @fou
+rth.\n";
}
$ perl test3.pl
Use of uninitialized value $second in concatenation (.) or string at t
+est3.pl line 11.
First: 1, Second: , Third: , Fourth: .
First: 2, Second: 3, Third: 4, Fourth: .
First: 5, Second: 6, Third: 7 8 9 0, Fourth: .
[download]

Notes:

[1] The difficulty is that @_ contains aliases to the calling values, so if you use the values in the argument list directly, you risk changing the values in the caller's point of view. That can be useful at times, but it can be a pernicious bug:

$ cat test4.pl
#!/usr/bin/perl
use strict;
use warnings;

sub noalias {
    my ($arg)=@_;
    $arg = uc($arg);
    print "Arg: $arg.\n";
}

sub alias {
    $_[0] = uc($_[0]);
    print "Arg: $_[0].\n";
}

my $t1 = "foobar";
my $t2 = "barbaz";

noalias($t1);
alias($t2);

print "T1: $t1, T2: $t2.\n";

$ perl test4.pl
Arg: FOOBAR.
Arg: BARBAZ.
T1: foobar, T2: BARBAZ.
[download]

[2] You'll probably want to read perlsub for all the details of how subroutines work. When you do, though, skip the section on prototypes. It turns out that (a) prototypes don't work the way you'd expect them to, (b) beginners should avoid using them as much as possible, and (c) they can be a source of problems. In fact, I don't think I've ever used them.

Sorry for the delay in replying, but I was watching the last few episodes of Azumanga Daioh, and just couldn't pry myself away. ;^)

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Comment on Re^3: help merging perl code routines together for file processing Select or Download Code

Replies are listed 'Best First'.
Re^4: help merging perl code routines together for file processing by JaeDre619 (Acolyte) on Feb 21, 2011 at 00:25 UTC
Hi, I was hoping you might help me figure this part out. I'm able to now print to a file. I think I had to open the file handle for the file I found and used that to process and then created a file handle for the new processed file. It works to an extent, but I am creating some duplicate records. I put a print statement on $data and to view on terminal output and the number of recs look ok, but not so on the outputfile. What am I doing wrong? Thanks again! My sample records: `col1,col2,col3,col4,col5 coor@tra.co.nz#hv,mac-l@lists.listmoms.net,8,2009-09-24 21:00:46,1 vw@tra.co.nz#i3,poalad888@test.com,16,2007-08-18 22:53:12,33 esmith@tra.co.nz#hv,gabmonsh@mymail.com,16,2007-08-18 23:41:23,33` [download] Updated code use strict; use Digest::MD5 qw(md5_hex); my $target_dir = "/backups/test/"; opendir my $dh, $target_dir or die "can't opendir $target_dir: $!"; while (defined(my $file = readdir($dh))) { next if $file =~ /^\.+$/; if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/ +) { process_file($file, $1, $2, $3); } } sub process_file { my ($filename, $foo_x, $name_x, $p_x) = @_; my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_ +x.out"; open my $in_fh, '<', $filename or die "cannot read $filename: $!"; open(my $out_fh, '>', $new_name) or die "cannot create $new_name: +$!"; my $data = ''; my $line1 = <$in_fh>; chomp $line1; my @heading = split /,/, $line1; my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D"); while (<$in_fh>) { chomp; my $digest = md5_hex($data); my (@values) = split /,/; my $extra = "__mykey__$sep1$digest$sep2"; $extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0 .. scalar(@values)); $data .= "$extra$eorec"; print $out_fh $data; print $data; } close $out_fh or die "Failed to close file: $!"; } [download]	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^4: help merging perl code routines together for file processing
by JaeDre619 (Acolyte) on Feb 21, 2011 at 00:25 UTC

Hi,

I was hoping you might help me figure this part out. I'm able to now print to a file. I think I had to open the file handle for the file I found and used that to process and then created a file handle for the new processed file. It works to an extent, but I am creating some duplicate records. I put a print statement on $data and to view on terminal output and the number of recs look ok, but not so on the outputfile. What am I doing wrong? Thanks again!

My sample records:

col1,col2,col3,col4,col5
coor@tra.co.nz#hv,mac-l@lists.listmoms.net,8,2009-09-24 21:00:46,1
vw@tra.co.nz#i3,poalad888@test.com,16,2007-08-18 22:53:12,33
esmith@tra.co.nz#hv,gabmonsh@mymail.com,16,2007-08-18 23:41:23,33
[download]

Updated code

use strict;
use Digest::MD5 qw(md5_hex);


my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
    next if $file =~ /^\.+$/;
    if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/
+) {
        process_file($file, $1, $2, $3);
    }
}

sub process_file {
    my ($filename, $foo_x, $name_x, $p_x) = @_;
    my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_
+x.out";

    open my $in_fh, '<', $filename or die "cannot read $filename: $!";
    open(my $out_fh, '>', $new_name) or die "cannot create $new_name: 
+$!";

    my $data  = '';
    my $line1 = <$in_fh>;
    chomp $line1;
    my @heading = split /,/, $line1;
    my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
    while (<$in_fh>) {
        chomp;
        my $digest   = md5_hex($data);
        my (@values) = split /,/;
        my $extra    = "__mykey__$sep1$digest$sep2";
        $extra .= "$heading[$_]$sep1$values[$_]$sep2"
          for (0 .. scalar(@values));
        $data .= "$extra$eorec"; 
        print $out_fh $data;
        print $data;
    }
    close $out_fh or die "Failed to close file: $!";
}
[download]

[reply]
[d/l]
[select]