JaeDre619 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I need some guidance in putting these (2) processes/code to work together. Most of my experience with perl is doing basic file processing, just a few lines of code.

Anyways, I was testing these (2) pieces of code and I am stuck figuring out how to get them to work together.

I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs and producing a new output file after reading and processing the input file. I'm not sure if I should go with a foreach, while..anyways the code is below.

Also, any best practices would be great too as I'm learning this language. I was told I was writing this in an old style perl programming. I'm sure I am. Any insight and getting these pieces to work smoothly would be appreciated. Thanks for your help.

Here's the process flow I am looking for: -read a directory -look for a particular file -use the file name to strip out some key information to create a newly processed file -process the input file -create the newly processed file for each input file read (if i read in 10, I create 10 new files)

Part 1:

my $target_dir = "/backups/test/"; opendir my $dh, $target_dir or die "can't opendir $target_dir: $!"; while (defined(my $file = readdir($dh))) { next if ($file =~ /^\.+$/); #Get filename attributes if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/ +) { print "$1\n"; print "$2\n"; print "$3\n"; } print "$file\n"; }

Part 2:

use strict; use Digest::MD5 qw(md5_hex); #Create new file open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die + "cannot create file"; my $data = ''; my $line1 = <>; chomp $line1; my @heading = split /,/, $line1; my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D"); while (<>) { my $digest = md5_hex($data); chomp; my (@values) = split /,/; my $extra = "__mykey__$sep1$digest$sep2" ; $extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(@valu +es)); $data .= "$extra$eorec"; print NEWFILE "$data"; } #print $data; close (NEWFILE);

Replies are listed 'Best First'.
Re: help merging perl code routines together for file processing
by roboticus (Chancellor) on Feb 19, 2011 at 22:48 UTC

    JaeDre619:

    Well both scripts are simple enough to include the second one inside the while loop of the first one, something like this:

    Note: I didn't change anything except moved a couple of blocks of code (marked) and changed the indentation (should be clear enough). You'll have to make any fixes.

    Rather than that, though, I think I'd just make the second script a subroutine, and call it from the first script. Again, something like the following. (Again, I'm not changing anything, just arranging stuff.):

    # from second script use strict; use Digest::MD5 qw(md5_hex); # from first script my $target_dir = "/backups/test/"; opendir my $dh, $target_dir or die "can't opendir $target_dir: $!"; while (defined(my $file = readdir($dh))) { next if ($file =~ /^\.+$/); #Get filename attributes if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/ +) { print "$1\n"; print "$2\n"; print "$3\n"; } print "$file\n"; # New line added: call the routine to process file process_file($file); } # New line: tell perl that we're making a subroutine sub process_file { # Need to get arguments here: my ($filename) = @_; # from second script #Create new file open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file"; my $data = ''; my $line1 = <>; chomp $line1; my @heading = split /,/, $line1; my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D"); while (<>) { my $digest = md5_hex($data); chomp; my (@values) = split /,/; my $extra = "__mykey__$sep1$digest$sep2" ; $extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(@ +values)); $data .= "$extra$eorec"; print NEWFILE "$data"; } #print $data; close (NEWFILE); }

    Oh, yes, you asked for some best practice suggestions, too. The only ones that jump out at me are:

    • Use the three argument form of open, like this:
      open (NEWFILE, ">", "/backups/processed/foo$1.name.$2-foo_p$3.out" +)
      it has the advantage of being safer. In some cases (such as the one here the filename can be executed by the operating system. If the string is provided by the user, they could cause you some serious problems. With the three argument form, you're safe from that: perl treats the string as a filename and doesn't try to execute it.
    • If you use the lower precedence or operator, you can omit the parenthesis on your open statement, like this:
      open NEWFILE, ">", "/backups/processed/foo$1.name.$2-foo_p$3.out" or die "cannot create file";

    Update: Added readmore tags and best practice suggestions.

    Is this the kind of suggestion you were seeking?

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Yes, roboticus. Thanks for your suggestions. I will play around with this and let you know how it goes. What happens is that i'll test something, a process I'd like to do and then write another script to try something else. Then, i'll get to a point where I should combine them, but confuse myself on how to do so. I haven't really tried writing subroutines. For myself, good examples help out a lot. Thanks.

      Hi, thanks again for your help. I been doing some testing with this and I'm stuck. The match succeeds and it gets into the subroutine as I put a print statement. But there I'm trying to figure out your use of the following line of code:

      my ($filename) = @_;

      Can you elaborate more on that part and what you meant by "Need to get arguments here?"

      Here's an update with some additional checks. Thank you.

      use strict; use Digest::MD5 qw(md5_hex); # from first script my $target_dir = "/backups/test/"; my $regex_flag = 0; opendir my $dh, $target_dir or die "can't opendir $target_dir: $!"; while (defined(my $file = readdir($dh))) { next if ($file =~ /^\.+$/); #Get filename attributes if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/ +) { print "$1\n"; print "$2\n"; print "$3\n"; $regex_flag=1; #Set to true } if ($regex_flag==0) { print "No matching files found \n"; exit; } print "$file\n"; print "We have a match.\n"; # New line added: call the routine to process file process_file($file); } # New line: tell perl that we're making a subroutine sub process_file { # Need to get arguments here: print "Starting process file..\n"; my ($filename) = @_; # from second script #Create new file open NEWFILE, ">", "/backups/processed/foo$1.name.$2-foo_p$3.out" or die "cannot create file"; my $data = ''; my $line1 = <>; chomp $line1; my @heading = split /,/, $line1; my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D"); while (<>) { my $digest = md5_hex($data); chomp; my (@values) = split /,/; my $extra = "__pkey__$sep1$digest$sep2" ; $extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(@ +values)); $data .= "$extra$eorec"; print NEWFILE "$data"; } #print $data; close (NEWFILE); }

        JaeDre619:

        When you call a subroutine, you typically want to give it some data to work on. In your case, we give it the name of the file to process. When you call the subroutine, you can give it a list of values, so your subroutine needs a way to access them.

        The subroutine gets an array @_ which contains the argument list. While you can use the arguments in the @_ array directly, it can cause you difficulties[1]. Instead, I like to copy the values into local variables, which is what I'm doing in the line you asked about. On the left, I put the list of local variables I want, so the first argument goes into the first variable, the second goes into the next variable, and so on. If any of your local variables is an array, it will consume all remaining values in the argument list--so be aware of it.

        $ cat test3.pl #!/usr/bin/perl use strict; use warnings; mysub(1); mysub(2,3,4); mysub(5,6,7,8,9,0); sub mysub { my ($first, $second, @third, @fourth)=@_; print "First: $first, Second: $second, Third: @third, Fourth: @fou +rth.\n"; } $ perl test3.pl Use of uninitialized value $second in concatenation (.) or string at t +est3.pl line 11. First: 1, Second: , Third: , Fourth: . First: 2, Second: 3, Third: 4, Fourth: . First: 5, Second: 6, Third: 7 8 9 0, Fourth: .

        Notes:

        [1] The difficulty is that @_ contains aliases to the calling values, so if you use the values in the argument list directly, you risk changing the values in the caller's point of view. That can be useful at times, but it can be a pernicious bug:

        $ cat test4.pl #!/usr/bin/perl use strict; use warnings; sub noalias { my ($arg)=@_; $arg = uc($arg); print "Arg: $arg.\n"; } sub alias { $_[0] = uc($_[0]); print "Arg: $_[0].\n"; } my $t1 = "foobar"; my $t2 = "barbaz"; noalias($t1); alias($t2); print "T1: $t1, T2: $t2.\n"; $ perl test4.pl Arg: FOOBAR. Arg: BARBAZ. T1: foobar, T2: BARBAZ.

        [2] You'll probably want to read perlsub for all the details of how subroutines work. When you do, though, skip the section on prototypes. It turns out that (a) prototypes don't work the way you'd expect them to, (b) beginners should avoid using them as much as possible, and (c) they can be a source of problems. In fact, I don't think I've ever used them.

        Sorry for the delay in replying, but I was watching the last few episodes of Azumanga Daioh, and just couldn't pry myself away. ;^)

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.