james28909 has asked for the wisdom of the Perl Monks concerning the following question:

I have this subroutine, its my first one but anyway, when i run this script there will be two files that i will be run through this to extract and check data. What i really want to know is if i can use $ARGV[0] for both files if i run them through the subroutine one at a time. here is the code in question:
use File::Path qw(make_path remove_tree); use Digest::MD5; use File::Slurp; my $infile = $ARGV[0]; infile ($infile); sub infile { open( my $infile, '<', $ARGV[0] ) or die "cannot open file: $!"; binmode($infile); my $rmdir = "extracted"; remove_tree $rmdir; make_path('extracted'); #or die "Failed to create Direcotry: $!"; my $fileLocation = ''; my $fileSize = ''; my $fileName = ''; my $file = ''; my $chunk = ''; my $exit = ''; #GET File Location, File Size, File Name and write to file seek( $infile, 0x10, 0 ); until ($exit) { read( $infile, $fileLocation, 0x08 ); read( $infile, $fileSize, 0x08 ); read( $infile, $fileName, 0x20 ); if ( $fileLocation =~ 'SCE' ) { last; } $fileLocation =~ s/(.)/sprintf("%02x",ord($1))/eg; $fileSize =~ s/(.)/sprintf("%02x",ord($1))/eg; $fileName =~ s/\0+$//; if ( $fileLocation =~ 'ffffffffffffffff' ) { last; } open( $file, '>', "extracted/$fileName" ) or die "Cannot open $fil +eName $!"; binmode($file); sysseek( $infile, hex($fileLocation), 0 ); sysread( $infile, $chunk, hex($fileSize) ); syswrite( $file, $chunk ); $fileLocation = ''; $fileSize = ''; } my $dirname = "extracted"; my @md5s = read_file "C:/md5"; my $md5s = join( '', @md5s ); my $filesize = ''; open( my $buf, '<', "extracted/sdk_version" ) or die "cannot open sdk_version: $!"; seek( $buf, 0x00, 0 ); read( $buf, my $sdk, 0x03 ); foreach my $file (<$dirname/*>) { next if -d $file; open( my $FILE, $file ); binmode($FILE); $filesize = -s $FILE; $file =~ s{.*/}{}; $md5 = Digest::MD5->new->addfile($FILE)->hexdigest; if ( $md5s =~ $md5 ) { print "$md5 Match! $sdk $file $filesize\n"; } else { print "WARNING !\n"; } } }
also do i declare my subroutines in the beginning of my script and then call them later on in the script like this?
sub new_sub { data parsing, file handling ect } new_sub ($file1); new_sub($file2);
Any input will be invaluable to me, and thanks for looking, and if there is anything i can explain better just let me know.

Replies are listed 'Best First'.
Re: sending data thru a sub routine
by choroba (Cardinal) on May 11, 2014 at 16:00 UTC
    The @ARGV array holds the parameters to the script. Subroutine parameters are retrieved from a different array: @_. See both in perlvar.

    The first line of a subroutine usually looks like this:

    my @parameters = @_;

    or

    my ($x, $y) = @_;

    or even

    my $x = shift;

    shift is special: if you don't give it an argument, it shifts the first element from @ARGV in the main body, or the first element form @_ in a subroutine.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: sending data thru a sub routine
by Laurent_R (Canon) on May 11, 2014 at 17:20 UTC
    Although it should probably work in this case, I would advise you against using the same name for different types of things:
    my $infile = $ARGV[0]; infile ($infile); sub infile { open( my $infile, '<', $ARGV[0] ) or die "cannot open file: $!"; # ...
    The name infile is used for three different types of things: a file name in the first line above, a function name in the second line, and a file handle at the last one. This is at best very confusing for yourself. You could rewrite this as follows:
    my $infile = $ARGV[0]; process_file ($infile); sub process_file { my $current_file = shift; # or: my $current_file = $_[0]; open my $FILEHANDLE, '<', $current_file or die "cannot open file +$current_file: $!"; # ...
    At least there is no danger of mixing up the various entities. Actually, although the $infile name is perfectly acceptable, it might be even better to have a name reflecting the content of the file, such as, for example $resources_infile or $employees_infile, whatever you have in the file, or even simply $resources or $employees. We can see from your code that you are going to open it as a file, but have no idea of the contents. Also naming the file that you cannot open in the message passed to die can be useful when you have to open several files and something goes wrong.
      yeah your right i should rename them because it can get very confusing.
Re: sending data thru a sub routine
by AnomalousMonk (Archbishop) on May 11, 2014 at 16:51 UTC
    ... do i declare my subroutines in the beginning of my script and then call them later on in the script like this?

    In this particular script, the order of subroutine definition and invocation doesn't matter. There are some situations in which you must define or declare a subroutine before you call it, but you're a long way away from having to worry about that kind of detail.

Re: sending data thru a sub routine
by AnomalousMonk (Archbishop) on May 11, 2014 at 17:19 UTC
    ... when i run this script there will be two files that i will be run through this to extract and check data. What i really want to know is if i can use $ARGV[0] for both files if i run them through the subroutine one at a time.

    I don't understand this. Do you mean that you will invoke the script twice, with a different file name given each time:
        system_prompt>perl your_script.pl file_1
        system_prompt>perl your_script.pl file_2
    passing a single file name string to the  infile() subroutine on each script invocation? Or do you want to invoke the script with two file names at once:
        system_prompt>perl your_script.pl file_1 file_2
    and process both files during one invocation of the script?

    In the first case, two separate invocations of the script, using  $ARGV[0] for the file name is fine. In the second case, a single invocation with two file names, you need to realize that the two strings representing the file names will end up in  $ARGV[0] and  $ARGV[1] respectively, and you must process these two elements of the  @ARGV array independently and re-organiize the logic of your script accordingly.

    Update: Finally realized that most Perl scripts are invoked not as
        script_name.pl parameter_1 param_2 ...
    but as
        perl script_name.pl parameter_1 param_2 ...
    and changed the command-line (pseudo-)code examples above accordingly.

      yes i will invoke this script or subroutine twice, once for each file. so....:
      system_prompt>my_script.pl $file1 system_prompt>my_script.pl $file2
      and i guess for each file i can do something like the following psuedocode to send it to the subroutine correct?
      my $file1 = 'extracted/file1; sub infile($file1); #will this be passed to ARGV[0]? #THEN further in the script... my $file2 = 'extracted/file2; sub infile(file2); #will this be sent to ARGV[0] as well after the fi +rst file?
        system_prompt>my_script.pl $file1 system_prompt>my_script.pl $file2

        If you're really doing this, with no intervening actions, and always having two files to process, you could instead do this:

        system_prompt>my_script.pl $file1 $file2

        And then, in my_script.pl:

        die "Usage: $0 file1 file2" unless @ARGV == 2; ... for (@ARGV) { process_file($_); } ... sub process_file { my ($filename) = @_; open my $input_fh, '<', $filename or die "Can't open '$filename': +$!"; ... }

        You have other issues in your code which you'll need to address. One that leapt out at me was this infinite loop:

        my $exit = ''; until ($exit) { ... code where $exit never becomes TRUE ... }

        You have two last statements but both are conditional on a pattern match. You should really have a bailout option, i.e. if you've done everything possible in the loop but are still looping, then die, warn and last or similar — and, instead of until ($exit) {...}, use while (1) {...} and get rid of the $exit variable altogether.

        sub infile($file1); #will this be passed to ARGV[0]? ... sub infile(file2); #will this be sent to ARGV[0] as well after the fi +rst file?

        Your (commented) questions about passing/sending to ARGV[0] [which should be $ARGV[0]] suggest you haven't really got a handle on the @ARGV array but, unfortunately, I don't know what you haven't understood. Take a look at "perlvar: Variables related to filehandles" and "perlop: I/O Operators". See what both of those sections say about @ARGV: that should either clarify the purpose and usage of @ARGV or, if not, provide you with the basis for more specific questions.

        -- Ken

        I think that you are somewhat confused. You can do either of two (or possibly more) things: 1. Launch your script only once, with the two files as arguments, and process each argument one after the other with the same subroutine; or 2. launch the script twice, each time with only one argument. Both approaches are valid, it is up to you to decide how you want to do it, but I would personally tend to favor the first approach (this enables to take into account things that happened while reading the first file when reading the second one, which would be much more difficult with the second solution). The first solution could more or less look as follows:
        perl process_files.pl file1.txt file2.txt
        and, inside the program:
        for $inputfile (@ARGV) { process_file ($inputfile); }
        The second approach would probably require a shell script under Un*x, or *.bat command script under Windows (or *.com command file under VMS, or whatever with other OS's) to loop over the two filenames. One of the advantages of the first approach is that it can be more portable across platforms.
      im using AS perl. so script.pl param1 param2 is just fine
Re: sending data thru a sub routine
by ww (Archbishop) on May 11, 2014 at 17:10 UTC

    Just a minor extension of choroba's observations (and mentioning, not just BTW, that the quoting here is for the windows box which was readily at hand):

    C:\>perl -E "sub doit {for my $passed(@_) { say 'passed is: '. $passed +;}}my @input=@ARGV; for my $input (@input) {say $input;} doit(@input) +;" "one" "two" "three" one two three passed is: one passed is: two passed is: three C:\>

    Alternately, you could use shift inside the loop in the sub. And, nota bene, that if your CLI arguements are enclosed in a single set of (appropriate) quotes, @ARGV will have them all as a single element, in which case, you need to (for one example) split @ARGV and push its arguments into whatever array you're going to use to pass to the sub.

    C:\>perl -E "sub doit {for my $passed(@_) { say 'passed is: '. $passed +;}} my @input = @ARGV; for my $input (@input) {say $input;} doit(@inp +ut);" "one two three" # NOTE QUOTING VARIANCE! one two three passed is: one two three C:\>


    Quis custodiet ipsos custodes. Juvenal, Satires

    -->
      ... all as a single element, in which case, you need to (for one example) split @ARGV and push its arguments ...

      This seems needlessly confusing advice to offer a novice Perler. Literally calling split on  @ARGV e.g.:
          my @array = split @ARGV;
      is likely to produce (unpleasantly) surprising results. Can you be more clear?

        Good point: clarification herewith:

        Some processing is required; it's definitely not a matter of simply splitting @ARGV, because split expects to work on a string, not an array.

        C:\>perl -E "my ($input) = @ARGV; my @input = split / /, $input; for $ +_(@input) {say $_;}" "trez zwei uno" trez zwei uno

        parenthesize the $input to stringify the content of @ARGV rather than count its (single here) element(s).

        Quibble: AnomalousMonk could improve on this 'clarification' and probably would have done it better the first time.


        Come, let us reason together: Spirit of the Monastery

        Quis custodiet ipsos custodes. Juvenal, Satires

      i will def keep this in mind, could come in handy :P
Re: sending data thru a sub routine
by GrandFather (Saint) on May 12, 2014 at 02:08 UTC

    A few general tips:

    1. Always use strictures (use strict; use warnings; - see The strictures, according to Seuss).
    2. Declare variables in the smallest sensible scope and don't initialise them with a bogus value.
    3. Avoid unless and until. They invert the sense of their expression and often cause confusion.
    4. Don't use a regular expression match where a string compare is intended. $fileLocation =~ 'SCE' is not the same as $fileLocation eq 'SCE'!
    5. If you are dealing with binary data use unpack and pack.
    Perl is the programming world's equivalent of English
      if you open this file in a hex editor, i am dealing with binary and plain text. $filename comes from plain text. $filelocation and $filesize comes from binary in the hex editor. 0x00, read 8 bytes = $filelocation, read 8 more bytes, $filesize, read 32 more bytes, $filename. repeat from current position.
      and thank you for the pointers :)
        "... binary and plain text ..."

        Hmm, not really. You are dealing with a binary file that happens to have some plain text fields. unpack makes the code easier and clearer - in this case it even makes it correct. Consider:

        use strict; use warnings; (my $binStr = <<BIN) =~ s/\n//g; \x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00 The end of the world is neigh \x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00 unless you use pack and unpack BIN open my $fIn, '<', \$binStr; binmode $fIn; print "Using unpack\n"; while (read($fIn, (my $rec), 48)) { my ($fileLocL, undef, $fileSizeL, undef, $fileName) = unpack('VVVVa32', $rec); printf "Loc: %d, Size: %d, Name: '%s'\n", $fileLocL, $fileSizeL, $ +fileName; } seek $fIn, 0, 0; print "Using bogus substitution code\n"; while (!eof $fIn) { my ($fileLoc, $fileSize, $fileName); read($fIn, $fileLoc, 0x08); read($fIn, $fileSize, 0x08); read($fIn, $fileName, 0x20); $fileLoc =~ s/(.)/sprintf("%02x",ord($1))/eg; $fileSize =~ s/(.)/sprintf("%02x",ord($1))/eg; $fileName =~ s/\0+$//; printf "Loc: %d, Size: %d, Name: '%s'\n", $fileLoc, $fileSize, $fi +leName; }

        Prints:

        Using unpack Loc: 1, Size: 2, Name: 'The end of the world is neigh ' Loc: 3, Size: 4, Name: 'unless you use pack and unpack ' Using bogus substitution code Loc: -1, Size: -1, Name: 'The end of the world is neigh ' Loc: -1, Size: -1, Name: 'unless you use pack and unpack '

        Note that I was using a build of Perl that doesn't have support for the quad word pack/unpack specification so I used the "VAX" long (32 bit) V specification and ignored the high words (that's the undefs in the variable list).

        Oh, and the trailing spaces on the two "file name" lines in the sample data are important. Don't lose them copying this test script!

        Perl is the programming world's equivalent of English