sending data thru a sub routine

james28909 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: sending data thru a sub routine by choroba (Cardinal) on May 11, 2014 at 16:00 UTC
The `@ARGV` array holds the parameters to the script. Subroutine parameters are retrieved from a different array: `@_`. See both in perlvar. The first line of a subroutine usually looks like this: `my @parameters = @_;` [download] or `my ($x, $y) = @_;` [download] or even `my $x = shift;` [download] shift is special: if you don't give it an argument, it shifts the first element from `@ARGV` in the main body, or the first element form `@_` in a subroutine. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re: sending data thru a sub routine by Laurent_R (Canon) on May 11, 2014 at 17:20 UTC
Although it should probably work in this case, I would advise you against using the same name for different types of things: `my $infile = $ARGV[0]; infile ($infile); sub infile { open( my $infile, '<', $ARGV[0] ) or die "cannot open file: $!"; # ...` [download] The name `infile` is used for three different types of things: a file name in the first line above, a function name in the second line, and a file handle at the last one. This is at best very confusing for yourself. You could rewrite this as follows: `my $infile = $ARGV[0]; process_file ($infile); sub process_file { my $current_file = shift; # or: my $current_file = $_[0]; open my $FILEHANDLE, '<', $current_file or die "cannot open file +$current_file: $!"; # ...` [download] At least there is no danger of mixing up the various entities. Actually, although the $infile name is perfectly acceptable, it might be even better to have a name reflecting the content of the file, such as, for example $resources_infile or $employees_infile, whatever you have in the file, or even simply $resources or $employees. We can see from your code that you are going to open it as a file, but have no idea of the contents. Also naming the file that you cannot open in the message passed to `die` can be useful when you have to open several files and something goes wrong.	[reply] [d/l] [select]
Re^2: sending data thru a sub routine by james28909 (Deacon) on May 11, 2014 at 19:21 UTC
yeah your right i should rename them because it can get very confusing.	[reply]
Re: sending data thru a sub routine by AnomalousMonk (Archbishop) on May 11, 2014 at 16:51 UTC
... do i declare my subroutines in the beginning of my script and then call them later on in the script like this? In this particular script, the order of subroutine definition and invocation doesn't matter. There are some situations in which you must define or declare a subroutine before you call it, but you're a long way away from having to worry about that kind of detail.	[reply]
Re: sending data thru a sub routine by AnomalousMonk (Archbishop) on May 11, 2014 at 17:19 UTC
... when i run this script there will be two files that i will be run through this to extract and check data. What i really want to know is if i can use $ARGV[0] for both files if i run them through the subroutine one at a time. I don't understand this. Do you mean that you will invoke the script twice, with a different file name given each time: `system_prompt>perl your_script.pl file_1` `system_prompt>perl your_script.pl file_2` passing a single file name string to the `infile()` subroutine on each script invocation? Or do you want to invoke the script with two file names at once: `system_prompt>perl your_script.pl file_1 file_2` and process both files during one invocation of the script? In the first case, two separate invocations of the script, using `$ARGV[0]` for the file name is fine. In the second case, a single invocation with two file names, you need to realize that the two strings representing the file names will end up in `$ARGV[0]` and `$ARGV[1]` respectively, and you must process these two elements of the `@ARGV` array independently and re-organiize the logic of your script accordingly. Update: Finally realized that most Perl scripts are invoked not as `script_name.pl parameter_1 param_2 ...` but as `perl script_name.pl parameter_1 param_2 ...` and changed the command-line (pseudo-)code examples above accordingly.	[reply] [d/l] [select]
Re^2: sending data thru a sub routine by james28909 (Deacon) on May 11, 2014 at 19:15 UTC
yes i will invoke this script or subroutine twice, once for each file. so....: `system_prompt>my_script.pl $file1 system_prompt>my_script.pl $file2` [download] and i guess for each file i can do something like the following psuedocode to send it to the subroutine correct? `my $file1 = 'extracted/file1; sub infile($file1); #will this be passed to ARGV[0]? #THEN further in the script... my $file2 = 'extracted/file2; sub infile(file2); #will this be sent to ARGV[0] as well after the fi +rst file?` [download]	[reply] [d/l] [select]
Re^3: sending data thru a sub routine by kcott (Archbishop) on May 11, 2014 at 21:16 UTC
`system_prompt>my_script.pl $file1 system_prompt>my_script.pl $file2` [download] If you're really doing this, with no intervening actions, and always having two files to process, you could instead do this: `system_prompt>my_script.pl $file1 $file2` [download] And then, in `my_script.pl`: `die "Usage: $0 file1 file2" unless @ARGV == 2; ... for (@ARGV) { process_file($_); } ... sub process_file { my ($filename) = @_; open my $input_fh, '<', $filename or die "Can't open '$filename': +$!"; ... }` [download] You have other issues in your code which you'll need to address. One that leapt out at me was this infinite loop: `my $exit = ''; until ($exit) { ... code where $exit never becomes TRUE ... }` [download] You have two `last` statements but both are conditional on a pattern match. You should really have a bailout option, i.e. if you've done everything possible in the loop but are still looping, then `die`, `warn` and `last` or similar — and, instead of `until ($exit) {...}`, use `while (1) {...}` and get rid of the `$exit` variable altogether. `sub infile($file1); #will this be passed to ARGV[0]? ... sub infile(file2); #will this be sent to ARGV[0] as well after the fi +rst file?` [download] Your (commented) questions about passing/sending to `ARGV[0]` [which should be `$ARGV[0]`] suggest you haven't really got a handle on the `@ARGV` array but, unfortunately, I don't know what you haven't understood. Take a look at "perlvar: Variables related to filehandles" and "perlop: I/O Operators". See what both of those sections say about `@ARGV`: that should either clarify the purpose and usage of `@ARGV` or, if not, provide you with the basis for more specific questions. -- Ken	[reply] [d/l] [select]
Re^4: sending data thru a sub routine by james28909 (Deacon) on May 12, 2014 at 01:54 UTC
Re^3: sending data thru a sub routine by Laurent_R (Canon) on May 11, 2014 at 21:02 UTC
I think that you are somewhat confused. You can do either of two (or possibly more) things: 1. Launch your script only once, with the two files as arguments, and process each argument one after the other with the same subroutine; or 2. launch the script twice, each time with only one argument. Both approaches are valid, it is up to you to decide how you want to do it, but I would personally tend to favor the first approach (this enables to take into account things that happened while reading the first file when reading the second one, which would be much more difficult with the second solution). The first solution could more or less look as follows: `perl process_files.pl file1.txt file2.txt` [download] and, inside the program: `for $inputfile (@ARGV) { process_file ($inputfile); }` [download] The second approach would probably require a shell script under Unx, or .bat command script under Windows (or *.com command file under VMS, or whatever with other OS's) to loop over the two filenames. One of the advantages of the first approach is that it can be more portable across platforms.	[reply] [d/l] [select]
Re^4: sending data thru a sub routine by james28909 (Deacon) on May 12, 2014 at 01:44 UTC
Re^5: sending data thru a sub routine by Laurent_R (Canon) on May 12, 2014 at 06:15 UTC
Re^2: sending data thru a sub routine by james28909 (Deacon) on May 14, 2014 at 02:01 UTC
im using AS perl. so script.pl param1 param2 is just fine	[reply]
Re: sending data thru a sub routine by ww (Archbishop) on May 11, 2014 at 17:10 UTC
Just a minor extension of choroba's observations (and mentioning, not just BTW, that the quoting here is for the windows box which was readily at hand): `C:\>perl -E "sub doit {for my $passed(@_) { say 'passed is: '. $passed +;}}my @input=@ARGV; for my $input (@input) {say $input;} doit(@input) +;" "one" "two" "three" one two three passed is: one passed is: two passed is: three C:\>` [download] Alternately, you could use `shift` inside the loop in the sub. And, nota bene, that if your CLI arguements are enclosed in a single set of (appropriate) quotes, @ARGV will have them all as a single element, in which case, you need to (for one example) `split` @ARGV and `push` its arguments into whatever array you're going to use to pass to the sub. `C:\>perl -E "sub doit {for my $passed(@_) { say 'passed is: '. $passed +;}} my @input = @ARGV; for my $input (@input) {say $input;} doit(@inp +ut);" "one two three" # NOTE QUOTING VARIANCE! one two three passed is: one two three C:\>` [download] Quis custodiet ipsos custodes. Juvenal, Satires -->	[reply] [d/l] [select]
Re^2: sending data thru a sub routine by AnomalousMonk (Archbishop) on May 11, 2014 at 17:28 UTC
... all as a single element, in which case, you need to (for one example) `split` @ARGV and `push` its arguments ... This seems needlessly confusing advice to offer a novice Perler. Literally calling split on `@ARGV` e.g.: `my @array = split @ARGV;` is likely to produce (unpleasantly) surprising results. Can you be more clear?	[reply] [d/l] [select]
Re^3: sending data thru a sub routine by ww (Archbishop) on May 12, 2014 at 00:16 UTC
Good point: clarification herewith: Some processing is required; it's definitely not a matter of simply splitting @ARGV, because split expects to work on a string, not an array. `C:\>perl -E "my ($input) = @ARGV; my @input = split / /, $input; for $ +_(@input) {say $_;}" "trez zwei uno" trez zwei uno` [download] parenthesize the `$input` to stringify the content of `@ARGV` rather than count its (single here) element(s). Quibble: AnomalousMonk could improve on this 'clarification' and probably would have done it better the first time. Come, let us reason together: Spirit of the Monastery Quis custodiet ipsos custodes. Juvenal, Satires	[reply] [d/l] [select]
Re^2: sending data thru a sub routine by james28909 (Deacon) on May 12, 2014 at 01:47 UTC
i will def keep this in mind, could come in handy :P	[reply]
Re: sending data thru a sub routine by GrandFather (Saint) on May 12, 2014 at 02:08 UTC
A few general tips: Always use strictures (use strict; use warnings; - see The strictures, according to Seuss). Declare variables in the smallest sensible scope and don't initialise them with a bogus value. Avoid unless and until. They invert the sense of their expression and often cause confusion. Don't use a regular expression match where a string compare is intended. `$fileLocation =~ 'SCE'` is not the same as `$fileLocation eq 'SCE'`! If you are dealing with binary data use unpack and pack. Perl is the programming world's equivalent of English	[reply] [d/l] [select]
Re^2: sending data thru a sub routine by james28909 (Deacon) on May 12, 2014 at 02:32 UTC
if you open this file in a hex editor, i am dealing with binary and plain text. $filename comes from plain text. $filelocation and $filesize comes from binary in the hex editor. 0x00, read 8 bytes = $filelocation, read 8 more bytes, $filesize, read 32 more bytes, $filename. repeat from current position. and thank you for the pointers :)	[reply]
Re^3: sending data thru a sub routine by GrandFather (Saint) on May 12, 2014 at 03:54 UTC
"... binary and plain text ..." Hmm, not really. You are dealing with a binary file that happens to have some plain text fields. unpack makes the code easier and clearer - in this case it even makes it correct. Consider: use strict; use warnings; (my $binStr = <<BIN) =~ s/\n//g; \x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00 The end of the world is neigh \x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00 unless you use pack and unpack BIN open my $fIn, '<', \$binStr; binmode $fIn; print "Using unpack\n"; while (read($fIn, (my $rec), 48)) { my ($fileLocL, undef, $fileSizeL, undef, $fileName) = unpack('VVVVa32', $rec); printf "Loc: %d, Size: %d, Name: '%s'\n", $fileLocL, $fileSizeL, $ +fileName; } seek $fIn, 0, 0; print "Using bogus substitution code\n"; while (!eof $fIn) { my ($fileLoc, $fileSize, $fileName); read($fIn, $fileLoc, 0x08); read($fIn, $fileSize, 0x08); read($fIn, $fileName, 0x20); $fileLoc =~ s/(.)/sprintf("%02x",ord($1))/eg; $fileSize =~ s/(.)/sprintf("%02x",ord($1))/eg; $fileName =~ s/\0+$//; printf "Loc: %d, Size: %d, Name: '%s'\n", $fileLoc, $fileSize, $fi +leName; } [download] Prints: `Using unpack Loc: 1, Size: 2, Name: 'The end of the world is neigh ' Loc: 3, Size: 4, Name: 'unless you use pack and unpack ' Using bogus substitution code Loc: -1, Size: -1, Name: 'The end of the world is neigh ' Loc: -1, Size: -1, Name: 'unless you use pack and unpack '` [download] Note that I was using a build of Perl that doesn't have support for the quad word pack/unpack specification so I used the "VAX" long (32 bit) V specification and ignored the high words (that's the undefs in the variable list). Oh, and the trailing spaces on the two "file name" lines in the sample data are important. Don't lose them copying this test script! Perl is the programming world's equivalent of English	[reply] [d/l] [select]
Re^4: sending data thru a sub routine by AnomalousMonk (Archbishop) on May 12, 2014 at 06:13 UTC
Re^4: sending data thru a sub routine by james28909 (Deacon) on May 15, 2014 at 01:25 UTC