Misunderstood array behavior

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Misunderstood array behavior by wfsp (Abbot) on Sep 20, 2008 at 07:05 UTC
If you add `use strict; use warnings;` near the top of your code perl complains that `Global symbol "$sample2" requires explicit package name at...` [download] This is because you are doing the compare outside the inner loop ( where `$sample2` is out of scope). Move the `if` block inside the inner loop and it will run. Fix that first and if it still won't run as expected show us what the first line in both files look like.	[reply] [d/l] [select]
Re: Misunderstood array behavior by GrandFather (Saint) on Sep 20, 2008 at 07:12 UTC
$sample2 is local to the inner loop but is tested outside the inner loop - it doesn't exist there. use strict would have told you about that unless you have another lexical $sample2 who's scope is global to the for loops. Generally when you want to perform this sort of matching task in Perl you should first think "hash". Consider: `use strict; use warnings; my $file1Data = "1\t2\t3\t4"; my $file2Data = "5\t6\t7\t4"; open my $fileText, '<', \$file1Data; my @firstLine1 = split /\t/, <$fileText>; close $fileText; open my $fileText2, '<', \$file2Data; my %firstLine2Fields = map {$_ => 1} split /\t/, <$fileText2>; foreach my $sample1 (@firstLine1) { print "Matched $sample1\n" if exists $firstLine2Fields {$sample1}; }` [download] Prints: `Matched 4` [download] Perl reduces RSI - it saves typing	[reply] [d/l] [select]
Re: Misunderstood array behavior by AnomalousMonk (Archbishop) on Sep 20, 2008 at 07:29 UTC
The other Usual Suspect in a split situation is a trailing split character in the input string, possibly with whitespace after it. E.g., if one of the strings you are splitting looks like `"foo\tbar\tbaz\t"` (note the trailing `\t` at the end), then you will have an empty string as the final string in the split output array. The other suggestion I would make would be to lose the confusing code construct `while(<$fileText>){ chomp; if($count++ == 0){ # I will eventually read the whole file... @firstLine1 = split(/\t/); last; } }` [download] in favor of something like `chomp($_ = <$file_handle>); # read, chomp one line my @split_fields = split /\t/;` [download] and eventually read the whole file separately.	[reply] [d/l] [select]
Re: Misunderstood array behavior by jethro (Monsignor) on Sep 20, 2008 at 11:40 UTC
May I suggest a change to your testing code: `# print "@firstLine1\n"; # All values print fine! MAYBE # print "@firstLine2\n"; # All values print fine! print '##',join('##',@firstLine1),"##\n"; print '##',join('##',@firstLine2),"##\n";` [download] Provided there are no '##' in your lines (showing those lines would have helped since your code is fine apart from the issue mentioned already) this will show you exactly how your arrays look like. Even better is the CPAN module Data::Dumper, especially when your data structures become more complex: `use Data::Dumper; # print "@firstLine1\n"; # All values print fine! MAYBE print Dumper(@firstLine1);` [download] PS: You could test your script with only one file i.e. `yourscript filex filex`. If you still get a missing value in the output, the script is to blame, otherwise your data	[reply] [d/l] [select]
Re^2: Misunderstood array behavior by toolic (Bishop) on Sep 20, 2008 at 15:44 UTC
Even better is the CPAN module Data::Dumper Not only is it a CPAN module, but it is also a "Core Module". This means that it is part of the Perl distribution and does not have to be separately downloaded and installed. It also means that you can use `[doc://Data::Dumper]` to link to the Perl doc, like so: Data::Dumper. You probably knew all this, but just in case others were unaware...	[reply] [d/l]
Re^3: Misunderstood array behavior by jethro (Monsignor) on Sep 20, 2008 at 15:57 UTC
Actually I didn't know this. Since I use linux distributions that make it easy to add lots of non-core modules to the installed perl at installation, the distinction between core and non-core is in practice replaced by distribution and non-distribution	[reply]
Re: Misunderstood array behavior by Anonymous Monk on Sep 20, 2008 at 14:53 UTC
Thank you everyone for your suggestions. I need to clarify a little more. wfsp and GrandFather: I apologize for misplacing the 'if' statement. I accidentally pasted it outside the inner loop, but in my code it is inside the inner loop. The condition works in every instance, except on the last item in either array. GrandFather: I thought about using a hash, but I need to gather the files in order (by column) so that I can correctly order the second one. I cannot think of how to do that with a hash. It seems that a 2D array would be optimal. Do you a suggestion on how to do it with a hash? AnomolousMonk: What I mean by the comment about "eventually" reading the whole file, I mean that I will eventually read it into a 2D array during that loop, but I simplified it for the posting. However, it still has the same behavior as is. I kept the loop to maintain what I would do later. Is there a better way to read in each row and column? jethro: Thank you for suggesting Dumper, I was not aware of it. It also perfectly shows my problem. When it prints the last item in both arrays, it's all messed up: #### BEGIN #### $VAR215 = 'MS02-19196-A6-DCIS'; $VAR216 = 'MS02-19196-A6-INVASIVE'; $VAR217 = 'MS01-9167-A7-DCIS'; ';AR218 = 'MS06-1878-D2-DCIS #### END #### That is exactly how it prints. Also, when it gets to the if condition, the condition fails. However, at that very moment, I can print the value in the debugger with "p $sample1" Anomolous Monk suggested that it could be a problem with extra tab(s) at the end of the line, but I have double checked that. This is really confusing to me. I also tried another file that is totally unrelated to what I'm doing, and it had the same behavior. I would like to post the file, but it has 218 columns and I don't see a way to upload it. Thanks for your help.	[reply]
Re^2: Misunderstood array behavior by jethro (Monsignor) on Sep 20, 2008 at 16:18 UTC
`$VAR217 = 'MS01-9167-A7-DCIS'; ';AR218 = 'MS06-1878-D2-DCIS` [download] If this is exactly what you get from data dumper, then there is a carriage return at the end of the line (hex 0D). It might mean that you use a msdos file on unix and your chomp only removes the Line Feed and not the Carriage return . See the man page of chomp and its dependance on $/. Setting $/ to `"\r\n"` would correct that, but then real unix files would not work. If you need both file types to work, use a regex instead of chomp About GrandFathers suggestion: Is the ordering of both files important to the result? If not you might put the second file into a hash instead of the first. But if you want helpful answers to that question you might open a new thread and tell us exactly what you want to do with those two files	[reply] [d/l] [select]
Re^3: Misunderstood array behavior by Anonymous Monk on Sep 20, 2008 at 17:25 UTC
For Pete's sake...I never would have suspected that because it seemed to be stomping on memory. I've dealt with these different line endings before, but never ran into that behavior. Thanks a ton for everyone who helped. I have to mention that this has been the most pleasant forum I've ever worked with. Thanks! By the way, I'm using tchomp (http://cpan.uwinnipeg.ca/htdocs/Text-Chomp/Text/Chomp.pm.html) to solve the problem. Do you see any reason not to always use tchomp in place of chomp?	[reply]
Re^4: Misunderstood array behavior by AZed (Monk) on Sep 20, 2008 at 19:05 UTC
Re^2: Misunderstood array behavior by tinita (Parson) on Sep 21, 2008 at 10:44 UTC
`#### BEGIN #### $VAR215 = 'MS02-19196-A6-DCIS'; $VAR216 = 'MS02-19196-A6-INVASIVE'; $VAR217 = 'MS01-9167-A7-DCIS'; ';AR218 = 'MS06-1878-D2-DCIS #### END ####` [download] This is why I always recommend $Data::Dumper::Useqq in such situations. Putting ~~quotes~~ some kind of delimiters around the variables you want to debug is of course a good thing, but if you're dealing with lines and have a problem, just use `use Data::Dumper; $Data::Dumper::Useqq = 1; # shows all non-printable characters print Dumper \@lines;` [download] (I also prefer to dump a reference, this avoids the big mess of many $VAR314159...) edit: I even have a useful mapping for vim on my homenode which lets you debug with only very few keystrokes. (for emacs it looks a bit more complicated)	[reply] [d/l] [select]
Re^3: Misunderstood array behavior by Anonymous Monk on Sep 22, 2008 at 02:14 UTC
Thank you. That is a valuable tip. Do you see any reason not to always use tchomp in place of chomp?	[reply]
Re^4: Misunderstood array behavior by JadeNB (Chaplain) on Sep 22, 2008 at 19:04 UTC