same script, different results

jb60606 has asked for the wisdom of the Perl Monks concerning the following question:

I had applied for a job and for the 'technical' phone screen interview, I was given two scripting tasks - one of which was the following:

##################
#TASK DESCRIPTION#
##################
[download]

Write a Perl script to find out which symbols in FILE_A are not contained in fileB for groupA and exchanges B and C.

##################
#FILE_A (sym.txt)#
##################
[download]

A
AA
ABC
ADF
BFD
EFF
ZFF
ZZD
[download]

###################
#FILE_B (data.txt)#
###################
[download]

exchangeA_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 ADF 10 10 EFF
+ 10 10 MMM 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeA_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeA_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeB_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeB_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeB_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeC_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 ZZD 10 10
exchangeC_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeC_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeD_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 RFD 10 10 ZFF 10 10
exchangeD_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeD_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
[download]

###########
#MY SCRIPT#
###########
[download]

#use warnings;
use strict;
#use feature qw(switch say);

my $symbols = "sym.txt";
my $data = "data.txt";

open (SYM, $symbols) or die ("$symbols not found");

while (my $symbol = <SYM>)
    { 
    my $lineNum = 0;
    chomp $symbol;
    open (DATA, $data) or die ("$data not found");

    while (my $line=<DATA>) {
    $lineNum++;
        chomp $line;
        my @array = split ( /[\." "]/, $line );
        my $sym1 = $array[4];
        my $sym2 = $array[7];
        my $sym3 = $array[10];
        my $sym4 = $array[13];
        my $sym5 = $array[16];
        my $sym6 = $array[19];
        my $sym7 = $array[22];
        my $sym8 = $array[25];
        if (( $array[0] =~ m/^exchangeB_groupA$/ ) || ( $array[0] =~ m
+/^exchangeC_groupA$/ ))
        {

            if ($symbol eq $sym1) { next; }
            elsif ($symbol eq $sym2) { next; }
            elsif ($symbol eq $sym3) { next; }
            elsif ($symbol eq $sym5) { next; }
            elsif ($symbol eq $sym5) { next; }
            elsif ($symbol eq $sym6) { next; }
            elsif ($symbol eq $sym7) { next; }
            elsif ($symbol eq $sym8) { next; }
            else 
            { 
                print "$symbol\t$array[0](ln$lineNum)\tnot found\n"
            }

        }
    }
}
[download]

############
#MY RESULTS#
############
[download]

A    exchangeB_groupA(ln4)    not found
A    exchangeC_groupA(ln7)    not found
ADF    exchangeB_groupA(ln4)    not found
ADF    exchangeC_groupA(ln7)    not found
BFD    exchangeB_groupA(ln4)    not found
BFD    exchangeC_groupA(ln7)    not found
ZZD    exchangeB_groupA(ln4)    not found
[download]

If you manually sift through the data, you'll see that the results I get are correct. However, when he ran my script, he said that he received ALL symbols back (meaning - every symbol in FILE_A were reported missing).

I was not near a computer with Perl on it and could not troubleshoot the issue 'live' during the interview. Upon being near a computer again, I began to troubleshoot the issue and I get the same (correct)results in Macintosh/FreeBSD (where it was written) as I do in Ubuntu Linux (ubuntu 3.11.0-12-generic). The only way that I could make the script fail and achieve the same results as he was by altering each of the $sym[x] declarations in my script, to match up with the wrong $array[element].

I know that my script may not be the most efficient way to perform the task, but I thought that the code was inteligible and "fool-proof". I packaged the script I sent to him with the same symbol file and data file that I used, so that we could mimic running the script in virtually the same environment and there would be no formatting issues.

Sorry for the long setup, but with all of that being said, can you think of any reason why this script wouldn't work and why it would give the results that he received?

Thanks

Comment on same script, different results Select or Download Code

Replies are listed 'Best First'.

Re: same script, different results
by Athanasius (Archbishop) on Oct 25, 2013 at 12:38 UTC

Hello jb60606,

When I run your code on my 32-bit Vista system:

This is perl 5, version 18, subversion 1 (v5.18.1) built for MSWin32-x
+86-multi-thread-64int
[download]

I get the same results as you. Looking through the code, nothing leaps out as a potential trigger for different behaviour on a different system. (Did you ask the interviewer for details of the system on which he ran your code?) So I can’t help you there, sorry. But the following points should be noted:

Commenting-out use warnings; is a red flag. For lines with fewer than 26 fields, $array[25] is undefined, raising a warning on the equality test. One way to avoid the warning is to test for definedness first, and short-circuit if the value is undefined:
```
...
next if defined $sym8 && $symbol eq $sym8;
...
[download]
```
As Lennotoecom has pointed out, your results do not take account of symbol A in the data file. This is because the character class used in the call to split doesn’t include =. Note also that within a character class, the dot does not need to be escaped, and enclosing the space in double quotes merely adds the " character to the class. You should use: my @array = split /[. =]/, $line;.
If the files get bigger, the inefficiency of opening and closing and reopening the data file for each symbol in the symbol file will quickly become prohibitive. I would read the symbols into ~~a hash~~ an array first, then open the data file once and process it line by line, testing each symbol against the current line before moving on to the next.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^2: same script, different results

by jb60606 (Acolyte) on Oct 25, 2013 at 13:42 UTC

Hi folks, thanks for your expertise and advice. To answer some of your questions, I hadn't originally planned on writing it this way as I thought something like a hash of arrays would have been more appropriate. The problem was that a HoA is kind of uncharted territory for me, as I am still relatively new to this this or any language. Additionally, I only had the remainder of the day to put a script together and send it back to the interviewer. The interview was yesterday. I had hoped to figure out what went wrong and package an explanation with the thank-you letter today but seeing that you've discovered additional unforgivable problems, I might be better off leaving it alone. Besides, apart from the 2nd script being a success, I pretty much bombed the rest of the interview after learning that this script failed and highly doubt they consider me a competitive candidate.

[reply]

Re: same script, different results
by Lennotoecom (Pilgrim) on Oct 25, 2013 at 11:57 UTC

exchangeB_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeC_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 ZZD 10 10
[download]

A    exchangeB_groupA(ln4)    not found
A    exchangeC_groupA(ln7)    not found
ADF    exchangeB_groupA(ln4)    not found
ADF    exchangeC_groupA(ln7)    not found
BFD    exchangeB_groupA(ln4)    not found
BFD    exchangeC_groupA(ln7)    not found
ZZD    exchangeB_groupA(ln4)    not found
[download]

[reply]
[d/l]
[select]

Re^2: same script, different results

by jb60606 (Acolyte) on Oct 25, 2013 at 12:25 UTC

Crap - you're right - I missed that. But even if 'A' was overlooked, any idea how the rest would return as 'Not Found' if the script designated that the first symbol was in a particular element/column? thanks for your reply.

[reply]

Re: same script, different results
by Dallaylaen (Chaplain) on Oct 25, 2013 at 15:07 UTC

chomp

s/\s*$//s

perlsec

while (<SYM>) {
    m/($insert_expected_symbol_format_here)/ 
        or die "Bad input format"; # or maybe just "or next"
    my $symbol = $1;
};
[download]

[reply]
[d/l]

Re: same script, different results
by Lennotoecom (Pilgrim) on Oct 25, 2013 at 12:33 UTC

fileA:
A
AA
ABC
ADF
BFD
EFF
ZFF
ZZD

fileB:
exchangeA_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 ADF 10 10 EFF
+ 10 10 MMM 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeA_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeA_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeB_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeB_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeB_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeC_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 ZZD 10 10
exchangeC_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeC_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeD_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 RFD 10 10 ZFF 10 10
exchangeD_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
exchangeD_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM
+ 10 10 NDB 10 10 RFD 10 10 ZFF 10 10

script:
open IN, "<fileA" or die $!;
        %fileA = map {/$/; $` => 1} <IN>;
close IN;

open IN, "<fileB" or die $!;
while($line = <IN>){
if($line=~m/(exchange[BC]_groupA)\.gateway_risk=/){
        print "in line $1:\n"; $line = $';
        foreach $key (keys %fileA){
                print "\tnot found $key\n" if !($line=~s/$key//);
        }
}
}
close IN;


result:
in line exchangeB_groupA:
        not found BFD
        not found ADF
        not found ZZD
in line exchangeC_groupA:
        not found BFD
        not found ADF
[download]

UPDATE

Use of uninitialized value $sym8 in string eq at ./b.pl line 39, <DATA
+> line 4.
A       exchangeB_groupA(ln4)   not found
A       exchangeC_groupA(ln7)   not found
Use of uninitialized value $sym8 in string eq at ./b.pl line 39, <DATA
+> line 40.
ADF     exchangeB_groupA(ln4)   not found
ADF     exchangeC_groupA(ln7)   not found
Use of uninitialized value $sym8 in string eq at ./b.pl line 39, <DATA
+> line 52.
BFD     exchangeB_groupA(ln4)   not found
BFD     exchangeC_groupA(ln7)   not found
Use of uninitialized value $sym8 in string eq at ./b.pl line 39, <DATA
+> line 88.
ZZD     exchangeB_groupA(ln4)   not found
[download]

[reply]
[d/l]
[select]