jb60606 has asked for the wisdom of the Perl Monks concerning the following question:

I had applied for a job and for the 'technical' phone screen interview, I was given two scripting tasks - one of which was the following:
################## #TASK DESCRIPTION# ##################

Write a Perl script to find out which symbols in FILE_A are not contained in fileB for groupA and exchanges B and C.

################## #FILE_A (sym.txt)# ##################
A AA ABC ADF BFD EFF ZFF ZZD
################### #FILE_B (data.txt)# ###################
exchangeA_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 ADF 10 10 EFF + 10 10 MMM 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeA_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeA_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeB_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeB_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeB_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeC_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 ZZD 10 10 exchangeC_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeC_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeD_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 RFD 10 10 ZFF 10 10 exchangeD_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeD_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10
########### #MY SCRIPT# ###########
#use warnings; use strict; #use feature qw(switch say); my $symbols = "sym.txt"; my $data = "data.txt"; open (SYM, $symbols) or die ("$symbols not found"); while (my $symbol = <SYM>) { my $lineNum = 0; chomp $symbol; open (DATA, $data) or die ("$data not found"); while (my $line=<DATA>) { $lineNum++; chomp $line; my @array = split ( /[\." "]/, $line ); my $sym1 = $array[4]; my $sym2 = $array[7]; my $sym3 = $array[10]; my $sym4 = $array[13]; my $sym5 = $array[16]; my $sym6 = $array[19]; my $sym7 = $array[22]; my $sym8 = $array[25]; if (( $array[0] =~ m/^exchangeB_groupA$/ ) || ( $array[0] =~ m +/^exchangeC_groupA$/ )) { if ($symbol eq $sym1) { next; } elsif ($symbol eq $sym2) { next; } elsif ($symbol eq $sym3) { next; } elsif ($symbol eq $sym5) { next; } elsif ($symbol eq $sym5) { next; } elsif ($symbol eq $sym6) { next; } elsif ($symbol eq $sym7) { next; } elsif ($symbol eq $sym8) { next; } else { print "$symbol\t$array[0](ln$lineNum)\tnot found\n" } } } }
############ #MY RESULTS# ############
A exchangeB_groupA(ln4) not found A exchangeC_groupA(ln7) not found ADF exchangeB_groupA(ln4) not found ADF exchangeC_groupA(ln7) not found BFD exchangeB_groupA(ln4) not found BFD exchangeC_groupA(ln7) not found ZZD exchangeB_groupA(ln4) not found

If you manually sift through the data, you'll see that the results I get are correct. However, when he ran my script, he said that he received ALL symbols back (meaning - every symbol in FILE_A were reported missing).

I was not near a computer with Perl on it and could not troubleshoot the issue 'live' during the interview. Upon being near a computer again, I began to troubleshoot the issue and I get the same (correct)results in Macintosh/FreeBSD (where it was written) as I do in Ubuntu Linux (ubuntu 3.11.0-12-generic). The only way that I could make the script fail and achieve the same results as he was by altering each of the $sym[x] declarations in my script, to match up with the wrong $array[element].

I know that my script may not be the most efficient way to perform the task, but I thought that the code was inteligible and "fool-proof". I packaged the script I sent to him with the same symbol file and data file that I used, so that we could mimic running the script in virtually the same environment and there would be no formatting issues.

Sorry for the long setup, but with all of that being said, can you think of any reason why this script wouldn't work and why it would give the results that he received?

Thanks

Replies are listed 'Best First'.
Re: same script, different results
by Athanasius (Archbishop) on Oct 25, 2013 at 12:38 UTC

    Hello jb60606,

    When I run your code on my 32-bit Vista system:

    This is perl 5, version 18, subversion 1 (v5.18.1) built for MSWin32-x +86-multi-thread-64int

    I get the same results as you. Looking through the code, nothing leaps out as a potential trigger for different behaviour on a different system. (Did you ask the interviewer for details of the system on which he ran your code?) So I can’t help you there, sorry. But the following points should be noted:

    • Commenting-out use warnings; is a red flag. For lines with fewer than 26 fields, $array[25] is undefined, raising a warning on the equality test. One way to avoid the warning is to test for definedness first, and short-circuit if the value is undefined:

      ... next if defined $sym8 && $symbol eq $sym8; ...
    • As Lennotoecom has pointed out, your results do not take account of symbol A in the data file. This is because the character class used in the call to split doesn’t include =. Note also that within a character class, the dot does not need to be escaped, and enclosing the space in double quotes merely adds the " character to the class. You should use: my @array = split /[. =]/, $line;.

    • If the files get bigger, the inefficiency of opening and closing and reopening the data file for each symbol in the symbol file will quickly become prohibitive. I would read the symbols into a hash an array first, then open the data file once and process it line by line, testing each symbol against the current line before moving on to the next.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Hi folks, thanks for your expertise and advice. To answer some of your questions, I hadn't originally planned on writing it this way as I thought something like a hash of arrays would have been more appropriate. The problem was that a HoA is kind of uncharted territory for me, as I am still relatively new to this this or any language. Additionally, I only had the remainder of the day to put a script together and send it back to the interviewer. The interview was yesterday. I had hoped to figure out what went wrong and package an explanation with the thank-you letter today but seeing that you've discovered additional unforgivable problems, I might be better off leaving it alone. Besides, apart from the 2nd script being a success, I pretty much bombed the rest of the interview after learning that this script failed and highly doubt they consider me a competitive candidate.
Re: same script, different results
by Lennotoecom (Pilgrim) on Oct 25, 2013 at 11:57 UTC
    Your results correct how?
    exchangeB_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeC_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 ZZD 10 10
    from your examples:
    exchangeB_groupA contains:
    A AA ABC EFF ZFF
    and exchangeC_groupA:
    AA ABC EFF ZFF ZZD
    but your results say:
    A exchangeB_groupA(ln4) not found A exchangeC_groupA(ln7) not found ADF exchangeB_groupA(ln4) not found ADF exchangeC_groupA(ln7) not found BFD exchangeB_groupA(ln4) not found BFD exchangeC_groupA(ln7) not found ZZD exchangeB_groupA(ln4) not found
    A for exchangeB_groupA is not found?
    The symbol A is in exchangeB_groupA and he should be found.
      Crap - you're right - I missed that. But even if 'A' was overlooked, any idea how the rest would return as 'Not Found' if the script designated that the first symbol was in a particular element/column? thanks for your reply.
Re: same script, different results
by Dallaylaen (Chaplain) on Oct 25, 2013 at 15:07 UTC
    If you run the script on Unix ($/ = "\n"), but the files are saved on Windows ($/ = "\r\n"), you'll get botched result. Don't trust chomp, use s/\s*$//s instead. Or better yet, use regexp matching and capturing (this is what -T mode and perlsec would suggest):
    while (<SYM>) { m/($insert_expected_symbol_format_here)/ or die "Bad input format"; # or maybe just "or next" my $symbol = $1; };
Re: same script, different results
by Lennotoecom (Pilgrim) on Oct 25, 2013 at 12:33 UTC
    I for the test did that code,
    works for me:
    fileA: A AA ABC ADF BFD EFF ZFF ZZD fileB: exchangeA_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 ADF 10 10 EFF + 10 10 MMM 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeA_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeA_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeB_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeB_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeB_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeC_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 ZZD 10 10 exchangeC_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeC_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeD_groupA.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 RFD 10 10 ZFF 10 10 exchangeD_groupB.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 exchangeD_groupC.gateway_risk=A 10 10 AA 10 10 ABC 10 10 EFF 10 10 MMM + 10 10 NDB 10 10 RFD 10 10 ZFF 10 10 script: open IN, "<fileA" or die $!; %fileA = map {/$/; $` => 1} <IN>; close IN; open IN, "<fileB" or die $!; while($line = <IN>){ if($line=~m/(exchange[BC]_groupA)\.gateway_risk=/){ print "in line $1:\n"; $line = $'; foreach $key (keys %fileA){ print "\tnot found $key\n" if !($line=~s/$key//); } } } close IN; result: in line exchangeB_groupA: not found BFD not found ADF not found ZZD in line exchangeC_groupA: not found BFD not found ADF
    now I'm gonna test your code. Will put update with the results.
    UPDATE
    result of your code on my machine:
    Use of uninitialized value $sym8 in string eq at ./b.pl line 39, <DATA +> line 4. A exchangeB_groupA(ln4) not found A exchangeC_groupA(ln7) not found Use of uninitialized value $sym8 in string eq at ./b.pl line 39, <DATA +> line 40. ADF exchangeB_groupA(ln4) not found ADF exchangeC_groupA(ln7) not found Use of uninitialized value $sym8 in string eq at ./b.pl line 39, <DATA +> line 52. BFD exchangeB_groupA(ln4) not found BFD exchangeC_groupA(ln7) not found Use of uninitialized value $sym8 in string eq at ./b.pl line 39, <DATA +> line 88. ZZD exchangeB_groupA(ln4) not found
    P.S. did you get the job?