If at first you don't succeed ...

My work in the microelectronics industry involves the automated testing of semiconductor wafers.

This generates large text files containing pass/fail information for each of the thousands of devices on those wafers.

Each device is identified by its co-ordinates and a unique serial number. Datalog files approaching 100MB with one or two million lines are not unusual.

Naturally, I use Perl for manipulating the data.

As an example - when I have to retest a wafer, I sometimes find that a few devices which had passed the first time, fail at retest, or vice-versa.

What I needed was a tool which 'merged' two input files, and produced a single output file which, for each serial number, contained the 'best' result from each of the input files.

The key to this kind of operation can be summed up as 'know thy data':

Each datalog file includes a header and footer.

The headers on the two input files should be identical since they describe the test program, wafer batch, operator, etc.

The footers contain summary information - and this is obviously going to change, but I have another perl script which takes care of this.

In between the header and footer, data for each device is a block of lines which look like

SERIAL NUMBER:    nnn 
DIE COORDINATES:  X=xx,Y=yy
[several lines of measurement values]
END OF TEST
BIN NAME:         FAIL_FUNCTIONAL_TEST
RESULT:           FAIL
SERIAL NUMBER:    nnn
[download]

Ideally, BIN NAME and RESULT indicate a PASS.

I had put off writing something to 'merge' the datalogs, but when I tried it I was surprised how little perl was necessary to do the job:

#!/usr/bin/perl

use warnings;
use strict;

die "Usage: $0 <file1> <file2>\n" unless scalar(@ARGV)>1;

undef $/;
my @f1 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);
my @f2 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);

die "Error: file1 has $#f1 serials, file2 has $#f2\n" if ($#f1 != $#f2
+);

foreach my $i (0 .. $#f1) {
    print (($f2[$i] =~ m/RESULT:\s+PASS/) ? $f2[$i] : $f1[$i]);
}

print STDERR "$#f1 serials found\n";
[download]

So, I slurp the input files into two arrays, splitting at (and preserving) the SERIAL NUMBER lines. Then, looping through the arrays, my output is the data from the second file if it passes, otherwise, from the first file.

I can redirect the output through another perl script which recalculates the summary in the footer.

It works like a charm, but, if the truth be known, I've been reluctant to actually use this technique on production data. I can picture the scenario now: 'So, if the yield was 90% both times you tested the wafer - how come you now claim it's 95% ?'

Management can take all the fun out of creative data crunching.

Comment on If at first you don't succeed ... Select or Download Code

Replies are listed 'Best First'.
Re: If at first you don't succeed ... by graff (Chancellor) on Nov 06, 2005 at 02:11 UTC
As an example - when I have to retest a wafer, I sometimes find that a few devices which had passed the first time, fail at retest, or vice-versa. What I needed was a tool which 'merged' two input files, and produced a single output file which, for each serial number, contained the 'best' result from each of the input files. ... I can picture the scenario now: 'So, if the yield was 90% both times you tested the wafer - how come you now claim it's 95% ?' Hmm. I suppose I'd be reluctant to use this approach to summarizing failure rates as well. Testing circuits on wafers is way outside my field, but I would have expected that if there is a subset "A" that fails on one pass, and a subset "B" that fails on another pass, then the set of troublesome serial numbers to report as unreliable should be the union of sets A and B, rather than their intersection. I can understand the perspective that the only "real" failures are the ones that consistently failed on every pass. But there is the other perspective: that the only "real" successes are the ones that never failed on any pass. Fortunately, perl makes it easy to report the results, no matter which perspective you choose.	[reply]
Re^2: If at first you don't succeed ... by Anonymous Monk on Nov 06, 2005 at 10:07 UTC
A good point - but what I didn't mention is that the testing I do is mostly DC parametric. Final test of packaged devices is done at GHz frequencies. So 'wafer sort' treats any indication of passing as an excuse to let the DUT live another day and be finally tested at RF. Obvious failures are inked and not packaged. There is no strong correlation between DC performance and RF performance.	[reply]
Re: If at first you don't succeed ... by chanio (Priest) on Nov 06, 2005 at 03:40 UTC
You could also do: `local $/="SERIAL NUMBER"; my @f1 = (<>); my @f2 = (<>); ...` [download] ...and write less code :) . Landlords production is only eaten by landlords... Wherever I lay my KNOPPIX disk, a new FREE LINUX nation could be established	[reply] [d/l]
Re: If at first you don't succeed ... by bart (Canon) on Nov 07, 2005 at 00:38 UTC
Just a thought, no fleshed out code: you could try to use diff, or its perl counterpart Algorithm::Diff or some derivate, as a basis to figure out if and where they're different. For each line where you spot a difference, you can check which value best approaches the value you're after. This is under the assumption that the test results which be the same in structure, but may differ at some spots.	[reply]