My work in the microelectronics industry involves the automated testing of semiconductor wafers.
This generates large text files containing pass/fail information for each of the thousands of devices on those wafers.
Each device is identified by its co-ordinates and a unique serial number. Datalog files approaching 100MB with one or two million lines are not unusual.
Naturally, I use Perl for manipulating the data.
As an example - when I have to retest a wafer, I sometimes find that a few devices which had passed the first time, fail at retest, or vice-versa.
What I needed was a tool which 'merged' two input files, and produced a single output file which, for each serial number, contained the 'best' result from each of the input files.
The key to this kind of operation can be summed up as 'know thy data':
Each datalog file includes a header and footer.
The headers on the two input files should be identical since they describe the test program, wafer batch, operator, etc.
The footers contain summary information - and this is obviously going to change, but I have another perl script which takes care of this.
In between the header and footer, data for each device is a block of lines which look like
SERIAL NUMBER: nnn DIE COORDINATES: X=xx,Y=yy [several lines of measurement values] END OF TEST BIN NAME: FAIL_FUNCTIONAL_TEST RESULT: FAIL SERIAL NUMBER: nnn
Ideally, BIN NAME and RESULT indicate a PASS.
I had put off writing something to 'merge' the datalogs, but when I tried it I was surprised how little perl was necessary to do the job:
#!/usr/bin/perl use warnings; use strict; die "Usage: $0 <file1> <file2>\n" unless scalar(@ARGV)>1; undef $/; my @f1 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>); my @f2 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>); die "Error: file1 has $#f1 serials, file2 has $#f2\n" if ($#f1 != $#f2 +); foreach my $i (0 .. $#f1) { print (($f2[$i] =~ m/RESULT:\s+PASS/) ? $f2[$i] : $f1[$i]); } print STDERR "$#f1 serials found\n";
So, I slurp the input files into two arrays, splitting at (and preserving) the SERIAL NUMBER lines. Then, looping through the arrays, my output is the data from the second file if it passes, otherwise, from the first file.
I can redirect the output through another perl script which recalculates the summary in the footer.
It works like a charm, but, if the truth be known, I've been reluctant to actually use this technique on production data. I can picture the scenario now: 'So, if the yield was 90% both times you tested the wafer - how come you now claim it's 95% ?'
Management can take all the fun out of creative data crunching.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: If at first you don't succeed ...
by graff (Chancellor) on Nov 06, 2005 at 02:11 UTC | |
by Anonymous Monk on Nov 06, 2005 at 10:07 UTC | |
|
Re: If at first you don't succeed ...
by chanio (Priest) on Nov 06, 2005 at 03:40 UTC | |
|
Re: If at first you don't succeed ...
by bart (Canon) on Nov 07, 2005 at 00:38 UTC |