comment on

My work in the microelectronics industry involves the automated testing of semiconductor wafers.

This generates large text files containing pass/fail information for each of the thousands of devices on those wafers.

Each device is identified by its co-ordinates and a unique serial number. Datalog files approaching 100MB with one or two million lines are not unusual.

Naturally, I use Perl for manipulating the data.

As an example - when I have to retest a wafer, I sometimes find that a few devices which had passed the first time, fail at retest, or vice-versa.

What I needed was a tool which 'merged' two input files, and produced a single output file which, for each serial number, contained the 'best' result from each of the input files.

The key to this kind of operation can be summed up as 'know thy data':

Each datalog file includes a header and footer.

The headers on the two input files should be identical since they describe the test program, wafer batch, operator, etc.

The footers contain summary information - and this is obviously going to change, but I have another perl script which takes care of this.

In between the header and footer, data for each device is a block of lines which look like

SERIAL NUMBER:    nnn 
DIE COORDINATES:  X=xx,Y=yy
[several lines of measurement values]
END OF TEST
BIN NAME:         FAIL_FUNCTIONAL_TEST
RESULT:           FAIL
SERIAL NUMBER:    nnn
[download]

Ideally, BIN NAME and RESULT indicate a PASS.

I had put off writing something to 'merge' the datalogs, but when I tried it I was surprised how little perl was necessary to do the job:

#!/usr/bin/perl

use warnings;
use strict;

die "Usage: $0 <file1> <file2>\n" unless scalar(@ARGV)>1;

undef $/;
my @f1 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);
my @f2 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);

die "Error: file1 has $#f1 serials, file2 has $#f2\n" if ($#f1 != $#f2
+);

foreach my $i (0 .. $#f1) {
    print (($f2[$i] =~ m/RESULT:\s+PASS/) ? $f2[$i] : $f1[$i]);
}

print STDERR "$#f1 serials found\n";
[download]

So, I slurp the input files into two arrays, splitting at (and preserving) the SERIAL NUMBER lines. Then, looping through the arrays, my output is the data from the second file if it passes, otherwise, from the first file.

I can redirect the output through another perl script which recalculates the summary in the footer.

It works like a charm, but, if the truth be known, I've been reluctant to actually use this technique on production data. I can picture the scenario now: 'So, if the yield was 90% both times you tested the wafer - how come you now claim it's 95% ?'

Management can take all the fun out of creative data crunching.

In reply to If at first you don't succeed ... by pavium

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.