skyler has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I have written this script to parse a flat HL7 file. It seems like it works but I have noticed that it doesn't separate all the accounts. Since the segments are in HL7 format with tags like MSH, PID, PV1, FT1. The script is suppose to go to tag "FT1" , then go to six position and check to see if the account matches and if it doesn't. It is suppose to move the line to another file. The script below only searches for the accounts below but not the tag as of yet. I'm not familiar with using a temporary buffer or store values in memory yet. Does someone knows how to parse an HL7 file base on the tags? I was thinking in modifying the script by searching the "FT1" tag then compare it with existing accounts. if it matches leave the line and if it doesn't move the line to another file until the end of file. Do you have any ideas on how to get it done. Thanks.
#!/usr/bin/perl -w use strict; my $infile = 'c:\\hl7file2.txt'; my ( $yr, $mo, $dy ) = (localtime)[5,4,3]; my $outfile = sprintf( "%04d%02d%02d.txt",$yr+1900,$mo+1,$dy ); my $counter; open IN, "<$infile" or die "Couldn't open $infile, $!"; open OUT,">$outfile" or die "Couldn't open $outfile, $!"; # $counter++; # print $counter; my @finds = qw( 00000 00001 00002 00003 00004 76370 76375 76950 77403 +77404 77406 77407 77408 77409 77411 77412 77413 77414 77416 77418 + 77370 77336 77417 ); # my $finds_re = join '|', map { quotemeta }@finds; my $finds_re = '\b' . join( '|', map { quotemeta } @finds ) . '\b'; $finds_re = qr/$finds_re/; # print $finds_re; while(<IN>) { next if m/$finds_re/; print OUT; } close IN;

Replies are listed 'Best First'.
Re: Parse an ADT file
by Old_Gray_Bear (Bishop) on May 18, 2004 at 17:01 UTC
    A couple of life-times ago I had the pleasure of dealing with an HL7 feed from an Emergency Room. I finally ended up in sheer desparation building individual routines to cope with the five or six (seventeen at last count) different record-types I was getting in my input. I ended up with a switch/case construct reading the file and calling the appropriate routine for each record type.

    After the fifth "Oh, didn't we mention we are getting XXX types as well" from the medical 'Analysts', I started work on an OO framework to handle a generalized HL7 record-type, which I still have somewhere in the Archive. (Methods to extract each of the subfields correctly, and build a common Parse Structure for HL7, v2.x. It was most ugly, variable numbers of fields with variable lengths, with variable field-markers (depending on where in the structure you are, the eof mark changes) -- oh my. I finally figured out that an HL7 data type is really a four dimensional structure -- an array of 3-d arrays. It helped to visualize what I was trying to do, but it was a main-pain to process. Can you say sparse variable-length arrays? Can you say storage hog??)

    Your approach is pretty much going the right way (in so far as there is a 'right way' w.r.t. HL7). You might ask the folks who are supplying your data if they plan to go to Version 3 any time soon. V3 was supposed to define the XML support constructs for all the HL7 record types and that will make the parsing problem much easier/more complicated (Take your pick).

    Best of luck

    ----
    I Go Back to Sleep, Now.

    OGB

Re: Parse an ADT file
by TrekNoid (Pilgrim) on Jun 14, 2004 at 22:10 UTC
    I'm still a lowly Novice here, so be gentle oh great ones :)

    While I'm not sure I can get you exactly what you're after, I can provide you some of the general logic I use to parse HL7 files. I know there's better ways to do it, but this is the project I learned Perl with, and so it's very rudimentary.

    I work primarily with HL7 lab results, but the logic is pretty general for any HL7 record.

    First off, I call the procedure with a file as a parameter, and assign it to the variable $b... Then call the following subroutine logic:

    sub parse_lab_test { @segs = split('\r', $b); # \r is the record terminator foreach $seg (@segs) { $segtype = substr($seg, 0, 3); if ($segtype eq "MSH") { msh_fields($seg); } elsif ($segtype eq "PID") { pid_fields($seg); } elsif ($segtype eq "PV1") { pv1_fields($seg); ... etc ... } else { &segment_error; } } }
    Then, for example, if you're working on the MSH segment, you'd have the following subroutine:

    sub msh_fields { ($mshseg) = @_; @vals = split('\|', $mshseg); $docsource = $vals[3]; $trandate = $vals[6]; $doctype2 = $vals[7]; $rectype = $vals[8]; $unique_hl7_id = $vals[9]; if ($rectype ne 'ORU^R01'){ # Discrete Results only next MAIN; # Label that ends the script } }
    Then, you just define all the HL7 segments you're interested in. In your case, you'd have $segtype = 'FT1', then call ft1_fields.

    Anyways, that's how I do it... It's not the smallest or most elegant way to do it, but it's pretty easy to maintain :)

    Trek