matthew_t_dooley has asked for the wisdom of the Perl Monks concerning the following question:

I have an EDI file to examine, ascii files with records starting "H00", "H01" for header records and "D01"..."D99" for detail records. I want to process each "H00".."D99" block at a time and have used

$/="\nH00"

In order to extract the "H01" records from this set, I have used

$_ =~ /(\nH01)(.*)(\n)/ ; my $H01record="H01".$2 ;

then checking the "H01" record, as follows

if ( substr ($H01record,43,2) eq "SA" ) ...

Is this very ugly? Is there a more elegant way to do it?
Please do not laugh, this is serious

Replies are listed 'Best First'.
Re: Simple but not elegant ?
by kcott (Archbishop) on Jun 01, 2015 at 19:14 UTC

    G'day matthew_t_dooley,

    Welcome to the Monastery.

    "Is this very ugly? Is there a more elegant way to do it?"

    This is somewhat "ugly" and certainly could be made "more elegant"; however, there are issues with your code that go beyond its beauty.

    "I want to process each "H00".."D99" block at a time and have used $/="\nH00"

    When changing a special variable (such as $/), you should use local in an anonymous block. See "Localization of special variables".

    "In order to extract the "H01" records from this set, I have used ..."

    Here you've hard-coded the literal 'H01' in two places. In general, this is a bad idea. Multiple, hard-coded literals increase the chances of introducing a typo, without introducing a syntax error which Perl could tell you about, potentially causing bugs which are hard to track down. Additionally, if you need to change this literal (perhaps as part of some future code enhancement), you'll need to make multiple, identical changes: more work for you and more chances of introducing errors.

    Beyond the actual code issues, there's some problems with your post. My intention here is not to berate you but rather just point out the issues.

    • A prosaic description of your data is rarely helpful. A small, representative example is much better: we can see exactly what the data looks like and we can use it directly in any code samples we provide to you. At the time of writing this, I see two monks have made (different) guesses as to your data and, in my code example below, I've made a third. Are any of us right?
    • You provide no context for the few lines of code you've shown. There might be a much better way to write your code but, again, you've left us guessing. A short, working example (with the output it produces - where appropriate), greatly increases your chances of getting (more) helpful answers from us: which would be why you spent the time to post your question here in the first place.
    • The markup was fixed before I first viewed your post. Few monks, myself included, will bother to answer posts that are illegible: we're happy to respond to questions but less happy if we have to rewrite the question first.

    These guidelines should help you with any future posts.

    Here's my take on a better way to write the code:

    #!/usr/bin/env perl -l use strict; use warnings; { local $/ = "\nH00"; while (<DATA>) { my ($h01) = / ^ ( H01 .* ) $ /mx; if (substr($h01, 14, 2) eq 'SA') { print 'WANTED: ', $h01; } } } __DATA__ H00... don't care ... H01... block1 SA ... D01... don't care ... H00... don't care ... H01... block2 SX ... D01... don't care ... H00... don't care ... H01... block3 SA ... D01... don't care ...

    Output:

    $ pm_1128584_process_edi_file.pl WANTED: H01... block1 SA ... WANTED: H01... block3 SA ...

    For difficulties with the regular expression I've used, see perlre. Specifics of the modifiers (/.../mx) can be found in the Modifiers section of that document.

    -- Ken

Re: Simple but not elegant ?
by Eily (Monsignor) on Jun 01, 2015 at 15:48 UTC

    If your records can't be multiline, as your regex seems to imply, you can read the file line by line and populate a hash like this:

    use v5.14; use Data::Dumper; my %header; LINE: while (<DATA>) # reading line by line { chomp; # remove the \n at the end if (/^(H\d\d)(.*)/) # ^ beginning of string { $header{$1} = $2; } else { last LINE; # Stop running the LINE loop } } print Dumper \%header; __DATA__ H00 H01 Hi H02 Hello H03 Bonjour D01 D02
    If your records can be multiline, you may want to read about $/ or the input record separator, and the /m modifier (which will make ^ match the beginning of any line, not just the beginning of the string). The logic may still be the same.

Re: Simple but not elegant ?
by ww (Archbishop) on Jun 01, 2015 at 15:44 UTC

    It's truly ugly when you fail to use the para and code tags (markup) which are explained immediately above the text-entry box where you created this node. Pls return and edit Thank you for editing to make it more "elegant" (or, in this case, readable -- which is highly valued when you ask busy volunteers for free help).

    And you can be quite confident that few if any will laugh ... at least, not at a well-formatted question.

    Update: s/// in first sentence. Appears that I caught OP davido in the process of janitoring correcting the lack of markup.

      Sincerest apologies

Re: Simple but not elegant ?
by GotToBTru (Prior) on Jun 01, 2015 at 19:57 UTC

    I've been using Perl in EDI for about 14 years, and I think it's the ideal tool. Other than a dedicated translator.

    Your example does not seem to match your code. If the beginning record header is H01..., \nH00 will not find it. Our applications produce similarly formatted files:

    H0010|... H0020|... H0040|... D0010|1|.. D0020|... D0030|... D0010|2|... D0020|... D0020|... D0030|...

    I tend to use a state machine and read thru the records sequentially, rather than try to separate the data into blocks using regexes. YMMV.

    EDI files typically are delimited, so you should look into Text::CSV which, despite the name, can use most any delimiter and not just commas. It helps a great deal when your delimiter may occur within the data itself. It will reliably divide your records into fields. So will split, but the module handles special cases you may not think of until they occur and mess up your process.

    If your fields are fixed length, substr() will work just fine. You may also want to look into unpack(), but it can be unforgiving if your data does not match expectations. Either will let you easily divide a record into named variables, and $ordernumber will be much easier to understand than substr($H01record,32,30) when you revisit the program 6 months from now.

    Dum Spiro Spero
Re: Simple but not elegant ?
by crusty_collins (Friar) on Jun 01, 2015 at 15:53 UTC
    that is ugly

    I would do something like this.

    #!perl use strict; foreach my $line (<DATA>){ if ( $line =~ /\bH00(.*)$/ ) { # print " hoo $1 \n"; } elsif ( $line =~ /\bH01(.*)$/ ) { # print " ho1 $1 \n"; } elsif ( $line =~ /\bD0(.*)$/ ) { # print " do1 $1 \n"; } } __DATA__ H00 Header1 H01 Header2 D01 data1 D02 data2
      that is ugly

      And some people think if-elsif chains and blocks without indentation are ugly...

      What does your code do with a line like "H01 H00"?

A reply falls below the community's threshold of quality. You may see it by logging in.