in reply to Re: Can anyone make this regex job neater?
in thread Can anyone make this regex job neater?

This smells a lot like homework, so I'm not posting a full answer -- hints only!

Heh - well, not really, since I already have a solution.

It was just that I was curious to see if such a task could be handled with a single regex, or whether my instinct to break the job into seperate regexes was correct.

Thanks to all for input - it looks like my original instinct was correct. Other solutions exist (of course - hey, it's perl), but split is inconvenient, since I'd need to rejoin later. Maybe it would be a little faster - I dunno - but (high) speed isn't really an issue with this particular job.

Once again, thanks to all, and if someone does come up a single regex, I'd still be curious to see it.

Here, btw, is the full code I'm using - if anyone notices anything dangerous or just plain silly, feel free to comment. Always happy to learn.

use strict; open(IN, $ARGV[0])||die "Cannot open $ARGV[0] for read:$!\n"; my @lines = <IN>; close IN||die "Cannot close $ARGV[0]:$!\n"; open(OUT, ">$ARGV[0]")||die "Cannot open $ARGV[0] for write:$!\n"; foreach(@lines){ if(m/(^OBX\|)([^\|]*\|){2}([^\^]*)(.*$)/){ my $pre = $1 . $2; my $read = $3; my $post = $4; $read =~ s/[A-Z]/U$&/g; $read =~ s/[a-z]/L$&/g; $_ = $pre . $read . $post; } print OUT||die "Cannot write to $ARGV[0]:$!\n"; } close OUT||die "Cannot close $ARGV[0]:$!\n";
Tom Melly, tom@tomandlu.co.uk

Replies are listed 'Best First'.
Re^3: Can anyone make this regex job neater?
by GrandFather (Saint) on Oct 11, 2005 at 19:56 UTC

    Some minor tidyling (?), a little bug fixing and a demonstration of how sample code can be altered slightly to make it run stand alone:

    use warnings; use strict; my @lines = <DATA>; for my $line (@lines){ chomp $line; if($line =~ m/(^OBX\|)([^\|]*\|)([^\^]*)(.*$)/){ my ($pre, $read, $post) = ($1 . $2, $3, $4); $read =~ s/([a-zA-Z])/($1 lt 'a' ? 'U' : 'L').$1/ge; $line = "$pre$read$post"; } print "$line\n"; } __DATA__ NTE||L|obr note OBX|NM|aaA..^Haem^RD2|7.5|g/dL|13.0-18.0|OR| NTE||L|obx note 1/1 for 3058 OBX|NM|dBf..^TWC^RD2|8.9|10*9/L|4.0-11.0||

    Prints:

    NTE||L|obr note OBX|NM|LaLaUA..^Haem^RD2|7.5|g/dL|13.0-18.0|OR| NTE||L|obx note 1/1 for 3058 OBX|NM|LdUBLf..^TWC^RD2|8.9|10*9/L|4.0-11.0||

    Note the use of <DATA> and the data section to provide the test data without requiring an additional file that has to be created somewhere and will likely require editing the code to hook up anyway. Note too that the output data is provided to make the expected outcome clear.

    The two prefixing lines were changed to a single line that uses a single regext to do the prefixing. Not sure it's clearer than your code though. The same could be said for the list assignment - saves lines, but probably obfusicates the code: your call.


    Perl is Huffman encoded by design.
Re^3: Can anyone make this regex job neater?
by blazar (Canon) on Oct 12, 2005 at 07:47 UTC
    It was just that I was curious to see if such a task could be handled with a single regex, or whether my instinct to break the job into seperate regexes was correct.
    I'm sure that some regex wizard may come up with a single regex to do it. If nothing else, you can always put some code into the replacement part of s/// and you can do other matches or substitutions in it. Whatever, it would be clumsy and unncessary...
    Thanks to all for input - it looks like my original instinct was correct. Other solutions exist (of course - hey, it's perl), but split is inconvenient, since I'd need to rejoin later. Maybe it would be a little faster - I dunno - but (high) speed isn't really an issue with this particular job.
    I really can't see why it should be "inconvenient": do you happen to work on some inner component of your car without unmounting it first? I suppose you do, instead, and then remount everything together when you're done, don't you? Here it's reasonable to break your string into chunks and operate on them, especially since
    • you have to operate on the first one to decide whether to proceed with the rest or pass to the next line and
    • you must do a substitution on the third one.
    This is IMNSHO the cleanest WTDI.

    All in all I would do it like thus:

    #!/usr/bin/perl -lpi use strict; use warnings; my @chunks=split /\|/; next unless $chunks[0] eq 'OBX'; s/(?=[A-Z])/U/g, s/(?=[a-z])/L/g for $chunks[2]; $_=join '|', @chunks; __END__
    Note how simple and concise the effective code is.