Can anyone make this regex job neater?

Melly has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Can anyone make this regex job neater? by jeffa (Bishop) on Oct 11, 2005 at 16:40 UTC
I would first `split` each line on the pipe character and do something with the resulting array. `my @line = split('\\|', $_); if ($line[0] eq 'NTE') { # parse NTE line } elsif ($line[0] eq 'OBX') { # parse OBX line } else { # got something unexpected }` [download] I try to avoid using $1, $2, etc. directly. jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l]
Re^2: Can anyone make this regex job neater? by Tanktalus (Canon) on Oct 11, 2005 at 16:48 UTC
Don't forget that if you split it and want to recreate it (as the OP seems to want to do), you should use the "-1" flag to split so as not to lose any blank fields at the end. `$ perl -e 'print join("::", split q"\\|", q"OBX\|foo\|blah\|\|\|"),$/' OBX::foo::blah $ perl -e 'print join("::", split q"\\|", q"OBX\|foo\|blah\|\|\|", -1),$/' OBX::foo::blah::::::` [download]	[reply] [d/l]
Re: Can anyone make this regex job neater? by Roy Johnson (Monsignor) on Oct 11, 2005 at 17:01 UTC
You can match and edit substr: `if (/^OBX(?:[^\|]\|){2}([^^])/) { substr($_, $-[1], length($1)) =~ s/(?=([a-z]))/$1 > 'Z' ? 'L' : 'U +'/ge; }` [download] Can't test it, so it might need some tweaking. Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re: Can anyone make this regex job neater? by Aristotle (Chancellor) on Oct 11, 2005 at 21:59 UTC
Find the location with a match, then run a replace from that point on. `#!/usr/bin/perl use strict; use warnings; while(<DATA>) { if( m{ \A OBX \\| [^\|]* \\| }gmsx ) { s{ \G (?: ([[:upper:]]) \| ([[:lower:]]) \| ([^\^]) ) }{ defined $1 ? "U$1" : defined $2 ? "L$2" : $3 }sexmg; } print; } __END__ NTE\|\|L\|obr note OBX\|NM\|aaA..^Haem^RD2\|7.5\|g/dL\|13.0-18.0\|OR\| NTE\|\|L\|obx note 1/1 for 3058 OBX\|NM\|dBf..^TWC^RD2\|8.9\|109/L\|4.0-11.0\|\|` [download] Makeshifts last the longest.*	[reply] [d/l]
Re: Can anyone make this regex job neater? by blazar (Canon) on Oct 11, 2005 at 16:44 UTC
I was just curious is there was a more elegant way to do this - preferably a single s///, rather than one match and two substitutions. Thanks in advance... This smells a lot like homework, so I'm not posting a full answer -- hints only! In any case I'd also take into account split, and then of course you can join again with `'\|'` (too explicit a hint?!?). Whatever, there's nothing wrong using two or more statements: it may well be doable with a single `s///`, but it would just be an exercise in doing it the difficult way. Your code must also be clear, mantainable and easy to understand. Incidentally you may also do `for ($stuff[2]) { s/(?=[A-Z])/U/g; s/(?=[a-z])/L/g; }` [download]	[reply] [d/l] [select]
Re^2: Can anyone make this regex job neater? by Melly (Chaplain) on Oct 11, 2005 at 18:59 UTC
This smells a lot like homework, so I'm not posting a full answer -- hints only! Heh - well, not really, since I already have a solution. It was just that I was curious to see if such a task could be handled with a single regex, or whether my instinct to break the job into seperate regexes was correct. Thanks to all for input - it looks like my original instinct was correct. Other solutions exist (of course - hey, it's perl), but split is inconvenient, since I'd need to rejoin later. Maybe it would be a little faster - I dunno - but (high) speed isn't really an issue with this particular job. Once again, thanks to all, and if someone does come up a single regex, I'd still be curious to see it. Here, btw, is the full code I'm using - if anyone notices anything dangerous or just plain silly, feel free to comment. Always happy to learn. `use strict; open(IN, $ARGV[0])\|\|die "Cannot open $ARGV[0] for read:$!\n"; my @lines = <IN>; close IN\|\|die "Cannot close $ARGV[0]:$!\n"; open(OUT, ">$ARGV[0]")\|\|die "Cannot open $ARGV[0] for write:$!\n"; foreach(@lines){ if(m/(^OBX\\|)([^\\|]\\|){2}([^\^])(.*$)/){ my $pre = $1 . $2; my $read = $3; my $post = $4; $read =~ s/[A-Z]/U$&/g; $read =~ s/[a-z]/L$&/g; $_ = $pre . $read . $post; } print OUT\|\|die "Cannot write to $ARGV[0]:$!\n"; } close OUT\|\|die "Cannot close $ARGV[0]:$!\n";` [download] Tom Melly, tom@tomandlu.co.uk	[reply] [d/l]
Re^3: Can anyone make this regex job neater? by GrandFather (Saint) on Oct 11, 2005 at 19:56 UTC
Some minor tidyling (?), a little bug fixing and a demonstration of how sample code can be altered slightly to make it run stand alone: `use warnings; use strict; my @lines = <DATA>; for my $line (@lines){ chomp $line; if($line =~ m/(^OBX\\|)([^\\|]\\|)([^\^])(.$)/){ my ($pre, $read, $post) = ($1 . $2, $3, $4); $read =~ s/([a-zA-Z])/($1 lt 'a' ? 'U' : 'L').$1/ge; $line = "$pre$read$post"; } print "$line\n"; } __DATA__ NTE\|\|L\|obr note OBX\|NM\|aaA..^Haem^RD2\|7.5\|g/dL\|13.0-18.0\|OR\| NTE\|\|L\|obx note 1/1 for 3058 OBX\|NM\|dBf..^TWC^RD2\|8.9\|109/L\|4.0-11.0\|\|` [download] Prints: `NTE\|\|L\|obr note OBX\|NM\|LaLaUA..^Haem^RD2\|7.5\|g/dL\|13.0-18.0\|OR\| NTE\|\|L\|obx note 1/1 for 3058 OBX\|NM\|LdUBLf..^TWC^RD2\|8.9\|10*9/L\|4.0-11.0\|\|` [download] Note the use of `<DATA>` and the data section to provide the test data without requiring an additional file that has to be created somewhere and will likely require editing the code to hook up anyway. Note too that the output data is provided to make the expected outcome clear. The two prefixing lines were changed to a single line that uses a single regext to do the prefixing. Not sure it's clearer than your code though. The same could be said for the list assignment - saves lines, but probably obfusicates the code: your call. Perl is Huffman encoded by design.	[reply] [d/l] [select]
Re^3: Can anyone make this regex job neater? by blazar (Canon) on Oct 12, 2005 at 07:47 UTC
It was just that I was curious to see if such a task could be handled with a single regex, or whether my instinct to break the job into seperate regexes was correct. I'm sure that some regex wizard may come up with a single regex to do it. If nothing else, you can always put some code into the replacement part of `s///` and you can do other matches or substitutions in it. Whatever, it would be clumsy and unncessary... Thanks to all for input - it looks like my original instinct was correct. Other solutions exist (of course - hey, it's perl), but split is inconvenient, since I'd need to rejoin later. Maybe it would be a little faster - I dunno - but (high) speed isn't really an issue with this particular job. I really can't see why it should be "inconvenient": do you happen to work on some inner component of your car without unmounting it first? I suppose you do, instead, and then remount everything together when you're done, don't you? Here it's reasonable to break your string into chunks and operate on them, especially since you have to operate on the first one to decide whether to proceed with the rest or pass to the next line and you must do a substitution on the third one. This is IMNSHO the cleanest WTDI. All in all I would do it like thus: `#!/usr/bin/perl -lpi use strict; use warnings; my @chunks=split /\\|/; next unless $chunks[0] eq 'OBX'; s/(?=[A-Z])/U/g, s/(?=[a-z])/L/g for $chunks[2]; $_=join '\|', @chunks; __END__` [download] Note how simple and concise the effective code is.	[reply] [d/l] [select]
Re: Can anyone make this regex job neater? by TedPride (Priest) on Oct 11, 2005 at 18:09 UTC
`use strict; use warnings; while (<DATA>) { s/\\|(.?)\\|(.?)\^/"\|$1\|".case($2)."^"/e if !index $_, 'OBX'; print; } sub case { my $s = $_[0]; $s =~ s/[A-Z]/U$&/g; $s =~ s/[a-z]/L$&/g; return $s; } __DATA__ NTE\|\|L\|obr note OBX\|NM\|aaA..^Haem^RD2\|7.5\|g/dL\|13.0-18.0\|OR\| NTE\|\|L\|obx note 1/1 for 3058 OBX\|NM\|dBf..^TWC^RD2\|8.9\|10*9/L\|4.0-11.0\|\|` [download]	[reply] [d/l]
Re: Can anyone make this regex job neater? by Skeeve (Parson) on Oct 11, 2005 at 21:11 UTC
Just for the sake of having a one-liner: `s#^(OBX\\|[^\|]+\\|)([^^])#$1 . join("",map( { s/([A-Z])/U$1/ \|\| s/([a-z +])/L$1/; $_ } split //,$2)) #e; OBX` [download] s$$([},&%#}/&/]+}%&{});#$&&s&&$^X.($'^"%]=\&(\|?{%.+=%;.#_}\&"^"-+%).}%:##%}={~=~:.")&e&&s""`$''`"e	[reply] [d/l] [select]
Re^2: Can anyone make this regex job neater? by cLive ;-) (Prior) on Oct 11, 2005 at 21:31 UTC
lol - mine turned out incredibly similar: `s#^(OBX(?:.*?\\|){2})([^^]+)#$1.join'',map{/[A-Z]/?"U$_":/[a-z]/?"L$_": +$_}split'',$2#e;` [download] And no, I didn't read yours first :) cLive ;-)	[reply] [d/l]
Re^3: Can anyone make this regex job neater? by Skeeve (Parson) on Oct 11, 2005 at 21:54 UTC
So let's go for another one: `$_="OBX\|NM\|aaA..^Haem^RD2\|7.5\|g/dL\|13.0-18.0\|OR\|";s#^(OBX\\|[^\|]+\\|)([^ +^])#$1 . join("",map( { my $x=$_; y/A-Z/U/ \|\| y/a-z/L/; "$_$x" } spl +it //,$2)) #e` [download] Update:* 2 downvotes? Would be nice to know why. s$$([},&%#}/&/]+}%&{});#$&&s&&$^X.($'^"%]=\&(\|?{%.+=%;.#_}\&"^"-+%*) +.}%:##%}={~=~:.")&e&&s""`$''`"e [download]	[reply] [d/l] [select]
Re: Can anyone make this regex job neater? by Perl Mouse (Chaplain) on Oct 11, 2005 at 16:46 UTC
Untested: `s{^(OBX\\|[^\|]\\|)([^^])\^} {my ($pref, $str) = ($1, $2); $str =~ s{([A-Z])}{"U$1"}g; $str =~ s{([a-z])}{"L$1"}g; "$pref$str"}eg;` [download] It'll be really awkward doing this with a single s///. `Perl --((8:>*`	[reply] [d/l]