Re: Can anyone make this regex job neater?
by jeffa (Bishop) on Oct 11, 2005 at 16:40 UTC
|
my @line = split('\|', $_);
if ($line[0] eq 'NTE') {
# parse NTE line
}
elsif ($line[0] eq 'OBX') {
# parse OBX line
}
else {
# got something unexpected
}
I try to avoid using $1, $2, etc. directly.
| [reply] [d/l] |
|
|
$ perl -e 'print join("::", split q"\|", q"OBX|foo|blah|||"),$/'
OBX::foo::blah
$ perl -e 'print join("::", split q"\|", q"OBX|foo|blah|||", -1),$/'
OBX::foo::blah::::::
| [reply] [d/l] |
Re: Can anyone make this regex job neater?
by Roy Johnson (Monsignor) on Oct 11, 2005 at 17:01 UTC
|
You can match and edit substr:
if (/^OBX(?:[^|]*|){2}([^^]*)/) {
substr($_, $-[1], length($1)) =~ s/(?=([a-z]))/$1 > 'Z' ? 'L' : 'U
+'/ge;
}
Can't test it, so it might need some tweaking.
Caution: Contents may have been coded under pressure.
| [reply] [d/l] |
Re: Can anyone make this regex job neater?
by Aristotle (Chancellor) on Oct 11, 2005 at 21:59 UTC
|
Find the location with a match, then run a replace from that point on.
#!/usr/bin/perl
use strict;
use warnings;
while(<DATA>) {
if( m{ \A OBX \| [^|]* \| }gmsx ) {
s{
\G (?: ([[:upper:]]) | ([[:lower:]]) | ([^\^]) )
}{
defined $1 ? "U$1" :
defined $2 ? "L$2" :
$3
}sexmg;
}
print;
}
__END__
NTE||L|obr note
OBX|NM|aaA..^Haem^RD2|7.5|g/dL|13.0-18.0|OR|
NTE||L|obx note 1/1 for 3058
OBX|NM|dBf..^TWC^RD2|8.9|10*9/L|4.0-11.0||
Makeshifts last the longest. | [reply] [d/l] |
Re: Can anyone make this regex job neater?
by blazar (Canon) on Oct 11, 2005 at 16:44 UTC
|
I was just curious is there was a more elegant way to do this - preferably a single s///, rather than one match and two substitutions. Thanks in advance...
This smells a lot like homework, so I'm not posting a full answer -- hints only! In any case I'd also take into account split, and then of course you can join again with '|' (too explicit a hint?!?).
Whatever, there's nothing wrong using two or more statements: it may well be doable with a single s///, but it would just be an exercise in doing it the difficult way. Your code must also be clear, mantainable and easy to understand. Incidentally you may also do
for ($stuff[2]) {
s/(?=[A-Z])/U/g;
s/(?=[a-z])/L/g;
}
| [reply] [d/l] [select] |
|
|
This smells a lot like homework, so I'm not posting a full answer -- hints only!
Heh - well, not really, since I already have a solution.
It was just that I was curious to see if such a task could be handled with a single regex, or whether my instinct to break the job into seperate regexes was correct.
Thanks to all for input - it looks like my original instinct was correct. Other solutions exist (of course - hey, it's perl), but split is inconvenient, since I'd need to rejoin later. Maybe it would be a little faster - I dunno - but (high) speed isn't really an issue with this particular job.
Once again, thanks to all, and if someone does come up a single regex, I'd still be curious to see it.
Here, btw, is the full code I'm using - if anyone notices anything dangerous or just plain silly, feel free to comment. Always happy to learn.
use strict;
open(IN, $ARGV[0])||die "Cannot open $ARGV[0] for read:$!\n";
my @lines = <IN>;
close IN||die "Cannot close $ARGV[0]:$!\n";
open(OUT, ">$ARGV[0]")||die "Cannot open $ARGV[0] for write:$!\n";
foreach(@lines){
if(m/(^OBX\|)([^\|]*\|){2}([^\^]*)(.*$)/){
my $pre = $1 . $2;
my $read = $3;
my $post = $4;
$read =~ s/[A-Z]/U$&/g;
$read =~ s/[a-z]/L$&/g;
$_ = $pre . $read . $post;
}
print OUT||die "Cannot write to $ARGV[0]:$!\n";
}
close OUT||die "Cannot close $ARGV[0]:$!\n";
Tom Melly, tom@tomandlu.co.uk
| [reply] [d/l] |
|
|
use warnings;
use strict;
my @lines = <DATA>;
for my $line (@lines){
chomp $line;
if($line =~ m/(^OBX\|)([^\|]*\|)([^\^]*)(.*$)/){
my ($pre, $read, $post) = ($1 . $2, $3, $4);
$read =~ s/([a-zA-Z])/($1 lt 'a' ? 'U' : 'L').$1/ge;
$line = "$pre$read$post";
}
print "$line\n";
}
__DATA__
NTE||L|obr note
OBX|NM|aaA..^Haem^RD2|7.5|g/dL|13.0-18.0|OR|
NTE||L|obx note 1/1 for 3058
OBX|NM|dBf..^TWC^RD2|8.9|10*9/L|4.0-11.0||
Prints:
NTE||L|obr note
OBX|NM|LaLaUA..^Haem^RD2|7.5|g/dL|13.0-18.0|OR|
NTE||L|obx note 1/1 for 3058
OBX|NM|LdUBLf..^TWC^RD2|8.9|10*9/L|4.0-11.0||
Note the use of <DATA> and the data section to provide the test data without requiring an additional file that has to be created somewhere and will likely require editing the code to hook up anyway. Note too that the output data is provided to make the expected outcome clear.
The two prefixing lines were changed to a single line that uses a single regext to do the prefixing. Not sure it's clearer than your code though. The same could be said for the list assignment - saves lines, but probably obfusicates the code: your call.
Perl is Huffman encoded by design.
| [reply] [d/l] [select] |
|
|
#!/usr/bin/perl -lpi
use strict;
use warnings;
my @chunks=split /\|/;
next unless $chunks[0] eq 'OBX';
s/(?=[A-Z])/U/g,
s/(?=[a-z])/L/g for $chunks[2];
$_=join '|', @chunks;
__END__
Note how simple and concise the effective code is. | [reply] [d/l] [select] |
Re: Can anyone make this regex job neater?
by TedPride (Priest) on Oct 11, 2005 at 18:09 UTC
|
use strict;
use warnings;
while (<DATA>) {
s/\|(.*?)\|(.*?)\^/"|$1|".case($2)."^"/e
if !index $_, 'OBX';
print;
}
sub case {
my $s = $_[0];
$s =~ s/[A-Z]/U$&/g;
$s =~ s/[a-z]/L$&/g;
return $s;
}
__DATA__
NTE||L|obr note
OBX|NM|aaA..^Haem^RD2|7.5|g/dL|13.0-18.0|OR|
NTE||L|obx note 1/1 for 3058
OBX|NM|dBf..^TWC^RD2|8.9|10*9/L|4.0-11.0||
| [reply] [d/l] |
Re: Can anyone make this regex job neater?
by Skeeve (Parson) on Oct 11, 2005 at 21:11 UTC
|
Just for the sake of having a one-liner:
s#^(OBX\|[^|]+\|)([^^]*)#$1 . join("",map( { s/([A-Z])/U$1/ || s/([a-z
+])/L$1/; $_ } split //,$2)) #e;
OBX
s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
| [reply] [d/l] [select] |
|
|
lol - mine turned out incredibly similar:
s#^(OBX(?:.*?\|){2})([^^]+)#$1.join'',map{/[A-Z]/?"U$_":/[a-z]/?"L$_":
+$_}split'',$2#e;
And no, I didn't read yours first :)
cLive ;-) | [reply] [d/l] |
|
|
So let's go for another one:
$_="OBX|NM|aaA..^Haem^RD2|7.5|g/dL|13.0-18.0|OR|";s#^(OBX\|[^|]+\|)([^
+^]*)#$1 . join("",map( { my $x=$_; y/A-Z/U/ || y/a-z/L/; "$_$x" } spl
+it //,$2)) #e
Update: 2 downvotes? Would be nice to know why.
s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%.+=%;.#_}\&"^"-+%*)
+.}%:##%}={~=~:.")&e&&s""`$''`"e
| [reply] [d/l] [select] |
Re: Can anyone make this regex job neater?
by Perl Mouse (Chaplain) on Oct 11, 2005 at 16:46 UTC
|
s{^(OBX\|[^|]*\|)([^^]*)\^}
{my ($pref, $str) = ($1, $2);
$str =~ s{([A-Z])}{"U$1"}g;
$str =~ s{([a-z])}{"L$1"}g;
"$pref$str"}eg;
It'll be really awkward doing this with a single s///.
| [reply] [d/l] |