Re: Stuck in komplexer regex, at least for me
by johngg (Canon) on Mar 26, 2007 at 15:49 UTC
|
From your output it looks as if you want to replace the first occurrence of multiple zeros with a single zero, any subsequent multiple zero group should be left alone. This seems to work
#!/usr/local/bin/perl
#
use strict;
use warnings;
print
map { s{0{2,}}{0}; $_ }
<DATA>;
__END__
215000007801
300000324002
890000457651
210004563401
201045139158
and the output is
21507801
30324002
890457651
2104563401
201045139158
I hope this is of use. Cheers, JohnGG
Update: Looks like I've misunderstood. If you say the first zero should not be touched, why does 890000457651 become 89457651 and not 890457651? Perhaps you could clarify what you require. | [reply] [d/l] [select] |
|
|
print
map { s{0{2,}}{0}; $_ }
<DATA>;
I know this is completely OT and that in this particular case it wouldn't make a difference, but we recommend all the time against slurping files in all at once unless really needed, so please do not spread the word against this recommendation:
s/0{2,}/0/, print while <DATA>;
is not terribly more verbose. (I also changed the curlies as delimiters in the s operator because in this case they seemed confusing to me.) | [reply] [d/l] [select] |
|
|
we recommend all the time against slurping files in all at once unless really needed
Do we? How strange. Why?
Seems to me that slurping is a perfectly valid technique in the right circumstances, for instance, parsing command output or working with small to middling data sets. People have even gone to the trouble of writing modules to support the idiom.
In this case it would appear that ultibuzz has a data set of 5000000 numbers so slurping, as it turns out, would definitely not be appropriate.
Cheers, JohnGG
| [reply] |
|
|
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: Stuck in komplexer regex, at least for me
by saintly (Scribe) on Mar 26, 2007 at 17:17 UTC
|
Well, I can make a ruleset that seems to fit:
- Always keep the first two numbers intact
- If any zeros start at position 3, remove them all.
- If you didn't do that, then if two or more zeros start after position 3, truncate them to a single 0
- If you didn't do either of the first two techniques, then remove a single 0 starting at position 3 or later entirely.
What happens if a single 0 starts at position 3? What does '35010333356' turn into?
Here's my code:
#!/usr/bin/perl
use strict;
use warnings;
use Test::More;
my %output_for
= ( '215000007801' => '21507801',
'300000324002' => '30324002',
'890000457651' => '89457651',
'210004563401' => '214563401',
'201045139158' => '20145139158', );
plan tests => scalar keys %output_for;
while ( my ($input, $correct_output) = each %output_for ) {
my $orig_input = $input;
$input = &compress_numstring( $input );
is( $input, $correct_output, "Solved '$orig_input'" );
}
sub compress_numstring {
my $starting_num = shift;
return $starting_num
unless(defined $starting_num && $starting_num =~ /^(\d{2})(\d+)/);
my( $keep, $modify ) = ($1,$2);
( $modify =~ s/^0+// ) ||
( $modify =~ s/0{2,}/0/ ) ||
( $modify =~ s/(?<!0)0([1-9]|$)/$1/ );
return $keep . $modify;
}
It passes the validation test, but it may fail further tests since the rules aren't clearly spelled out.
| [reply] [d/l] |
|
|
sub compress_numstring {
substr($_[0], 2) =~ s/(^0+|(0)0+|(?<!0)0([1-9]|$))/$2$3/;
return $_[0];
}
does the same thing as the other function, but with more job security)
Unfortunately, 'use warnings' will complain. | [reply] [d/l] |
Re: Stuck in komplexer regex, at least for me
by Moron (Curate) on Mar 26, 2007 at 15:55 UTC
|
I think you are suffering slightly from the popular compulsion to do everything in the regexp. Personally, I would have gone for the simplest thing that came to mind e.g.: my $keep = substr( $_, 0, 2 );
$_ = substr( $_, 2, length($_) );
s/0+/0;
$_ = $keep . $_;
and then also reassured myself that such a simple regexp should gain significantly in performance and should win the trade-off against having the extra substr operations which are cheap by comparison to m//.
| [reply] [d/l] |
Re: Stuck in komplexer regex, at least for me
by kyle (Abbot) on Mar 26, 2007 at 16:18 UTC
|
use Test::More;
my %output_for
= ( '215000007801' => '21507801',
'300000324002' => '30324002',
'890000457651' => '89457651',
'210004563401' => '214563401',
'201045139158' => '20145139158', );
plan tests => scalar keys %output_for;
sub solution {
# (this doesn't work)
$_[0] =~ s{ \A ([^0]+ 0) ([^0]*) 0+ ([^0]+0) }{$1$2$3}x
# ||
# $_[0] =~ s{ \A ([^0]+) 0+ ([^0]) }{$1$2}x;
}
while ( my ($input, $correct_output) = each %output_for ) {
my $orig_input = $input;
solution( $input );
is( $input, $correct_output, "Solved '$orig_input'" );
}
| [reply] [d/l] |
Re: Stuck in komplexer regex, at least for me
by saintly (Scribe) on Mar 26, 2007 at 16:26 UTC
|
Hmm, I don't think I understand the rule...
201045139158 -> 20145139158
Why is the 2nd 0 eliminated? | [reply] |
|
|
Because it's not Thursday after dark. If it were, the 0 would have been replaced with a jack and he'd be on the way to a royal fizzbin. Unless he got a kronk, but that'd just be bad luck.
I think the problem is that he doesn't understand his own spec well enough to either write the regex nor to explain it to us. It's probably time for the OP to sit down, go back to square one, and enumerate just what it is that's trying to be accomplished.
| [reply] |
Re: Stuck in komplexer regex, at least for me
by chrism01 (Friar) on Mar 26, 2007 at 22:53 UTC
|
Just out of curiosity, can you tell us what this does/is for in real life?
Seems like a wierd set of rules
Cheers
Chris | [reply] |
|
|
sure we get 12 digit numbers from another firm these numbers can vary from 4 -12 digits and are fileld up with 0 the problem is we get an letter how they fill up the numbers but they miss several possibilitys so i get the problem here decoding this 12 digit numbers to remove the right 0 and not 0 that are part of the digit. the explanation was 2 sentencis wich dosnt really help for anything ^^ and we are not in the position to force a change in their process then we woud get these numbers in paper form so we need to do workarounds to get the right numbers, or atleast not many failurs :D
kd ultibuzz
| [reply] |
|
|
I feel the need to ask, might it be easier to convert your numbers to the other firm's numbers? Or maybe not try to convert anyone's numbers, but use both yours and theirs and match them up using a database or something?
Maybe if I understood how and why they seem to add 0 in very odd places, it would be easier to figure out how to deal with this.
I'm not a regex person, so this is my stab at this:
I'm sure there is a serious performance hit for not using a regex.
| [reply] [d/l] |
|
|
Re: Stuck in komplexer regex, at least for me
by ultibuzz (Monk) on Mar 26, 2007 at 19:17 UTC
|
sorry for the bad explanation, i try to explain it better the following regex will produce this
s/(?<=\d{2})0*(?<=\d{4})0+//
output
215000007801 -> 2157801
300000324002 -> 30324002
890000457651 -> 89457651
210004563401 -> 214563401
201045139158 -> 201045139158
these 2 outputs are wrong 201045139158 -> 201045139158 215000007801 -> 2157801 shoud be 215000007801 -> 21507801 201045139158 -> 20145139158
now i try to explain the rule better,counting starts at 0 the 0 , 1 digits shoud be untouched if the 2 digit is a 0 and the next digit is a 0 remove all 0 untill first non 0 if the 2 digit a 0 and the next is a non 0 remove the 0 if the 3 digit is a 0 and follows by a 0 then remove all 0 except the 0 at 3 digit if the 3 digit is a 0 and the next digit is a non 0 remove only the 0 if the 4 digit is a 0 remove it and all following 0 untill non 0 i hope that makes it a bit clearer, i know its kinda confusing
kd ultibuzz
i will test the help u all already given tomorrow at office,thx alot
| [reply] [d/l] |
|
|
Hi, You need an anchor '^' to make sure the matchings start from the beginning of your strings..and your requirements might be written into two patterns which would be much easier to understand(the order of two s/// expressions matters)..
#!/usr/bin/perl
use warnings;
use strict;
while(<DATA>) {
s/^(\d\d[1-9])0(?=[1-9])/$1/;
s/^(\d\d(?:[1-9]0)?)0+/$1/;
print;
}
__DATA__
215000007801
300000324002
890000457651
210004563401
201045139158
Regards,
Xicheng | [reply] |
|
|
your right 2 patterns look easyer,
i am testing atm 5 million numbers and afterwards they will check with the system, then i know if all fit are some fail. same testing atm for the regex fanboy pattern ;)
thx alot for the quick and very good help kd ultibuzz
UPDATE:there is a problem with numbers like
215100069395
215100069395
215100153821
they shoud change into
215169395
215169395
2151153821
but they remained unchanged
UPDATE 2:i have it running with an if loop, if digit 2 or 3 is 0 use new pattern else my old one ^^ this isn't nice at all and i don't like it ;) | [reply] |