Re: Regexp substitution on variable-length ranges with embedded code?
by choroba (Cardinal) on May 26, 2021 at 11:18 UTC
|
You can use regex to extract the data, but I'd reach for something like Set::IntSpan to handle the intervals.
#!/usr/bin/perl
use warnings;
use strict;
use Set::IntSpan;
my $data = '43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 27:3:7; 27
+:3:8; 27:3:9; 65:1:4; 65:1:18';
my %result;
my $out = sub {
my $key = (keys %result)[0] or return;
print "$key:", delete $result{$key}, ';';
};
while ($data =~ /([0-9]+:[0-9]+):([0-9]+)/g) {
my ($key, $value) = ($1, $2);
if (exists $result{$key}) {
$result{$key}->U($value);
} else {
$out->();
$result{$key} = 'Set::IntSpan'->new($value);
}
}
$out->();
Output:
43:1:1-6;27:3:7-9;65:1:4,18;
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
I saw this and was going to reply with something to the effect of Set::IntSpan but I was too late and mine wouldn't have had as clever a trick with that closure coderef . . . bah.
Also as far as the original question, IMHO most times if you're thinking about embedding code inside your regexen you're pretty well already deep into (quasi-apocryphical) JWZ-quote / xkcd 1171 territory.
The cake is a lie.
The cake is a lie.
The cake is a lie.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
|
Re: Regexp substitution on variable-length ranges with embedded code?
by tybalt89 (Monsignor) on May 26, 2021 at 18:28 UTC
|
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11133054
use warnings;
$_ = '43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 27:3:7; 27:3:8;
+27:3:9; 65:1:4; 65:1:18';
print "$_\n";
s/\b(\d+:\d+:)(\d+)\K(; \1((??{$+ + 1}))\b)+/-$+/g;
print "$_\n";
Outputs:
43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 27:3:7; 27:3:8; 27:3:9
+; 65:1:4; 65:1:18
43:1:1-6; 27:3:7-9; 65:1:4; 65:1:18
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
43:1:1-6; 27:3:7-9; 65:1:4,18
I hadn't really considered such possible, but you experts make me think I may have underestimated.
| [reply] [Watch: Dir/Any] [d/l] |
|
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11133054
use warnings;
$_ = '43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 43:1:9; 43:1:10;
+ 27:3:2; 27:3:7; 27:3:8; 27:3:9; 65:1:4; 65:1:18; 65:1:28';
print "$_\n";
s/\b(\d+:\d+:)(\d+)\K(; \1((??{$+ + 1}))\b)+/-$+/g;
1 while s/\b(\d+:\d+:)[\d,-]+\K; \1([\d,-]+)/,$2/;
print "$_\n";
Outputs:
43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 43:1:9; 43:1:10; 27:3:
+2; 27:3:7; 27:3:8; 27:3:9; 65:1:4; 65:1:18; 65:1:28
43:1:1-6,9-10; 27:3:2,7-9; 65:1:4,18,28
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
One regex for both would be very complicated. But two regexes almost trivial now.
First regex should transform in comma separated list: 65:1:2,3,4,18
Second regex should apply shown technique to condense lists to ranges: 65:1:2-4,18
| [reply] [Watch: Dir/Any] |
Re: Regexp substitution on variable-length ranges with embedded code?
by kcott (Archbishop) on May 27, 2021 at 11:18 UTC
|
G'day Polyglot,
I see you already have a number of solutions.
Here's another with no complex regexes nor any CPAN requirements.
#!/usr/bin/env perl
use strict;
use warnings;
my $data = '43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 27:3:7; 27
+:3:8; 27:3:9; 65:1:4; 65:1:18';
$data .= '; 43:1:8; 65:1:9; 65:1:3; 1:1:2; 1:1:1';
my %seqs_for;
my @key_order;
for (split /; /, $data) {
my ($key, $seq) = /^(\d+:\d+):(\d+)$/;
push @key_order, $key unless exists $seqs_for{$key};
push @{$seqs_for{$key}}, $seq;
}
my @output;
for my $key (@key_order) {
my ($out, $last_seq, $end_range) = ('', -2, '');
my @sorted_seqs = sort { $a <=> $b } @{$seqs_for{$key}};
for my $i (0 .. $#sorted_seqs) {
my $seq = $sorted_seqs[$i];
if (length $out) {
if ($seq == $last_seq + 1) {
$end_range = $seq;
}
else {
$out .= "-$end_range" if $end_range;
push @output, $out;
$out = "$key:$seq";
$end_range = '';
}
}
else {
$out = "$key:$seq";
}
$last_seq = $seq;
if ($i == $#sorted_seqs) {
$out .= "-$end_range" if $end_range;
push @output, $out;
}
}
}
print join('; ', @output), "\n";
Note that I added a few more data elements mainly for testing purposes:
$data .= '; 43:1:8; 65:1:9; 65:1:3; 1:1:2; 1:1:1';
Output:
43:1:1-6; 43:1:8; 27:3:7-9; 65:1:3-4; 65:1:9; 65:1:18; 1:1:1-2
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Regexp substitution on variable-length ranges with embedded code? (updated)
by AnomalousMonk (Archbishop) on May 26, 2021 at 13:18 UTC
|
I agree "pure" regex isn't the way to go, but...
Win8 Strawberry 5.30.3.1 (64) Wed 05/26/2021 9:07:19
C:\@Work\Perl\monks
>perl
use 5.018; # need lexicals in regexes, regex extensions
use strict;
use warnings;
my @Test = (
'43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 27:3:7; 27:3:8; 27:
+3:9; 65:1:4; 65:1:18',
'987:23:45; 987:23:46; 65:1:17; 65:1:19',
);
for my $data (@Test) {
print "'$data' \n";
my $rx_base = qr{ (?> \d+ : \d+ :) }xms;
my $rx_tail = qr{ (?> \d+) }xms;
my $rx_sep = qr{ (?> ;? \s*) }xms;
my @run;
$data =~ s{
($rx_base) ($rx_tail) (?{ push @run, $^N })
(?: $rx_sep \1 ($rx_tail) (?{ push @run, $^N })
(?(?{ $run[-1] - $run[-2] != 1 }) (*F))
)+
}
{$1$2-$3}xmsg;
print "'$data' \n\n";
}
^Z
'43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 27:3:7; 27:3:8; 27:3:
+9; 65:1:4; 65:1:18'
'43:1:1-6; 27:3:7-9; 65:1:4; 65:1:18'
'987:23:45; 987:23:46; 65:1:17; 65:1:19'
'987:23:45-46; 65:1:17; 65:1:19'
(I think this could be scaled back to pre-5.10 regexes if necessary.)
Update: Here's another version that I think is a bit nicer.
It avoids "absolute" capture group variables and backreferences. It
is also not push-y, using plain scalars that are
self-initializing.
Give a man a fish: <%-{-{-{-<
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Your solution works for me, with the addition of:
use re 'eval';
(I'm on Perl 5.12.)
...but I don't understand it. Specifically, this line is the most difficult one for me:
(?(?{ $run[-1] - $run[-2] != 1 }) (*F))
Is that double-eval'ed or executed? What is the '*F' referencing, and what is the '$^N' from the lines above? I've never seen this kind of regex before.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
+----------------------+------ embedded Perl code
| |
v v
(?(?{ $run[-1] - $run[-2] != 1 }) (*F))
This is the "(?(*condition*)*yes-pattern*)" regex expression
added with Perl version 5.10 (see Extended Patterns
in perlre).
In this case, the *condition*
is the true/false result of evaluating the code. If true, the
(*F) (a.k.a. (*FAIL))
backtracking
control verb is executed and the match fails and backtracks to the
most recent successfully matched substring:
a sequence with incrementing values for $3.
... what is the '$^N' ...
The $^N Perl special variable (see
Variables related to regular expressions
in perlvar) returns the value of the most recently closed
capture group.
Give a man a fish: <%-{-{-{-<
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Regexp substitution on variable-length ranges with embedded code?
by LanX (Saint) on May 26, 2021 at 13:03 UTC
|
IF sequences are always consecutive and never interrupted, one could try to grab the first and the last of each sequence and replace each sequence in a loop.
BUT you need to be sure that neither
- neither 43:1:1; 43:1:2; 43:1:5; 43:1:6;
- nor 43:1:1; 43:1:2; 27:3:7; 43:1:3; 43:1:4;
happens.
Still I wouldn't choose to embed Perl into a regex to solve that.
That's only complicating things, because the complex logic would be only obfuscated
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Regexp substitution on variable-length ranges with embedded code?
by AnomalousMonk (Archbishop) on May 26, 2021 at 13:27 UTC
|
$seq =~ s/
(\d+):(\d+):(\d+)
(?:;|\s)*
(\1):(\2):(?{1+($3|$6)})
/$1:$2:$3-$6/xg;
FWIW, the (?{1+($3|$6)})
expression | embedded code expression
in the quoted regex just
adds 1 to the bitwise-or of $3 and $6 (treated
as numbers... I think) and then throws away the result;
it's a no-op.
Give a man a fish: <%-{-{-{-<
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Regexp substitution on variable-length ranges with embedded code? ( $^N )
by Anonymous Monk on May 26, 2021 at 12:46 UTC
|
| [reply] [Watch: Dir/Any] |
|
use v5.12;
use warnings;
my $str ='43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 27:3:7; 27:3
+:8; 27:3:9; 65:1:4; 65:1:18';
say $str;
$str =~ s#(\d+:\d+:)(\d);(?: \1((??{$^N+1}));)+#$1$2-$3#g;
say $str;
C:/Strawberry/perl/bin\perl.exe -w d:/tmp/pm/sequencer.pl
43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 27:3:7; 27:3:8; 27:3:9
+; 65:1:4; 65:1:18
43:1:1-6 27:3:7-9 65:1:4; 65:1:18
Compilation finished at Wed May 26 15:46:31
PS: beautifying with /x modifier and adding missing ; in replacement left for the interested reader. :) | [reply] [Watch: Dir/Any] [d/l] [select] |
|
use v5.12;
use warnings;
use Test::More;
my $str = '43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 27:3:7; 27:
+3:8; 27:3:9; 65:1:4; 65:1:18';
my $exp = '43:1:1-6; 27:3:7-9; 65:1:4; 65:1:18';
my $got = $str;
$got =~ s/
(
\d+:\d+:
) # $1
(
\d+ # (updated +)
) # $2 = first $^N
;
(?: # group w/o match
\s
\1
(
(??{ $^N+ 1 }) # include last_match + 1
) # $3 = next $^N
;
)+ # repeat
/$1$2-$3;/xg;
is($got, $exp, "fits");
done_testing;
Worth noting that the OP was wrong with his expectation, it's 43:1:1-6; not 43:1:1-5;
PS: still not perfect, because the input isn't terminated with a semicolon, but I'm tired now. :) | [reply] [Watch: Dir/Any] [d/l] [select] |
|
|