While editing Musicbrainz, I wanted to check something and at one point needed a quick script to munge data like this:
Attacca Quartet:1,2,3,4,5,6,7,8,10,11,12,13,14,15,16,17,18,19,20,22,23 +,24,25,26,27,28 John Patitucci:1,2,3,5,8,11,12,13,14,15,16,17,19,25,26,27,28 Roomful of Teeth:1,3,5,7,9,12,14,16,19,21,24,26,28 Sō Percussion:1,3,4,5,6,7,8,11,12,13,14,15,16,18,19,21,22,23,24,2 +5,26,27,28
into this:
Attacca Quartet:1-8,10-20,22-28 John Patitucci:1-3,5,8,11-17,19,25-28 Roomful of Teeth:1,3,5,7,9,12,14,16,19,21,24,26,28 Sō Percussion:1,3-8,11-16,18-19,21-28

So I thought: Perl must have an easy way to find consecutive numbers inside a regex. And indeed it does!

Here is my script, which was originally a one-liner. It consolidates ranges in text using the (??{code}) construct in the search pattern to find consecutive numbers, and leaves everything intact that doesn't look like a number range.

It passes all the tests I threw at it, as long as the ranges are sane, non-overlapping and sorted in ascending order. I made it so that it can handle pre-existing ranges in the input, since I needed some of that code anyway and now it looks cool and has some nice internal symmetry. It does not merge duplicate ranges, nor does it try to handle whitespace. So it is basically only useful if the input data for this stage is generated by your own code (or your own data manipulations in Vim, as in my case). Definitely don't use it for processing arbitrary user input, there are good modules for that!

#!/usr/bin/perl -wp 1 while s/-(\d+),(??{1+$1})-/-/ or s/-(\d+),((??{1+$1}))\b/-$2/ or s/\b(\d+),(??{1+$1})-/$1-/ or s/\b(\d+),((??{1+$1}))\b/$1-$2/;

This was an interesting learning experience to use the (??{code}) construct! Note that I put capturing parentheses around the (?{...}) items only where it was necessary.

If you want to sort your data before consolidating the ranges, you could first do something like this, but note that this does not ignore extra text:

perl -pe 'chomp; $_ = join(",", sort { $a <=> $b } split /,/) . "\n"'

Replies are listed 'Best First'.
Re: Consolidate ranges (quick and dirty with a cool regexp)
by tybalt89 (Monsignor) on Oct 30, 2024 at 03:49 UTC

    Since you're just doing it for fun, here's an alternate way. Just change comma to dashes where needed and clean up the runs later :)

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11162490 use warnings; for ( split /^/, <<END ) Attacca Quartet:1,2,3,4,5,6,7,8,10,11,12,13,14,15,16,17,18,19,20,22,23 +,24,25,26,27,28 John Patitucci:1,2,3,5,8,11,12,13,14,15,16,17,19,25,26,27,28 Roomful of Teeth:1,3,5,7,9,12,14,16,19,21,24,26,28 S&#333; Percussion:1,3,4,5,6,7,8,11,12,13,14,15,16,18,19,21,22,23,24,2 +5,26,27,28 Has Dashes:1,3-6,7-9,11,13-20,21,22-25,30,33-34,351,520 END { print "\n$_"; print s/\b(\d+)\K,(?=(??{$1+1})\b)/-/gr =~ s/-[\d-]*-/-/gr; }

    Outputs:

    Attacca Quartet:1,2,3,4,5,6,7,8,10,11,12,13,14,15,16,17,18,19,20,22,23 +,24,25,26,27,28 Attacca Quartet:1-8,10-20,22-28 John Patitucci:1,2,3,5,8,11,12,13,14,15,16,17,19,25,26,27,28 John Patitucci:1-3,5,8,11-17,19,25-28 Roomful of Teeth:1,3,5,7,9,12,14,16,19,21,24,26,28 Roomful of Teeth:1,3,5,7,9,12,14,16,19,21,24,26,28 S&#333; Percussion:1,3,4,5,6,7,8,11,12,13,14,15,16,18,19,21,22,23,24,2 +5,26,27,28 S&#333; Percussion:1,3-8,11-16,18-19,21-28 Has Dashes:1,3-6,7-9,11,13-20,21,22-25,30,33-34,351,520 Has Dashes:1,3-9,11,13-25,30,33-34,351,520
Re: Consolidate ranges (quick and dirty with a cool regexp)
by choroba (Cardinal) on Oct 29, 2024 at 22:22 UTC
    Or, reach for CPAN and Number::Util::Range.
    #!/usr/bin/perl use warnings; use strict; use experimental qw( signatures ); use utf8; use Number::Util::Range qw{ convert_number_sequence_to_range }; sub consolidate($s) { my ($name, $range) = $s =~ /(.*):((?:[0-9]+,)+[0-9]+)/g; my @range = split /,/, $range; my $new_range = convert_number_sequence_to_range( array => \@range, min_range_len => 2, separator => '-'); return "$name:" . join ',', @$new_range }
    use Test::More tests => 4;
    
    is consolidate('Attacca Quartet:1,2,3,4,5,6,7,8,10,11,12,13,14,15,16,17,18,19,20,22,23,24,25,26,27,28'),
        'Attacca Quartet:1-8,10-20,22-28';
    
    is consolidate('John Patitucci:1,2,3,5,8,11,12,13,14,15,16,17,19,25,26,27,28'),
        'John Patitucci:1-3,5,8,11-17,19,25-28';
    
    is consolidate('Roomful of Teeth:1,3,5,7,9,12,14,16,19,21,24,26,28'),
        'Roomful of Teeth:1,3,5,7,9,12,14,16,19,21,24,26,28';
    
    is consolidate('Sō Percussion:1,3,4,5,6,7,8,11,12,13,14,15,16,18,19,21,22,23,24,25,26,27,28'),
        'Sō Percussion:1,3-8,11-16,18-19,21-28';
    
    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      Absolutely (and definitely when I'd have to write production code!), but using just basic Perl reduces my cognitive load. And that is also why I framed this as a learning experience and marked it quick & dirty. ;-)

      Also good to see the min_range_len option, as I actually don't like ranges of length 2; curious to see the default is 4, I would put it at 3.

      (Hmm, so you can use Unicode in general markup, but not in code blocks. Good to know.)

        If you are up for similar learning experiences then have a browse of the https://theweeklychallenge.org site.

        (Hmm, so you can use Unicode in general markup, but not in code blocks. Good to know.)

        Yes - you too can have hippos in your sig! Generally the rule is always to put code and data within <code>...</code> tags unless it includes utf-8 in which case enclose it in <pre>...</pre> tags instead.


        🦛

      You'll find a bunch more if you search for "intspan". I'm partial to Set::IntSpan::Fast.