in reply to String expansion script

Well that's fairly nifty. However, it has a few shortcoming... but rather than say "Here's my version which is better," I'd like to propose to you a set of enhancements, and see how you would approach it - if you're up for the game.
  1. The essential functionality is not reusable, except as a command-line tool. Extract it into a module, or at least a perl library file, so that I can call it from other perl code.
  2. The expander is hard-coded to expect only "-[1:10]-{a:b:c}-" patterns. What about "-{a:b:c}-[1:10]-" or "-[1:10]-[2:11]-" or "-{a:b:c}-{d:e:f}-" or even "-[1:10]-" or "-{a:b:c}-", etc? Make the expander recognize and expand any number of "[1:10]" and "{a:b:c}" patterns in the string.
-- 
jdporter

Replies are listed 'Best First'.
Re^2: String expansion script
by blazar (Canon) on Feb 03, 2005 at 16:57 UTC
    The essential functionality is not reusable, except as a command-line tool. Extract it into a module, or at least a perl library file, so that I can call it from other perl code.
    Well, basically you're tempting me to let my hubris take over my laziness. However consider that
    1. as a general rule I still consider myself to be at most an advanced newbie,
    2. unfortunately time is not really an option. See for example this article in clpmisc, also available from Google groups or Google groups-beta.
    The expander is hard-coded to expect only "-[1:10]-{a:b:c}-" patterns.
    No, it isn't!
    What about "-{a:b:c}-[1:10]-" or "-[1:10]-[2:11]-" or "-{a:b:c}-{d:e:f}-" or even "-[1:10]-" or "-{a:b:c}-", etc? Make the expander recognize and expand any number of "[1:10]" and "{a:b:c}" patterns in the string.

    But it already does!! (Maybe you missed the point in which I stated that I wrote this as a general purpose solution to avoid having to create many ad hoc ones.)

    I apologize for I did not pinpoint all the details and only hinted to the "format" of input strings...

    However:

    1. a range of "numbers" (but not only, thanks to Perl's smart .. operator) [<num1>:<num2>] expands to the list of numbers from <num1> to <num2> in a smart way, e.g. with the correct number of leading zeroes,
    2. a colon separated list of "words" {<word1>:<word2>:...:<wordn>} expands to that list.
    I am perfectly aware that this description is not too clear and foolproof either, but I'm confident it will shed some light on the damned thing.

    I am aware it could be improved in many other ways as well. As I wrote in the first place it is well suited to the use I'm making of it. Of course I'd be curious to see any suggestion about it both from the UI and the implementation POVs.

      D'Oh. --me (jdporter--) for not actually testing your code before making a statement about its behavior.

      So I tested your code, and it doesn't do for me what you say it does for you:

      /[09:10]/
      /09/ /10/
      /{x:y}/
      /x/ /y/
      /[09:10]/{x:y}/
      /[09:10]/x/ /[09:10]/y/
        So I tested your code, and it doesn't do for me what you say it does for you:
        You're right, there's an obvious bug. I am really surprised that it had not popped out before (I mean, because I make a relatively intense use of it).

        In the meantime I have slightly changed my mind about the "format" of input strings in a way that will add to functionality without requiring additional coding efforts, nay, probably making the code even lighter...

        I'll post an updated version ASAP, but not now. I must go to study!

      O.k., here's how I would do it. The following sub is stand-alone. I've changed the spec slightly: lists within curlies are comma-separated rather than colon separated. That seems a bit more natural to me.
      sub expand { local $_ = shift; if ( /^(.*?)\{([^}]+)\}(.*)$/ ) { my( $pre, $spec, $post ) = ( $1, $2, $3 ); return map expand($pre.$_.$post), split /,/, $spec } if ( /^(.*?)\[(\d+):(\d+)\](.*)$/ ) { my( $pre, $lo, $hi, $post ) = ( $1, $2, $3, $4 ); return map expand($pre.$_.$post), $lo .. $hi } $_ }
      One thing about it that I think could use some investigation and tweaking is whether it might be preferable to use /s or /g (or both) on the regexes.
      -- 
      jdporter

      Update: Here's a slightly different way to code it:
      sub expand { local $_ = shift; my @a; (@a=/^(.*?)\{([^}]+)\}(.*)$/)?map(expand($a[0].$_.$a[2]),split/,/, +$a[1]): (@a=/^(.*?)\[(\d+):(\d+)\](.*)$/)?map(expand($a[0].$_.$a[3]),$a[1] +..$a[2]): $_ }
        O.k., here's how I would do it. The following sub is stand-alone. I've changed the spec slightly: lists within curlies are comma-separated rather than colon separated. That seems a bit more natural to me.
        Indeed just at the same time as the infamous bug popped out, by pure coincidence I had a "real world case" in which it would have been desirable to have list expansion along with range expansion. More precisely, why using two different delimiters where one would suffice? I will use commas to separate items and colons to specify ranges and the former will have precedency over the latter. I think that the following code is self_explanatory:
        #!/usr/bin/perl -ln use strict; use warnings; sub expand; sub doit; print for expand $_; sub expand { doit split /\[(.*?)\]/, shift, -1; } sub doit { return @_ if @_ == 1; my ($pre,$pat,$post)=splice @_, 0, 3; map { doit $pre . $_ . $post, @_ } map { /(\w+):(\w+)/ ? $1..$2 : $_ } split /,/, $pat; } __END__
        Example:
        echo pre_[aa,01:03,bb]_[x,y]_post | ./xstr.pl pre_aa_x_post pre_aa_y_post pre_01_x_post pre_01_y_post pre_02_x_post pre_02_y_post pre_03_x_post pre_03_y_post pre_bb_x_post pre_bb_y_post
        Now you may wonder why I still leave in expand() whereas it's not really needed anymore. Well, this was originally meant as a quick hack, not tremendously concerned with efficiency, for example. But it is somewhat less and less so. So I though: well we can improve it allowing for quoting of "special charachters" here, for example with
        sub expand { doit split / (?<!\\)\[ (.*?) (?<!\\)\]/x, shift, -1; }
        instead of the above, and similarly for the regexen specifying for commas and colons. Of course all this would require additional works if we want (as is reasonable) to remove the quoting charachter from output and we (also quite as consistently and reasonably) want to allow one to put literal backslashes there by quoting them the same way (i.e. '\\').

        Once we have done all this, I thought, as a side effect we can have nested patterns, at which point expand() would become handy once again. On a second thought, however, which occurred something like a nanosecond later, i realized that we can't, not so naively at least for this would require to match balanced text.

        Now, if we had at our disposal Perl6's rules we could still use a split() statement like the above (modulo any syntactical change). Talking Perl5, I think it should be possible to do it anyway, possibly using some (still marked as) experimental regex feature. Or else we could resort to Text::Balanced in which case I would probably write a separate split()ty sub for clarity, but we would loose the IMHO aesthetically appealing immediateness of the

        sub expand { doit split /<SOME_REGEX>/, shift, -1; }
        construct.

        I must say that I am puzzled by the possibility of doing all this with no external module, and I think have devised a relatively simple way to do it, albeit not in one swept but with the aid of some pre and post transformations, which could be sensible after all, since we've never been tremendously concerned about efficiency in the first place, but I will save this for another post!