pjc955 has asked for the wisdom of the Perl Monks concerning the following question:

This is what I have:
|xx-xxx-xxxxx-xxx x/xx|xx-xxxx-xxx-xx-xxxx-xx-xx|
This is what I want:
xx-xxx-xxxxx-xxx,x/xx,xx,xxxx,xxx,xx,xxxx,xx,xx,

To summarize, i want to remove the pipes and replace them with commas except for the first pipe. I then want to replace all dashes with commas ONLY after the first 3 occurrences of dashes that appear from left to right. I know how to use a regular expression to replace all dashes and pipes with commas but I need to keep the first three dashes in tact.

1) How do i change all dashes to commas except the first three?

2) is there a way to do the first question along with removing white spaces, replacing pipes with commas (except first) all in one line?

Thank you in advance!

Replies are listed 'Best First'.
Re: REGEX detailed character replace
by CountZero (Bishop) on Nov 11, 2008 at 21:40 UTC
    Although not in one regex, this works:
    use strict; my $string = '|xx-xxx-xxxxx-xxx x/xx|xx-xxxx-xxx-xx-xxxx-xx-xx|'; print "$string\n"; $string =~ s/\|//; # gets rid of the first pipe $string =~ s/\|/,/g; # replaces all other pipes by commas print "$string\n"; my @chunks = split '-', $string; $string = join( '-', @chunks[ 0 .. 3 ] ) . ',' . join( ',', @chunks[ 4 .. $#chunks ] ); print "$string\n";
    giving:
    |xx-xxx-xxxxx-xxx x/xx|xx-xxxx-xxx-xx-xxxx-xx-xx| xx-xxx-xxxxx-xxx x/xx,xx-xxxx-xxx-xx-xxxx-xx-xx, xx-xxx-xxxxx-xxx x/xx,xx,xxxx,xxx,xx,xxxx,xx,xx,
    Please note that it appears you also replace a space by a comma in your example.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: REGEX detailed character replace
by ikegami (Patriarch) on Nov 12, 2008 at 05:49 UTC

    How do i change all dashes to commas except the first three

    my $c = 0; s/-/$c++ < 3 ? '-' : ','/eg;
    or
    local our $c = 0; s/-(?(?{ $c++ < 3 })(?!))/,/g;
    or
    # As is, assumes at least 3 commas are found. my ($pre, $post) = /^((?:[^-]-){3})(.*)/s; $post =~ s/-/,/g; $_ = "$pre$post";
    or
    # As is, assumes at least 4 commas are found. my @parts = split(/-/, $_, -1); $_ = join(',', join('-', @parts[0..3]), @parts[4..$#parts]);

    is there a way to do the first question along with removing white spaces, replacing pipes with commas (except first) all in one line

    Yup

    s/\|//; s/[|\s]+//g; my $c = 0; s/-/$c++ < 3 ? '-' : ','/eg;
    s/\|//; s/\|/,/g; s/\s+//g; my $c = 0; s/-/$c++ < 3 ? '-' : ','/eg;

    Update: Fixed rushed code as per reply.

      s/\|//; s/\s+//g; my $c = 0; s/-/$c++ < 3 ? '-' : ','/eg;#Here pipe is not replaced by comma..Its just removing all pipes
      s/^\|//; s/\s+//g; s/\|/\,/g my $c = 0; s/-/$c++ < 3 ? '-' : ','/eg;
      This should work fine
        We can also shrink further :)
        s/\|/$a++ < 1 ? '':','/eg;s/\s+//g;s/-/$c++ < 3 ? '-' : ','/eg;
Re: REGEX detailed character replace
by ww (Archbishop) on Nov 11, 2008 at 22:08 UTC
    Consider (and, answer too, please) this question.
    • Are there guaranteed to be exactly three dashes before the first space?

    If so, you may immediately see a regex solution: have your first capture contain everything up to the first space.

    And since you say you know how to deal with your other desired replacements, the answer to question two is embedded in this node's question, above.

    And, just BTW, please use code tags ( <c>...</c> ) around data like that you've shown. See Writeup Formatting Tips and Markup in the Monastery.

Re: REGEX detailed character replace
by marcussen (Pilgrim) on Nov 12, 2008 at 05:27 UTC

    s/|(xx)-(xxx)-(xxxxx)-(xxx x/xx)|(xx)-(xxxx)-(xxx)-(xx)-(xxxx)-(xx)-(xx)|/\1-\2-\3-\4,\5,\6etc;

    Confucius says kill mosquito unless cannon
      Well, though this is rather distant from a general solution, it does answer the OP's stated question so no downvote even though the regex is scarcely more than an identity.

      It could be generalized a bit:

      $string =~ s/\|(\S+)\s([^-|]+)[-|](x+)[-|](x+)[-|](x+)[-|](x+)[-|](x+)[-|](x+)[-|](x+).*/$1,$2,$3,$4,$5,$6,$7,$8,$9,/;

      But that's still far from general, relying on "x" rather than [A-Za-z0-9]+ -- which I suspect comes closer (but still clumsily) to OP's "real" question." See, for example, ikegami's reply.

      Update: Linkified to clarify my "this"

        Granted, I should probably have been more helpful, for instance Anonymous Monk does this, but I have a tendency to get lazy/sloppy/careless when giving out fish.

        Confucius says kill mosquito unless cannon
Re: REGEX detailed character replace
by Anonymous Monk on Nov 12, 2008 at 15:05 UTC

    Learn some regex-fu. :)

    Alright, well, let's take what you have:

    |xx-xxx-xxxxx-xxx x/xx|xx-xxxx-xxx-xx-xxxx-xx-xx|

    Now re-write that in a way that we preserve what you want to keep:

    |(xx)-(xxx)-(xxxxx)-(xxx) (x/xx)|(xx)-(xxxx)-(xxx)-(xx)-(xxxx)-(xx)-(x +x)|

    Make that a bit more flexible:

    |(\w{2})-(\w{3})-(\w{5})-(\w{3}) (\w/\w{2})|(\w{2})-(\w{4})-(\w{3})-(\ +w{2})-(\w{4})-(\w{2})-(\w{2})|
    Now you'll match all of the interesting bits, so let's set up the inline replace:
    $mess =~ s/|(\w{2})-(\w{3})-(\w{5})-(\w{3}) (\w\/\w{2})|(\w{2})-(\w{4} +)-(\w{3})-(\w{2})-(\w{4})-(\w{2})-(\w{2})|/\1,\2,\3,\4\5,\6,\7,\8,\9, +\10,\11,\12/;
      Thank you everyone for your help! I had looked all over and couldn't find a solution and you all have provided me with multiple ways to go about what I need which is fantastic.
      I really appreciate the input.

      What these solutions seem to be missing is the OP wants to achieve a transformation, rather than capturing the individual 'x' groupings. More specifically, part of the string will be preserved in a known manner, while the rest is transformed.

      In addition, I assume the 'x' characters are stand-ins for real alphanumeric data. So, now I think we can generalize the regex a lot via these rules:

      - Skip a pipe at the beginning of the line.
      - Grab a) alphanumerics and dashes up to third occurance of dash, b) the rest.
      - In "the rest", replace pipes and dashes with commas.
      - OP says "remove spaces", but his example shows the space being replaced by a comma, so do that.

      $s='|xx-xxx-xxxxx-xxx x/xx|xx-xxxx-xxx-xx-xxxx-xx-xx|'; # in one line, if not one command ($a, $b)=$s=~/^\|?(\w+-\w+-\w+-)(.+)/; $b=~s/[-| ]/,/g; $s=$a . $b; print $s # xx-xxx-xxxxx-xxx,x/xx,xx,xxxx,xxx,xx,xxxx,xx,xx,
Re: REGEX detailed character replace
by DrHyde (Prior) on Nov 12, 2008 at 10:52 UTC
    You seem to have forgotten to show us what you've tried so far. Without that, I can't help you do your work, I can only do your work for you, and you ain't paying me enough for that.
Re: REGEX detailed character replace
by eye (Chaplain) on Nov 16, 2008 at 04:51 UTC
    > This is what I have: > |xx-xxx-xxxxx-xxx x/xx|xx-xxxx-xxx-xx-xxxx-xx-xx| > This is what I want: > xx-xxx-xxxxx-xxx,x/xx,xx,xxxx,xxx,xx,xxxx,xx,xx,
    Here's a slightly different approach to the problem. Note that the treatment of the space is different in the example and the specification.

    Since all the exceptions are at the beginning of the string, we can handle the removal of the leading pipe and then sequester the first part of the string. The remainder can be handled with the "tr" command.

    Advantages of this approach:

    • the regex is simple
    • the regex only assumes that the mysterious 'x's are not whitespace
    • "tr" is potentially better than "s" for character replacement
    #!/usr/bin/perl use strict; my $string = '|xx-xxx-xxxxx-xxx x/xx|xx-xxxx-xxx-xx-xxxx-xx-xx|'; print "$string\n"; # remove pipe, protect everything before the space $string =~ s{^[|](\S+)}{}; my $result = $1; # translate pipes, spaces, and hyphens to commas $string =~ tr[| -][,]; $result .= $string; print "$result\n";