Popcorn Dave has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

I am trying to figure out what kind of regex would replace the 4th space in a line of data with \n.

I'm working with name and address data and printing them to a PDF file to print on labels. However, some of the names are spanning in to the next column of labels. My thought was to replace the 4th instance of \s with \n so that:

School of Wisdom and Knowledge College Preparatory <PR> would become

School of Wisdom and
Knowledge College Preparatory

All I could think to do, as a regex, was to:

$name =~ s/.+\s.+\s.+\s.+\s/.+\s.+\s.+\s.+\n/;

but that got nowhere at all.

Is it even possible to count the number of matches in a regex without using some kind of loop? Is it even possible to replace on the 4th occurence of something inside a regex at all?

Thanks in advance!

Some people fall from grace. I prefer a running start...

Replies are listed 'Best First'.
Re: Need help with regex to replace 4th \s with \n in data line
by japhy (Canon) on Sep 05, 2002 at 23:22 UTC
    Here's another way to do it:
    while ($name =~ /\s/g) { substr($name, $-[0], 1, "\n") if ++$i % 4 == 0; pos($name) = $-[0] + 1; }

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Need help with regex to replace 4th \s with \n in data line
by Django (Pilgrim) on Sep 05, 2002 at 23:49 UTC

    It might be better not to count the spaces, but the characters, which will be a better guess for the actual line length. You can wrap any string to a specified minimum and maximum chars per line (here:{3,30}) with that regex:

    $_ = 'School of Wisdom and Knowledge College Preparatory'; s/(.{3,30})\s+(.+?)/$1\n$2/g; print;

    ~Django
    "Why don't we ever challenge the spherical earth theory?"

      Actually you're right in that thinking. After looking at some of my data, I realized that the 4 space theory is going to blow up on a name like John F. Kennedy Middle School - which I believe is small enough to fit on the label.

      I had been working out in my head how to go about checking actual string length and make the decision then, but your solution is excellent! ++!

      Some people fall from grace. I prefer a running start...

Re: Need help with regex to replace 4th \s with \n in data line
by sauoq (Abbot) on Sep 05, 2002 at 23:05 UTC
    $ perl -le '$_="1 2 3 4 5 6"; s/^((?:[^\s]*\s){3}[^\s]*)\s/$1\n/ and p +rint' 1 2 3 4 5 6

    That might be more readable as:

    s/^([^\s]*(?:\s[^\s]*){3})\s/$1\n/

    or expanded:

    s/^([^\s]*\s[^\s]*\s[^\s]*\s[^\s]*)\s/$1\n/

    The idea is to match anything that is not a space followed by a space and 0 or more non-spaces 3 times and capture it. Then match another space and replace all of that with what you captures and a newline at the end.

    Expanding and commenting gives us:

    s/ ^ # Start at the beginning. ( # Start capturing. [^\s]* # 0 or more non-space. \s [^\s]* # A space and 0 or more non-spaces... once. \s [^\s]* # twice. \s [^\s]* # three times. ) # Stop capturing. \s # Match another space. /$1\n/x # Replace with what we captured and a newline.

    Of course, that all looks rather ugly with the negated char classes written like that. You should probably use \S instead. So, finally:

    s/^(\S*(?:\s\S*){3})\s/$1/n/
    -sauoq
    "My two cents aren't worth a dime.";
    
Re: Need help with regex to replace 4th \s with \n in data line
by jmcnamara (Monsignor) on Sep 05, 2002 at 23:38 UTC

    The regex methods have been done so here is another approach:
    #!/usr/bin/perl -wl use strict; my $str = "School of Wisdom and Knowledge College Preparatory"; my $pos = 0; $pos = 1 + index $str, " ", $pos for 1..4; substr $str, $pos -1, 1, "\n" if $pos; print $str;

    --
    John.

Re: Need help with regex to replace 4th \s with \n in data line
by zigdon (Deacon) on Sep 05, 2002 at 23:10 UTC
    Well, there's a few ways to go at it that I can think of:
    $name =~ s/(.*?\s.*?\s.*?\s.*?)\s/$1\n/;
    basicly, we want to take a few characters off as possible, and count our spaces... also, we have to save the first part of the replacement (hence the ()'s), since we don't want to throw it away.

    we should allow for multiple spaces in a row, since I think those should count as one:

    $name =~ s/(.*?\s+.*?\s+.*?\s+.*?)\s+/$1\n/;
    You could also write the same thing here as:
    $name =~ s/((?:.*?\s+){3}.*?)\s/$1\n/;
    and just repeat a section 3 times. A similar approach would be to replace the .*? parts with \S+ - I think this will be faster, as the regex engine will have less options on how ot match:
    $name =~ s/((?:\S+\s+){3}\S+)/$1\n/;
    Hope that helps! See man perlre for lots more info!

    -- Dan

Re: Need help with regex to replace 4th \s with \n in data line
by Popcorn Dave (Abbot) on Sep 05, 2002 at 23:28 UTC
    Boy, just when I think I'm making progress in regexes. : )

    Thanks for both of those!

    Some people fall from grace. I prefer a running start...

    Update: After further contemplation, I have come to the realization that my initial idea was at fault since I was using PDF::Labels module and it's set up to take lines of text, not lines with line breaks.

    Thanks to Django for the excellent suggestion of splitting based on length here.

Re: Need help with regex to replace 4th \s with \n in data line
by fglock (Vicar) on Sep 06, 2002 at 14:53 UTC