ww has asked for the wisdom of the Perl Monks concerning the following question:

Full disclosure: a newbie seeks your wisdom here; one who decided to re-invent 'cookie' or 'fortune' as his next learning exercise. Much of it works (and please forgive but don't hesitate to chastise the un-idiomatic or unwise useages tho I do know that the m// (s) should be in conditionals).

Several monks kindly offered pointers to references about use of \G anchors on CB t'other day, but clearly I haven't absorbed some of the wisdom at perlretut, perlre and others (and refs I'd already studied, such as the Owl book, the cookbook, etc have provided minimal enlightenment. Perhaps Friedl will help but if a monk has an online source of a clearer explanation or examples I would be most thankful).

The sticking point, however, is that I'm not getting the anchors right -- as in the minimal test instance below:

#! c:/perl -w use strict; use diagnostics; ############################################## # vars use vars qw($aphor $max $min $new @sect $sect $splBRK); $max=65; $min=55; $new=""; @sect=""; $sect[0]="x"; $splBRK="~~"; # uppercase "Z" before each double-tilde solely for ease of checking C +LI output. $aphor="Now is the time for all good men to come toZ ~~the aid of thei +r country as the quickZ ~~red fox jumps over the lazy brown dog's bac +k and the knife runs away with the spoon since this nonsense goes on +for waaay faaaaahrrr too long."; ################################################ # convert ~~ s #### if (length($aphor) > ($max+6) || $aphor =~ /~~/ ) { # plus six is a +n ARBITRARY allowed_overrun $aphor =~ s/$splBRK/\n\t/g; } else { $new = "\n\t" . $aphor; print "$new\n"; exit(); } &split2; exit(); ################################################## # sub SPLIT2 ##### sub split2 { @sect= $aphor =~ m/\G # match with anchor (?:([\w\x20]{,$min}\n)) # NONcapture (ie, grouping only) NM +T $min(letters or spaces) /gmsx; # THEN @sect=$aphor =~ m/\G (.{$min,$max} # anything of length (between $min and $max) + followed by (?:[\b\x20]) # a NONcapture group of one word boundary (In +case no space after ~~) or space )/gsx; # global, include \n in ., extended patterns print "\t$_\n" foreach @sect; if ($') { print "\t$'\n"; } }

OUTPUT (initial tabs stipped for ease of viewing):

Now is the time for all good men to come toZ the aid of their country as the quickZ red fox jumps over the lazy brown dog's back and the knife runs away with the spoon since this nonsense goes on for waaay faaaaahrrr too long. </big>

OUTPUT INTENDED:

Now is the time for all good men to come toZ + < \n in file so short line OK the aid of their country as the quickZ + < \n in file so short line OK red fox jumps over the lazy brown dog's back and the knife runs < 65 + chars, including tab & trailing space: OK away with the spoon since this nonsense goes on for waaay < 59 chars: + OK faaaaahrrr too long. < Last line: short is OK

With thanks for St. japhy's observations (below) ...and re the question on input and output: Have an ascii text file of aphorisms, fortune-style observations, jokes, etc which range from a single line of approx 40 chars to complex, multi-line quotes including fragment of a SCOTUS opinion. Some make no sense if I fail to reproduce the original linebreaks, as in this entry which may illuminate the intent:

Q: How many surrealists does it take to change a lightbulb? ~~A: Three; one to hold the giraffe and two to fill the bathtub with brightly painted machine tools.
.

Desired output would break the quote into 3 lines; the first ending before the double-tilde; the second after the last space preceding $max with the origin of $max reset to the zero-width location before "A: Three..." ; and the third containing the balance of the string.

Q: How many surrealists does it take to change a lightbulb?
A: Three; one to hold the giraffe and two to fill the bathtub
with brightly painted machine tools.

Eventually, the output of the full program will go to a constrained space on a webpage, with $min and $max set dynamically for various user screen resolutions using $ENV. Hope I'm answering the question asked.

re {,$min}, I will fix. I stole that useage from a tutorial or example on line, and --- checking later, did not find it documented, but it appeared to work. And slaps forehead, falls dazed but still shamed! \b in charclass: another fix to come!

Replies are listed 'Best First'.
Re: \G anchor useage
by japhy (Canon) on Aug 06, 2004 at 18:19 UTC
    I cannot understand what it is you're trying to accomplish. What are you trying to use \G for? Could you explain your input and output formats better?

    One mistake I can see, though, is that you have {,$min} as a quantifier. That's not a valid quantifier: you need a leading 0 in there. {0,$min}.

    Oh, another mistake: \b is only a word boundary outside of a character class. Inside a character class, it matches a backspace.

    _____________________________________________________
    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart