Re: How do I match lines of 40 characters long in a block of text?
by sauoq (Abbot) on Sep 25, 2002 at 20:06 UTC
|
This really isn't as hard as everyone has made it out to be. You almost had it, but you need to anchor so as not to match partial lines. You also probably want to avoid matching the empty string. I'm guessing that you really want "1 to 4 lines each consisting of up to 40 characters followed by a newline."
/
( # Assuming you want capture these lines.
(?: # Group each line.
^ # Beginning of the line.
.{0,40}\n # 0 to 40 characters followed by a newline.
){1,4} # 1 to 4 lines. (0 will permit an empty match.)
) # Done capturing.
/mx; # /m so that ^ anchor works, /x for comments.
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] |
|
|
I think you are probably right about what he actually needs, re: 1 to 4 rather than 0 to 4, but there is a possibility that yours won't cater for: A string containing < 40 chars but no newline..
It's probably a spurious requirement, but trying to achieve it hung me up for ages.
(Knowing you, you'll add a 4 character, positively backward, forward-looking, zero-width assertion to your regex and acheive that too:)
Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!
| [reply] [d/l] |
|
|
but there is a possibility that yours won't cater for: A string containing < 40 chars but no newline..
You're right. Mine doesn't account for it. I guess I assumed they all would end with newlines. That was a bad assumption on my part. Of course, I might blame it on poorly stated requirements. :-)
(Knowing you, you'll add a 4 character, positively backward, forward-looking, zero-width assertion to your regex and acheive that too:)
Nah... it should be easier than that. Use a $ to match the end of the line (not including the newline) and then \n? to match an optional newline. So, I tried that:
/
( # Assuming you want capture these lines.
(?: # Group each line.
^ # Beginning of the line.
.{0,40}$\n? # 0 to 40 chars, an end-of line and optional newline
+.
){1,4} # 1 to 4 lines. (0 will permit an empty match.)
) # Done capturing.
/mx; # /m so that ^ anchor works, /x for comments.
But that didn't work! I was vexed until I realized that looks an awful lot like "match 0 to 40 characters followed by $\ followed by an optional "n". So, then I tried:
/
( # Assuming you want capture these lines.
(?: # Group each line.
^ # Beginning of the line.
.{0,40}$ # 0 to 40 characters followed by an end-of-line.
\n? # An optional newline.
){1,4} # 1 to 4 lines. (0 will permit an empty match.)
) # Done capturing.
/mx; # /m so that ^ anchor works, /x for comments.
And that worked like a charm.
That additional requirement did make the whole exercise more fun. There is another workaround. Sometime before I actually figured out why it was breaking, I tried (?:\n|\Z) and that worked as well but I thought it was ugly. So, I'm left wondering whether there is a better way around it than using /x and whitespace.
Thanks for making this so much more entertaining. :-)
Update: This was my 300th node! :-)
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] [select] |
Re: How do I match lines of 40 characters long in a block of text?
by Zaxo (Archbishop) on Sep 25, 2002 at 19:25 UTC
|
my $foo =
"1234567890123456789012345678901234567890
1234567890123456789012345678901234567890
12345678901234567890123456789012345678901
123456789012345678901234567890123456789012
1234567890123456789012345678901234567890
1234567890123456789012345678901234567890
1234567890123456789012345678901234567890
1234567890123456789012345678901234567890";
my ($this, $prev, @short) = (0,0);
while (@short < 4) {
$this = index $foo, "\n", $prev;
push @short, substr $foo, $prev, $this - $prev
if $this - $prev <= 40;
$prev = $this + 1;
}
{
local $, = $/;
print @short;
}
I expanded the data to show it's doing the right thing.
After Compline, Zaxo | [reply] [d/l] |
Re: How do I match lines of 40 characters long in a block of text?
by fglock (Vicar) on Sep 25, 2002 at 19:04 UTC
|
if ($foo =~ m/(?:.{0,40}\n){0,4}/m) { print "ok" }
/m is for "multi-line" match
update: see bart below for why is this wrong . See sauoq answer instead.
update: now I've got it (I did learn something today!) :
The RE is: /((?:[^\n]{0,40}\n){0,4})/
or: /((?:.{0,40}\n){0,4})/ # thanks sauoq!
roughly meaning: group( (up-to-40 non-line-breaks, line-break) up-to-4 times)close-group
Test code:
$line = "x" x 38 . "\n";
sub test {
($res) = ($_[0] =~ /((?:[^\n]{0,40}\n){0,4})/);
if ($res) {
print "yes\n";
}
else {
print "no\n";
}
}
test( "x$line" x 4 ); # 39 x 4
test( "xx$line" x 4 ); # 40 x 4
test( "xxx$line" x 4 ); # 41 x 4
test( "xx$line" x 3 ); # 40 x 3
test( "x$line" x 5 ); # 39 x 5
output:
yes
yes
no
yes
yes
Thanks thelenm and BrowserUk. sauoq got something very similar too.
sauoq noted that [^\n] is the same as a "dot". | [reply] [d/l] [select] |
|
|
Won't that match 0 to 4 lines of (0 to 40 characters + nl)?
Update:Of course it wil, cos that's what he asked for! D'oh!
Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!
| [reply] [d/l] |
|
|
I'm still curious why he would want to match zero lines. I'd expect him to want to match at least one line.
And: as soon as you ask for at least one line, without an "^" anchor, your first line might contain more than 40 characters, only the regex it will only grab the last 40 ones of that line!
In short: it is definitely not a bad idea to add an anchor.
/^((?:.{0,40}\n){0,4})/;
or
/^((?:.{0,40}\n){1,4})/m;
The latter case can grab 1 to 4 whole lines anywhere in your text.
| [reply] [d/l] [select] |
|
|
|
|
| [reply] |
|
|
|
|
The /m modifier only changes the behavior of "^" and "$" (whether they match at embedded newlines). Since your regex doesn't use either of those, the /m has no effect.
-- Mike
--
just,my${.02}
| [reply] |
|
|
Your update isn't really any different than your first crack at it. A dot is equivalent to your character class, [^\n], as long as there is no /s modifier on the regex. A dot means "match any character except for a newline."
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] |
Re: How do I match lines of 40 characters long in a block of text?
by BrowserUk (Patriarch) on Sep 25, 2002 at 19:05 UTC
|
Update: Answered the wrong question.
This is tougher than it looks.
Try m/(.{40}\n){4}/. It works for me...
perl> $" = ''
perl> $s = ((qq(@{[0..8]}) x 4).$/)x4
perl> print $s
012345678012345678012345678012345678
012345678012345678012345678012345678
012345678012345678012345678012345678
012345678012345678012345678012345678
perl> print 'Matched'.$/ if ( $s =~ m/(.{40}\n){4}/ )
perl> $s = ((qq(@{[0..9]}) x 4).$/)x4
perl> print $s
0123456789012345678901234567890123456789
0123456789012345678901234567890123456789
0123456789012345678901234567890123456789
0123456789012345678901234567890123456789
perl> print 'Matched'.$/ if ( $s =~ m/(.{40}\n){4}/ )
Matched
perl>
Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring! | [reply] [d/l] [select] |
|
|
Sorry, I rephrased my question wrong. I wanted it to match lines from 0 to 40 characters but no more than 40 characters. thanks.
| [reply] |
Re: How do I match lines of 40 characters long in a block of text?
by BrowserUk (Patriarch) on Sep 25, 2002 at 21:19 UTC
|
This (finally) will do what you asked. It will test a string and determine if it contains 0 to 4 lines, and if it has 1 or more lines, that each of those lines is less than 40 characters.
It wont search a scalar that has more than 4 lines and determine if there are 4 consecutive lines of less than fourty characters which I think is what Zaxo's solution does, and could be what you want, but it ain't what you asked for:)
To put it another way, it will fail if the string has more than 4 lines or if any of the lines it contains are > 40 chars. Is that what you wanted? Should it be true if it contains no lines? Or rather, some chars with no newline?
Oh, and its a 'pure regex' solution. (For some definition of pure:)
#! perl -sw
use strict;
local $"='';#"
my ($tmp1, $tmp2);
for my $lines (0..5) {
my $willmatch = ((qq(@{[0..9]}) x 4).$/)x $lines;
my $half = int($lines/2);
my $wontmatch = ((qq(@{[0..10]}) x 4).$/)x$half . 'X' . ((qq(@{[0.
+.9]}) x 4).$/)x(1+$lines-$half);
for ($willmatch, $wontmatch) {
# if the number of lines inthe string is less than 4
if( ($tmp1=()=m/\n/mg) <=4
# and they are all < 40 chars
and ( $tmp2=()=m/^(?:[^\n]{0,40}\n)/mg ) == $tmp1
) {
print "\nMatches! ($tmp1 lines with $tmp2 < 40 chars)\n";
}
else {
print "\nNo Match! ($tmp1 lines with only $tmp2 < 40 chars
+)\n";
}
print $_.$/;
}
}
__END__
For the results of the test click to | [reply] [d/l] [select] |