comment on

An error msg, and a small sample of test data would have been nice.

The problem appears to be caused by the fact that when using source filters, the regex is eval'd. As your regexes contain embedded vars that require interpolation, and interpolation in eval'd regexes is prohibited by default, we need to add

use re 'eval';

to the program under test. I hoped that I could add it to the filter module itself, but that doesn't work. (Obvious why once you tried it but...). Anyway, adding that line to the top of the program under test and the filter seems to work fine again without modification from the version presented above.

A quick test prog

#! perl -slw
use strict;
use re 'eval'; #! <<< ADD THIS LINE
use My::Filter;

my ($short_line_threshold, $short_line_counter, $long_line_threshold) 
+= (40,2,50);

my $data = q[
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</line>
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</line>
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+xxxxxxxx</line>
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</line>
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</line>
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</line>
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+xxxxxxxx</line>
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</line>
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</line>
];

my $tmp;
for (1..1000) {
    $tmp = $data;
    $tmp =~ s/((?:<line>\s*(?:.{1,$short_line_threshold})<\/line>\s*){
+$short_line_counter,})(<line>\s*(?:.{$long_line_threshold,}?)<\/line>
+)/$1<\/para><para>$2/gs;
}
print $tmp;


print '=' x 20, 'Timing of regexs in ', $0, '=' x 20;
print My::Filter::report();

__END__
C:\test>testmyfilter

<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</line>
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</line>
</para><para><line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+xxxxxxxxxxxxxxxxxxxxx</line>
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</line>
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</line>
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</line>
</para><para><line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+xxxxxxxxxxxxxxxxxxxxx</line>
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</line>
<line>  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</line>

====================Timing of regexs in C:\test\testMyFilter.pl=======
+=============
2000 trials of ((?:<line>\s*(?:.{1,$short_line_threshold})</line>\s*){
+$short_line_counter,})(<line>\s*(?:.{$long_line_threshold,}?)<
/line>)
 (460.000ms total), 230us/trial
[download]

I'd like to suggest using the /x option on your regexes to make them a little more readable, but I tried it and whilst they still work, it has a significant effect upon the performance. Which as that's presumably what your trying to improve.

One minor improvement to the readablility of the output report can be obtained by changing

$My::Filter::t->start('$_')

to $My::Filter::t->start('$/$_$/')

Make sure your make the same change to the stop() line as well.

I also tried a version of the filter that used a simple numbering scheme for the start/stop labels which makes the output more readable, but makes relating the number in the report back to the individual regex in the code considerably harder. Post a reply if you want a copy of that version

I still think that if I could find a way of using the __LINE__ macro as the timer label, it would be better option than the text of the regex itself, but that doesn't work for obvious reasons.

Examine what is said, not who speaks.

The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

In reply to Re: Re: Re: Profiling regular expressions by BrowserUk
in thread Profiling regular expressions by Mur

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.