comment on

Thanks, yet again. The Benchmark qw(:hireswallclock) module really shows the difference between the two code segments:

        my $t0 = Benchmark->new;
        # build TOC
                 
        foreach my $h (split(/\n/, $page)) {
            if ($h =~ m/^={2,3}\s/) {
                push(@toc, $file."\t".$h);
            }
        }
        my $t1 = Benchmark->new;
        my $td = timediff($t1, $t0);
        print "the code took:",timestr($td),"\n";
[download]

and

        my $t0 = Benchmark->new;
        # build TOC
        push(@toc,
              map(m/^={2,3}\s/ ? $file."\t".$_ : (),
                   split(/\n/, $page) ) );  
        my $t1 = Benchmark->new;
        my $td = timediff($t1, $t0);
        print "the code took:",timestr($td),"\n";
[download]

With the data sets I have, the first one is about a second faster per subset of data. I hope those two segments are similar enough to compare. I don't think I can use substr() because the pieces sought are of variable length and alternations slows it way down.

        foreach my $h (split(/\n/, $page)) {
            if (substr($h,0,3) eq '== ' 
                 or substr($h,0,4) eq '=== ') {
                push(@toc, $file."\t".$h);
            }
        }
[download]

That adds up to 1.5 seconds.

For what it's worth, the data is by that point markdown text — and a lot of it. The sources are OpenDocument Format word processing files. I got the first 90% of the task done in pure XML using XML::Twig in two afternoons, thanks to advice here. It's the second 90%: dealing with nested lists, cross-references, and individual chapters which is taking even more time. It seems that few are using sections within their ODF files, and so "chapters" end up being all part of the same parent element. Parkinson's law applies as well.

In reply to Re^4: Speed comparison of foreach vs grep + map by mldvx4
in thread Speed comparison of foreach vs grep + map by mldvx4

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.