thewebsi has asked for the wisdom of the Perl Monks concerning the following question:

I have a string that looks like key: value pairs. A couple of the keys are "headings". I want to:
- Add a space above the first heading
- Indent everything from the 1st heading to the 2nd heading by 2 spaces
- Indent everything from the 2nd heading to the end of the section by 2 spaces
The heading names and end-of-section keys are known.

Here is what I came up with. It works as desired, but seems to me mighty inefficient as it traverses the string #-of-headings * #-of-items-to-indent times. I wonder if it can be done in fewer iterations...

#!/usr/bin/perl local $/ = undef; my $string = <DATA>; $string =~ s/^(Heading Here:)$/\n$1/m; while ( $string =~ s/^(Heading Here:.*\n)(\S.+)(\nAnother Heading:)$/$ +1 $2$3/gsm ) {} while ( $string =~ s/^(Another Heading:.*\n)(\S.+)(\nEnd of section:)/ +$1 $2$3/gsm ) {} print $string; __DATA__ Key: Value Item: Another Item: Text Etc Etc: Whatever Heading Here: Indent This: Stuff By 2 spaces: Every line: Until The Next: Heading Efficiently: If possible Another Heading: More: Stuff: To: Indent End of section: Stop indenting Random: Test There: Is more stuff Further down: The string

Output:

Key: Value Item: Another Item: Text Etc Etc: Whatever Heading Here: Indent This: Stuff By 2 spaces: Every line: Until The Next: Heading Efficiently: If possible Another Heading: More: Stuff: To: Indent End of section: Stop indenting Random: Test There: Is more stuff Further down: The string

Replies are listed 'Best First'.
Re: Regexp help
by ikegami (Patriarch) on Sep 16, 2011 at 19:33 UTC

    Seems to me this would be quite easy to implement reading a line at a time. Just keep a flag indicating whether you should be indenting or not.

    while (<>) { if ( /^Heading Here:/ ) { $_ = "\n" . $_; $indent = 1; } elsif ( /^Another Heading:/ ) { $indent = 1; } elsif ( /^End of section:/ ) { $indent = 0; } elsif ( $indent ) { $_ = " " . $_; } print; }

      I agree. I might even take it a step further and write it with an eye for maintainability and allow myself to easily add new headings as needed.

      my @new_line_strings = ( 'Heading Here', ); my @opening_strings = ( 'Heading Here', 'Another Heading', ); my @closing_strings = ( 'End of section', @opening_strings, ); my $new_line_pattern = '^(?:' . join('|', @new_line_strings) . ')'; my $opening_pattern = '^(?:' . join('|', @opening_strings) . ')'; my $closing_pattern = '^(?:' . join('|', @closing_strings) . ')'; while ( defined (my $line = <DATA> ) ) { state $indent_char = ''; $indent_char = '' if $line =~ m/$closing_pattern/; print "\n" if $line =~ m/$new_line_pattern/; print $indent_char . $line; $indent_char = ' ' if $line =~ m/$opening_pattern/; }

        This tweak adds a newline to the first heading in a series:

        while ( defined (my $line = <DATA> ) ) { state $indent_char = ''; print "\n" if $line =~ m/$opening_pattern/ && ! $indent_char; $indent_char = '' if $line =~ m/$closing_pattern/; print $indent_char . $line; $indent_char = ' ' if $line =~ m/$opening_pattern/; }
Re: Regexp help
by johngg (Canon) on Sep 16, 2011 at 21:53 UTC

    Not answering your question but your use of local $/ = undef; is not particularly local as you have not confined its effect to a logical block of code. Rather, you have changed the value from that point forward in the script. It is better to restrict the localization to where it is required so as to avoid collateral damage further down the script where line by line input might be required.

    You could employ a bare block

    my $string; { local $/; $string = <DATA>; }

    Or, perhaps better, a do block

    my $string = do { local $/; <DATA>; };

    I hope this is of interest.

    Update: Corrected missing semi-colon, thanks AnomalousMonk.

    Cheers,

    JohnGG

Re: Regexp help
by Anonymous Monk on Sep 16, 2011 at 21:10 UTC

    This uses the range operator:

    use strict; use warnings; my $head_re = qr/^(?:Heading Here|Another Heading):/; my $end_re = qr/^End of section:/; while (<DATA>) { state $saw = 0; if (my $seq = /$head_re/ ... /$head_re|$end_re/) { if ($seq == 1) { s/^/\n/ unless $saw++; } elsif ($seq =~ /E0$/) { $saw = 0 if /$end_re/; redo; } else { s/^/ /; } } print; }

      This didn't run as fast as the previously posted solution. It took me a while to wrap my head around the range operator, but I modified the code as follows, and now it runs faster than Kc12349's solution, though not quite as fast as ikegami's:

      foreach ( split ( /^/m, $string ) ) { state $saw = 0; if ( my $seq = ( /^Heading Here:$/ .. /^Another Heading:$/ or /^Another Heading:$/ .. /^End of section:/ ) ) { redo if $seq =~ /E0$/; print $seq != 1 ? " " : ! $saw++ ? "\n" : ""; } print; }

      I do find this solution quite elegant though - thanks!