Regexp help

thewebsi has asked for the wisdom of the Perl Monks concerning the following question:

I have a string that looks like key: value pairs. A couple of the keys are "headings". I want to:
- Add a space above the first heading
- Indent everything from the 1st heading to the 2nd heading by 2 spaces
- Indent everything from the 2nd heading to the end of the section by 2 spaces
The heading names and end-of-section keys are known.

Here is what I came up with. It works as desired, but seems to me mighty inefficient as it traverses the string #-of-headings * #-of-items-to-indent times. I wonder if it can be done in fewer iterations...

#!/usr/bin/perl

local $/ = undef;
my $string = <DATA>;

$string =~ s/^(Heading Here:)$/\n$1/m;
while ( $string =~ s/^(Heading Here:.*\n)(\S.+)(\nAnother Heading:)$/$
+1  $2$3/gsm ) {}
while ( $string =~ s/^(Another Heading:.*\n)(\S.+)(\nEnd of section:)/
+$1  $2$3/gsm ) {}

print $string;

__DATA__
Key: Value
Item:
Another Item: Text
Etc Etc: Whatever
Heading Here:
Indent This: Stuff
By 2 spaces:
Every line: Until
The Next: Heading
Efficiently: If possible
Another Heading:
More:
Stuff:
To: Indent
End of section: Stop indenting
Random: Test
There: Is more stuff
Further down: The string
[download]

Output:

Key: Value
Item:
Another Item: Text
Etc Etc: Whatever

Heading Here:
  Indent This: Stuff
  By 2 spaces:
  Every line: Until
  The Next: Heading
  Efficiently: If possible
Another Heading:
  More:
  Stuff:
  To: Indent
End of section: Stop indenting
Random: Test
There: Is more stuff
Further down: The string
[download]

Arnon Weinberg Back2Front - The Web Site People

Comment on Regexp help Select or Download Code

Replies are listed 'Best First'.
Re: Regexp help by ikegami (Patriarch) on Sep 16, 2011 at 19:33 UTC
Seems to me this would be quite easy to implement reading a line at a time. Just keep a flag indicating whether you should be indenting or not. `while (<>) { if ( /^Heading Here:/ ) { $_ = "\n" . $_; $indent = 1; } elsif ( /^Another Heading:/ ) { $indent = 1; } elsif ( /^End of section:/ ) { $indent = 0; } elsif ( $indent ) { $_ = " " . $_; } print; }` [download]	[reply] [d/l]
Re^2: Regexp help by Kc12349 (Monk) on Sep 16, 2011 at 19:45 UTC
I agree. I might even take it a step further and write it with an eye for maintainability and allow myself to easily add new headings as needed. my @new_line_strings = ( 'Heading Here', ); my @opening_strings = ( 'Heading Here', 'Another Heading', ); my @closing_strings = ( 'End of section', @opening_strings, ); my $new_line_pattern = '^(?:' . join('\|', @new_line_strings) . ')'; my $opening_pattern = '^(?:' . join('\|', @opening_strings) . ')'; my $closing_pattern = '^(?:' . join('\|', @closing_strings) . ')'; while ( defined (my $line = <DATA> ) ) { state $indent_char = ''; $indent_char = '' if $line =~ m/$closing_pattern/; print "\n" if $line =~ m/$new_line_pattern/; print $indent_char . $line; $indent_char = ' ' if $line =~ m/$opening_pattern/; } [download]	[reply] [d/l]
Re^3: Regexp help by Anonymous Monk on Sep 16, 2011 at 20:57 UTC
This tweak adds a newline to the first heading in a series: `while ( defined (my $line = <DATA> ) ) { state $indent_char = ''; print "\n" if $line =~ m/$opening_pattern/ && ! $indent_char; $indent_char = '' if $line =~ m/$closing_pattern/; print $indent_char . $line; $indent_char = ' ' if $line =~ m/$opening_pattern/; }` [download]	[reply] [d/l]
Re^4: Regexp help by thewebsi (Scribe) on Sep 16, 2011 at 21:41 UTC
Re: Regexp help by johngg (Canon) on Sep 16, 2011 at 21:53 UTC
Not answering your question but your use of `local $/ = undef;` is not particularly local as you have not confined its effect to a logical block of code. Rather, you have changed the value from that point forward in the script. It is better to restrict the localization to where it is required so as to avoid collateral damage further down the script where line by line input might be required. You could employ a bare block `my $string; { local $/; $string = <DATA>; }` [download] Or, perhaps better, a do block `my $string = do { local $/; <DATA>; };` [download] I hope this is of interest. Update: Corrected missing semi-colon, thanks AnomalousMonk. Cheers, JohnGG	[reply] [d/l] [select]
Re: Regexp help by Anonymous Monk on Sep 16, 2011 at 21:10 UTC
This uses the range operator: `use strict; use warnings; my $head_re = qr/^(?:Heading Here\|Another Heading):/; my $end_re = qr/^End of section:/; while (<DATA>) { state $saw = 0; if (my $seq = /$head_re/ ... /$head_re\|$end_re/) { if ($seq == 1) { s/^/\n/ unless $saw++; } elsif ($seq =~ /E0$/) { $saw = 0 if /$end_re/; redo; } else { s/^/ /; } } print; }` [download]	[reply] [d/l]
Re^2: Regexp help by thewebsi (Scribe) on Sep 17, 2011 at 07:57 UTC
This didn't run as fast as the previously posted solution. It took me a while to wrap my head around the range operator, but I modified the code as follows, and now it runs faster than Kc12349's solution, though not quite as fast as ikegami's: `foreach ( split ( /^/m, $string ) ) { state $saw = 0; if ( my $seq = ( /^Heading Here:$/ .. /^Another Heading:$/ or /^Another Heading:$/ .. /^End of section:/ ) ) { redo if $seq =~ /E0$/; print $seq != 1 ? " " : ! $saw++ ? "\n" : ""; } print; }` [download] I do find this solution quite elegant though - thanks! Arnon Weinberg Back2Front - The Web Site People	[reply] [d/l]