Re: Removing multiple trailing comment lines from a string

G'day eyepopslikeamosquito,

I can see what you've done to run the tests; however, I don't know how that translates to your real-world code. The solution I've provided below is substantially different from your code. The main differences are:

In your code, you pass the entire INI file as a string to &get_section and parse it with a regex. You do this every time that function is called. In my solution, I read the INI file once, clean it up and store the result in a hash (&get_clean_ini_data). &get_section now only contains a single statement which accesses the data in that hash.
I've reduced your four whitespace removal regexes to a single regex: s/^\s*(.*?)\s*$/$1/.
There's only one other regex (for capturing the section name): /^\[([^]]+)/.
The removal of trailing comments is done by &strip_trailing_comments. This simply works backwards through a section's lines; removing comments until a non-comment line is found. The index function, rather than a regex, is used to identify these comments.
I've also added a [WhitespaceSection] with test data for checking the whitespace cleanup.
You could probably adapt this to your real-world requirements by making the INI filename an argument to &get_clean_ini_data; adding an open statement; and changing <DATA> to <$ini_fh>. I think everything else should work as is.

Here's "pm_1178405_ini_file_clean.pl":

#!/usr/bin/env perl -l

use strict;
use warnings;

get_clean_ini_data();

for (qw{MySection AnotherSection WhitespaceSection}) {
    print "Contents of '$_':\n", get_section($_);
}

{
    my %section_lines_for;

    sub get_clean_ini_data {
        my $current_section;

        while (<DATA>) {
            s/^\s*(.*?)\s*$/$1/;
            next unless length;

            if (/^\[([^]]+)/) {
                my $new_section = $1;
                strip_trailing_comments($current_section);
                $current_section = $new_section;
            }
            else {
                push @{$section_lines_for{$current_section}}, $_;
            }
        }

        strip_trailing_comments($current_section);
    }

    sub strip_trailing_comments {
        my $section = shift;

        return unless defined $section;

        for my $i (reverse 0 ..  $#{$section_lines_for{$section}}) {
            if (0 == index $section_lines_for{$section}[$i], ';') {
                pop @{$section_lines_for{$section}};
            }
            else {
                last;
            }
        }
    }

    sub get_section { join "\n", @{$section_lines_for{$_[0]}} }
}

__DATA__
[MySection]
; This is a comment line for MySection
fld1 = 'value of field 1' 
fld2 = 42

; This is the heading for AnotherSection
[AnotherSection]
; another comment
asfld=69

 ; Heading for WhitespaceSection
[WhitespaceSection]
    ; Comment starting with a tab
     ; Comment starting with a tab and a space
 ; Comment starting with a space
; Comment ending with a tab 
; Comment ending with a tab and a space  
; Comment ending with a space 
     ; tab+space+comment+space+tab  
    ; space+tab+comment+tab+space    

qwe=rty
asd=fgh
; trailing 1
    ; tab + trailing 2
 ; space + trailing 3
; trailing 4
[download]

Output:

$ pm_1178405_ini_file_clean.pl
Contents of 'MySection':
; This is a comment line for MySection
fld1 = 'value of field 1'
fld2 = 42
Contents of 'AnotherSection':
; another comment
asfld=69
Contents of 'WhitespaceSection':
; Comment starting with a tab
; Comment starting with a tab and a space
; Comment starting with a space
; Comment ending with a tab
; Comment ending with a tab and a space
; Comment ending with a space
; tab+space+comment+space+tab
; space+tab+comment+tab+space
qwe=rty
asd=fgh
[download]

Because whitespace is difficult to see (especially differentiating spaces from tabs), I passed the script and output through `cat -vet`. I used this for my own testing; you might also find it useful. The relevant parts are in the spoiler.

— Ken

Comment on Re: Removing multiple trailing comment lines from a string Select or Download Code


Clear questions and runnable code get the best and fastest answer
	PerlMonks