G'day eyepopslikeamosquito,
I can see what you've done to run the tests;
however, I don't know how that translates to your real-world code.
The solution I've provided below is substantially different from your code.
The main differences are:
-
In your code, you pass the entire INI file as a string to &get_section
and parse it with a regex. You do this every time that function is called.
In my solution, I read the INI file once, clean it up and store the result in a hash (&get_clean_ini_data).
&get_section now only contains a single statement which accesses the data in that hash.
-
I've reduced your four whitespace removal regexes to a single regex: s/^\s*(.*?)\s*$/$1/.
-
There's only one other regex (for capturing the section name): /^\[([^]]+)/.
-
The removal of trailing comments is done by &strip_trailing_comments.
This simply works backwards through a section's lines;
removing comments until a non-comment line is found.
The index function, rather than a regex,
is used to identify these comments.
-
I've also added a [WhitespaceSection] with test data for checking the whitespace cleanup.
-
You could probably adapt this to your real-world requirements
by making the INI filename an argument to &get_clean_ini_data;
adding an open statement;
and changing <DATA> to <$ini_fh>.
I think everything else should work as is.
Here's "pm_1178405_ini_file_clean.pl":
#!/usr/bin/env perl -l
use strict;
use warnings;
get_clean_ini_data();
for (qw{MySection AnotherSection WhitespaceSection}) {
print "Contents of '$_':\n", get_section($_);
}
{
my %section_lines_for;
sub get_clean_ini_data {
my $current_section;
while (<DATA>) {
s/^\s*(.*?)\s*$/$1/;
next unless length;
if (/^\[([^]]+)/) {
my $new_section = $1;
strip_trailing_comments($current_section);
$current_section = $new_section;
}
else {
push @{$section_lines_for{$current_section}}, $_;
}
}
strip_trailing_comments($current_section);
}
sub strip_trailing_comments {
my $section = shift;
return unless defined $section;
for my $i (reverse 0 .. $#{$section_lines_for{$section}}) {
if (0 == index $section_lines_for{$section}[$i], ';') {
pop @{$section_lines_for{$section}};
}
else {
last;
}
}
}
sub get_section { join "\n", @{$section_lines_for{$_[0]}} }
}
__DATA__
[MySection]
; This is a comment line for MySection
fld1 = 'value of field 1'
fld2 = 42
; This is the heading for AnotherSection
[AnotherSection]
; another comment
asfld=69
; Heading for WhitespaceSection
[WhitespaceSection]
; Comment starting with a tab
; Comment starting with a tab and a space
; Comment starting with a space
; Comment ending with a tab
; Comment ending with a tab and a space
; Comment ending with a space
; tab+space+comment+space+tab
; space+tab+comment+tab+space
qwe=rty
asd=fgh
; trailing 1
; tab + trailing 2
; space + trailing 3
; trailing 4
Output:
$ pm_1178405_ini_file_clean.pl
Contents of 'MySection':
; This is a comment line for MySection
fld1 = 'value of field 1'
fld2 = 42
Contents of 'AnotherSection':
; another comment
asfld=69
Contents of 'WhitespaceSection':
; Comment starting with a tab
; Comment starting with a tab and a space
; Comment starting with a space
; Comment ending with a tab
; Comment ending with a tab and a space
; Comment ending with a space
; tab+space+comment+space+tab
; space+tab+comment+tab+space
qwe=rty
asd=fgh
Because whitespace is difficult to see (especially differentiating spaces from tabs),
I passed the script and output through `cat -vet`.
I used this for my own testing; you might also find it useful.
The relevant parts are in the spoiler.