Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Removing multiple trailing comment lines from a string

by kcott (Archbishop)
on Dec 23, 2016 at 17:19 UTC ( [id://1178440]=note: print w/replies, xml ) Need Help??


in reply to Removing multiple trailing comment lines from a string

G'day eyepopslikeamosquito,

I can see what you've done to run the tests; however, I don't know how that translates to your real-world code. The solution I've provided below is substantially different from your code. The main differences are:

  • In your code, you pass the entire INI file as a string to &get_section and parse it with a regex. You do this every time that function is called. In my solution, I read the INI file once, clean it up and store the result in a hash (&get_clean_ini_data). &get_section now only contains a single statement which accesses the data in that hash.
  • I've reduced your four whitespace removal regexes to a single regex: s/^\s*(.*?)\s*$/$1/.
  • There's only one other regex (for capturing the section name): /^\[([^]]+)/.
  • The removal of trailing comments is done by &strip_trailing_comments. This simply works backwards through a section's lines; removing comments until a non-comment line is found. The index function, rather than a regex, is used to identify these comments.
  • I've also added a [WhitespaceSection] with test data for checking the whitespace cleanup.
  • You could probably adapt this to your real-world requirements by making the INI filename an argument to &get_clean_ini_data; adding an open statement; and changing <DATA> to <$ini_fh>. I think everything else should work as is.

Here's "pm_1178405_ini_file_clean.pl":

#!/usr/bin/env perl -l use strict; use warnings; get_clean_ini_data(); for (qw{MySection AnotherSection WhitespaceSection}) { print "Contents of '$_':\n", get_section($_); } { my %section_lines_for; sub get_clean_ini_data { my $current_section; while (<DATA>) { s/^\s*(.*?)\s*$/$1/; next unless length; if (/^\[([^]]+)/) { my $new_section = $1; strip_trailing_comments($current_section); $current_section = $new_section; } else { push @{$section_lines_for{$current_section}}, $_; } } strip_trailing_comments($current_section); } sub strip_trailing_comments { my $section = shift; return unless defined $section; for my $i (reverse 0 .. $#{$section_lines_for{$section}}) { if (0 == index $section_lines_for{$section}[$i], ';') { pop @{$section_lines_for{$section}}; } else { last; } } } sub get_section { join "\n", @{$section_lines_for{$_[0]}} } } __DATA__ [MySection] ; This is a comment line for MySection fld1 = 'value of field 1' fld2 = 42 ; This is the heading for AnotherSection [AnotherSection] ; another comment asfld=69 ; Heading for WhitespaceSection [WhitespaceSection] ; Comment starting with a tab ; Comment starting with a tab and a space ; Comment starting with a space ; Comment ending with a tab ; Comment ending with a tab and a space ; Comment ending with a space ; tab+space+comment+space+tab ; space+tab+comment+tab+space qwe=rty asd=fgh ; trailing 1 ; tab + trailing 2 ; space + trailing 3 ; trailing 4

Output:

$ pm_1178405_ini_file_clean.pl Contents of 'MySection': ; This is a comment line for MySection fld1 = 'value of field 1' fld2 = 42 Contents of 'AnotherSection': ; another comment asfld=69 Contents of 'WhitespaceSection': ; Comment starting with a tab ; Comment starting with a tab and a space ; Comment starting with a space ; Comment ending with a tab ; Comment ending with a tab and a space ; Comment ending with a space ; tab+space+comment+space+tab ; space+tab+comment+tab+space qwe=rty asd=fgh

Because whitespace is difficult to see (especially differentiating spaces from tabs), I passed the script and output through `cat -vet`. I used this for my own testing; you might also find it useful. The relevant parts are in the spoiler.

— Ken

Replies are listed 'Best First'.
Re^2: Removing multiple trailing comment lines from a string
by eyepopslikeamosquito (Archbishop) on Dec 23, 2016 at 21:29 UTC

    I don't know how that translates to your real-world code. The solution I've provided below is substantially different from your code.
    It's early days yet and requirements are a bit unclear right now. I was after ideas for general approaches and you've provided some interesting and useful code. Thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1178440]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-04-25 05:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found