Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
It works fine (as long as the format does not change too much), however the more complex things that I want to do these kind of snippets tend to become very complex and difficult to maintain. ... I am looking for a very simple approach (search and replace, not reading the entire data file to memory)

It depends a lot on how much you can trust how strict the configuration file format is. For example, if you can be absolutely certain that, like in your example, the opening and closing braces are always on a line by themselves, then it'd be possible to implement a fairly simple line-by-line parser that keeps the names of the current sections on a stack, so that you can differentiate between different nested sections that happen to have the same name - I'm thinking something like the following:

use warnings; use strict; # $sep needs to be some char guaranteed not to be in the section name # (not the most elegant solution, but this isn't meant to be) my $sep = "\n"; my $target_sec = "ObjectType1${sep}NestedObject"; my $target_param = "Param1"; my $replace_val = "Hello"; my ($secname,@sec_stack,$cur_sec); while (<DATA>) { next if /^\s*#/ || !/\S/; if ( /^ (\s*) (.+?) \s* = \s* (.+?) \s* $/x ) { die "section name without a following block" if defined $secname; my ($indent,$param,$val) = ($1,$2,$3); if ($cur_sec eq $target_sec && $param eq $target_param) { $_ = $indent.$param.' = '.$replace_val."\n" } } elsif ( /^ \s* (\w+) \s* $/x ) { die "two section names following one another" if defined $secname; $secname = $1; } elsif ( /^ \s* \{ \s* $/x ) { die "'{' without name" unless defined $secname; push @sec_stack, $secname; $cur_sec = join $sep, @sec_stack; $secname = undef; } elsif ( /^ \s* \} \s* $/x ) { die "section name without a following block" if defined $secname; die "'}' without '{'" unless @sec_stack; pop @sec_stack; $cur_sec = join $sep, @sec_stack; } else { die "Failed to parse '$_'" } print $_; } __DATA__ # comment ObjectType1 { Param1 = Foo NestedObject { Param1 = Bar } # just another comment } ObjectType2 { Param1 = Quz NestedObject { Param1 = Baz } }

But once things start getting more complex, I'd recommend a "real" parser instead. You can check the Config:: namespace to see if there happen to be any modules that match your config format. 500k lines isn't all too much to read into memory at once, IMO, unless you're running on some really memory-restricted machine. In the worst case, you can write a parser yourself, e.g. using the m/\G.../gc technique (there's one example in the Perl docs in perlop under "\G assertion"), or using a full grammar (Parse::RecDescent, Regexp::Grammars, Marpa::R2, ...).

Here's a solution using m/\G.../gc, followed by a Regexp::Grammars example (the latter only parses, it doesn't do the replacement). In both, I've made some assumptions about the file format, such as that a Name = Value pair must appear on a single line by itself, that the section names may or may not contain whitespace, and so on (I've chosen slightly different rules in both). What I like about these kind of solutions is that they're "just" regular expressions, and as long as one can deal with those, it should hopefully be understandable.

use warnings; use strict; use Data::Compare qw/Compare/; my @target_block = ('Object Type1','NestedObject'); my $target_param = 'Param 1'; my $new_val = 'Hello!'; my $data = do { local $/; <DATA> }; my @stack; pos($data)=0; while ( pos($data)<length($data) ) { use re '/msx'; my $repl; if ( $data=~m{\G ^ \h* \# [^\n]* (?:\z|\n) }gc ) {} # comment, nothing to do elsif ( $data=~m{\G \s* ( \w(?:[\w\h]*\w)? ) \s* \{ \h*\n* }gc ) { push @stack, $1 } elsif ( $data=~m{\G (?<pre> ^\h* ) (?<name> [^\n=]+?) (?<mid> \h*=\h* ) (?<value> [^\n]+? ) (?<post> \h*(?:\z|\n) ) }gc ) { if ( Compare(\@stack,\@target_block) && $+{name} eq $target_param ) { $repl = $+{pre}.$+{name}.$+{mid}.$new_val.$+{post}; } } elsif ( $data=~m{\G \s* \} \h*\n* }gc ) { die "'}' with no opening '{'?" unless @stack; pop @stack; } else { die "Failed to parse at: \"" .substr($data, pos $data, 50)."...\"" } print $repl//substr($data, $-[0], $+[0]-$-[0]); } __DATA__ # comment Object Type1 { Param1 = Foo NestedObject { Param 1 = Bar } # just another comment } # comment ObjectType2 { Param1 = Quz = z NestedObject { Param1 = Baz } }
use warnings; use strict; use Regexp::Grammars; my @blockstack; my $grammar = do { use Regexp::Grammars; qr{ \A (?: <.comment> | <[confblock]> )* \z <rule: confblock> ^ <blockname=([^\s\{\}=]+)> \{ (?: <[param]> | <[confblock]> | <.comment> )* \} <.ws> <rule: param> ^ <name=([^\n=]+?)> = <value=([^\n]+?)> (?:\n|\z) <token: comment> ^ \h* \# [^\n]* (?:\n|\z) }xms }; my $data = do { local $/; <DATA> }; $data =~ $grammar or die "failed to parse"; my %conf = %/; #/ use Data::Dump; dd \%conf; __DATA__ # comment ObjectType1 { Param1 = Foo NestedObject { Param1 = Bar } # just another comment } # comment ObjectType2 { Param1 = Quz NestedObject { Param1 = Baz } }

In reply to Re: Contextual find and replace large config file by haukex
in thread Contextual find and replace large config file by Veltro

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2024-04-16 10:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found