Contextual find and replace large config file

Veltro has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Contextual find and replace large config file by haukex (Archbishop) on Jan 02, 2019 at 17:27 UTC
It works fine (as long as the format does not change too much), however the more complex things that I want to do these kind of snippets tend to become very complex and difficult to maintain. ... I am looking for a very simple approach (search and replace, not reading the entire data file to memory) It depends a lot on how much you can trust how strict the configuration file format is. For example, if you can be absolutely certain that, like in your example, the opening and closing braces are always on a line by themselves, then it'd be possible to implement a fairly simple line-by-line parser that keeps the names of the current sections on a stack, so that you can differentiate between different nested sections that happen to have the same name - I'm thinking something like the following: Read more... Example code (2 kB) But once things start getting more complex, I'd recommend a "real" parser instead. You can check the Config:: namespace to see if there happen to be any modules that match your config format. 500k lines isn't all too much to read into memory at once, IMO, unless you're running on some really memory-restricted machine. In the worst case, you can write a parser yourself, e.g. using the `m/\G.../gc` technique (there's one example in the Perl docs in perlop under "`\G` assertion"), or using a full grammar (Parse::RecDescent, Regexp::Grammars, Marpa::R2, ...). Here's a solution using `m/\G.../gc`, followed by a Regexp::Grammars example (the latter only parses, it doesn't do the replacement). In both, I've made some assumptions about the file format, such as that a `Name = Value` pair must appear on a single line by itself, that the section names may or may not contain whitespace, and so on (I've chosen slightly different rules in both). What I like about these kind of solutions is that they're "just" regular expressions, and as long as one can deal with those, it should hopefully be understandable. Read more... Example code (3 kB)	[reply] [d/l] [select]
Re^2: Contextual find and replace large config file by Veltro (Hermit) on Jan 03, 2019 at 10:57 UTC
This is great stuff haukex I think that using Regexp::Grammars is probably the best solution, however I am getting this YACC feeling over me and think this kind of thing is programming on an entire different level. So currently I am looking at your second approach which I think will offer me the flexibility that I am looking for. Actually I think this will help me to take this even one step further and build a more advanced configuration which will allow me to specify a filter and formulas to act on parameters. And for this I am thinking in the same lines as LanX (using a cache, separate functionality in functions etc. etc.). I understand about 95% of the code, but I am still struggling with some of the regex items which are: Why `(?:\z\|\n)` and not just `\z` when `\z` is 'up to and including `\n`' Why `\h\n` and not `\s*` Thanks for your elaborate post	[reply] [d/l] [select]
Re^3: Contextual find and replace large config file by haukex (Archbishop) on Jan 03, 2019 at 12:40 UTC
Why `(?:\z\|\n)` and not just `\z` when `\z` is 'up to and including `\n`' Not quite, `\z` only ever matches at the very end of the string, whereas `\Z` also matches before the newline at the end of the string, and the meaning of `$` is changed by the `/m` modifier to match before every newline or at the end of the string. When I want to express "match up to the end of this line", I sometimes prefer `(?:\z\|\n)` over `$`+`/m` because the former explicitly consumes the `\n`. Why `\h\n` and not `\s`* Because `/\s/` would also match e.g. `\t\n\t`, which causes a following `/^.../` to no longer match, since `/\s/` consumed the `\t` at the beginning of the line. Update: Regarding the first point: `$ perl -MData::Dump -e 'dd split /($)/m, "x\ny\nz"' ("x", "", "\ny", "", "\nz") $ perl -MData::Dump -e 'dd split /(\z\|\n)/m, "x\ny\nz"' ("x", "\n", "y", "\n", "z")` [download]	[reply] [d/l] [select]
Re: Contextual find and replace large config file by tybalt89 (Monsignor) on Jan 02, 2019 at 19:26 UTC
"The data files often don't have any official format." -> Then it's hopeless and you should give up. :) Or The following program works for your test case #2 (and some things you might have missed). You should only have to change the "configuration section" to alter different things, after, of course, fixing it to actually read and write files. If it doesn't work on one of your large files, please show a small failed test case, and we'll see what we can do :) #!/usr/bin/perl # https://perlmonks.org/?node_id=1227916 use strict; use warnings; ##################### configuration section my $section = 'ObjectType1'; my %changes = ( Param1 => 0, Param2 => 'SomeOtherText', Param3 => 'Foo +bar'); ##################### end configuration section my $allkeys = join '\|', keys %changes; my $pattern = qr/\b($allkeys)\b/; local $/ = "\n}\n"; while( <DATA> ) { if( /\b$section\b/ ) { my @context; print $& while @context && $context[-1] eq $section && /\G(\h$pattern = ).\n/ +gc ? "$1$changes{$2}\n" =~ /./s : @context && /\G\h\}\n/gc ? pop @context : /\G\h([\w ]+)\n\h\{\n/gc ? push @context, $1 : /\G.*\n/gc; } else { print; } } __DATA__ ObjectType1 { Param1 = 8 NestedObject { Param1 = 3 Param2 = SomeText } Param2 = SomeText } ObjectType2 { Foo { Param1 = StaySame ObjectType1 { Param3 = ReplaceThis } } } ObjectType1 { ... } [download] Outputs: `ObjectType1 { Param1 = 0 NestedObject { Param1 = 3 Param2 = SomeText } Param2 = SomeOtherText } ObjectType2 { Foo { Param1 = StaySame ObjectType1 { Param3 = Foobar } } } ObjectType1 { ... }` [download] I'm also curious about benchmark times vs any other solution (since I'm not going to generate a 500000 line test file).	[reply] [d/l] [select]
Re: Contextual find and replace large config file by kschwab (Vicar) on Jan 02, 2019 at 18:34 UTC
"Does anyone know of a better or more generic way to do these kind of things?" There's lots of choices for config files. JSON and YAML are popular. Your second example is pretty close to JSON already. It would look like this as JSON: `{ "ObjectType1": { "Param1": 8, "Param2": "SomeText", "NestedObject": { "Param1": 3, "Param2": "SomeText" } }, "ObjectType2": { "Param1": 10 } }` [download] There are perl modules to parse JSON, some streaming, if you really can't load it all into memory. There's also a really nice command line utility called "jq", see some examples here. Note that JSON doesn't support comments, which is probably the biggest complaint about it as a configuration file format.	[reply] [d/l]
Re^2: Contextual find and replace large config file by LanX (Saint) on Jan 03, 2019 at 23:09 UTC
> Note that JSON doesn't support comments, which is probably the biggest complaint about it as a configuration file format. I never noticed this - most probably because I never came into a situation to need it. What's surprising me, is that JSON historically started as eval'ed JS object, so why did they skip the comment feature? Especially since CSS inherited JS comments too. So I did some research to find out that Douglas Crockford disabled it deliberately, because he wanted to prevent people from hiding data there. ... ... well, Douglas again. :/ Anyway, for config purpose I'd try split up the data into multiple JSON chunks and comment them, or resort to YAML, which allows JSON as subset. `--- # Comment { "name": "John Smith", "age": 33 }` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply] [d/l]
Re^2: Contextual find and replace large config file by karlgoethebier (Abbot) on Jan 03, 2019 at 17:23 UTC
"... JSON doesn't support comments..." It does if you treat them as data: `#!/usr/bin/env perl use strict; use warnings; use JSON::Tiny qw(decode_json encode_json); use Data::Dump; my $conf = encode_json { foo => qw(bar), nose => qw(cuke), comment => qw(RTFM) }; my $hash = decode_json($conf); dd $hash; __END__ { comment => "RTFM", foo => "bar", nose => "cuke" }` [download] Best regards, Karl �The Crux of the Biscuit is the Apostrophe� `perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'`Help	[reply] [d/l] [select]
Re^3: Contextual find and replace large config file by choroba (Cardinal) on Jan 03, 2019 at 17:41 UTC
Comments in most languages can appear anywhere where insignificant whitespace is possible. Your approach can't transform structures that comment both on the keys and values, as in `{ "name" /* represented as "shortname" in the DB / : "John Doe" / full name /,` [download] `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]`	[reply] [d/l] [select]
Re^4: Contextual find and replace large config file by karlgoethebier (Abbot) on Jan 04, 2019 at 14:36 UTC
Re: Contextual find and replace large config file by LanX (Saint) on Jan 02, 2019 at 21:52 UTC
These are several different questions First let me warn you that your code has an error This will fail if you don't care about indentation: `if ( $line =~ /ObjectType1/ ) { $context = "ObjectType1" ; } if ( $line =~ /\}/ ) { $context = "" ; }` [download] Here you rather want to test for `/$\}/` at lines start! My suggestions separate parsing of syntax logic from processing of semantic logic parse all lines of an object into a cache ( a string or nested hashes) before handling it with nested objects use recursion keep track of the indentation level, like counting open and closed braces you should handle parsing errors in case the input is corrupted use functions and packages instead of piling up `if` cases use a function dispatcher if you need to handle semantics of different "ObjectTypes" Like this you will get reusable and maintainable code! edit some may miss example code, but you got a generic answer for a generic question. Feel free to pick some points and ask for clarification. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply] [d/l] [select]
Re^2: Contextual find and replace large config file by Veltro (Hermit) on Jan 03, 2019 at 11:06 UTC
Hi LanX, Yes, I was actually aware of that error, since you mentioned it I edited the OP Not strictly necessary to provide example code (plus others have already done so), I am just trying to redesign some code and trying to find a different approach. So your generic answer is welcome of course The only thing is what you mean with your first suggestion (separate parsing...semantic logic). What do you mean with that? Do you mean parsing and gathering data first and then split the processing of that data into different function blocks or something else? Thanks, Veltro	[reply]
Re^3: Contextual find and replace large config file by LanX (Saint) on Jan 03, 2019 at 14:49 UTC
> (separate parsing...semantic logic). What do you mean with that?� Your two examples seem to hold the same information (semantic) while having different format (syntax). So better write parsers for the different formats which "cache" them in an intermediate format. These parsers should be ignorant about the meaning just concentrating on correctness. The semantics - the meaning of the data - could be handled by one central module which only operates on the intermediate format. This module could be re`use`d for all formats. A possible intermediate format could be nested hashes `$cache = { ObjectType1 => { Param1 => 8, Param2 => "SomeText", NestedObject => { Param1 => 3, Param2 => "SomeText" } }` [download] Of course this highly depends on the nature of your data, like does order matter? are repeated elements allowed? Using nested arrays may be better then� And after transforming your data you can also have emitter modules to write them into a new out file. Like this you are even capable to transform between different formats, or add new ones. HTH! :) edit NB: this approach is also useful when handling only one input format, because you can cleanly separate code, hence much better maintain it. update �) or a mix of hashes and arrays. Or even using Perl objects blessing elements into different "ObjectTypes", ... Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply] [d/l]
Re: Contextual find and replace large config file by tybalt89 (Monsignor) on Jan 03, 2019 at 15:32 UTC
Here's a version that uses a HoH to allow multiple changes to multiple contexts in one pass. #!/usr/bin/perl # https://perlmonks.org/?node_id=1227916 use strict; use warnings; $SIG{__WARN__} = sub {die @_}; ##################### configuration section my %changes = ( ObjectType1 => { Param1 => 0, Param2 => 'SomeOtherText' }, ObjectType4 => { Param3 => 'Replacement' }, Foo => { Param2 => 'FooChanged' }, ); ##################### end configuration section my $allcontexts = join '\|', sort keys %changes; my $contextpattern = qr/\b($allcontexts)\b/; my %patterns; for my $section (keys %changes) { my $all = join '\|', keys %{ $changes{$section} }; $patterns{$section} =qr/\b($all)\b/; } local $/ = "\n}\n"; while( <DATA> ) { if( /$contextpattern/ ) { my @context; print $& while @context && $patterns{$context[-1]} && /\G(\h$patterns{$context[-1]} = ).\n/gc ? "$1$changes{$context[-1]}{$2}\n" =~ /./s : @context && /\G\h\}\n/gc ? pop @context : /\G\h([\w ]+)\n\h\{\n/gc ? push @context, $1 : /\G.*\n/gc; } else { print; } } __DATA__ ObjectType1 { Param1 = 8 NestedObject { Param1 = 3 Param2 = SomeText } Param2 = SomeText } ObjectType2 { Foo { Param1 = StaySame Param2 = FooChange ObjectType4 { Param1 = DoNotReplaceThis Param3 = ReplaceThis } } } ObjectType1 { Param1 = ReplaceThis Param3 = DoNotReplaceThis Foo { Param1 = StaySame ObjectType4 { Param1 = DoNotReplaceThis Param3 = ReplaceThis } } } [download]	[reply] [d/l]
Re: Contextual find and replace large config file by Veltro (Hermit) on Jan 05, 2019 at 11:35 UTC
Thanks again for your input everyone. With your help I am now able to change a foreign datafile like: `# comment GlobalParam = 1 Object Type1 { Param1 = Foo NestedObject { Param 1 = Bar } # just another comment } # comment ObjectType2 { Param1 = Quz = z Param2 = 3 NestedObjectX { Param1 = Baz NestedObjectZ { Param1 = Baz } } NestedObjectY { Param1 = 5 } }` [download] by applying a filter like: [ [ # Filter { 'Object Type1' => { 'Param1' => [ "Foo" ], }, 'GlobalParam' => [ '1' ], # 'Junk' => [ 'more junk' ], # Will break the filter }, # Changes { 'Object Type1' => { 'NestedObject' => { 'Param 1' => "\"Box\"", }, }, } ], [ # Filter { 'Object Type1' => { 'Param1' => [ "Foo" ], 'NestedObject' => { 'Param 1' => [ "Box" ], }, }, # 'GlobalParam' => [ '2' ], # Will disable this filter, # but first filter is still # applied }, # Changes { 'Object Type1' => { 'NestedObject' => { 'Param 1' => "\$curVal . \" Baz\"", }, }, } ], [ # Filter { 'ObjectType2' => { 'Param2' => [ '1', '2', '3' ], }, }, # Changes { 'ObjectType2' => { 'NestedObjectY' => { 'Param1' => "\$curVal * 2", }, }, } ], ] ; [download] Which changes the configured paramaters into: `# comment GlobalParam = 1 Object Type1 { Param1 = Foo NestedObject { Param 1 = Box Baz } # just another comment } # comment ObjectType2 { Param1 = Quz = z Param2 = 3 NestedObjectX { Param1 = Baz NestedObjectZ { Param1 = Baz } } NestedObjectY { Param1 = 10 } }` [download] edit 2019 Jan 07: Without further testing of this particular program I have removed a '`^`' from `my $re_comment = qr/ ^ \h* \# [^\n]* \n / ;` and `qr/ (?<pre> ^\h* )`because it was killing the performance of this program. code if you want: Read more... (13 kB)	[reply] [d/l] [select]
Re: Contextual find and replace large config file by trippledubs (Deacon) on Jan 08, 2019 at 19:41 UTC
Not sure if this is too much or too little for you to plugin, but fun to learn some Parse::RecDescent. I could not figure out how to get the array list as the hash I wanted except to use unroll. Each parsing module requires it's own learning investment just browsing Regexp::Grammars from haukex's answer. If you need such a thing. #!/usr/bin/env perl use strict; use warnings; use Parse::RecDescent; use Data::Dumper; $::RD_ERRORS = 1; $::RD_WARN = 1; $::RD_HINT = 1; #$::RD_TRACE = 1; #$::RD_AUTOACTION = q { print Dumper \@item }; my $grammar = q{ { use Data::Dumper; sub unroll { my @list = @{$_[0]}; my $unrolled; for my $href (@list) { for my $key (keys %{$href}) { $unrolled->{$key} = $href->{$key}; } } return $unrolled; }; } Expression: Object(s) { $return = unroll($item[1]) } Object: String '{' Param(s) '}' { $return = { $item[1] => unroll($item[3]) } } Param: String '=' String { $return = { $item[1] => $item[3] } } \| Object(s) { $return = unroll($item[1]) } String: /[\w\d]+/ { $return = $item[1] } }; my $parser = Parse::RecDescent->new($grammar); my $text = do { undef $/; <DATA> }; my $tree = $parser->Expression($text) or die $!; $tree->{ObjectType1}{NestedObject}{DeeplyNested}{Param60} = 'tuna'; print Dumper $tree; __DATA__ ObjectType1 { Param1 = 8 Param2 = SomeText NestedObject { Param1 = 3 Param2 = MoreText DeeplyNested { Param50 = 500 Param60 = squid } } } ObjectType2 { Param1 = 3 Param2 = 40 } [download]	[reply] [d/l]


laziness, impatience, and hubris
	PerlMonks

Contextual find and replace large config file

edit

edit

update