John007 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks !!Very Happy New Year!!
I need help regarding below pattern matching. I have a text file with below data and I need to find and replace data for customer 2 for bakjob1 and bakjob2.
I need to start my search at a specific line to find and replace customer 2 in bakjob1 and bakjob2. If I search using normal regx then multiple line match for same keyword.
I could use
perl -ne 'print if /bakjob1_details/ .. /bakjob2_details/' filename
to restrict my search to few lines but that also does not help me here. I need something which helps me in starting my search from a specific line. Pleaes help...thanks in advance
bakjob1_details = { credit = { customer1= 2000.0, customer2 = -1500.0 customer3 = 0.0, }, debit = { customer1= 50000.0, customer2 = -2000.0, customer3 = 0.0, } }, bakjob2_details = { credit = { customer1= 1000.0, customer2 = 200.0, customer3 = 500.0, }, debit = { customer1= 600.0, customer2 = 659.0, customer3 = 887.0, } }

Replies are listed 'Best First'.
Re: Pattern Matching - a Tricky one
by kyle (Abbot) on Jan 02, 2009 at 16:12 UTC

    I don't recommend this solution because it uses string eval, it changes the order of the output data (and a little bit of the format), it reads the entire input file to do its work, and I wrote it without the aid of my usual morning chemicals. I'm not proud. The TIMTOWTDI made me do it.

    On the plus side, it doesn't do anything so perverse as try to parse a nested data structure with regular expressions, and it uses recursion, if you're into that sort of thing.

Re: Pattern Matching - a Tricky one
by kdj (Friar) on Jan 02, 2009 at 15:38 UTC

    A solution using seek may be better, but I came up with this for finding the customer2 data:

    #!/usr/local/bin/perl use strict; use warnings; while ( <DATA> ) { next if ($_ !~ /bakjob(\d)_details/); print "bakjob$1:\n"; my $found = 0; while ($found < 2) { my $line = <DATA>; if ($line =~ /customer2/) { $found++; my @customer2_data = split(/=/, $line); print "$found- $customer2_data[1]"; } } } END; __DATA__ bakjob1_details = { credit = { customer1= 2000.0, customer2 = -1500.0 customer3 = 0.0, }, debit = { customer1= 50000.0, customer2 = -2000.0, customer3 = 0.0, } }, bakjob2_details = { credit = { customer1= 1000.0, customer2 = 200.0, customer3 = 500.0, }, debit = { customer1= 600.0, customer2 = 659.0, customer3 = 887.0, } }

    Prints:

    % testing.pl bakjob1: 1- -1500.0 2- -2000.0, bakjob2: 1- 200.0, 2- 659.0,
      my $found = 0; while ($found < 2) { my $line = <DATA>; if ($line =~ /customer2/) { $found++; my @customer2_data = split(/=/, $line); print "$found- $customer2_data[1]"; } }

      Don't you run into a infinite loop with this? In case $found is less than 2 and <DATA> has reached its end-of-file, there will be no more matches and $found won't be incremented... So it will loop quite a while ;o))

      So, maybe this would be better:

      my $found = 0; while ( $found < 2 and my $line = <DATA> ) { if ( $line =~ m/customer2/ ) { # ... } }

      Even if this case only appears, if the input data is corrupt, I would try to avoid such construct.

        Ah, good call on the infinite loop. Sometimes I forget that you can't trust your input.

        There's a bit more I could have done with this, such as detecting/capturing whether I was in a credit or a debit section, as that would presumably be good to know. But, this has the basics at least.

Re: Pattern Matching - a Tricky one
by pobocks (Chaplain) on Jan 02, 2009 at 16:08 UTC

    I'd suggest matching after the unique id...

    This works, but is oogly.

    #!/usr/bin/perl @chumptastic = <DATA>; $chumpstain = join "\n", @chumptastic; $chumpstain =~ m/bakjob2_details .+? credit .+? # The ? makes these minimal matches; # thus catching the nearest match. customer2\ =\ (-?\d+\.?\d*),/sx; # Modified s to treat it as one string # (. matches newlines) print "$1\n"; __DATA__ # Your Data goes here

    Compressed, the regex looks like this: /bakjob2_details.+?credit.+?customer2 = (-?\d+\.?\d*),/s

    for(split(" ","tsuJ rehtonA lreP rekcaH")){print reverse . " "}print "\b.\n";
Re: Pattern Matching - a Tricky one
by linuxer (Curate) on Jan 02, 2009 at 17:46 UTC

    My idea for your problem looks like this:

    #!/usr/bin/perl use strict; use warnings; my $datafile = 'account.data'; my $searched_customer = 'customer2'; # modifications of values my %modify = ( credit => -100, debit => 250, ); my %indent = ( bakjob => ' 'x2, type => ' 'x4, customer => ' 'x6, ); # my %re = ( bakjob => qr{^$indent{bakjob}bakjob(\d+)_details}, type => qr{^ (debit|credit) =}, ); #open my $fh, '<', $datafile or die $!; my $fh = *DATA; # don't start before this line my $line_num = 4; while ( my $line = <$fh> ) { # only parse complete bakjob blocks after line $line_num if ( $. > $line_num && $line =~ m/$re{bakjob}/ .. $line =~ m/^$indent{bakjob}},?/ ) { # parse internal structure if ( $line =~ m/$re{type}/ ) { print $line; parse( $fh, $1 ); } else { print $line; } } # maybe this is wanted, if you need to get all lines printed: # if ( $. < $line_num ) { # print $line; # elsif ( $line =~ m/$re{bakjob}/ .. $line =~ m/^$indent{bakjob}}, +?/ ) { # # parse internal structure # .... # } # else { # print $line; # } } sub parse { my ( $fh, $type ) = @_; while ( my $line = <$fh> ) { chomp $line; my ( $customer, $value ) = split m{\s*=\s*|,}, $line; $customer =~ s/^\s+//; if ( $customer eq $searched_customer ) { printf "$indent{customer}%s = %.1f\n", $customer, $value+$ +modify{$type}; } else { print $line, "\n"; } last if $line =~ m/^$indent{type}},?/; } } __DATA__ bakjob1_details = { credit = { customer1= 2000.0, customer2 = -1500.0 customer3 = 0.0, }, debit = { customer1= 50000.0, customer2 = -2000.0, customer3 = 0.0, } }, bakjob2_details = { credit = { customer1= 1000.0, customer2 = 200.0, customer3 = 500.0, }, debit = { customer1= 600.0, customer2 = 659.0, customer3 = 887.0, } }

    updates:

    1. typo fixed; thanks kyle
Re: Pattern Matching - a Tricky one
by toolic (Bishop) on Jan 02, 2009 at 21:27 UTC
    If you have control over the syntax of your text data file, read further; otherwise, stop reading now.

    I do not recognize the syntax of your data being any standard syntax. However, if you could format your data according to some standard syntax, such as XML, then you could leverage the power of existing parsers, such as the CPAN module XML::Twig

Re: Pattern Matching - a Tricky one
by kennethk (Abbot) on Jan 02, 2009 at 15:15 UTC

    If you need to work with a specific known line number and have a known record width, perhaps you should consider using seek to control your input rather than reading in the entire file. This could also be used to estimate a start position to avoid a fair bit of reading time. Otherwise, a sample of your input file might be helpful (or is that the hash you provided? - if so, do you have the option of changing your input format?).

    Update: Upon further consideration, did you not mean a given line number by "specific line"? Is "bakjob1_details" a unique identifier? Does every record header (and only record headers) end with "_details"? It seems like this should all be done in a database, but assuming that the answer to all the above is yes, the following should function:

Re: Pattern Matching - a Tricky one
by Perlbotics (Archbishop) on Jan 03, 2009 at 00:05 UTC

    A fuzzy approach...

    Advantage: may handle large files (no slurping). 'Regexp's have to handle a single line only.
    Disadvantage: bakjob*_details, credit/debit, customer2 have to appear on distinct lines (fragile, relies on proper input).

    use strict; my %replacement = ("bakjob1_details.debit" => 123.45, "bakjob2_details.credit" => -543.21 ); sub new_value { my ($key, $value) = @_; if (defined $replacement{$key}) { $value = $replacement{$key}; } else { warn "WARNING: '$key' unmodified ($value)\n"; } return $value; } my ($bjob, $payment); while ( <DATA> ) { $bjob = $1, next if /\b(bakjob[12]_details)\b/; $payment = $1, next if /\b(credit|debit)\b/; # next unless /\bcustomer2\b/; #speed-up? s{^ (\s* customer2 \s* = \s*) (\S+?) (\s*?,?\s*) $ } { $1 . new_value("$bjob.$payment",$2) . $3 }ex; } continue { print; } __DATA__ bakjob1_details = { credit = { customer1= 2000.0, customer2 = -1500.0 # comma missing? customer3 = 0.0, }, debit = { customer1= 50000.0, customer2 = -2000.0, customer3 = 0.0, } }, bakjob2_details = { credit = { customer1= 1000.0, customer2 = 200.0, customer3 = 500.0, }, debit = { customer1= 600.0, customer2 = 659.0, customer3 = 887.0, } }