maxamillionk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I want to work with a file formatted like this:

ASDF { tmp plz_match tmp } string2 { tmp } string3 { tmp plz_match tmp }

I want to know if some arbitrary string (plz_match) exists within one of those parenthesized contexts. I am interested in the block called ASDF and do not want to scan the other blocks.

I'm not sure why this regex keeps telling me there is no match. I have tried all manner of variations and I am obviously missing something...

#!/usr/bin/perl use warnings; use strict; my $file = "/path/to/file.txt"; local $/; # added after post open FILE, '<', $file; my $content = <FILE>; close FILE; if ( $content =~ m/(?<=ASDF {)(.*)plz_match(.*?)(?=})/s ) { print "Matched: |$`<$&>$'|\n"; } else { print "No match: |$content|\n";

Replies are listed 'Best First'.
Re: Matching a string in a parenthesized block (regex help)
by LanX (Saint) on Mar 05, 2021 at 22:40 UTC
    The best answer depends on the things you haven't told us, like
    • are blocks never nested?
    • do they always finish in a single } per line?

    If both is true use the flip-flop operator .. to match start and end of a block.

    Use a normal regex to match the insides.

    edit

    if( /block-start/ .. /block-end/ ) { $block .= $line; $hit = 1 if /match-plz/; } else { print $block if $hit; $block = $hit = undef; # reset }

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      I must admit my knowledge is not advanced enough to understand the significance of the .= operator here. What is the reason behind adding strings to a string here? Not sure how to implement this solution.
        I already linked to the docs for the Flip-Flop operator

        Here an implementation

        Please note how ...

        • it avoids slurping the whole (potentially huge) file into RAM
        • it's self documenting (well better than one big regex)
        • you can now easily add more complicated tests when maintaining

        use strict; use warnings; my $section; my $hit; while (<DATA>) { my $start = /^ASDF \{\s*$/; #(2) my $end = /^\}\s*$/; if ($start .. $end) { $section .= $_; $hit = 1 if /foo_match/; } if ($end and $hit) { print $section; $section = $hit = ""; # reset (1) } } __DATA__ ASDF { tmp foo_match tmp } string2 { tmp } ASDF { tmp bar_match tmp }

        NB:

        • 1) you can also exit instead of resetting
        • 2) allowing potential "invisible" whitespace \s* at the end makes it more robust

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: Matching a string in a parenthesized block (regex help)
by jwkrahn (Abbot) on Mar 06, 2021 at 00:21 UTC
    $ echo "ASDF { tmp plz_match tmp } string2 { tmp } string3 { tmp } " | perl -e' local $/ = "}\n"; while ( <> ) { if ( /^ASDF/ && /plz_match/ ) { print "Matched: $_"; ++$match; } } print "No match\n" unless $match; ' Matched: ASDF { tmp plz_match tmp }

      I tried implementing your solution here:

      use warnings; use strict; my $file = "/path/to/file.txt"; sub has_word { my $arg = $_[0]; local $/; open FILE, '<', $file; while ( <FILE> ) { if ( /^ASDF_$arg/ && /magic/ ) { close FILE; return 1; } else { close FILE; return 0; } } } sub main { if (has_word("ONE")) { print "ONE already has the word.\n"; } else { print "ONE does not have the word.\n"; } if (has_word("TWO")) { print "TWO already has the word.\n"; } else { print "TWO does not have the word.\n"; } } main;

      Content of file in this particular case:

      ASDF_ONE { magic tmp tmp } ASDF_TWO { tmp magic tmp } string3 { tmp tmp magic }

      The output is not what I expect:

      ONE already has the word. TWO does not have the word.

      Indeed, all the sections in this case have the word.

        >     local $/;

        That's not how it works, his solution is based on setting $INPUT_RECORD_SEPARATOR to the closed brace "}\n" (while "\n}" would be better in the case of trailing whitespace)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        This seems to work:

        $ cat file.txt ASDF_ONE { magic tmp tmp } ASDF_TWO { tmp magic tmp } string3 { tmp tmp magic }
        #!/usr/bin/perl use warnings; use strict; use feature 'state'; my $file_name = 'file.txt'; sub get_file_data { state $data; unless ( length $data ) { open my $FH, '<', $file_name or die "Cannot open '$file_name' +because: $!"; my $read = read $FH, $data, -s $FH; $read == -s _ or die "Error reading '$file_name'"; } return $data; } sub has_word { my $query = shift; my $file = get_file_data(); local $/ = "\n}\n"; open my $FH, '<', \$file; while ( <$FH> ) { if ( /^ASDF_\Q$query/ && /magic/ ) { return 1; } } return; } if ( has_word( 'ONE' ) ) { print "ONE already has the word.\n"; } else { print "ONE does not have the word.\n"; } if ( has_word( 'TWO' ) ) { print "TWO already has the word.\n"; } else { print "TWO does not have the word.\n"; }

        And it produces this output:

        $ perl 11129184.pl ONE already has the word. TWO already has the word.
Re: Matching a string in a parenthesized block (regex help)
by hippo (Archbishop) on Mar 06, 2021 at 11:01 UTC

    This works for me. See also How to ask better questions using Test::More and sample data.

    use strict; use warnings; use Test::More tests => 2; my $in = <<EOT; ASDF { tmp foo_match tmp } string2 { tmp } string3 { tmp bar_match tmp } EOT my $re = '^ASDF {[^{]*foo_match[^}]*}'; like $in, qr/$re/m, 'foo_match found in ASDF'; $re =~ s/foo/bar/; unlike $in, qr/$re/m, 'bar_match not found in ASDF although present in + string3';

    🦛

Re: Matching a string in a parenthesized block (regex help)
by LanX (Saint) on Mar 05, 2021 at 23:59 UTC
    I played around with your regex, what exactly is wrong, except that you didn't slurp it all into $file?

    !/usr/bin/perl use warnings; use strict; my $file = "/path/to/file.txt"; local $/; # added after post my $content = <DATA>; if ( $content =~ m/(ASDF \{)(.*?)plz_match(.*?)(\})/s ) { print "Matched: <<< $& >>>\n"; } else { print "No match: |$content|\n"; } __DATA__ ASDF { tmp plz_match tmp } string2 { tmp } string3 { tmp }

    Matched: <<< ASDF { tmp plz_match tmp } >>>

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Now that you mention it, I think the slurp mode fixed my little program.

      *edit*

      Actually no it seems like it wants to match greedily all the way down to the end of the file... I will check the other responses. For instance it wants to look outside ASDF and will match the other blocks if it has plz_match

        Look at my regex, I made both .*? non-greedy.

        jwkrahn's solution isn't bad either, if your records are that consistent.

        Edit

        Though [^}]*? is certainly better for more complex input.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: Matching a string in a parenthesized block (regex help)
by haukex (Archbishop) on Mar 06, 2021 at 03:58 UTC
Re: Matching a string in a parenthesized block (regex help)
by maxamillionk (Acolyte) on Mar 05, 2021 at 22:33 UTC

    Ah darn it I forgot

    local $/;

    That's one mistake...