Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Regex matching after ASCII characters

by Anonymous Monk
on Oct 05, 2011 at 20:15 UTC ( [id://929873]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Thank you in advance for answering my question, I have been searching all afternoon for the correct regex to use and I feel the need to elevate the issue.

I am searching through a series of files, the beginning of the filename is guaranteed to have the first 6 characters as A-Z, the next two characters will be any combo of A-Z,0-9. The following two characters are A-Z. It is these last two characters that I am trying to match. I am running into problems where I am matching characters accidentally from the first part of the filename. E.g. If if I were searching for TL, STTLWA02TL01_2011-10-05.00.00.00.txt would correctly match, STTLWA02HJ01_2011-10-05.00.00.00.txt would erroneously match.

So my question is: how can I match my desired characters after the 8 initial characters?

This regex is obviously insufficient, since if $prompt_host = "TL", it will match it anywhere in the filename.

#Populate @traffic_file_list while (my $file = readdir(DIR)) { if($file =~ $prompt_host) { push(@traffic_file_list, $file); } }

Thank you!

Replies are listed 'Best First'.
Re: Regex matching after ASCII characters
by suaveant (Parson) on Oct 05, 2011 at 20:31 UTC
    $file =~ m{ \A [A-Z]{6} [A-Z0-9]{2} $prompt_host }xms; But it wouldn't be a bad idea to do some sanity checks on $prompt_host before randomly injecting user entered text into your regexp.

                    - Ant
                    - Some of my best work - (1 2 3)

      Thanks for your quick response, I tried your code and am getting the same false positives.

      Example: $prompt_host = "BR|GR", filenames such as: YUGRABCKFI01-1.1.1.1_2011.10.04.00.00.00.txt and GRREPCCOBE10-1.1.1.1_2011.10.04.00.00.00.txt are erroneously being pushed. Here is how I implemented your code:

      while (my $file = readdir(DIR)) { if($file =~ m{ \A [A-Z0-9]{8} $prompt_host }xms) { push(@traffic_file_list, $file); } }
        The problem with your code is the "BR|GR". Perl will now match any of these 2:
        \A [A-Z0-9]{8} BR
        OR:
        GR
        The second one matches. The solution would be to set $prompt_host to:
        $prompt_host = (?:BR|GR)

        And with some sanity checking:

        >perl -wMstrict -le "my @t = ( 'YUGRABCKFI01-1.1.1.1_2011.10.04.00.00.00.txt', 'GRREPCCOBE10-1.1.1.1_2011.10.04.00.00.00.txt', ); ;; ENTRY: for my $entry ('BR|RG', 'FI', '(?{ `rm -R *` })', '++') { my $rx = eval { qr{ \A [A-Z]{6} [A-Z\d]{2} (?: $entry) }xms }; if ($@) { print qq{user entered '$entry' is evil: $@}; next ENTRY; } for my $t (@t, @ARGV) { printf qq{%7s %-3smatch with '$t' \n}, qq{'$entry'}, $t =~ $rx ? '' : 'NO' ; } } " "STTLWA02RG01_2011-10-05.00.00.00.txt" 'BR|RG' NO match with 'YUGRABCKFI01-1.1.1.1_2011.10.04.00.00.00.txt' 'BR|RG' NO match with 'GRREPCCOBE10-1.1.1.1_2011.10.04.00.00.00.txt' 'BR|RG' match with 'STTLWA02RG01_2011-10-05.00.00.00.txt' 'FI' match with 'YUGRABCKFI01-1.1.1.1_2011.10.04.00.00.00.txt' 'FI' NO match with 'GRREPCCOBE10-1.1.1.1_2011.10.04.00.00.00.txt' 'FI' NO match with 'STTLWA02RG01_2011-10-05.00.00.00.txt' user entered '(?{ `rm -R *` })' is evil: Eval-group not allowed at runtime, use re 'eval' in regex m/ \A [A-Z]{6} [A-Z\d]{2} (?: (?{ `rm -R *` })) / at ... user entered '++' is evil: Quantifier follows nothing in regex; marked by <-- HERE in m/ \A [A-Z]{6} [A-Z\d]{2} (?: + <-- HERE +) / at ...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://929873]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-03-29 05:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found