Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!
I have to check if an account number is part of a file name and then do some processing if they are a match, but I can't find the best way of doing this, here is a sample code to simulate what I am trying to do:
#!/usr/bin/perl use strict; use warnings; my $filename = "000231263444_01_XY_20130110_061717.txt"; #my $filename = "17034513_01_WQ_20130511_053551.txt"; $filename =~/(^\w+)_(\w{1,2})_(\w{1,2})_(\w+)_(\w+)\.txt$/i; my $accountnumber = $1; #test condition #my $accountnumber = "0"; print "\n *$accountnumber* \n"; #if($filename=~/$accountnumber/gi) { if($accountnumber=~/$filename/gi) { print "\n Found - *$accountnumber* - *$filename*\n"; }else{ print "\n Not Found - *$accountnumber* - *$filename*\n"; }
Thanks for looking!

Replies are listed 'Best First'.
Re: Checking number in file name
by hdb (Monsignor) on May 19, 2013 at 15:06 UTC

    If you want to check whether the accountnumber is part of the filename is has to be

    if( $filename =~ /$accountnumber/gi ) {

    and not the other way round. But it seems you have tested that already. You might want to add additional checks such as that there is not a digit before and after the account number or similar. Otherwise there could be a match if the account number is part of another number in the filename.

    Also, if you are after digits only use \d not \w.

Re: Checking number in file name
by space_monk (Chaplain) on May 19, 2013 at 15:07 UTC

    Whilst (\w+) will match on digits also, you're probably better using ^(\d+); you're also better putting the start marker outside the brackets. Note also that "\w" also accepts "_" characters, so if you know a filename is going to be a certain pattern you are best tying it down as tightly as possible.

    Note that also you probably need to look at greedy and non-greedy matching - that initial (\w+) may be getting all the characters

    I admit regexs still manage to beat me occasionally and one way of building up patterns is to start at the simplest and incrementally make it more complex.

    1) $filename =~ /^(\d+)_/ 2) $filename =~ /^(\d+)_(\d+)/ 3) $filename =~ /^(\d+)_(\d+)_(\w+)/ ....

    You can also use glob to find all files that match a certain pattern.

    If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)
      The file name could start with some letters as well, here is what I am thinking that could work, since I know that a true file name needs to be at least 7 chars long:
      #!/usr/bin/perl use strict; use warnings; my @files = qw( 000231263444_01_XY_20130110_061717.txt 17034513_01_WQ_ +20130511_053551.txt 12345670_01_XY_20130110_061717.txt BOA034513_01_W +Q_20130511_053551.txt); foreach my $file(@files) { $file =~/(^\w+)_(\w{1,2})_(\w{1,2})_(\w+)_(\w+)\.txt$/i; #my $accountnumber = $1; # test my $accountnumber = "BOA034513"; if( ($file=~/$accountnumber/gi) && ($accountnumber=~/\w{7,}/g) ) { print "\n Found - *$accountnumber* - *$file*\n"; }else{ print "\n Not Found - *$accountnumber* - *$file*\n"; } }
      Any example of how the same process could be done using "GLOB"?
      Thanks!
Re: Checking number in file name
by hippo (Archbishop) on May 19, 2013 at 15:40 UTC

    Would I be correct in inferring from your code that the "account number" is always at the start of the filename and followed by an underscore? In that case, a simple index test would do (and I think should be pretty efficient). Something like this:

    #!/usr/bin/perl use strict; use warnings; my $filename = "000231263444_01_XY_20130110_061717.txt"; my $accountnumber = '000231263444'; if (index($filename, "${accountnumber}_") == 0) { print "Found\n"; } else { print "Not found\n"; }
      Yes, it will be at that start of the file name but no underscore will be part of the file name:
      my $filename = "000231263444_01_XY_20130110_061717.txt"; #after getting it with reg exp. my $accountnumber = '000231263444';

        Splitting is probably better for you than using a regex. Especially if you later want to do something with the other parts of the filename, too.

        my ( $accountnumber, $part2, $part3, $part4, $part5, $extension ) = sp +lit /[_.]/, $filename;