fasoli has asked for the wisdom of the Perl Monks concerning the following question:

Hello wise monks,

Can somebody help me match this filename? 1M01_F00121.npt.gro

What I'm doing, first for testing on one file, is

#/bin/perl/ use strict; #test only for file 1M01_F00121.npt.gro my $file = /(1M01)(\_)(F)(\0+)(121)/; print $file; # for testing `grompp -f input.mdp -c "$file".npt.gro -p topol.top -o "$file".md.tpr +`;
And then, if this succeeded, I wanted to use the variable $i in the matching, as in
#/bin/perl/ use strict; my $i; #now loop through all the files in the directory for ($i=121; $i<=150; $i++) { my $file = /(1M01)(\_)(F)(\0+)($i)/; print $file; # for testing `grompp -f input.mdp -c "$file".npt.gro -p topol.top -o "$file".md.tpr +`; }

Or do these two code attempts make no sense? I am kind of confused especially about the ending (.npt.gro). Because I want to use the file further on with Gromacs to do some simulations, I'm wondering, does it make sense to also match the .npt.gro or not? What confuses me is the fact that then I'll have to change the file extension each time depending on the simulation step and what kind of output my file becomes, so that's why I decided I shouldn't match the ending.

edit: I forgot to say that all my filenames follow the pattern 1M01_F and then a number of zeros and the index number of my file. The total number of digits after the "F" is always 5. So the files range from 1M01_F00001 to 1M01_F00150, in this directory. That's why I'm trying to match for "one or more zeros".

I'd be grateful if you could provide me with some feedback/corrections as to how I can fix this :)

Thank you very much!

Replies are listed 'Best First'.
Re: matching characters in filename
by shmem (Chancellor) on Aug 11, 2015 at 13:14 UTC
    my $file = /(1M01)(\_)(F)(\0+)(121)/;

    In the above line, you matching against the default "thing", i.e. $_ and assigning the result to tha variable $file - not what you want, probably. The parens in the regular expression mean captures which are to be found in $1, $2, $3, ... and so on.

    Furthermore, the char '_' needs no escape, and \0 does match a NULL byte, not the number 0.

    That corrected, the following prints those captures mentioned above, if there is a match:

    my $file = '1M01_F00121.npt.gro'; if ( $file =~ /(1M01)(_)(F)(0+)(121)/ ) { print join( "\n", "\$1 = '$1'", "\$2 = '$2'", "\$3 = '$3'", "\$4 = '$4'", "\$5 = '$5'", ),"\n"; } else { print "'$file': no match\n"; } __END__ $1 = '1M01' $2 = '_' $3 = 'F' $4 = '00' $5 = '121'

    Read perlop and perlre.

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: matching characters in filename
by Anonymous Monk on Aug 11, 2015 at 13:13 UTC

    Your question is a bit unclear to me - what do you mean by "match"? Do you have a directory full of files, and you are trying to locate files whose name match a certain pattern? Here's a quick example of one way to do it (there are plenty other ways, but this works fine if everything is in one directory), this uses glob to list and preselect files and then a regular expression to filter out only those that exactly match the format you specified:

    for my $file (glob '1M01_F*.npt.gro') { next unless $file=~/^1M01_F\d{5}\.npt\.gro$/; print "$file\n"; } __END__ 1M01_F00121.npt.gro 1M01_F00130.npt.gro 1M01_F00140.npt.gro 1M01_F00150.npt.gro

    ... or, are you just trying to generate filenames? Like:

    for my $n (121,130,140,150) { my $file = sprintf '1M01_F%05d.npt.gro', $n; print "$file\n"; } __END__ 1M01_F00121.npt.gro 1M01_F00130.npt.gro 1M01_F00140.npt.gro 1M01_F00150.npt.gro

    There are a few other issues with your script, such as that the first line should start with #!/ with no space at the beginning and no slash at the end, you should be using warnings, and if you're not interested in capturing the output of the grompp command, you should use system instead of backticks (``). Since your regular expressions also aren't going to work, I suggest you have a look at perlintro and perlretut, and the other tutorials linked from there.

      ... and in the first example, if you want to limit the files by the number that appears in the filename, one way to do that is:

      for my $file (glob '1M01_F*.npt.gro') { next unless $file=~/^1M01_F(\d{5})\.npt\.gro$/ && $1>120 && $1<=150; print "$file\n"; } __END__ 1M01_F00121.npt.gro 1M01_F00130.npt.gro 1M01_F00140.npt.gro 1M01_F00150.npt.gro

      The doubling of the filename pattern may not be particularly elegant though. Other ways to list files in a directory include readdir, File::Find (which goes into subdirectories too), Path::Class, or Path::Tiny.