in reply to Help constructing proper regex to extract values from filenames and concurrently opening those same files to access records

Your program logic is actually pretty good. Without more information on what the school name can be, I'd say the only way to determine whether there is a school specified -- if other non-school-address data is allowed after "/data_..." -- is so check your second regex capture against known-good or known-bad values.

Or, if it's always a US-based school web address, then checking that it ends in .edu (or I guess .net, .com, or .org, and maybe .us) might be enough.

As for readability and best practices, I'd write that snippet as follows, tested with the provided data:

#! /usr/bin/perl; use strict; use warnings; use autodie; # errors from open made fatal sub DEBUG { 1 } my $filelist = 'tmp.txt'; open my $filelist_handle, '<', $filelist; while (<$filelist_handle>) { chomp; my ($type, $school) = m! ^ # anchor to beginning /home/test/ # common to all lines (\w{3}) # capture 'type' /\.date_[^.]+ # common to all lines (?: # non-capturing group .+?(\w+)\.\w+$ # capture domain name? | # or don't capture ) # end group !x # /x flag means ignore white space in pattern or next; # skip line if it doesn't match # do extra check that $school is acceptable $school //= 'null'; # regex gives undef if not found if (DEBUG) { print "match: $_\n"; print "\ttype: $type\n"; print "\tschool: $school\n"; } else { open my $line_handle, '<', $_; while (<$line_handle>) { print "Type:$type:School:$school:File:$_\n"; } } }

Example debug output:

match: /home/test/abc/.date_run_dir type: abc school: null match: /home/test/def/.date_run_dir type: def school: null match: /home/test/abc/.date_file_sent.email@wolverine.cole.edu type: abc school: cole match: /home/test/abc/.date_file_sent.dp3.drew.net type: abc school: drew match: /home/test/def/.date_file_sent.email@wolverine.cole.edu type: def school: cole match: /home/test/def/.date_file_sent.dp3.drew.net type: def school: drew
  • Comment on Re: Help constructing proper regex to extract values from filenames and concurrently opening those same files to access records
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Help constructing proper regex to extract values from filenames and concurrently opening those same files to access records
by JaeDre619 (Acolyte) on Dec 11, 2010 at 16:04 UTC
    Thank you. This is great. Thanks a lot for breaking down that cryptic regex as well. I did try and test this, but had some issues with:
    # do extra check that $school is acceptable $school //= 'null'; # regex gives undef if not found
    Error msg:

    Search pattern not terminated

    I commented that out and it run, although it had these errors:

    match: /home/test/abc/.date_run_dir type: abc Use of uninitialized value in concatenation (.) or string at ./test7.p +l line 31, <$_[...]> line 1. school: match: /home/test/def/.date_run_dir type: def Use of uninitialized value in concatenation (.) or string at ./test7.p +l line 31, <$_[...]> line 2. school:

    Also would you pls show me to extract values from the files I match? Can I do this in the same pass that I peform the regex? Example values from file (.date_run_dir, etc)

    $ cat .date_run_dir .date_file_sent.* /project/school/data/feed_abc_2010120816.ext3 mail_abc.dat.2010120816.ext3 mail_abc.dat.2010120816.ext3
      Oh, sorry about that error. //= is only in Perl 5.10.0 and later, and I should have noted that. The statement is equivalent to $school = 'null' unless defined $school;

      For the values inside the listed files, you could use a similar regex (or build it and the original from another which contains the common parts of both) inside that inner while loop, yes?

        I wouldn't need another regex inside the while loop. At this point, the regex you helped me with list all the files I need to read in at the same time extracting those particular keys I wanted. Now, since I'm already in I wouldn't need another regex. I just need to open the filenames and get the values. I hope I am making sense.

        Would I do this within the while loop of the of the if (DEBUG) section?

        UPDATE:

        Nevermind. Thanks for your help! This does what I need it to do. I haven't really used DEBUG before. Once I changed it to 1 it does what I need to do. This is a cool script. I can always use that debug technique. Thanks for showing me the ways.