in reply to Re: Find Not Working
in thread Find Not Working

I liked your solution, and posted a version using split at Re: Find Not Working.

After some reflection, I think that something like this is probably better than either:

while (<DATA>) { if (my ($name_column_deleted) = m/((?:H0|HT)\d{8,}.*)/) { print "$name_column_deleted\n"; } }
The OP doesn't show what exactly can go in the "NAME" field but I suspect that it could contain spaces. "John Smith, Jr." or whatever. In that case, both of our solutions fail the general case. There could be multiple space separated tokens in name.

My suggestion now is to go with the regex approach, but do not anchor this to the beginning of the line. Instead use a regex that qualifies HO (or HT) with a minimum number of digits (could be 4,5,6, or above I used 8). That way, this field will not be confused with a name. HO could be a last name.

There was a suggestion to use a fixed field solution like unpack or substr. That can work well if there is one producer of the file. However, I often work with files that say "field X is 32 columns", but some guys put 30,31,32,33 columns in the output! As a defense, I write files like that exactly as spec'd, but allow more flexibility when reading files generated by others when I can.

As a PS: I prefer to assign directly to a variable rather than using the intermediate $1. I think the code "reads" better, but of course, your call on that.

Replies are listed 'Best First'.
Re^3: Find Not Working
by stevieb (Canon) on Jun 03, 2016 at 19:45 UTC

    ++... very nice Marshall. I know my method wasn't overly efficient for the data supplied, so I just wanted to give an example of what a full string regex would look like. I was going to give a substr example, but didn't for the reason above.

    I assign direct to a variable instead of the special numbered vars (mostly), but since it didn't seem like OP knew much about regexes, I wanted to be explicit in my example.