in reply to Find Not Working

Right off the bat, your regex will never match, as this: ^\s{32} says "match exactly 32 whitespace characters at the very beginning of the string", but each line starts with a word character (\w). That's not the only issue, but I digress. Try this:

use warnings; use strict; my $find = qr/ ^ # start of string \w+ # one or more word chars (last name) \s+ # one or more whitespace ( # begin capture (goes into $1) (?:H0|HT) # H0 or HT .* # everything to end of string ) # end capture /x; open my $fh, '<', 'in.txt' or die $!; while (<$fh>){ if (/$find/){ my $string = $1; # $1 contains what we captured in the rex print "$string\n"; } }

Output:

HT00000000 I HT00000000 S HT00000000 I HT00000000 M HT00000000 I HT00000000 I H000000000 I H000000000 O H000000000 I

Here's the regex without breaking it up for explanation: /^\w+\s+((?:H0|HT).*)/

Have a read of perlretut and perlre.

Replies are listed 'Best First'.
Re^2: Find Not Working
by Marshall (Canon) on Jun 03, 2016 at 12:20 UTC
    I liked your solution, and posted a version using split at Re: Find Not Working.

    After some reflection, I think that something like this is probably better than either:

    while (<DATA>) { if (my ($name_column_deleted) = m/((?:H0|HT)\d{8,}.*)/) { print "$name_column_deleted\n"; } }
    The OP doesn't show what exactly can go in the "NAME" field but I suspect that it could contain spaces. "John Smith, Jr." or whatever. In that case, both of our solutions fail the general case. There could be multiple space separated tokens in name.

    My suggestion now is to go with the regex approach, but do not anchor this to the beginning of the line. Instead use a regex that qualifies HO (or HT) with a minimum number of digits (could be 4,5,6, or above I used 8). That way, this field will not be confused with a name. HO could be a last name.

    There was a suggestion to use a fixed field solution like unpack or substr. That can work well if there is one producer of the file. However, I often work with files that say "field X is 32 columns", but some guys put 30,31,32,33 columns in the output! As a defense, I write files like that exactly as spec'd, but allow more flexibility when reading files generated by others when I can.

    As a PS: I prefer to assign directly to a variable rather than using the intermediate $1. I think the code "reads" better, but of course, your call on that.

      ++... very nice Marshall. I know my method wasn't overly efficient for the data supplied, so I just wanted to give an example of what a full string regex would look like. I was going to give a substr example, but didn't for the reason above.

      I assign direct to a variable instead of the special numbered vars (mostly), but since it didn't seem like OP knew much about regexes, I wanted to be explicit in my example.