m/^\s{32}\S\s$find/
...can match only at the beginning of a string because of the ^ metacharacter -- an anchor that matches only at the start of a string, or a line if the /m modifier is in use.
This subpattern:
qr/^(?:H0|HT)/
...can match only at the beginning of a string because it starts with the ^ metacharacter. But you are embedding $find at a position within the consuming pattern that cannot be at the beginning of the string. Consequently there is no string that could match.
At minimum, you probably should remove the ^ metacharacter from the embedded subpattern.
Also, this: /(?:H0|HT)/ might be more clearly written as /H[0T]/.
| [reply] [d/l] [select] |
Right off the bat, your regex will never match, as this: ^\s{32} says "match exactly 32 whitespace characters at the very beginning of the string", but each line starts with a word character (\w). That's not the only issue, but I digress. Try this:
use warnings;
use strict;
my $find = qr/
^ # start of string
\w+ # one or more word chars (last name)
\s+ # one or more whitespace
( # begin capture (goes into $1)
(?:H0|HT) # H0 or HT
.* # everything to end of string
) # end capture
/x;
open my $fh, '<', 'in.txt' or die $!;
while (<$fh>){
if (/$find/){
my $string = $1; # $1 contains what we captured in the rex
print "$string\n";
}
}
Output:
HT00000000 I
HT00000000 S
HT00000000 I
HT00000000 M
HT00000000 I
HT00000000 I
H000000000 I
H000000000 O
H000000000 I
Here's the regex without breaking it up for explanation: /^\w+\s+((?:H0|HT).*)/
Have a read of perlretut and perlre.
| [reply] [d/l] [select] |
while (<DATA>)
{
if (my ($name_column_deleted) = m/((?:H0|HT)\d{8,}.*)/)
{
print "$name_column_deleted\n";
}
}
The OP doesn't show what exactly can go in the "NAME" field but I suspect that it could contain spaces. "John Smith, Jr." or whatever. In that case, both of our solutions fail the general case. There could be multiple space separated tokens in name.
My suggestion now is to go with the regex approach, but do not anchor this to the beginning of the line. Instead use a regex that qualifies HO (or HT) with a minimum number of digits (could be 4,5,6, or above I used 8). That way, this field will not be confused with a name. HO could be a last name.
There was a suggestion to use a fixed field solution like unpack or substr. That can work well if there is one producer of the file. However, I often work with files that say "field X is 32 columns", but some guys put 30,31,32,33 columns in the output! As a defense, I write files like that exactly as spec'd, but allow more flexibility when reading files generated by others when I can.
As a PS: I prefer to assign directly to a variable rather than using the intermediate $1. I think the code "reads" better, but of course, your call on that.
| [reply] [d/l] |
++... very nice Marshall. I know my method wasn't overly efficient for the data supplied, so I just wanted to give an example of what a full string regex would look like. I was going to give a substr example, but didn't for the reason above.
I assign direct to a variable instead of the special numbered vars (mostly), but since it didn't seem like OP knew much about regexes, I wanted to be explicit in my example.
| [reply] |
As another possible idea, you could just use a split.
#!/usr/bin/perl
use warnings;
use strict;
while (<DATA>)
{
next if /^NAME/ or /^\s*$/; #update changed \s+ to \s*
print ''.(split (' ',$_,2))[1];
}
=prints
HT00000000 I
HT00000000 S
HT00000000 I
HT00000000 M
HT00000000 I
HT00000000 I
H000000000 I
H000000000 O
H000000000 I
=cut
__DATA__
NAME PT # AT
DOE HT00000000 I
DOE HT00000000 S
DOE HT00000000 I
SMITH HT00000000 M
DOE HT00000000 I
DOE HT00000000 I
DOE H000000000 I
DOE H000000000 O
SMITH H000000000 I
update: See above reply to stevieb. | [reply] [d/l] |
Given your data format, you might also consider substr or unpack instead of a regex.
| [reply] |
If you only interested in 32nd column and don't care about what is at the start then
my $find = qr/^.{31}((?:H0|HT).*)/;
while ( <FILE> ) {
print NEW "$1\n" if /$find/;
}
| [reply] [d/l] |
Please explain each what each part of each regex pattern means and the answer will become clear ^(?:H0|HT)
^\s{32}\S\s
| [reply] [d/l] |