in reply to Modifying a regex

If you simply want to strip the NA from the beginning of a string, you can use s/PATTERN/REPLACEMENT/ (see perlop). This will not affect IDs that do not start with NA. For example,

use strict; use warnings; for ( 'NA12345', 67890 ) { my $id = $_; print "$id -> "; $id =~ s/^NA//; print "$id\n"; } __END__ NA12345 -> 12345 67890 -> 67890

Update: I think I misread the question. If you want to allow an optional NA in the line that reads

next unless $line =~ m{^(\S+) (\d+) (.*)};
then you can change it by using a noncapturing set of parens (see perlre). For example:
use strict; use warnings; for( 'string1 NA12345 other stuff', 'string2 67890 more stuff' ) { if( $_ =~ m/^(\S+) ((?:NA)?\d+) (.*)/ ) { print "matched: $2\n"; } } __END__ matched: NA12345 matched: 67890

Update 2: It looks like your regex is simply capturing 3 fields separated by a single space. If that is the case, split might be more appropriate.

use strict; use warnings; for( 'string1 NA12345 other stuff', 'string2 67890 more stuff' ) { my @elements = split( /\s/, $_, 3 ); print( '[', join( '][', @elements ), "]\n" ); } __END__ [string1][NA12345][other stuff] [string2][67890][more stuff]

HTH

Replies are listed 'Best First'.
Re^2: Modifying a regex
by seni (Initiate) on Oct 27, 2006 at 18:03 UTC
    Thank you to grep and bobf!!
Re^2: Modifying a regex
by seni (Initiate) on Oct 27, 2006 at 18:31 UTC
    Hi bobf, The thing is, for this second data set I only have IDs with the NA prefix. So I don't have to be concerned with the data that do not have the NA prefix. Now, I have tried both yours and grep's inital suggestions and your last updated one, however the output file comes up blank...what is going on?
      Take a couple of lines from your data, then write a small test script to parse it. If you can't get that to work post it here (munging any sensitive data).

      something like:

      @lines = ('Foo9 NA1234 blah blah blah', 'Bar8 NA2345 blah blah blah', 'Baz7 NA3456 blah blah blah'); foreach my $line (@lines) { next unless $line =~ m{^(\S+) NA(\d+) (.*)}; my ($site, $userID, $data) = ($1, $2, $3); print "SITE: $site USER: $userID DATA: $data\n"; }


      grep
      One dead unjugged rabbit fish later
        Alright, so the following is a portion of the data set I am using, and following that is the format I would like it to eventually look like:
        DATA SET 012345 NA13333 C C 012345 NA13334 F F 012345 NA13335 E F 012346 NA13333 U U 012346 NA13334 I I 012346 NA13335 Y O IDEAL OUTCOME **note the spacing comes out weird, SORRY! There is a si +te number above every pair of letters. SITES 012345 012346 NA13333 C C U U SITES 012345 012346 NA13334 F F I I SITES 012345 012346 NA13335 E F Y O
        ***** The code I am using again is:
        #!/usr/bin/perl use strict; my $inFile = 'fanca.txt'; open (IN, $inFile) or die "open $inFile: $!"; my %user; while (my $line = <IN>) { next unless $line =~ m{^(\S+) (\d+) (.*)}; my ($site, $userID, $data, $data2) = ($1, $2, $3, $4); $user{$userID}{$site} = $data, $data2; } close(IN) or die "close $inFile: $!"; my $outfile = "parsingoutput_for_fanca.txt"; open(REPORT, ">$outfile") or die "open >$outfile: $!"; foreach my $userID (sort {$a <=> $b} keys %user) { my %sites = %{$user{$userID}}; my $line1 = 'SITES'; my $line2 = "$userID"; while (my ($site, $data, $data2) = each %sites) { $line1 .= ' ' x (length($line2)-length($line1)); $line2 .= ' ' x (length($line1)-length($line2)); #add on next site $line1 .= ' '. ' ' . $site; $line2 .= ' '. ' '. $data . ' ' . ' '. $data2; } print REPORT $line1 . "\n"; print REPORT $line2 . "\n"; print REPORT "\n"; } close (REPORT) or die "close $outfile: $!";
        PLEASE see the anonymous posting after grep, it is mine (I forgot to login). I posted some test data to try out...thanks for your help!!