Re: Modifying a regex

If you simply want to strip the NA from the beginning of a string, you can use s/PATTERN/REPLACEMENT/ (see perlop). This will not affect IDs that do not start with NA. For example,

use strict;
use warnings;

for ( 'NA12345', 67890 )
{
    my $id = $_;
    print "$id -> ";
    $id =~ s/^NA//;
    print "$id\n";
}

__END__
NA12345 -> 12345
67890 -> 67890
[download]

Update: I think I misread the question. If you want to allow an optional NA in the line that reads

next unless $line =~ m{^(\S+) (\d+) (.*)};
[download]

then you can change it by using a noncapturing set of parens (see perlre). For example:

use strict;
use warnings;

for( 'string1 NA12345 other stuff',
      'string2 67890 more stuff' )
{
    if( $_ =~ m/^(\S+) ((?:NA)?\d+) (.*)/ )
    {
        print "matched: $2\n";
    }
}

__END__
matched: NA12345
matched: 67890
[download]

Update 2: It looks like your regex is simply capturing 3 fields separated by a single space. If that is the case, split might be more appropriate.

use strict;
use warnings;

for( 'string1 NA12345 other stuff',
      'string2 67890 more stuff' )
{
    my @elements = split( /\s/, $_, 3 );
    print( '[', join( '][', @elements ), "]\n" );
}

__END__
[string1][NA12345][other stuff]
[string2][67890][more stuff]
[download]

HTH

Comment on Re: Modifying a regex Select or Download Code

Replies are listed 'Best First'.

Re^2: Modifying a regex
by seni (Initiate) on Oct 27, 2006 at 18:03 UTC

Thank you to grep and bobf!!

[reply]

Re^2: Modifying a regex
by seni (Initiate) on Oct 27, 2006 at 18:31 UTC

Hi bobf, The thing is, for this second data set I only have IDs with the NA prefix. So I don't have to be concerned with the data that do not have the NA prefix. Now, I have tried both yours and grep's inital suggestions and your last updated one, however the output file comes up blank...what is going on?

[reply]

Re^3: Modifying a regex

by grep (Monsignor) on Oct 27, 2006 at 19:34 UTC

something like:

@lines = ('Foo9 NA1234 blah blah blah',
          'Bar8 NA2345 blah blah blah',
          'Baz7 NA3456 blah blah blah');
foreach my $line (@lines) {
    next unless $line =~ m{^(\S+) NA(\d+) (.*)};
    my ($site, $userID, $data) = ($1, $2, $3);
    print "SITE: $site   USER: $userID   DATA: $data\n";
}
[download]

grep

One dead unjugged rabbit fish later

[reply]
[d/l]

Re^4: Modifying a regex

by Anonymous Monk on Oct 27, 2006 at 20:15 UTC

DATA SET

012345 NA13333 C C
012345 NA13334 F F
012345 NA13335 E F
012346 NA13333 U U
012346 NA13334 I I
012346 NA13335 Y O

IDEAL OUTCOME **note the spacing comes out weird, SORRY! There is a si
+te number above every pair of letters.

SITES              012345        012346
NA13333  C        C            U         U  

SITES          012345        012346
NA13334  F         F         I           I  

SITES           012345      012346
NA13335  E        F         Y         O
[download]

#!/usr/bin/perl

use strict;

my $inFile = 'fanca.txt';

open (IN, $inFile) or die "open $inFile: $!";

my %user;

while (my $line = <IN>) {
     next unless $line =~ m{^(\S+) (\d+) (.*)};
     my ($site, $userID, $data, $data2) = ($1, $2, $3, $4);
    $user{$userID}{$site} = $data, $data2;
}


close(IN) or die "close $inFile: $!";

my $outfile = "parsingoutput_for_fanca.txt";
open(REPORT, ">$outfile") or die "open >$outfile: $!";

foreach my $userID (sort {$a <=> $b} keys %user) {
    my %sites = %{$user{$userID}};

    my $line1 =  'SITES';
    my $line2 = "$userID";

    while (my ($site, $data, $data2) = each %sites) {
        $line1 .= ' ' x (length($line2)-length($line1));
        $line2 .= ' ' x (length($line1)-length($line2));

        #add on next site
        $line1 .= ' '. ' ' . $site;
        $line2 .= ' '. ' '. $data . ' ' . ' '. $data2;
    }

    print REPORT $line1 . "\n";
    print REPORT $line2 . "\n";
    print REPORT "\n";
}

close (REPORT) or die "close $outfile: $!";
[download]

[reply]
[d/l]
[select]

Re^5: Modifying a regex

by grep (Monsignor) on Oct 27, 2006 at 21:00 UTC

Re^4: Modifying a regex

by seni (Initiate) on Oct 27, 2006 at 20:21 UTC

PLEASE see the anonymous posting after grep, it is mine (I forgot to login). I posted some test data to try out...thanks for your help!!

[reply]