in reply to Re^3: Modifying a regex
in thread Modifying a regex

Alright, so the following is a portion of the data set I am using, and following that is the format I would like it to eventually look like:
DATA SET 012345 NA13333 C C 012345 NA13334 F F 012345 NA13335 E F 012346 NA13333 U U 012346 NA13334 I I 012346 NA13335 Y O IDEAL OUTCOME **note the spacing comes out weird, SORRY! There is a si +te number above every pair of letters. SITES 012345 012346 NA13333 C C U U SITES 012345 012346 NA13334 F F I I SITES 012345 012346 NA13335 E F Y O
***** The code I am using again is:
#!/usr/bin/perl use strict; my $inFile = 'fanca.txt'; open (IN, $inFile) or die "open $inFile: $!"; my %user; while (my $line = <IN>) { next unless $line =~ m{^(\S+) (\d+) (.*)}; my ($site, $userID, $data, $data2) = ($1, $2, $3, $4); $user{$userID}{$site} = $data, $data2; } close(IN) or die "close $inFile: $!"; my $outfile = "parsingoutput_for_fanca.txt"; open(REPORT, ">$outfile") or die "open >$outfile: $!"; foreach my $userID (sort {$a <=> $b} keys %user) { my %sites = %{$user{$userID}}; my $line1 = 'SITES'; my $line2 = "$userID"; while (my ($site, $data, $data2) = each %sites) { $line1 .= ' ' x (length($line2)-length($line1)); $line2 .= ' ' x (length($line1)-length($line2)); #add on next site $line1 .= ' '. ' ' . $site; $line2 .= ' '. ' '. $data . ' ' . ' '. $data2; } print REPORT $line1 . "\n"; print REPORT $line2 . "\n"; print REPORT "\n"; } close (REPORT) or die "close $outfile: $!";

Replies are listed 'Best First'.
Re^5: Modifying a regex
by grep (Monsignor) on Oct 27, 2006 at 21:00 UTC
    If you had read my previous post. You would've written (which was mostly written for you) a simple test program.
    use strict; use warnings; my @lines = ( '012345 NA13333 C C', '012345 NA13334 F F', '012345 NA13335 E F', '012346 NA13333 U U', '012346 NA13334 I I', '012346 NA13335 Y O'); foreach my $line (@lines) { next unless $line =~ m{^(\S+) NA(\d+) (.*)}; my ($site, $userID, $data) = ($1, $2, $3); print "SITE: $site USER: $userID DATA: $data\n"; }
    Then you would've seen output like this:
    SITE: 012345 USER: 13333 DATA: C C SITE: 012345 USER: 13334 DATA: F F SITE: 012345 USER: 13335 DATA: E F SITE: 012346 USER: 13333 DATA: U U SITE: 012346 USER: 13334 DATA: I I SITE: 012346 USER: 13335 DATA: Y O
    This would've shown you the regex is no longer the problem and you could've started looking for the real problem and posted pertenent information, instead of posting the code you already posted.

    Some problems you have not addressed from my original post:

    next unless $line =~ m{^(\S+) (\d+) (.*)}; my ($site, $userID, $data, $data2) = ($1, $2, $3, $4); # you have 3 capturing paran's but you try to call $4 # your 2 data columns get folded together in $3 because of your gree +dy .* $user{$userID}{$site} = $data, $data2; # $data2 is useless and I think you are trying to use an array ref # but that is not what you are doing [ ] signifies an array ref
    The rest of your code has several problems. Print out the %user hash with Data::Dumper and then fix your code.
    print Dumper \%user;


    grep
    One dead unjugged rabbit fish later