garyboyd has asked for the wisdom of the Perl Monks concerning the following question:

Hi I am a perl newbie and have a real life problem that I am trying to solve:

I have a file with 4 columns of data:

HWUSI-EAS95L_0025_FC:3:1:5232:1082#0/1 - 1449586 1449619

HWUSI-EAS95L_0025_FC:3:1:5232:1082#0/2 - 1449544 1449577

HWUSI-EAS95L_0025_FC:3:1:6417:1078#0/1 - 4744083 4744113

HWUSI-EAS95L_0025_FC:3:1:6539:1083#0/1 - 4867122 4867157

HWUSI-EAS95L_0025_FC:3:1:6539:1083#0/2 - 4866942 4866977

HWUSI-EAS95L_0025_FC:3:1:10260:1083#0/1 + 1930232 1930266

HWUSI-EAS95L_0025_FC:3:1:10260:1083#0/2 + 1930354 1930389

I would like to be able to match the lines within the first column based on name

ie

HWUSI-EAS95L_0025_FC:3:1:5232:1082#0/1 and HWUSI-EAS95L_0025_FC:3:1:5232:1082#0/2

belong together and then parse out information (1449586) from column 3 for the first line

and (1449577) column 4 for the matched second line.

Where there is no matched pair eg HWUSI-EAS95L_0025_FC:3:1:6417:1078#0/1

I want to skip onto the next line and look to see if there is another matched

pair.

Any pointers would be appreciated.

Thanks

Gary

Replies are listed 'Best First'.
Re: compare lines within a file
by kennethk (Abbot) on Mar 09, 2011 at 17:35 UTC
    What have you tried? It's hard to give guidance if we don't know where you are coming from and where your issues lie.

    As your input file looks a lot like a variant on CSV (with spaces and newlines as delimiters), I would suggest using one of the modules for parsing these types of files off of . My usual choice is Text::CSV, but the search results for CSV will give you an idea of how many solutions are out there for this parsing problem. The above module will generate an array of arrays for you - let us know if you have difficulty dealing with Perl references.

    Your code may end up looking something like this (adapted from the documentation):

    #!/usr/bin/perl use strict; use warnings; use Text::CSV; my @result; my $csv = Text::CSV->new ( { sep_char => ' ' } ) # should set binary +attribute. or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $fh, "<", "test.csv" or die "test.csv: $!"; while ( my $row = $csv->getline( $fh ) ) { if ($row->[0] =~ m{\QHWUSI-EAS95L_0025_FC:3:1:5232:1082#0/\E}) { if (@result) { $result[1] = $row->[3]; } else { $result[0] = $row->[2]; } } } printf "%s\t%s\n", @result; $csv->eof or $csv->error_diag(); close $fh;

    As you did not wrap your input text in <code> tags, it's possible the files were mangled during posting. For example, if your file is actually tab delimited, you would need to specify "\t" as your delimiter, not " ".

    Update: Fixed typo in code

      Thanks for helping out with this I really appreciate the input from you guys.

      I tried the above code but I get an error:

      Use of uninitialized value in printf at parse_result.txt.pl line 23, <$fh> line 45.

      The code I used was practically the same as yours except for changing the name of the input file to results.txt and changing the " " to "\t"

      #!/usr/bin/perl use strict; use warnings; use Text::CSV; my @result; my $csv = Text::CSV->new ( { sep_char => '\t ' } ) # should set binar +y +attribute. or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $fh, "<", "result.txt" or die "result.txt: $!"; while ( my $row = $csv->getline( $fh ) ) { if ($row->[0] =~ m{\QHWUSI-EAS95L_0025_FC:3:1:5232:1082#0//E}) { if (@result) { $result[1] = $row->[3]; } else { $result[0] = $row->[2]; } } } #printf "%s\t%s\n", @result; printf @result; $csv->eof or $csv->error_diag(); close $fh;

        oops, shouldn't have included the printf @result; in the code, but it still doesn't work with the

        #printf "%s\t%s\n", @result;

        uncommented........

Re: compare lines within a file
by wind (Priest) on Mar 09, 2011 at 17:26 UTC
    What have you tried thus far? You just need to use regex's to compare mate:
    use strict; my @last; my $lastKey; while (<DATA>) { my @rec = /(\S+)/; my ($key) = $rec[0] =~ m/([^#]+)/; if ($key eq $lastKey) { print "matching $key\n"; } @last = @rec; $lastKey = $key; } __DATA__ HWUSI-EAS95L_0025_FC:3:1:5232:1082#0/1 - 1449586 1449619 HWUSI-EAS95L_0025_FC:3:1:5232:1082#0/2 - 1449544 1449577 HWUSI-EAS95L_0025_FC:3:1:6417:1078#0/1 - 4744083 4744113 HWUSI-EAS95L_0025_FC:3:1:6539:1083#0/1 - 4867122 4867157 HWUSI-EAS95L_0025_FC:3:1:6539:1083#0/2 - 4866942 4866977 HWUSI-EAS95L_0025_FC:3:1:10260:1083#0/1 + 1930232 1930266 HWUSI-EAS95L_0025_FC:3:1:10260:1083#0/2 + 1930354 1930389