rebkirl has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

In am new to Perl scripting. I have done some research for solving this question but I would need some help with the code.
I have the following file.txt (X and Y state for above and below text lines):

XXXXXXXXXX YYYYYYYY TIFF 1..203 /def="Z/AA:XGproxy1" /pattern="gdgdfn6N6" TIFF trans(256..298) /def="Z/AA:ZYprompt5" /pattern="HbgREV5ehe757gAH" TIFF trans(303..323) /pattern="hfftvt&&jdgY=)" XXXXXXXX YYYYYYYY
Further details:
- TIFF lines are repeated several times along the text.
-columns are separated by whitespaces.
- TIFF length is defined as (example):
TIFF 1..203
or (trans before interval number)
TIFF trans(256..298)

I want to:
1. calculate total summed length of all TIFFs (1-203 + 256-298 etc..)
2. calculate gaps in between (203-256 + 298-303 etc..)
I tried the following code but it print all the lines as output:
foreach $row (@file){ if ($row =~ /TIFF\s+([0-9.]+)/g){ @columns = split(/\s+../, $row); $difference = scalar(@columns[2]) - scalar(@columns[1]); print ("$difference\n"); } }

In addition I want to print all patterns in another file2.txt, taking part of definition (i.e. XGproxy1) as an header:

Example outputs:
>XGproxy1
gdgdfn6N6
>ZYprompt5
HbgREV5ehe757gAH

I tried the following code but it prints only the first pattern and I do not know how to print into FILE2.

open (FILE1, 'file1.txt'); open (FILE2, 'file2.txt'); $content =~ /pattern="([A-Z\s]+)/g; $query = $1; $query =~ s/\s+//g; print("$query\n”);

It would be very appreciated if anyone could give me some help with writing a proper code or point me in the correct direction.

Replies are listed 'Best First'.
Re: Extract length sum from columns and print pattern into another file
by poj (Abbot) on Jun 07, 2019 at 14:32 UTC

    Try but beware of off-by-1 errors in the lengths and gaps

    #!/usr/bin/perl use strict; my $outfile = 'file2.txt'; open OUT,'>',$outfile or die "Could not open $outfile : $!"; my $sum_len = 0; my $sum_gap = 0; my $def; my $end; while (<DATA>){ if (/TIFF[^\d]+(\d+)\.\.(\d+)/g){ # gaps if (defined $end){ my $gap = $1 - $end; $sum_gap += $gap; printf "%5d to %5d = %5d gap\n",$end,$1,$gap; } # lengths my $len = $2 - $1; $sum_len += $len; printf "%5d to %5d = %5d\n",$1,$2,$len; $end = $2; $def = ''; } elsif (/def="([^"]+)/){ (undef,$def) = split ':',$1; } elsif (/pattern="([^"]+)/){ printf OUT ">%s\n%s\n",$def,$1 if $def; } } close OUT; printf " Sum = %5d %5d\n",$sum_len,$sum_gap; __DATA__ XXXXXXXXXX YYYYYYYY TIFF 1..203 /def="Z/AA:XGproxy1" /pattern="gdgdfn6N6" TIFF trans(256..298) /def="Z/AA:ZYprompt5" /pattern="HbgREV5ehe757gAH" TIFF trans(303..323) /pattern="hfftvt&&jdgY=)" XXXXXXXX YYYYYYYY
    poj
      Thank you! So, in your code, file1.txt would be DATA, right? How is <DATA> given as input input file?
        my $infile = 'file1.txt'; open IN,'<',$infile or die "Could not open $infile : $!"; while (<IN>){ } close IN;
        See perlopentut

        poj

Re: Extract length sum from columns and print pattern into another file
by stevieb (Canon) on Jun 07, 2019 at 16:02 UTC

    Not an answer, but some advice...

    First, always put use strict; and use warnings; at the top of every script you write. These pragmas catch the vast majority of common issues (typos, scoping problems etc), and all fatal errors.

    Second, always use the three argument form of open, use lexical file handles as opposed to bareword ones, and throw an error if there are issues opening the file(s):

    open my $fh1, '<', 'file1.txt' or die "can't open the damned 'file1.tx +t' file!: $!"; open my $fh2, '<', 'file2.txt' or die "can't open the bloody 'file2.tx +t' file!: $!";

    Lastly, although it's valid and often reasonable to slurp in an entire file into an array, it's much more common and far less memory intensive if you iterate over the file handle. Instead of this:

    my @file = <$fh>; foreach $row (@file){ ...

    Do:

    while (my $row = <$fh>){ chomp $row; # remove line ending ...