giridharreddy9 has asked for the wisdom of the Perl Monks concerning the following question:

Hello , I am new to perl and I am already facing a few challenges. Any help would be appreciated. To properly explain my problem, I am actually pasting a sample data that I am working on.
input CK, n3065gat, n3066gat, n3067gat, n3068gat, n3069gat, n3070gat, +n3100gat, test_si, test_se; output n3104gat, n3105gat, n3106gat, n3107gat, n3108gat, n3109gat, n +3110gat;
What I need to achieve is to print all the words between the substrings "input" and "output" excluding both of them into another file. I have managed to achieve a part of the task. I am able to print from CK i.e after "input" to the end of that line i.e n3070gat. I am not able to print the lines below it.
open(DATA, "s5378_scan.v") or die "error: $!"; while (<DATA>) { if ($_ =~ /input/) { $length_line = length ($_); $x= $length_line - 7; $portion = substr($_, 7, $x); } my $str = $portion; my $find = ","; my $replace = ";"; $find = quotemeta $find; $str =~ s/$find/$replace/g; open (MYFILE, '>data.txt'); print MYFILE "$str\n"; close (MYFILE); } # End while close(DATA)
Please note that I am replacing all the "comma's" with ";" at the same time while writing to the file. Also the substring input always starts at an offset of 7. I would appreciate if anyone could suggest any modifications/additions to the code. --Giridhar

Replies are listed 'Best First'.
Re: writing all the strings between two particular substrings
by CountZero (Bishop) on Oct 28, 2009 at 07:27 UTC
    Assuming that there will be more than one line of data to process, I wrote the following:
    use strict; { local $/ = 'output'; while (<DATA>) { chomp; # drop 'output' my (undef, $line) = split /input/; $line =~s/,/;/g; print "$line\n"; } } __DATA__ input CK, n3065gat, n3066gat, n3067gat, n3068gat, n3069gat, n3070gat,n +3100gat, test_si, test_se; output n3104gat, n3105gat, n3106gat, n3107gat, n3108gat, n3109gat, n +3110gat; input CK, n3065gat_2, n3066gat_2, n3067gat_2, n3068gat_2, n3069gat_2, +n3070gat_2,n3100gat_2, test_si, test_se; output n3104gat_2, n3105gat_2, n3106gat_2, n3107gat_2, n3108gat_2, n +3109gat_2, n3110gat_2; input CK, n3065gat_3, n3066gat_3, n3067gat_3, n3068gat_3, n3069gat_3, +n3070gat_3,n3100gat_3, test_si, test_se; output n3104gat_3, n3105gat_3, n3106gat_3, n3107gat_3, n3108gat_3, n +3109gat_3, n3110gat_3; input CK, n3065gat_4, n3066gat_4, n3067gat_4, n3068gat_4, n3069gat_4, +n3070gat_4,n3100gat_4, test_si, test_se; output n3104gat_4, n3105gat_4, n3106gat_4, n3107gat_4, n3108gat_4, n +3109gat_4, n3110gat_4;
    It works by setting the input record separator '$/' to 'output', so every line you read will be terminated by 'output', rather than by the usual EOL.

    chomp removes the 'output'.

    Then I split the line just read on 'input' and throw away everything before 'input', leaving me with nothing but what is between 'input' and 'output'. The substitution s/,/;/g takes care of changing commas into semi-colons. There is no need to quotemeta them as they are not special in a regex. The result then gets printed.

    BTW: you are opening, printing and closing the file for every line. Not only is this very slow, it is also wrong as each opening of the file for output with '>' will delete everything which was in that file, so you end up with only the result of your last print. Keep your open and close outside of the loop.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    Update: fixed typo.
Re: writing all the strings between two particular substrings
by FalseVinylShrub (Chaplain) on Oct 28, 2009 at 06:23 UTC

    Hi

    I don't think I quite understand what you are aiming to do. Perhaps if you posted the desired output to go with the input data, it would be clearer.

    But here are some notes on your code:

    • use strict; use warnings; and probably use diagnostics;
    • Indent your code to show what is in the while loop.
    • You seem to be opening and closing the output file within the while loop - this means you'll only have the most recent line from the input file in the output file.
    • You should check the return code from opening the output file as well as the input file.
    • substr($_, 7, $x) where $x = length($_) - 7 seems to be taking the rest of the line after character 7. You can do that with substr($_, 7)
    • For me, your code outputs to test_se, not n3070gat.

    Taking a guess at what you want to do, try this for starters:

    use strict; use warnings; use diagnostics; while (<>) { s/^\s*input\s*//; s/^\s*output\s*//; s/,/;/g; print; }

    This simply strips 'input' and 'output' from the starts of the lines (with variable amounts of white space) and turns ',' into ';'. I have done it using <> so that it will act as a filter. You specify the input and output files on the command line e.g.

    $ perl test.pl 5378_scan.v >data.txt $ cat data.txt CK; n3065gat; n3066gat; n3067gat; n3068gat; n3069gat; n3070gat;n3100ga +t; test_si; test_se; n3104gat; n3105gat; n3106gat; n3107gat; n3108gat; n3109gat; n3110gat;

    If you decide to open the files within the script, look up using the 3-argument form of open, lexical file handles and checking the return value of open.

    Come back and say if this helps, and any other questions you have.

    Regards, F.V.S.

Re: writing all the strings between two particular substrings
by bichonfrise74 (Vicar) on Oct 28, 2009 at 17:20 UTC
    I think you can use a flip-flop operator.
    #!/usr/bin/perl use strict; while (<DATA>) { print if ( /^input/ ... /\;/ ); } __DATA__ input CK, n3065gat, n3066gat, n3067gat, n3068gat, n3069gat, n3070gat, n3100gat, test_si, test_se; output n3104gat, n3105gat, n3106gat, n3107gat, n3108gat, n3109gat, n3110gat; input CK, n3065gat, n3066gat, n3067gat, n3068gat, n3069gat, n3070gat; [jcua@jcua tmp]$ perl 501.pl input CK, n3065gat, n3066gat, n3067gat, n3068gat, n3069gat, n3070gat, n3100gat, test_si, test_se; input CK, n3065gat, n3066gat, n3067gat, n3068gat, n3069gat, n3070gat;