in reply to Re^5: Shorten a list...
in thread Shorten a list...

I guess it would have helped if I was more specific in exactly what my file looks like. Sorry about that. In reality, there are thousands of groups and my input file looks like this:

>Group 42 0 25ap, >name_06-T_1_0... at 92.00% 1 28ap, >name_06-T_1_0... * >Group 43 0 28ap, >name_07-N_1_0... * >Group 44 0 29ap, >name_07-N_1_0... * >Group 45 0 25ap, >name_03-T_1_0... * 1 25ap, >name_06-T_1_0... at 100.00% 2 25ap, >name_07-N_1_0... at 100.00% 3 25ap, >name_11-N_1_0... at 100.00% 4 25ap, >name_14-T_1_0... at 100.00%

I would want an output that looks like this:

>Group 42 0 25ap, >name_06-T_1_0... at 92.00% 1 28ap, >name_06-T_1_0... * >Group 45 0 25ap, >name_03-T_1_0... * 1 25ap, >name_06-T_1_0... at 100.00% 2 25ap, >name_07-N_1_0... at 100.00% 3 25ap, >name_11-N_1_0... at 100.00% 4 25ap, >name_14-T_1_0... at 100.00%

Replies are listed 'Best First'.
Re^7: Shorten a list...
by Cristoforo (Curate) on Oct 18, 2011 at 15:59 UTC
    Yes, it is necessary to see exactly what your file looks like. Could have saved some typing knowing that :-)

    A possible solution is close to my first answer.

    #!/usr/bin/perl use strict; use warnings; { local ($/, $\) = ('', "\n\n"); while (<DATA>) { chomp; print if tr/\n// > 1; } } __DATA__ >Group 42 0 25ap, >name_06-T_1_0... at 92.00% 1 28ap, >name_06-T_1_0... * >Group 43 0 28ap, >name_07-N_1_0... * >Group 44 0 29ap, >name_07-N_1_0... * >Group 45 0 25ap, >name_03-T_1_0... * 1 25ap, >name_06-T_1_0... at 100.00% 2 25ap, >name_07-N_1_0... at 100.00% 3 25ap, >name_11-N_1_0... at 100.00% 4 25ap, >name_14-T_1_0... at 100.00%
    The statement local ($/, $\) = ('', "\n\n");, sets the INPUT_RECORD_SEPARATOR, $/, to read in paragraphs. A paragraph is lines of text followed by 1 or more blank lines, (2 or more newlines). The OUTPUT_RECORD_SEPARATOR, $\, is set to "\n\n". $\ is printed at the end of every print statement.
      Hey,

      I'm sorry about not having it exact from the start. I will keep that in mind for any later posts. Thanks again for all of your help.

      Between the time I left and arrived at work this morning, the format was again changed. They took out the spaces between the group names:

      >Group 42 0 25ap, >name_06-T_1_0... at 92.00% 1 28ap, >name_06-T_1_0... * >Group 43 0 28ap, >name_07-N_1_0... * >Group 44 0 29ap, >name_07-N_1_0... * >Group 45 0 25ap, >name_03-T_1_0... * 1 25ap, >name_06-T_1_0... at 100.00% 2 25ap, >name_07-N_1_0... at 100.00% 3 25ap, >name_11-N_1_0... at 100.00% 4 25ap, >name_14-T_1_0... at 100.00%

      I will try out the code you gave me and see if I can modify it for this new format. Thanks again for all of the help.

        So in this case, I would be changing the file separator from defining a new paragraph with blank lines, to defining a new paragraph with ">Group". THen I need the script to print the paragraph if the paragraph is greater than 2 lines.

        Is this along the right track?

        local ($/, $\) = ('', ">Group");