raidermike has asked for the wisdom of the Perl Monks concerning the following question:

i have a super wacky text file and want to use perl to make it readable in other applications, what do you think? i have been using perl pie to try to get the | out but its not working. i also want to get all the text on one line.
| | | SNDYN000036000 | Unloadable | The SKU is not val +id for the | | | | | | vendor with vendor + number | | | | | | 875323. + |

Replies are listed 'Best First'.
Re: tricky text file
by choroba (Cardinal) on Sep 10, 2015 at 13:48 UTC
    Process the table line by line. Keep an array of text buffers (@t), if the line contains data, split it on |'s and add each part to a corresponding text buffer. If the line contains only ---'s, output the buffers and clear them.
    perl -lne 'sub out { s/\s+/ /g, print for @t; # Normalize whitespace, +print buffers. @t = (); # Clear the buffers. } if (/^-+$/) { # Separator. out(); } else { @p = split /\|/; # Split the line on vert +ical bars. $t[$_] .= $p[$_] for 0 .. $#p; # Add each part to its b +uffer. }' < input

    If the last line in the input is not a --- line, you'll need to add

    }{ out();

    to the end of the script to print the last accumulated buffers.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      ah, very cool
Re: tricky text file
by MidLifeXis (Monsignor) on Sep 10, 2015 at 13:20 UTC

    First, please update your OP to include <code>...</code> tags around the data - the formatting is entirely lost. I am also unsure of what you have tried so far.

    It sounds like you want the opposite of format. opposite of format? has some suggestions. Personally, I would:

    • read a line
    • split into columns
    • Add non-empty data into an accumulator variable
    taking care to handle new data sets properly (emit previous data, store in an array or hash, ...)

    --MidLifeXis

      yeah the data is pretty awful, its thousands of "transactions" that look just like this
      ---------------------------------------------------------------------- +-------------- | Vendor | Vendor | Sku Number | Status | Status Detail + | | ID | Name | | | + | ---------------------------------------------------------------------- +-------------- | | | SNDYN000036000 | Unloadable | The SKU is not val +id for the | | | | | | vendor with vendor + number | | | | | | 875323. + |
      i just want to put the readable data (the text) in one line each
Re: tricky text file
by Laurent_R (Canon) on Sep 10, 2015 at 13:40 UTC
    Yes, without formatting due to the lack of <code> and </code> tags, it is difficult to know what you need exactly.

    Perhaps a starting point:

    $ echo ' | | | SNDYN000036000 | Unloadable | The SKU is not valid for +the | | | | | | vendor with vendor number | | | | | | 875323. | ' | +perl -pe 's/\|//g;' SNDYN000036000 Unloadable The SKU is not valid for the ven +dor with vendor number 875323.
    Or, you you want to remove also the extra spaces:
    $ echo ' | | | SNDYN000036000 | Unloadable | The SKU is not valid for +the | | | | | | vendor with vendor number | | | | | | 875323. | ' | +perl -pe 's/\|//g; s/\s+/ /g;' SNDYN000036000 Unloadable The SKU is not valid for the vendor with ve +ndor number 875323.
    Update: I had not seen your new post with the formatted data when I wrote my post. Obviously, a bit more is required, though possibly not so much:
    $ echo '-------------------------------------------------------------- +---------------------- > | Vendor | Vendor | Sku Number | Status | Status Detail + | > | ID | Name | | | + | > -------------------------------------------------------------------- +---------------- > | | | SNDYN000036000 | Unloadable | The SKU is not v +alid for the | > | | | | | vendor with vend +or number | > | | | | | 875323. + |' | perl -pe 's/\|//g; s/\s+/ /g;' ---------------------------------------------------------------------- +-------------- Vendor Vendor Sku Number Status Status Detail ID Nam +e ------------------------------------------------------------------- +----------------- SNDYN000036000 Unloadable The SKU is not valid for + the vendor with vendor number 875323.
    In brief, you have to tell what to do with the column headers, dashes, etc., and we can get pretty close to your needs. For example, something m perhaps closer to your needs:
    $ echo '-------------------------------------------------------------- +---------------------- > | Vendor | Vendor | Sku Number | Status | Status Detail + | > | ID | Name | | | + | > -------------------------------------------------------------------- +---------------- > | | | SNDYN000036000 | Unloadable | The SKU is not v +alid for the | > | | | | | vendor with vend +or number | > | | | | | 875323. + | > ' | perl -ne 'next if /^\s*-/; s/\|//g; s/\s+/ /g; print;' Vendor Vendor Sku Number Status Status Detail ID Name SNDYN00003600 +0 Unloadable The SKU is not valid for the vendor with vendor number + 875323.
Re: tricky text file
by locked_user sundialsvc4 (Abbot) on Sep 10, 2015 at 16:34 UTC

    When tackling problems like this, it might also be useful to surf through a few tutorials on the awk tool, which is readily available in any Unix/Linux system.   (It is a venerable old tool, and, in fact, is one of the original inspirations for Perl.)   Awk is a much simpler yet more-targeted tool, designed specifically for this purpose, and its general approach is both informative and somewhat easier to see.

    In short, an awk program consists of a series of regular-expression patterns (as well as BEGIN and END (-of-file ...) patterns), accompanied by blocks of code that are to be executed when each pattern is matched.   Generally, a programmer uses the patterns to gather information from one-or-more lines, and then to recognize when a complete output-record has been accumulated and should be produced.   It is not a general-purpose programming language, as Perl is, but it does very-clearly illustrate a powerful approach that is well-proved and widely applicable to situations like this.   It also corresponds quite-directly to a corresponding Perl solution:   by design and inspiration, Perl programs can be written in a very awk-like manner.

    There are, as it turns out, a very lot of pragmatic situations, very much like this one, where Awk, or an implementation in another language of the strategy that it champions, is “just what the doctor ordered.”   You will find yourself using it very frequently.

      very good suggestion, thanks to everyone who responded. I have been reading these pages for a while and am constantly impressed by the level of wizardry!