zodell has asked for the wisdom of the Perl Monks concerning the following question:

First off I want to say that I have been reading posts from PerlMonks for a while and they have always been extremely useful and informative, so I just want to say thank you for that.

I have been trying to get my code for this project working for the last few days and have tried multiple different solutions that should work, but are not working and I am currently at a loss right now.

Here is the code that I currently am using, and what I am trying to do is to remove the "L1," "L2," and so on from an input .TXT or .CSV whichever is being sent in to our servers and run that input file through this script and the L1, L2, would be removed on the output file that we would then use.

Or if there is a better way of removing those characters directly before the print statement of the output file, that would work as well.

Sample File of the output file currently

L1,830 HORIZON PKG RELEASE L2, L3,PURPOSE:,00,RELEASE #,746962464,SCHEDULE TYPE:,PR, L4,SCH QTY TYPE:,A, L5, L6,HORIZON START:,20140915 L7,END:,20150913 L8,GENERATION DATE:,20140915 L9, L10,SHIP TO NAME:,, L16,SHIP TO CODE:,US08, L15, DETAIL,BUYERS PART #,QUANTITY,FCST TYPE,FCST TIMING,DATE, DETAIL,1070954,6000,PLANNING(D),DISCREET(D),20140925SMS,10, DETAIL,1070954,10000,PLANNING(D),DISCREET(D),20140926SMS,10,

The code currently being used at this moment is the following:

#!/usr/bin/perl -w # txtRemoval.pl --in=%in% --out=%out% use strict; use warnings; use Text::CSV; #my ( $infile, $outfile) = @ARGV; use Getopt::Long; my @ARGS; my $wholefile = @ARGS; my $csv = Text::CSV->new() or die "Can't use CSV: ".Text::CSV->error_d +iag(); ## options my $opt = {}; GetOptions ($opt, 'in=s', 'out=s', ); ## make sure we have the right options unless ( defined($opt->{'in'}) and defined($opt->{'out'}) ) { die "Usage: $0 --in=INFILE.TXT --out=OUTFILE.TXT\n"; } ## open file handles open INFILE, $opt->{'in'} or die "Cannot open input file: $!"; open OUTFILE, '>', $opt->{'out'} or die "Cannot open output file: $!"; #my @elements = ["L1,", "L2,", "L3,", "L4,", "L5,", "L6,", "L7,", "L8, +", "L9,", "L10,", "L11,", "L12,", "L14,", "L15,", "L16,", "DETAIL,", +"SPACE,", "SUMMARY,"] my @file = <INFILE>; my $reg = s/[^,]*\.(\S*)//; while (my $line = <INFILE>){ chomp $line; my $wholefile = $line.$_ foreach(@file); print OUTFILE $wholefile; } ## spit out entire file #print OUTFILE $wholefile; ## close file handles close OUTFILE;

Here is one other snippet of the code that I have tried, all the other portions of the code is the same

while (my $wholefile = <INFILE>){ my $reg = s/.*?,//; my $wholefile = $wholefile.$reg; } ## spit out entire file print OUTFILE $wholefile;

This is what the output file is supposed to look like

830 HORIZON PKG RELEASE PURPOSE:,00,RELEASE #,746962464,SCHEDULE TYPE:,PR, SCH QTY TYPE:,A, HORIZON START:,20140915 END:,20150913 GENERATION DATE:,20140915 SHIP TO NAME:,, SHIP TO CODE:,US08, BUYERS PART #,QUANTITY,FCST TYPE,FCST TIMING,DATE, 1070954,6000,PLANNING(D),DISCREET(D),20140925SMS,10, 1070954,10000,PLANNING(D),DISCREET(D),20140926SMS,10,

I want to thank you in advance for your help on this and I have a feeling that I have just been overthinking the whole thing and that most of you will probably just laugh at me over missing something simple, but I have been beating my head over this for too long at this point so I figure it is best to finally get some help.

I want to give thanks to AnonymousMonk for the regex statement he provided. I was not able to get the code to work as a one liner, but I was able to adapt it into the code that I already had.

BELOW IS THE FINAL WORKING CODE

#!/usr/bin/perl -w # txtRemoval.pl --in=%in% --out=%out% # This script removes specific text from a file for the Mars 830 Forec +ast use strict; use warnings; use Getopt::Long; my $opt = {}; GetOptions ($opt, 'in=s', 'out=s', ); my $infile = $opt->{'in'}; my $outfile = $opt->{'out'}; ## make sure we have the right options unless ( defined($opt->{'in'}) and defined($opt->{'out'}) ) { die "Usage: $0 --in=INFILE.TXT --out=OUTFILE.TXT\n"; } open my $in, "<", $infile or die $!; open my $out, ">", $outfile or die $!; while (<$in>){ s/^(?:L\d+|DETAIL|SPACE|SUMMARY),//; print $out $_; } close $in; ## close file handles close $out;

Again thank you to all of you for your help and contributions

Replies are listed 'Best First'.
Re: Removing everything before the first comma separator on each line of a text file
by johngg (Canon) on Sep 15, 2014 at 22:18 UTC

    If you want to remove any text up to and including the first comma you could do this.

    $ perl -Mstrict -Mwarnings -e ' open my $inFH, q{<}, \ <<EOD or die $!; L1,830 HORIZON PKG RELEASE L2, L3,PURPOSE:,00,RELEASE #,746962464,SCHEDULE TYPE:,PR, L4,SCH QTY TYPE:,A, L5, L6,HORIZON START:,20140915 L7,END:,20150913 L8,GENERATION DATE:,20140915 L9, L10,SHIP TO NAME:,, L16,SHIP TO CODE:,US08, L15, DETAIL,BUYERS PART #,QUANTITY,FCST TYPE,FCST TIMING,DATE, DETAIL,1070954,6000,PLANNING(D),DISCREET(D),20140925SMS,10, DETAIL,1070954,10000,PLANNING(D),DISCREET(D),20140926SMS,10, EOD while ( <$inFH> ) { s{^[^,]*,}{}; print } close $inFH or die $!;' 830 HORIZON PKG RELEASE PURPOSE:,00,RELEASE #,746962464,SCHEDULE TYPE:,PR, SCH QTY TYPE:,A, HORIZON START:,20140915 END:,20150913 GENERATION DATE:,20140915 SHIP TO NAME:,, SHIP TO CODE:,US08, BUYERS PART #,QUANTITY,FCST TYPE,FCST TIMING,DATE, 1070954,6000,PLANNING(D),DISCREET(D),20140925SMS,10, 1070954,10000,PLANNING(D),DISCREET(D),20140926SMS,10, $

    If you ony want to remove the particular fields you mention in your code then you can construct a regex with alternation.

    $ perl -Mstrict -Mwarnings -e ' open my $inFH, q{<}, \ <<EOD or die $!; L1,830 HORIZON PKG RELEASE L2, L3,PURPOSE:,00,RELEASE #,746962464,SCHEDULE TYPE:,PR, L4,SCH QTY TYPE:,A, L5, L6,HORIZON START:,20140915 L7,END:,20150913 L8,GENERATION DATE:,20140915 L9, L10,SHIP TO NAME:,, L16,SHIP TO CODE:,US08, L15, DETAIL,BUYERS PART #,QUANTITY,FCST TYPE,FCST TIMING,DATE, DETAIL,1070954,6000,PLANNING(D),DISCREET(D),20140925SMS,10, DETAIL,1070954,10000,PLANNING(D),DISCREET(D),20140926SMS,10, EOD my @removes = map { q{L} . $_ } 1 .. 12, 14 .. 16; push @removes, qw{ DETAIL SPACE SUMMARY }; my $qrRemove = do { local $" = q{|}; qr{^(?:@removes),}; }; while ( <$inFH> ) { s{$qrRemove}{}; print } close $inFH or die $!;' 830 HORIZON PKG RELEASE PURPOSE:,00,RELEASE #,746962464,SCHEDULE TYPE:,PR, SCH QTY TYPE:,A, HORIZON START:,20140915 END:,20150913 GENERATION DATE:,20140915 SHIP TO NAME:,, SHIP TO CODE:,US08, BUYERS PART #,QUANTITY,FCST TYPE,FCST TIMING,DATE, 1070954,6000,PLANNING(D),DISCREET(D),20140925SMS,10, 1070954,10000,PLANNING(D),DISCREET(D),20140926SMS,10, $

    I hope this is helpful.

    Cheers,

    JohnGG

      Quick question..in your reply, you have the actual text from the file within the code? Or is that to show what portion of the data that code is to be affecting?

        johngg is using a here-document to construct a string consisting of the input data, and is using open with a reference to access it as an in-memory file. Also, the output of the script is shown in-line (as it would look when run from a terminal).

Re: Removing everything before the first comma separator on each line of a text file
by roboticus (Chancellor) on Sep 15, 2014 at 22:40 UTC

    zodell:

    If you just want to unconditionally remove everything before the first comma, I'd suggest using split, like this:

    while (my $line = ..datasource..) { my ($first, $rest) = split /,/, $line, 2; print $OFH $rest; }

    Update: Removed erroneous comma after $OFH, thanks to Tux.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      That will remove all the comma's from the rest of the line.

      It will also break if one of the fields is quoted and contains a newline.

      Bad advice

      update: The comma after $OFH will also cause havoc :P


      Enjoy, Have FUN! H.Merijn

        nope, split leaves the delimiters in after the LIMIT (just tested with perl 5.14.2).

        Admittedly, the documentation says "substrings (called "fields") that do not include the separator", so one could expect an implementation of split which filters out the delimiters from the "rest field", but I have heard the "P" in "Perl" stands for "practical" :-)

        your update about the comma after print$OFH is correct, of course...

Re: Removing everything before the first comma separator on each line of a text file
by Anonymous Monk on Sep 15, 2014 at 22:08 UTC

    The idea to use Text::CSV is a good one, since that's the "correct" way to handle CSV.

    In this particular case, a one-liner seems to do the trick:

    perl -pe 's/^(?:L\d+|DETAIL|SPACE|SUMMARY),//' INPUT.TXT >OUTPUT.TXT

    (Note that removes any "L" followed by digits, not just "L1" through "L16" as in your first example code.)

      This may be a stupid question, but if I am running this script on a server rather than in a command prompt and the script is to be executed autonomously...how would I implement a one liner such as this?

        In quite a few cases it's possible to execute one-liners autonomously, for example from a crontab(5). But I'm assuming you mean you'd rather have a script file, which is easier to version/distribute/install/etc.

        To see what the one-liner is doing, have a look at perlrun to see what the -p switch does, or you could even add -MO=Deparse to the perl arguments to see what B::Deparse makes of the one-liner:

        $ perl -MO=Deparse -pe 's/^(?:L\d+|DETAIL|SPACE|SUMMARY),//' LINE: while (defined($_ = <ARGV>)) { s/^(?:L\d+|DETAIL|SPACE|SUMMARY),//; } continue { die "-p destination: $!\n" unless print $_; }

        Simplifying that:

        while (<>) { s/^(?:L\d+|DETAIL|SPACE|SUMMARY),//; print; }

        While it hopefully helps explain a little more what the one-liner is doing, and code is even something you could put in a file and have work, it's still not really "better" than just a one-liner, as it still lacks warnings, error checking, and it still writes to STDOUT. Here's how one might write the same functionality as a simple script with command-line arguments and a bit more error-checking:

        #!/usr/bin/env perl use warnings; use strict; die "Usage: $0 INFILE OUTFILE\n" unless @ARGV==2; my ($infile,$outfile) = @ARGV; open my $ifh, '<', $infile or die $!; open my $ofh, '>', $outfile or die $!; while (my $line=<$ifh>) { $line=~s/^(?:L\d+|DETAIL|SPACE|SUMMARY),//; print $ofh $line; } close $ofh; close $ifh;

        Note that the core functionality, the regex, remains the same. Also, this script does not make use of the magic ARGV handle (the <> operator) like the one-liner; it makes the input less magic and more explicit, although sometimes the more magic and also more flexible ARGV is preferable when writing UNIX-style scripts. TIMTOWTDI.

      I got it working but not as a one liner, thank you for your help. Great Advice

      while (<$in>){ s/^(?:L\d+|DETAIL|SPACE|SUMMARY),//; print $out $_; }
Re: Removing everything before the first comma separator on each line of a text file
by Tux (Canon) on Sep 16, 2014 at 06:19 UTC

    You declare a Text::CSV object but you don't use it. Dit you read its docs?

    If you safely want to drop the first column, here's a simple alternative using RFC7111:

    use Text::CSV_XS qw( csv ); csv (in => csv (in => $opt->{in}, fragment => "col=2-*"), out => $opt- +>{out}));

    Enjoy, Have FUN! H.Merijn

      Yes the Text::CSV declaration is part of the code that has been left in there from past attempts to get this script working. I forgot it was still there before I posted

Re: Removing everything before the first comma separator on each line of a text file
by Laurent_R (Canon) on Sep 16, 2014 at 06:30 UTC
    You may just apply one of the following regexes:
    s/^[^,]+,//;
    or
    s/^.+?,//;
    to each line of input.

      Not that his example has empty first fields, but your regexes will keep the comma of leading empty fields, which IMHO is not correct.


      Enjoy, Have FUN! H.Merijn
        Well, yes, you are right, this would happen, but we don't have such lines in the data sample. A rule of thumb for data munging is to know the data properly, which we cannot do when we are just presented a short sample on a forum post. There could be many other irregularities in the input data,which would lead to other regexes or other methods, we just don't know.

      I would also have to write a whole new script or section to be able to parse through the input file and have code that would insert the regex statement to each line, and also would not serve the purpose that I am trying to accomplish.

        This part of your code:
        my @file = <INFILE>; my $reg = s/[^,]*\.(\S*)//; while (my $line = <INFILE>){ chomp $line; my $wholefile = $line.$_ foreach(@file); print OUTFILE $wholefile; }
        is most probably wrong anyway, so you might as well rewrite it completely. Either slurp the file into an array and then process the array elements, or read the file line by line and process each line in turn, but don't try to do both. Here, you read the whole file into the @file array and then try to read from that file again line by line, this is not going to work. In addition, in the while loop that is supposed to read the file, you loop on the array again, which is a faulty logic. You are "saved" from that silly process only because the while loop will in fact not loop on the file handler, because the file handler is exhausted at this point.