Removing everything before the first comma separator on each line of a text file

zodell has asked for the wisdom of the Perl Monks concerning the following question:

First off I want to say that I have been reading posts from PerlMonks for a while and they have always been extremely useful and informative, so I just want to say thank you for that.

I have been trying to get my code for this project working for the last few days and have tried multiple different solutions that should work, but are not working and I am currently at a loss right now.

Here is the code that I currently am using, and what I am trying to do is to remove the "L1," "L2," and so on from an input .TXT or .CSV whichever is being sent in to our servers and run that input file through this script and the L1, L2, would be removed on the output file that we would then use.

Or if there is a better way of removing those characters directly before the print statement of the output file, that would work as well.

Sample File of the output file currently

L1,830 HORIZON PKG RELEASE
L2,
L3,PURPOSE:,00,RELEASE #,746962464,SCHEDULE TYPE:,PR,
L4,SCH QTY TYPE:,A,
L5,
L6,HORIZON START:,20140915
L7,END:,20150913
L8,GENERATION DATE:,20140915
L9,
L10,SHIP TO NAME:,,
L16,SHIP TO CODE:,US08,
L15,
DETAIL,BUYERS PART #,QUANTITY,FCST TYPE,FCST TIMING,DATE,   
DETAIL,1070954,6000,PLANNING(D),DISCREET(D),20140925SMS,10,
DETAIL,1070954,10000,PLANNING(D),DISCREET(D),20140926SMS,10,
[download]

The code currently being used at this moment is the following:

#!/usr/bin/perl -w

# txtRemoval.pl --in=%in% --out=%out%

use strict;
use warnings;
use Text::CSV;

#my ( $infile, $outfile) = @ARGV;
use Getopt::Long;
my @ARGS;
my $wholefile = @ARGS;

my $csv = Text::CSV->new() or die "Can't use CSV: ".Text::CSV->error_d
+iag();
## options
my $opt = {};
GetOptions ($opt,
    'in=s',
    'out=s',
);

## make sure we have the right options
unless ( defined($opt->{'in'}) and defined($opt->{'out'}) ) {
    die "Usage: $0 --in=INFILE.TXT --out=OUTFILE.TXT\n";
}

## open file handles
open INFILE, $opt->{'in'} or die "Cannot open input file: $!";
open OUTFILE, '>', $opt->{'out'} or die "Cannot open output file: $!";

#my @elements = ["L1,", "L2,", "L3,", "L4,", "L5,", "L6,", "L7,", "L8,
+", "L9,", "L10,", "L11,", "L12,", "L14,", "L15,", "L16,", "DETAIL,", 
+"SPACE,", "SUMMARY,"]

my @file = <INFILE>;
my $reg = s/[^,]*\.(\S*)//;
while (my $line = <INFILE>){
    chomp $line;
    my $wholefile = $line.$_ foreach(@file);
    print OUTFILE $wholefile;
    }

## spit out entire file
#print OUTFILE $wholefile;

## close file handles
close OUTFILE;
[download]

Here is one other snippet of the code that I have tried, all the other portions of the code is the same

while (my $wholefile = <INFILE>){
    my $reg = s/.*?,//;
    my $wholefile = $wholefile.$reg;
    }

## spit out entire file
print OUTFILE $wholefile;
[download]

This is what the output file is supposed to look like

830 HORIZON PKG RELEASE

PURPOSE:,00,RELEASE #,746962464,SCHEDULE TYPE:,PR,
SCH QTY TYPE:,A,

HORIZON START:,20140915
END:,20150913
GENERATION DATE:,20140915

SHIP TO NAME:,,
SHIP TO CODE:,US08,

BUYERS PART #,QUANTITY,FCST TYPE,FCST TIMING,DATE,   
1070954,6000,PLANNING(D),DISCREET(D),20140925SMS,10,
1070954,10000,PLANNING(D),DISCREET(D),20140926SMS,10,
[download]

I want to thank you in advance for your help on this and I have a feeling that I have just been overthinking the whole thing and that most of you will probably just laugh at me over missing something simple, but I have been beating my head over this for too long at this point so I figure it is best to finally get some help.

I want to give thanks to AnonymousMonk for the regex statement he provided. I was not able to get the code to work as a one liner, but I was able to adapt it into the code that I already had.

BELOW IS THE FINAL WORKING CODE

#!/usr/bin/perl -w

# txtRemoval.pl --in=%in% --out=%out%

# This script removes specific text from a file for the Mars 830 Forec
+ast

use strict;
use warnings;

use Getopt::Long;

my $opt = {};
GetOptions ($opt,
    'in=s',
    'out=s',
);

my $infile = $opt->{'in'};
my $outfile = $opt->{'out'};

## make sure we have the right options
unless ( defined($opt->{'in'}) and defined($opt->{'out'}) ) {
    die "Usage: $0 --in=INFILE.TXT --out=OUTFILE.TXT\n";
}

open my $in, "<", $infile or die $!;
open my $out, ">", $outfile or die $!;

while (<$in>){
    
s/^(?:L\d+|DETAIL|SPACE|SUMMARY),//;
print $out $_;
}


close $in;

## close file handles
close $out;
[download]

Again thank you to all of you for your help and contributions

Comment on Removing everything before the first comma separator on each line of a text file Select or Download Code

Replies are listed 'Best First'.
Re: Removing everything before the first comma separator on each line of a text file by johngg (Canon) on Sep 15, 2014 at 22:18 UTC
If you want to remove any text up to and including the first comma you could do this. $ perl -Mstrict -Mwarnings -e ' open my $inFH, q{<}, \ <<EOD or die $!; L1,830 HORIZON PKG RELEASE L2, L3,PURPOSE:,00,RELEASE #,746962464,SCHEDULE TYPE:,PR, L4,SCH QTY TYPE:,A, L5, L6,HORIZON START:,20140915 L7,END:,20150913 L8,GENERATION DATE:,20140915 L9, L10,SHIP TO NAME:,, L16,SHIP TO CODE:,US08, L15, DETAIL,BUYERS PART #,QUANTITY,FCST TYPE,FCST TIMING,DATE, DETAIL,1070954,6000,PLANNING(D),DISCREET(D),20140925SMS,10, DETAIL,1070954,10000,PLANNING(D),DISCREET(D),20140926SMS,10, EOD while ( <$inFH> ) { s{^[^,]*,}{}; print } close $inFH or die $!;' 830 HORIZON PKG RELEASE PURPOSE:,00,RELEASE #,746962464,SCHEDULE TYPE:,PR, SCH QTY TYPE:,A, HORIZON START:,20140915 END:,20150913 GENERATION DATE:,20140915 SHIP TO NAME:,, SHIP TO CODE:,US08, BUYERS PART #,QUANTITY,FCST TYPE,FCST TIMING,DATE, 1070954,6000,PLANNING(D),DISCREET(D),20140925SMS,10, 1070954,10000,PLANNING(D),DISCREET(D),20140926SMS,10, $ [download] If you ony want to remove the particular fields you mention in your code then you can construct a regex with alternation. $ perl -Mstrict -Mwarnings -e ' open my $inFH, q{<}, \ <<EOD or die $!; L1,830 HORIZON PKG RELEASE L2, L3,PURPOSE:,00,RELEASE #,746962464,SCHEDULE TYPE:,PR, L4,SCH QTY TYPE:,A, L5, L6,HORIZON START:,20140915 L7,END:,20150913 L8,GENERATION DATE:,20140915 L9, L10,SHIP TO NAME:,, L16,SHIP TO CODE:,US08, L15, DETAIL,BUYERS PART #,QUANTITY,FCST TYPE,FCST TIMING,DATE, DETAIL,1070954,6000,PLANNING(D),DISCREET(D),20140925SMS,10, DETAIL,1070954,10000,PLANNING(D),DISCREET(D),20140926SMS,10, EOD my @removes = map { q{L} . $_ } 1 .. 12, 14 .. 16; push @removes, qw{ DETAIL SPACE SUMMARY }; my $qrRemove = do { local $" = q{\|}; qr{^(?:@removes),}; }; while ( <$inFH> ) { s{$qrRemove}{}; print } close $inFH or die $!;' 830 HORIZON PKG RELEASE PURPOSE:,00,RELEASE #,746962464,SCHEDULE TYPE:,PR, SCH QTY TYPE:,A, HORIZON START:,20140915 END:,20150913 GENERATION DATE:,20140915 SHIP TO NAME:,, SHIP TO CODE:,US08, BUYERS PART #,QUANTITY,FCST TYPE,FCST TIMING,DATE, 1070954,6000,PLANNING(D),DISCREET(D),20140925SMS,10, 1070954,10000,PLANNING(D),DISCREET(D),20140926SMS,10, $ [download] I hope this is helpful. Cheers, JohnGG	[reply] [d/l] [select]
Re^2: Removing everything before the first comma separator on each line of a text file by zodell (Initiate) on Sep 16, 2014 at 19:58 UTC
Quick question..in your reply, you have the actual text from the file within the code? Or is that to show what portion of the data that code is to be affecting?	[reply]
Re^3: Removing everything before the first comma separator on each line of a text file by Anonymous Monk on Sep 16, 2014 at 21:06 UTC
johngg is using a here-document to construct a string consisting of the input data, and is using open with a reference to access it as an in-memory file. Also, the output of the script is shown in-line (as it would look when run from a terminal).	[reply]
Re: Removing everything before the first comma separator on each line of a text file by roboticus (Chancellor) on Sep 15, 2014 at 22:40 UTC
zodell: If you just want to unconditionally remove everything before the first comma, I'd suggest using split, like this: `while (my $line = ..datasource..) { my ($first, $rest) = split /,/, $line, 2; print $OFH $rest; }` [download] Update: Removed erroneous comma after $OFH, thanks to Tux. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l]
Re^2: Removing everything before the first comma separator on each line of a text file by Tux (Canon) on Sep 16, 2014 at 06:21 UTC
That will remove all the comma's from the rest of the line. It will also break if one of the fields is quoted and contains a newline. Bad advice update: The comma after `$OFH` will also cause havoc :P Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re^3: Removing everything before the first comma separator on each line of a text file by soonix (Chancellor) on Sep 16, 2014 at 07:37 UTC
nope, split leaves the delimiters in after the LIMIT (just tested with perl 5.14.2). Admittedly, the documentation says "substrings (called "fields") that do not include the separator", so one could expect an implementation of `split` which filters out the delimiters from the "rest field", but I have heard the "P" in "Perl" stands for "practical" :-) your update about the comma after `print$OFH` is correct, of course...	[reply] [d/l] [select]
Re: Removing everything before the first comma separator on each line of a text file by Anonymous Monk on Sep 15, 2014 at 22:08 UTC
The idea to use Text::CSV is a good one, since that's the "correct" way to handle CSV. In this particular case, a one-liner seems to do the trick: `perl -pe 's/^(?:L\d+\|DETAIL\|SPACE\|SUMMARY),//' INPUT.TXT >OUTPUT.TXT` [download] (Note that removes any "L" followed by digits, not just "L1" through "L16" as in your first example code.)	[reply] [d/l]
Re^2: Removing everything before the first comma separator on each line of a text file by zodell (Initiate) on Sep 16, 2014 at 19:56 UTC
This may be a stupid question, but if I am running this script on a server rather than in a command prompt and the script is to be executed autonomously...how would I implement a one liner such as this?	[reply]
Re^3: Removing everything before the first comma separator on each line of a text file by Anonymous Monk on Sep 16, 2014 at 20:41 UTC
In quite a few cases it's possible to execute one-liners autonomously, for example from a crontab(5). But I'm assuming you mean you'd rather have a script file, which is easier to version/distribute/install/etc. To see what the one-liner is doing, have a look at perlrun to see what the `-p` switch does, or you could even add `-MO=Deparse` to the `perl` arguments to see what B::Deparse makes of the one-liner: `$ perl -MO=Deparse -pe 's/^(?:L\d+\|DETAIL\|SPACE\|SUMMARY),//' LINE: while (defined($_ = <ARGV>)) { s/^(?:L\d+\|DETAIL\|SPACE\|SUMMARY),//; } continue { die "-p destination: $!\n" unless print $_; }` [download] Simplifying that: `while (<>) { s/^(?:L\d+\|DETAIL\|SPACE\|SUMMARY),//; print; }` [download] While it hopefully helps explain a little more what the one-liner is doing, and code is even something you could put in a file and have work, it's still not really "better" than just a one-liner, as it still lacks warnings, error checking, and it still writes to `STDOUT`. Here's how one might write the same functionality as a simple script with command-line arguments and a bit more error-checking: `#!/usr/bin/env perl use warnings; use strict; die "Usage: $0 INFILE OUTFILE\n" unless @ARGV==2; my ($infile,$outfile) = @ARGV; open my $ifh, '<', $infile or die $!; open my $ofh, '>', $outfile or die $!; while (my $line=<$ifh>) { $line=~s/^(?:L\d+\|DETAIL\|SPACE\|SUMMARY),//; print $ofh $line; } close $ofh; close $ifh;` [download] Note that the core functionality, the regex, remains the same. Also, this script does not make use of the magic ARGV handle (the `<>` operator) like the one-liner; it makes the input less magic and more explicit, although sometimes the more magic and also more flexible `ARGV` is preferable when writing UNIX-style scripts. TIMTOWTDI.	[reply] [d/l] [select]
Re^2: Removing everything before the first comma separator on each line of a text file by zodell (Initiate) on Sep 16, 2014 at 20:32 UTC
I got it working but not as a one liner, thank you for your help. Great Advice `while (<$in>){ s/^(?:L\d+\|DETAIL\|SPACE\|SUMMARY),//; print $out $_; }` [download]	[reply] [d/l]
Re: Removing everything before the first comma separator on each line of a text file by Tux (Canon) on Sep 16, 2014 at 06:19 UTC
You declare a Text::CSV object but you don't use it. Dit you read its docs? If you safely want to drop the first column, here's a simple alternative using RFC7111: `use Text::CSV_XS qw( csv ); csv (in => csv (in => $opt->{in}, fragment => "col=2-*"), out => $opt- +>{out}));` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re^2: Removing everything before the first comma separator on each line of a text file by zodell (Initiate) on Sep 16, 2014 at 12:55 UTC
Yes the Text::CSV declaration is part of the code that has been left in there from past attempts to get this script working. I forgot it was still there before I posted	[reply]
Re: Removing everything before the first comma separator on each line of a text file by Laurent_R (Canon) on Sep 16, 2014 at 06:30 UTC
You may just apply one of the following regexes: `s/^[^,]+,//;` [download] or `s/^.+?,//;` [download] to each line of input.	[reply] [d/l] [select]
Re^2: Removing everything before the first comma separator on each line of a text file by Tux (Canon) on Sep 16, 2014 at 07:12 UTC
Not that his example has empty first fields, but your regexes will keep the comma of leading empty fields, which IMHO is not correct. Enjoy, Have FUN! H.Merijn	[reply]
Re^3: Removing everything before the first comma separator on each line of a text file by Laurent_R (Canon) on Sep 16, 2014 at 17:45 UTC
Well, yes, you are right, this would happen, but we don't have such lines in the data sample. A rule of thumb for data munging is to know the data properly, which we cannot do when we are just presented a short sample on a forum post. There could be many other irregularities in the input data,which would lead to other regexes or other methods, we just don't know.	[reply]
Re^4: Removing everything before the first comma separator on each line of a text file by Tux (Canon) on Sep 17, 2014 at 06:34 UTC
Re^5: Removing everything before the first comma separator on each line of a text file by Laurent_R (Canon) on Sep 17, 2014 at 06:48 UTC
Re^2: Removing everything before the first comma separator on each line of a text file by zodell (Initiate) on Sep 16, 2014 at 16:44 UTC
I would also have to write a whole new script or section to be able to parse through the input file and have code that would insert the regex statement to each line, and also would not serve the purpose that I am trying to accomplish.	[reply]
Re^3: Removing everything before the first comma separator on each line of a text file by Laurent_R (Canon) on Sep 16, 2014 at 17:40 UTC
This part of your code: `my @file = <INFILE>; my $reg = s/[^,]\.(\S)//; while (my $line = <INFILE>){ chomp $line; my $wholefile = $line.$_ foreach(@file); print OUTFILE $wholefile; }` [download] is most probably wrong anyway, so you might as well rewrite it completely. Either slurp the file into an array and then process the array elements, or read the file line by line and process each line in turn, but don't try to do both. Here, you read the whole file into the @file array and then try to read from that file again line by line, this is not going to work. In addition, in the while loop that is supposed to read the file, you loop on the array again, which is a faulty logic. You are "saved" from that silly process only because the while loop will in fact not loop on the file handler, because the file handler is exhausted at this point.	[reply] [d/l]