mlux has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm quite new to perl so this is probably a simple fix. What I am trying to do is read in data from a .csv file and create a new folder for each different value in the file (the values often repeat). What I am having trouble with is that the code is generating separate folders for cells at the end of a row. For example, if I had a row of "1,1,2,2" the code is creating 3 folders: "1", "2", and "2\n". The obvious and easy solution should be the chomp command, but I am having no luck with it. I have tried putting it before and after each operation (and even before and after all operations) in case split, shift, or push was initiating the newline somehow, but the newlines persist. I should note that in the code below I refer to $_, but in other attempts I use the relevant array (@data or @row).
#!/usr/bin/perl $filename="p1"; #input file into array open(plate_file,$filename.".csv") or die "Can't open: $!"; $firstline=<plate_file>; #remove first line while(<plate_file>) { chomp(); #remove newlines??? @row=split(/,/); #split row into array elements shift(@row); #remove first column push(@data,@row); #append current row to data array } close(plate_file); #create file structure mkdir($filename, 0777) || print "$!\n"; foreach(@data){ if(-d "$filename/$_"){ } else { mkdir("$filename/$_",0777) || print "$!\n"; } }
I also tried a regular expression, but again no success. There seemed to be many ways to do remove newlines with a regex, but the code I tried was:
while(<plate_file>) { if(/(.*)\n/){ @row=split(/,/,$1); shift(@row); push(@data,@row); } }
As a side question, there must be a more direct way to use the $1 value from the regular expression than the if statement I used. What would be the advised way? Thanks.

Replies are listed 'Best First'.
Re: Removing Newline Characters from File Data (chomp not working)
by Utilitarian (Vicar) on Aug 27, 2009 at 14:48 UTC
    Was the CSV file created on the same system?
    Could be that you need to persuade chomp to use the line separator of the other system? This can be achieved directly by setting the input line separator to that of the source system, eg if source is Windows (\r\n) and parsing is done on Unix (\n)
    ... while (<$FH>){ { local $/="\r\n"; chomp; } ... } ...
    note that local is in it's own block so as not to mess with anything else
      That was it, thanks.
Re: Removing Newline Characters from File Data (chomp not working)
by ikegami (Patriarch) on Aug 27, 2009 at 14:50 UTC

    Your string probably ends in "\r\n". You're chomping "\n", then complaining about the "\r" you didn't remove.

    To verify, you can use:

    use Data::Dumper; $Data::Dumper::Useqq = 1; print(Dumper($_));

    Do a Super Search for solutions. This comes up all the time, once quite recently.

Re: Removing Newline Characters from File Data (chomp not working)
by kennethk (Abbot) on Aug 27, 2009 at 15:13 UTC
    Assuming that the previous suggestions of modifying the line terminator (see $/) fix your immediate issue, there are some style issues you may consider. These are some generally regarded good practices which you are free to ignore at your discretion:

    1. Starting each script and module with use strict;use warnings; can save you a bunch of headaches. The virtues of these pragmas have been extolled extensively on this site. For example, Use strict warnings and diagnostics or die.
    2. You should consider using 3-argument open instead of 2. It allows for some more security against malicious people and messing stuff up. If your case, line 5 would be changed to

      open(plate_file,'<',$filename.".csv") or die "Can't open: $!";

      See open,perlopentut for some more details.

    3. Rather than building your own paths, you might consider using File::Spec to do it for you, in a platform independent way.
    4. In fact, rather than rolling your own CSV parser, consider using Text::CSV.
    5. Regarding your second question, in list context a regular expression returns the captured expressions. So you could write something like:

      #!/usr/bin/perl use strict; use warnings; my @data; while(<DATA>) { my @row = split /,/,(/(.*)\n?/)[0]; # <-- nice and clear, right? shift(@row); push(@data,@row); } print join "\n", @data; __DATA__ 1,2,3,4 2,3,4,5 3,4,5,6

    I could go on, but that should give you plenty to chew on.

      Thanks, these suggestions are helpful. Regarding #5, that makes sense now - I had thought that would simply store true or false in the array. I do have a couple questions about the expression you used in #5. I get that /(.*)\n?/ matches all characters before the newline character, but I don't understand the outer outer parenthesis or the [0]. Also, is the ? necessary? Each line should end /n so is the ? there because of good practice or for some other reason?
        The purpose of the ? on the newline is to make it not essential that the line ends with a new line. This handles the edge case where the file is not ended with a newline character - otherwise, that wouldn't match the expression and it would be silently dropped. For some reference on regex construction, see perlre and perlretut.

        The purpose of the parentheses is to put the regular expression in list context, and then the [0] index takes the first and only element, which is the expression in the parentheses. Without that, the regular expression would be in scalar context because of the split function, and would return only the true/false. Perl's ability to modify functional behavior's based upon context is one of its great strengths, though it can also cause significant confusion to new acolytes.

Re: Removing Newline Characters from File Data (chomp not working)
by biohisham (Priest) on Aug 27, 2009 at 15:29 UTC
    I need to be corrected if wrong but I see that you are reading the entire file into a string and not reading one line at a time, hence chomp would result in only chomping at the end of the file and not every line end as you intend to...I haven't tested it anyways. This an initial look at it.

    (Update): After ack clarification, and I always have been having a trouble with scalar and list contexts, I would say I retract my observation since I thought the $firstline is the same as the @firstline, thanks ack and excuse my ignorance for I am still learning too.


    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.

      From the op's code, $firstline appears to be scalar context, not list context (which would use the sigil '@' rather than the op's sigil '$'). In scalar context, the input operator '< >' only returns the next line; in list context (e.g., @firstline), the input operator '< >' will read all the lines in the file...one line per list element in @firstline). Hence, that part of the op's code looks to me to be OK.

      I believe that the other responders' notations and observations (i.e., that the op has failed to account for the what is actually at the end of each input line in terms of the line separators) are the likely source of the op's difficulties.

      ack Albuquerque, NM