perlmad has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I have a csv file and i need to parse those data as well as write into separate file with type filter

input_parse_and_output("Bill"); input_parse_and_output("Bond"); input_parse_and_output("Note"); # Subroutine has two arguments 1.Input file name 2. Output file direct +ory 3. Output file name all of these are scalar sub input_parse_and_output{ foreach my $input_data_each_line(@input_data_in_array){ chomp($input_data_each_line); # if the current line not contatin cusip then program will move to ne +xt line next unless $input_data_each_line=~ /cusip/; next unless $input_data_each_line=~ /"securityType":"$_[0]"/; if(defined $input_data_each_line){ foreach my $output_header_names_temp(@output_header_names){ chomp($output_header_names_temp); if(defined $output_header_names_temp){ # regex to replace the double quotes to blank string #input_data_each_line=~ s/"//g; $input_data_each_line=~ /$output_header_names_temp:([\w\d\-\$\%\:\!\ +@\&\*\.]+)/; if(defined $1){ my $temp=$1; if($temp=~ /\:/){ # regex for get exact first ten charavtetr it can be digit or hyph +en $temp=~ /^([\d\-]{10})/; print OUTPUT_FILE_WRITE "$1\t"; } else { print OUTPUT_FILE_WRITE "$1\t"; } } else { print OUTPUT_FILE_WRITE "NULL\t"; } } } print OUTPUT_FILE_WRITE "\n"; } } }

In this above code @input_data_in_array contain 100 lines so the subroutine call itself by 3 times for sorting by order "Bill","Bond","Note"

The given input file data sample as below

912828Q86 Note 1-Year 2016-05-25 2016-05-27 2018-0 +4-30 100.003850 NULL 912796HD4 Bill 4-Week 2016-01-26 2016-01-28 2016-0 +2-25 99.977056 NULL 912810RS9 Bond 30-Year 2016-05-12 2016-05-16 2046-0 +5-15 97.619462 2.500000 912810RQ3 Bond 29-Year 2016-04-14 2016-04-15 2046-0 +2-15 98.011430 2.500000 912796HD4 Bill 4-Week 2016-01-26 2016-01-28 2016-0 +2-25 99.977056 NULL 912828N71 Note 9-Year 2016-05-19 2016-05-31 2026-0 +1-15 103.533587 0.625000

In this program the array of content is read 3 times so it taken more time to complete , I need it to done with in 1 time reading and filter by "Bill","Bond","Note"

Expected output

912796HD4 Bill 4-Week 2016-01-26 2016-01-28 2016-0 +2-25 99.977056 NULL 912796HD4 Bill 4-Week 2016-01-26 2016-01-28 2016-0 +2-25 99.977056 NULL 912810RS9 Bond 30-Year 2016-05-12 2016-05-16 2046-0 +5-15 97.619462 2.500000 912810RQ3 Bond 29-Year 2016-04-14 2016-04-15 2046-0 +2-15 98.011430 2.500000 912828Q86 Note 1-Year 2016-05-25 2016-05-27 2018-0 +4-30 100.003850 NULL 912828N71 Note 9-Year 2016-05-19 2016-05-31 2026-0 +1-15 103.533587 0.625000

Your Suggestion is mostly appreciated...

Replies are listed 'Best First'.
Re: File content sorting based on type
by Corion (Patriarch) on May 31, 2016 at 11:12 UTC
    input_parse_and_output("Note"); # Subroutine has two arguments 1.Input file name 2. Output file direct +ory 3. Output file name all of these are scalar sub input_parse_and_output{ foreach my $input_data_each_line(@input_data_in_array){

    Please explain how these four lines fit together. The documentation does not fit the call of the function. Passed parameters do not fit the documentation. The passed parameters never get used in the code.

    You seem to be highly confused about what goes where in Perl code. Please post a complete, runnable example of your program which has the problem you have but which is shorter than 20 lines.

Re: File content sorting based on type
by Corion (Patriarch) on May 31, 2016 at 11:01 UTC

    The last time with this problem, your file was something JSON-like and not a CSV file.

    I think you will be better off in the long run to learn how to read in a file in its original format instead of converting it using Excel or whatever to a CSV format.

    If you have received the new input data from a different source or in a different format, I recommend removing the old parsing code completely as you don't seem to know which parts of the old code cannot apply to the new data.

      I was posted .json file but the original code has .csv file , i forget to mention sorry for that and input file should have securityType string, I was posted sorted files only

Re: File content sorting based on type
by Marshall (Canon) on May 31, 2016 at 12:08 UTC
    Your code is incomprehensible to me. Looks like a simple sort. Here is how to do it....Only uses column 2, to use other columns as part of the sort is not that hard, but here, not needed? Why don't your use Excel or other spreadsheet and not worry about Perl?
    #!usr/bin/perl use strict; use warnings; my @lines; while (<DATA>) { push @lines,$_; } @lines = sort by_col2 @lines; print @lines; sub by_col2 { my ($A_col2) = (split ' ', $a)[1]; my ($B_col2) = (split ' ', $b)[1]; $A_col2 cmp $B_col2; } =prints 912796HD4 Bill 4-Week 2016-01-26 2016-01-28 2016-0 +2-25 99.977056 NULL 912796HD4 Bill 4-Week 2016-01-26 2016-01-28 2016-0 +2-25 99.977056 NULL 912810RS9 Bond 30-Year 2016-05-12 2016-05-16 2046-0 +5-15 97.619462 2.500000 912810RQ3 Bond 29-Year 2016-04-14 2016-04-15 2046-0 +2-15 98.011430 2.500000 912828Q86 Note 1-Year 2016-05-25 2016-05-27 2018-0 +4-30 100.003850 NULL 912828N71 Note 9-Year 2016-05-19 2016-05-31 2026-0 +1-15 103.533587 0.625000 =cut __DATA__ 912828Q86 Note 1-Year 2016-05-25 2016-05-27 2018-0 +4-30 100.003850 NULL 912796HD4 Bill 4-Week 2016-01-26 2016-01-28 2016-0 +2-25 99.977056 NULL 912810RS9 Bond 30-Year 2016-05-12 2016-05-16 2046-0 +5-15 97.619462 2.500000 912810RQ3 Bond 29-Year 2016-04-14 2016-04-15 2046-0 +2-15 98.011430 2.500000 912796HD4 Bill 4-Week 2016-01-26 2016-01-28 2016-0 +2-25 99.977056 NULL 912828N71 Note 9-Year 2016-05-19 2016-05-31 2026-0 +1-15 103.533587 0.625000

      Your

      my @lines; while (<DATA>) { push @lines,$_; }

      could more simply be written as

      my @lines = <DATA>;

      Rather than reading the file into an array, sorting it using a routine then printing it out, you could sort and print directly using a GRT.

      $ perl -Mstrict -Mwarnings -E ' open my $inFH, q{<}, \ <<EOD or die $!; 912828Q86 Note 1-Year 2016-05-25 2016-05-27 2018-0 +4-30 100.003850 NULL 912796HD4 Bill 4-Week 2016-01-26 2016-01-28 2016-0 +2-25 99.977056 NULL 912810RS9 Bond 30-Year 2016-05-12 2016-05-16 2046-0 +5-15 97.619462 2.500000 912810RQ3 Bond 29-Year 2016-04-14 2016-04-15 2046-0 +2-15 98.011430 2.500000 912796HD4 Bill 4-Week 2016-01-26 2016-01-28 2016-0 +2-25 99.977056 NULL 912828N71 Note 9-Year 2016-05-19 2016-05-31 2026-0 +1-15 103.533587 0.625000 EOD print for map { unpack q{x4a*}, $_ } sort map { pack q{a4a*}, ( split )[ 1 ], $_ } <$inFH>;' 912796HD4 Bill 4-Week 2016-01-26 2016-01-28 2016-0 +2-25 99.977056 NULL 912796HD4 Bill 4-Week 2016-01-26 2016-01-28 2016-0 +2-25 99.977056 NULL 912810RQ3 Bond 29-Year 2016-04-14 2016-04-15 2046-0 +2-15 98.011430 2.500000 912810RS9 Bond 30-Year 2016-05-12 2016-05-16 2046-0 +5-15 97.619462 2.500000 912828N71 Note 9-Year 2016-05-19 2016-05-31 2026-0 +1-15 103.533587 0.625000 912828Q86 Note 1-Year 2016-05-25 2016-05-27 2018-0 +4-30 100.003850 NULL $

      Note that this code is a little naive in that it relies on the words in the sorting column all being the same length. More robust would be to use unpack, or perhaps substr, rather than split to extract the entire column for sorting including any trailing spaces so that the code would cope if "Invoice" cropped up.

      I hope this is of interest.

      Cheers,

      JohnGG

        Yes, you are quite correct about the input while loop!
        I was thinking that there is some filtering or other actions that are not apparent from the OP's posted data. Whatever that other stuff is, it can be put inside the while loop. So this is kinda like a "place holder". But you are completely correct. The code that I wrote could be more compact.

        And yes, a well written GRT will out perform other sorting options. True. However GRT (Guttman Rosler Transform) and ST (Schwartzian Transform) are advanced techniques that come after mastering basic sorting, which I don't think the OP has a solid handle upon yet. And in addition not every sort has to optimized to the nth degree.

        We were both trying to be helpful. Whether or not this helped the OP remains to be seen. However, some posts have "teachable" moments past the current problem. Your post re: GRT may activate some other brain cells out there.

Re: File content sorting based on type
by Anonymous Monk on May 31, 2016 at 10:45 UTC

    Please post runnable code with appropriate input data.

    next unless $input_data_each_line=~ /"securityType":"$_[0]"/;

    The sample input never contains the string "securityType".