in reply to Re: Building a dynamic array or some other method?
in thread Building a dynamic array or some other method?

The_DJ, thank you for this. I've learned a lot by slowly poring through this to learn how this worked, and realized that by making unique keys of SEVERAL fields together, that simplifies things greatly! I've run into a problem that I'm not sure how to solve. The entire file gets slurped in within a single line, but I discovered on my full data set that someone actually created filenames with commas in it, causing it to not output correctly.

Reading through the Text::CSV documentation (which apparently uses CSV_XS), I find something about a quote_char, but it defaults to a quote anyway.
Adding the reference to it specifically

my $aoa = csv( in => $filename, quote_char => "\""  );
Also proves ineffective, or rather no change. (I think that's the default anyway.)

The input data shows it as a quoted field, but I'm not sure what I'm doing wrong.

Input data looks like this:
10.15.106.71,"/ifs/PH01/PH01SUB/ENTNASIS02/PH02/SMB/Share/Share6/Emplo +yee-Share/Contracts/Privacy and Disclosure/Disclosure Unit/Active Con +tracts/County/Sample County/E00526, E00595 Sample County DA/2017-202 +0 M4385566/Emails & Correspondence/Welcome Letter.doc",FMRWX,Creator +Owner,\ifs\PH01\PH01SUB\ENTNASIS02\PH02\SMB\Share\Share6\Employee-Sha +re\Contracts,This folder only,Abstract\Creator Owner,"US PII (1/1),Do +cument Passwords - 2.0 (1/1),US Social Security Number (1/1),GLBA (Gr +amm-Leach Bliley Act) (1/1)","Credentials (1),Financial (1),PII (2)", +4


After focusing a bit on the 1st part of the data (before the magic field), I noticed the quoted commas issue was there in all other lines containing more than one sensitive data type the entire time. (after the magic field)

So, somehow I need to figure out how to slurp in (as one field) anything with quotes and having commas within, but still comma separated.

Am I missing something too obvious in the docs? Text::CSV#quote_char

Replies are listed 'Best First'.
Re^3: Building a dynamic array or some other method?
by hippo (Archbishop) on May 02, 2024 at 20:48 UTC

    I'm not sure what it is that you're doing which has resulted in abnormal operation but here it is as an SSCCE.

    #!/usr/bin/env perl use strict; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new; my $line = '10.15.106.71,"/ifs/PH01/PH01SUB/ENTNASIS02/PH02/SMB/Share/Share6/Em +ployee-Share/Contracts/Privacy and Disclosure/Disclosure Unit/Active +Contracts/County/Sample County/E00526, E00595 Sample County DA/2017- +2020 M4385566/Emails & Correspondence/Welcome Letter.doc",FMRWX,Creat +or Owner,\ifs\PH01\PH01SUB\ENTNASIS02\PH02\SMB\Share\Share6\Employee- +Share\Contracts,This folder only,Abstract\Creator Owner,"US PII (1/1) +,Document Passwords - 2.0 (1/1),US Social Security Number (1/1),GLBA +(Gramm-Leach Bliley Act) (1/1)","Credentials (1),Financial (1),PII (2 +)",4'; $csv->parse ($line); my @fields = $csv->fields; print "Input:\n$line\n\nOutput:\n" . join "\n\n", @fields;

    This outputs the parsed fields separated by empty lines so it should be trivial to see what is contained in each field. The quote characters are honoured as expected. HTH.


    🦛

      In an example previously submitted, I had a large file with multiple lines, with condensing many lines containing the same access privileges on a file being condensed.

      The_DJ posted an option to slurp in an entire file into an array of arrays.

      I took that example and in processing my file, it wasn't handling quoted fields in the original file that had commas within the quoted fields.

      Perhaps in only showing one line of data, with an example, it made it sound easy to solve within a single line.

      I thought that in slurping in an entire file, there was a mode I could properly switch to for handling the quoted fields (with embedded commas). I wasn't able to figure it out, and was asking if I was misunderstanding the documentation.

Re^3: Building a dynamic array or some other method?
by tybalt89 (Monsignor) on May 03, 2024 at 23:29 UTC

    It looks like The_DJ post was slurping in correctly, but was not using csv to output a correct csv file.

    Try this:

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11159049 use warnings; use Text::CSV qw( csv ); my $data = <<''; File Server,Access Path,Current Permissions,Logon Name,Inherited From +Folders,Flags,User/Group,Classification Results,Classification Result +s by Category (Including Nested),Total Hit Count 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,FMRWX,@FOO NOW Onsite Support,\Common,This folder only, +Pathway12.My.Corp.com\@FOO NOW Onsite Support,IRS Data (1/1),PII (1), +1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,FMRWX,Administrators,\Common,This folder only,10.15.106 +.71\Administrators,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,FMRWX,Creator Owner,\Common,This folder only,Abstract\C +reator Owner,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,FMRWX,FP NOW BMG FSE NTFS Admins,\Common,This folder on +ly,Pathway12.My.Corp.com\FP NOW BMG FSE NTFS Admins,IRS Data (1/1),PI +I (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,FMRWX,ClusterSvcDIR,\Common,This folder only,Pathway12. +My.Corp.com\ClusterSvcDIR,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,FMRWX,SYSTEM,\Common,This folder only,Abstract\SYSTEM,I +RS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,FMRWX,MiJim,<not inherited>,This folder only,"Pathway12 +.My.Corp.com\Michaels, Jim@My",IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,MRWX,@FP DIR BMG,\Common,This folder only,Pathway12.My. +Corp.com\@FP DIR BMG,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,RX,&CDAdmin,\Common,This folder only,Pathway12.My.Corp. +com\&CDAdmin,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,RX,@FOO DSMS Admins,\Common,This folder only,Pathway12. +My.Corp.com\@FOO DSMS Admins,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,RX,FOO BMG FS Support,\Common,This folder only,Pathway1 +2.My.Corp.com\FOO BMG FS Support,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,RX,DPeterso,\Common,This folder only,"Pathway12.My.Corp +.com\Peterson, Dan@My",IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,RX,FP BMG IMG Read Access,\Common,This folder only,Path +way12.My.Corp.com\FP BMG IMG Read Access,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/My-B8245.pdf,FMRWX,@FOO NOW Onsite Support,\Com +mon,This folder only,Pathway12.My.Corp.com\@FOO NOW Onsite Support,IR +S Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/My-B8245.pdf,FMRWX,Administrators,\Common,This +folder only,10.15.106.71\Administrators,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/My-B8245.pdf,FMRWX,Creator Owner,\Common,This f +older only,Abstract\Creator Owner,IRS Data (1/1),PII (1),1 my %database; my $aoa = csv( in => \$data ); # FIXME change to filename my @output = shift @$aoa; # the header for my $fields ( @$aoa ) { my $ref = \%database; $ref = $ref->{$_} //= {} for @$fields; } combine( \%database ); # combine lines with common beginning csv( in => \@output, out => *STDOUT ); # FIXME change to filename sub tail { my $ref = shift; my ($key) = sort keys %$ref; $key ? ( $key, tail( $ref->{$key} ) ) : (); } sub combine { my ($ref, @lines) = @_; my @keys = sort keys %$ref; if( @keys > 1 and @lines >= 3 ) { my $group = join ',', @keys; push @output, [ @lines, $group, tail $ref->{$keys[0]} ]; } else { combine( $ref->{$_}, @lines, $_ ) for @keys; @keys or push @output, \@lines; } }

    which outputs:

    "File Server","Access Path","Current Permissions","Logon Name","Inheri +ted From Folders",Flags,User/Group,"Classification Results","Classifi +cation Results by Category (Including Nested)","Total Hit Count" 10.15.106.71,"/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Sub +scription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Ren +ewal FINAL.pdf",FMRWX,"@FOO NOW Onsite Support,Administrators,Cluster +SvcDIR,Creator Owner,FP NOW BMG FSE NTFS Admins,MiJim,SYSTEM",\Common +,"This folder only","Pathway12.My.Corp.com\@FOO NOW Onsite Support"," +IRS Data (1/1)","PII (1)",1 10.15.106.71,"/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Sub +scription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Ren +ewal FINAL.pdf",MRWX,"@FP DIR BMG",\Common,"This folder only","Pathwa +y12.My.Corp.com\@FP DIR BMG","IRS Data (1/1)","PII (1)",1 10.15.106.71,"/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Sub +scription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Ren +ewal FINAL.pdf",RX,"&CDAdmin,@FOO DSMS Admins,DPeterso,FOO BMG FS Sup +port,FP BMG IMG Read Access",\Common,"This folder only",Pathway12.My. +Corp.com\&CDAdmin,"IRS Data (1/1)","PII (1)",1 10.15.106.71,"/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Sub +scription Renewal Docs/My-B8245.pdf",FMRWX,"@FOO NOW Onsite Support,A +dministrators,Creator Owner",\Common,"This folder only","Pathway12.My +.Corp.com\@FOO NOW Onsite Support","IRS Data (1/1)","PII (1)",1

    This looks like it correctly quotes unchanged fields that contain commas. If it doesn't for you, please post the failed lines and the code you ran to get the failed lines.

      Hi.

      I was aiming to replicate the output as provided by OP, which you'll note also doesn't have "proper" use of quoted strings.

      I should have highlighted that issue.
      My bad

        No worries. Thank you! I took some time and studied your code, and understood it!
      Okay.. I'm REALLY struggling to understand what's going on here - few others used similar syntax. I've tried running it through a debugger and I think that because there's SOOO much going on in individual lines, I'm missing a lot.

      my $ref = \%database; $ref = $ref->{$_} //= {} for @$fields;
      is puzzling. I tried to find info on //= and found only a clumping of similar nomenclature vaguely saying it transfers info into arrays. But couldn't look at intermediate values from what I think is a for loop built into the end of the end of the 2nd line. The $_ is familiar vaguely as a reference to a value within a for loop (for instance), but the $ref->{$_} part, along with the //= {} (empty set?) is throwing me as to what's happening here.

      (try searching for "//=" and you get back a lot of nothing!! Yeesh... I was (somewhat) happy to have finally found the vague reference. It just didn't help me much. LOL

      I'm admittedly struggling with references and dereferences, and how it's being used or addressed.

      So looking at how %database gets filled isn't intuitively obvious for me. , nor the entire effort within combine and tail.

      I'll study this some more. Have had to deal with my mother-in-law's recent stroke and husband ( both 90), and competing priorities in the job. It's hard to keep focused and dive slowly through this.

      I did want to say thank you tybalt89!
        I tried to find info on //= and found only a clumping of similar nomenclature vaguely saying it transfers info into arrays.

        //= combines the // (defined-or) operator and the = (assignment) operatior into one, exactly like ||= (or + assign), += (add + assign), -= (subtract + assign), and many others.

        The // (defined-or) operator is relatively new, it was introduced in 5.10.0 and you can find a short intro in perl5100delta:

        Defined-or operator

        A new operator // (defined-or) has been implemented. The following expression:

        $a // $b

        is merely equivalent to

        defined $a ? $a : $b

        and the statement

        $c //= $d;

        can now be used instead of

        $c = $d unless defined $c;

        The // operator has the same precedence and associativity as ||. Special care has been taken to ensure that this operator Do What You Mean while not breaking old code, but some edge cases involving the empty regular expression may now parse differently. See perlop for details.


        my $ref = \%database; $ref = $ref->{$_} //= {} for @$fields;

        is puzzling.

        Yes, it is. Partly because it is optimized for minimal keystrokes, not for readability. A typical beginner's mistake.

        But it does not take much to decode. You should now know //=. $ref is initially a reference to the hash %database. (If you know C, a reference can be thought of as a pointer.) $ref->{$_} //= {} assigns a new, empty hash ({}) to $ref->{$_} unless $ref->{$_} exists and is not undef. Then, $ref is assigned whatever is stored in $ref->{$_}. And finally, the entire $ref = $ref->{$_} //= {} is executed for all elements of the $fields array reference, setting $_ to each array element. (Technically, $_ is aliased, but it does not matter here.)

        The same code, but much more verbose, looks like this:

        my $ref = \%database; foreach my $field (@$fields) { unless (defined $ref->{$field}) { $ref->{$field} = {}; } $ref = $ref->{$field}; }

        (Seven lines instead of just two. What a waste of disk space and screen estate! And all the wear on the keyboard for those unnecessary characters! A single line would have been sufficient, no whitespace needed at all: my$r=\%d;$r=$r->{$_}//={}for@$f; - SCNR)

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)