in reply to Building a dynamic array or some other method?

I did this that almost exactly reproduces the output in your example, :
#!perl use strict; use warnings; use Text::CSV qw( csv ); my $magic_column = 3; #0-indexed my $aoa = csv( in => $ARGV[0] ); my %uniq = (); my $head = shift @$aoa; foreach my $line (@$aoa) { my $key = join( ',', @{$line}[ 0 .. $magic_column - 1 ] ); my $suffix = join( ',', @{$line}[ $magic_column + 1 .. $#$line ] ) +; $uniq{$key} //= [ [], $suffix ]; push @{ $uniq{$key}[0] }, $line->[$magic_column]; } print join( ',', @$head ), "\n"; foreach my $line ( sort { lc $a cmp lc $b } keys %uniq ) { my $magic = join( ',', sort { lc $a cmp lc $b } @{ $uniq{$line}[0] + } ); if ( @{ $uniq{$line}[0] } > 1 ) { $magic = '"' . $magic . '"'; } print $line, ',', $magic, ',', $uniq{$line}[1], "\n"; }
The two difrences are:
Your output includes the string MiJim(*) but the (*) doesn't appear anywhere in youur source data, and nothing in your question explains where it would come from.

Secondly, I don't know what order you reproduce the 'combined' column 3. It's not native or alphabetical order 🤷‍♂️.

Replies are listed 'Best First'.
Re^2: Building a dynamic array or some other method?
by CAdood (Acolyte) on Apr 24, 2024 at 03:30 UTC
    The (*) notation was going to be a quick and dirty reference to a non-inherited permission for a file. I didn't mention it in the initial posting, but mentioned it in the posting where I show it between the CSV in and CSV out samples.

    The columns show:
    File Server,Access Path,Current Permissions,Logon Name,Inherited From Folders,Flags,User/Group,Classification Results,Classification Results by Category (Including Nested),Total Hit Count

    In the sample input CSV, there are a few unique files shown(column 2), and column 3 shows what permissions each user or group has to that file. So those get minimized to show 1 line per file, and it's column 4 where I start packing all the users and groups together that all have the same permissions seen in column 3.

    There's no order to the combine list of users/groups. As I'm parsing the file (sorted by server and filename and permissions), I intend to look at whether those are all the same, and if there's a user already associated with that permission for that file, I append the current line's user onto the list of users that are already having the same permissions. So it's just "whatever comes next" for the list of users.

    I'm sorry I didn't that more clear.

    I'm going to pore over what people have submitted because I have a lot to learn from those techniques. I do want to thank everyone who has provided sample code! I had training all day today, and will have another day tomorrow, and I'll look more closely at it.

    I saw someone used map above, and I failed in using it, and couldn't figure out why it failed, so I want to look over how it's being used there. I could only use it on @ARGV for some reason, and not other arrays. (I had a comment in my code that mentioned the failure)

    Thank you again! I'll be sure to ask some questions if I can't figure out how some of the code functionality works.
Re^2: Building a dynamic array or some other method?
by CAdood (Acolyte) on May 02, 2024 at 16:44 UTC
    The_DJ, thank you for this. I've learned a lot by slowly poring through this to learn how this worked, and realized that by making unique keys of SEVERAL fields together, that simplifies things greatly! I've run into a problem that I'm not sure how to solve. The entire file gets slurped in within a single line, but I discovered on my full data set that someone actually created filenames with commas in it, causing it to not output correctly.

    Reading through the Text::CSV documentation (which apparently uses CSV_XS), I find something about a quote_char, but it defaults to a quote anyway.
    Adding the reference to it specifically

    my $aoa = csv( in => $filename, quote_char => "\""  );
    Also proves ineffective, or rather no change. (I think that's the default anyway.)

    The input data shows it as a quoted field, but I'm not sure what I'm doing wrong.

    Input data looks like this:
    10.15.106.71,"/ifs/PH01/PH01SUB/ENTNASIS02/PH02/SMB/Share/Share6/Emplo +yee-Share/Contracts/Privacy and Disclosure/Disclosure Unit/Active Con +tracts/County/Sample County/E00526, E00595 Sample County DA/2017-202 +0 M4385566/Emails & Correspondence/Welcome Letter.doc",FMRWX,Creator +Owner,\ifs\PH01\PH01SUB\ENTNASIS02\PH02\SMB\Share\Share6\Employee-Sha +re\Contracts,This folder only,Abstract\Creator Owner,"US PII (1/1),Do +cument Passwords - 2.0 (1/1),US Social Security Number (1/1),GLBA (Gr +amm-Leach Bliley Act) (1/1)","Credentials (1),Financial (1),PII (2)", +4


    After focusing a bit on the 1st part of the data (before the magic field), I noticed the quoted commas issue was there in all other lines containing more than one sensitive data type the entire time. (after the magic field)

    So, somehow I need to figure out how to slurp in (as one field) anything with quotes and having commas within, but still comma separated.

    Am I missing something too obvious in the docs? Text::CSV#quote_char

      I'm not sure what it is that you're doing which has resulted in abnormal operation but here it is as an SSCCE.

      #!/usr/bin/env perl use strict; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new; my $line = '10.15.106.71,"/ifs/PH01/PH01SUB/ENTNASIS02/PH02/SMB/Share/Share6/Em +ployee-Share/Contracts/Privacy and Disclosure/Disclosure Unit/Active +Contracts/County/Sample County/E00526, E00595 Sample County DA/2017- +2020 M4385566/Emails & Correspondence/Welcome Letter.doc",FMRWX,Creat +or Owner,\ifs\PH01\PH01SUB\ENTNASIS02\PH02\SMB\Share\Share6\Employee- +Share\Contracts,This folder only,Abstract\Creator Owner,"US PII (1/1) +,Document Passwords - 2.0 (1/1),US Social Security Number (1/1),GLBA +(Gramm-Leach Bliley Act) (1/1)","Credentials (1),Financial (1),PII (2 +)",4'; $csv->parse ($line); my @fields = $csv->fields; print "Input:\n$line\n\nOutput:\n" . join "\n\n", @fields;

      This outputs the parsed fields separated by empty lines so it should be trivial to see what is contained in each field. The quote characters are honoured as expected. HTH.


      🦛

        In an example previously submitted, I had a large file with multiple lines, with condensing many lines containing the same access privileges on a file being condensed.

        The_DJ posted an option to slurp in an entire file into an array of arrays.

        I took that example and in processing my file, it wasn't handling quoted fields in the original file that had commas within the quoted fields.

        Perhaps in only showing one line of data, with an example, it made it sound easy to solve within a single line.

        I thought that in slurping in an entire file, there was a mode I could properly switch to for handling the quoted fields (with embedded commas). I wasn't able to figure it out, and was asking if I was misunderstanding the documentation.

      It looks like The_DJ post was slurping in correctly, but was not using csv to output a correct csv file.

      Try this:

      #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11159049 use warnings; use Text::CSV qw( csv ); my $data = <<''; File Server,Access Path,Current Permissions,Logon Name,Inherited From +Folders,Flags,User/Group,Classification Results,Classification Result +s by Category (Including Nested),Total Hit Count 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,FMRWX,@FOO NOW Onsite Support,\Common,This folder only, +Pathway12.My.Corp.com\@FOO NOW Onsite Support,IRS Data (1/1),PII (1), +1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,FMRWX,Administrators,\Common,This folder only,10.15.106 +.71\Administrators,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,FMRWX,Creator Owner,\Common,This folder only,Abstract\C +reator Owner,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,FMRWX,FP NOW BMG FSE NTFS Admins,\Common,This folder on +ly,Pathway12.My.Corp.com\FP NOW BMG FSE NTFS Admins,IRS Data (1/1),PI +I (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,FMRWX,ClusterSvcDIR,\Common,This folder only,Pathway12. +My.Corp.com\ClusterSvcDIR,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,FMRWX,SYSTEM,\Common,This folder only,Abstract\SYSTEM,I +RS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,FMRWX,MiJim,<not inherited>,This folder only,"Pathway12 +.My.Corp.com\Michaels, Jim@My",IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,MRWX,@FP DIR BMG,\Common,This folder only,Pathway12.My. +Corp.com\@FP DIR BMG,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,RX,&CDAdmin,\Common,This folder only,Pathway12.My.Corp. +com\&CDAdmin,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,RX,@FOO DSMS Admins,\Common,This folder only,Pathway12. +My.Corp.com\@FOO DSMS Admins,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,RX,FOO BMG FS Support,\Common,This folder only,Pathway1 +2.My.Corp.com\FOO BMG FS Support,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,RX,DPeterso,\Common,This folder only,"Pathway12.My.Corp +.com\Peterson, Dan@My",IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Rene +wal FINAL.pdf,RX,FP BMG IMG Read Access,\Common,This folder only,Path +way12.My.Corp.com\FP BMG IMG Read Access,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/My-B8245.pdf,FMRWX,@FOO NOW Onsite Support,\Com +mon,This folder only,Pathway12.My.Corp.com\@FOO NOW Onsite Support,IR +S Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/My-B8245.pdf,FMRWX,Administrators,\Common,This +folder only,10.15.106.71\Administrators,IRS Data (1/1),PII (1),1 10.15.106.71,/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Subs +cription Renewal Docs/My-B8245.pdf,FMRWX,Creator Owner,\Common,This f +older only,Abstract\Creator Owner,IRS Data (1/1),PII (1),1 my %database; my $aoa = csv( in => \$data ); # FIXME change to filename my @output = shift @$aoa; # the header for my $fields ( @$aoa ) { my $ref = \%database; $ref = $ref->{$_} //= {} for @$fields; } combine( \%database ); # combine lines with common beginning csv( in => \@output, out => *STDOUT ); # FIXME change to filename sub tail { my $ref = shift; my ($key) = sort keys %$ref; $key ? ( $key, tail( $ref->{$key} ) ) : (); } sub combine { my ($ref, @lines) = @_; my @keys = sort keys %$ref; if( @keys > 1 and @lines >= 3 ) { my $group = join ',', @keys; push @output, [ @lines, $group, tail $ref->{$keys[0]} ]; } else { combine( $ref->{$_}, @lines, $_ ) for @keys; @keys or push @output, \@lines; } }

      which outputs:

      "File Server","Access Path","Current Permissions","Logon Name","Inheri +ted From Folders",Flags,User/Group,"Classification Results","Classifi +cation Results by Category (Including Nested)","Total Hit Count" 10.15.106.71,"/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Sub +scription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Ren +ewal FINAL.pdf",FMRWX,"@FOO NOW Onsite Support,Administrators,Cluster +SvcDIR,Creator Owner,FP NOW BMG FSE NTFS Admins,MiJim,SYSTEM",\Common +,"This folder only","Pathway12.My.Corp.com\@FOO NOW Onsite Support"," +IRS Data (1/1)","PII (1)",1 10.15.106.71,"/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Sub +scription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Ren +ewal FINAL.pdf",MRWX,"@FP DIR BMG",\Common,"This folder only","Pathwa +y12.My.Corp.com\@FP DIR BMG","IRS Data (1/1)","PII (1)",1 10.15.106.71,"/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Sub +scription Renewal Docs/Axidome Quote My Corp 2020211 KnowBe4 2-yr Ren +ewal FINAL.pdf",RX,"&CDAdmin,@FOO DSMS Admins,DPeterso,FOO BMG FS Sup +port,FP BMG IMG Read Access",\Common,"This folder only",Pathway12.My. +Corp.com\&CDAdmin,"IRS Data (1/1)","PII (1)",1 10.15.106.71,"/Common/Awareness and Training/KnowBe4/2020- KnowBe4 Sub +scription Renewal Docs/My-B8245.pdf",FMRWX,"@FOO NOW Onsite Support,A +dministrators,Creator Owner",\Common,"This folder only","Pathway12.My +.Corp.com\@FOO NOW Onsite Support","IRS Data (1/1)","PII (1)",1

      This looks like it correctly quotes unchanged fields that contain commas. If it doesn't for you, please post the failed lines and the code you ran to get the failed lines.

        Hi.

        I was aiming to replicate the output as provided by OP, which you'll note also doesn't have "proper" use of quoted strings.

        I should have highlighted that issue.
        My bad

        Okay.. I'm REALLY struggling to understand what's going on here - few others used similar syntax. I've tried running it through a debugger and I think that because there's SOOO much going on in individual lines, I'm missing a lot.

        my $ref = \%database; $ref = $ref->{$_} //= {} for @$fields;
        is puzzling. I tried to find info on //= and found only a clumping of similar nomenclature vaguely saying it transfers info into arrays. But couldn't look at intermediate values from what I think is a for loop built into the end of the end of the 2nd line. The $_ is familiar vaguely as a reference to a value within a for loop (for instance), but the $ref->{$_} part, along with the //= {} (empty set?) is throwing me as to what's happening here.

        (try searching for "//=" and you get back a lot of nothing!! Yeesh... I was (somewhat) happy to have finally found the vague reference. It just didn't help me much. LOL

        I'm admittedly struggling with references and dereferences, and how it's being used or addressed.

        So looking at how %database gets filled isn't intuitively obvious for me. , nor the entire effort within combine and tail.

        I'll study this some more. Have had to deal with my mother-in-law's recent stroke and husband ( both 90), and competing priorities in the job. It's hard to keep focused and dive slowly through this.

        I did want to say thank you tybalt89!