in reply to Removing doubles and printing only unique values

Using split, map and grep inside a do block.

johngg@shiraz:~/perl/Monks > perl -Mstrict -Mwarnings -E ' open my $inFH, q{<}, \ <<__EOD__ or die $!; C000004923;VK11070778;Delta;;16/08/2017;15/09/2017;Prepayment;Yes;Addi +tional note C000004923;VK11070778;Rounding;;16/08/2017;15/09/2017;Prepayment;Yes;A +dditional note C000004924;VK11070778;Delta Gas;;16/08/2017;15/09/2017;Prepayment;Yes; +Additional note C000858948;VK11070783;Delta;;3/01/2017;2/02/2017;Prepayment;Yes;Additi +onal note C001028127;VK11070844;Delta;;1/07/2017;31/07/2017;Prepayment;Yes;Addit +ional note C000863388;VK11070869;Delta;;14/03/2016;13/04/2016;Prepayment;Yes;Addi +tional note C000863388;VK11070869;Rounding;;14/03/2016;13/04/2016;Prepayment;Yes;A +dditional note C000863389;VK11070869;Delta Gas;;14/03/2016;13/04/2016;Prepayment;Yes; +Additional note C001041275;VK11070873;Delta;;14/04/2017;13/05/2017;Prepayment;Yes;Addi +tional note C000457921;VK11070913;Delta;;11/12/2014;10/01/2015;Prepayment;Yes;Addi +tional note C000457922;VK11070913;Delta Gas;;11/12/2014;10/01/2015;Prepayment;Yes; +Additional note C000354278;VK11070920;Delta;;21/09/2015;20/10/2015;Prepayment;Yes;Addi +tional note C000354278;VK11070920;Rounding;;21/09/2015;20/10/2015;Prepayment;Yes;A +dditional note C001139698;VK11070923;Delta;;12/08/2017;11/09/2017;Prepayment;Yes;Addi +tional note C001139698;VK11070923;Rounding;;12/08/2017;11/09/2017;Prepayment;Yes;A +dditional note C001072986;VK11070933;Delta;;14/03/2017;15/05/2017;Prepayment;Yes;Addi +tional note C001072986;VK11070933;Rounding;;14/03/2017;15/05/2017;Prepayment;Yes;A +dditional note C000833421;VK11074400;Delta;;1/05/2017;31/05/2017;Prepayment;Yes;Addit +ional note C000833422;VK11074400;Delta Gas;;1/05/2017;31/05/2017;Prepayment;Yes;A +dditional note C000833422;VK11074400;Rounding;;1/05/2017;31/05/2017;Prepayment;Yes;Ad +ditional note C000147059;VK11074404;Delta;;20/06/2017;19/07/2017;Prepayment;Yes;Addi +tional note C000147062;VK11074404;Delta Gas;;20/06/2017;19/07/2017;Prepayment;Yes; +Additional note C001109215;VK11074415;Delta;;24/08/2017;23/09/2017;Prepayment;Yes;Addi +tional note C000313157;VK11074418;Delta;;15/11/2016;14/12/2016;Prepayment;Yes;Addi +tional note C000313157;VK11074418;Rounding;;15/11/2016;14/12/2016;Prepayment;Yes;A +dditional note C000313158;VK11074418;Delta Gas;;11/11/2016;10/12/2016;Prepayment;Yes; +Additional note C001099002;VK11074430;Delta;;1/08/2017;31/08/2017;Prepayment;Yes;Addit +ional note C001117234;VK11074441;Delta Gas;;15/06/2017;14/07/2017;Prepayment;Yes; +Additional note C001009800;VK11074443;Delta;;16/11/2016;15/12/2016;Prepayment;Yes;Addi +tional note C000679686;VK11074451;Delta;;20/06/2016;19/07/2016;Prepayment;Yes;Addi +tional note C000679687;VK11074451;Delta Gas;;20/06/2016;19/07/2016;Prepayment;Yes; +Additional note C001242987;VK11074454;Delta Gas;;15/06/2017;14/07/2017;Prepayment;Yes; +Additional note C001080282;VK11074470;Delta;;2/03/2017;1/04/2017;Prepayment;Yes;Additi +onal note C001080283;VK11074470;Delta Gas;;2/03/2017;1/04/2017;Prepayment;Yes;Ad +ditional note C001192414;VK11074473;Delta;;14/07/2017;13/08/2017;Prepayment;Yes;Addi +tional note C001192414;VK11074473;Rounding;;14/07/2017;13/08/2017;Prepayment;Yes;A +dditional note C001192415;VK11074473;Delta Gas;;14/07/2017;13/08/2017;Prepayment;Yes; +Additional note C001192415;VK11074473;Rounding;;14/07/2017;13/08/2017;Prepayment;Yes;A +dditional note C000268914;VK11074478;Delta;;9/10/2016;8/11/2016;Prepayment;Yes;Additi +onal note C000268914;VK11074478;Rounding;;9/10/2016;8/11/2016;Prepayment;Yes;Add +itional note __EOD__ say for do { my %seen; grep { not $seen{ $_ } ++ } map { ( split m{;} )[ 1 ] } <$inFH>; };' VK11070778 VK11070783 VK11070844 VK11070869 VK11070873 VK11070913 VK11070920 VK11070923 VK11070933 VK11074400 VK11074404 VK11074415 VK11074418 VK11074430 VK11074441 VK11074443 VK11074451 VK11074454 VK11074470 VK11074473 VK11074478

I hope this is useful.

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^2: Removing doubles and printing only unique values
by holli (Abbot) on Oct 31, 2017 at 14:01 UTC
    Sub-optimal.
    say for do { # <-- mixed paradigms -------- my %seen; # <-- reinventing the wheel | grep { not $seen{ $_ } ++ } # <--------------------------- + map { ( split m{;} )[ 1 ] } # too much logic, but that's nitpickin +g <$inFH>; };'
    Better? I think so:
    use List::Util qw(uniqstr); say join "\n", uniqstr map { $_->[0] } map { [ split m{;} ] } <$inFH>;
    You could of course roll your own implementation of uniq if you want to.


    holli

    You can lead your users to water, but alas, you cannot drown them.

      Coding styles are somewhat subjective so the question of better or worse is hard to answer. I will set out why I have coded the solution that way and, incidentally, point out that your code should be operating on the second field and not the first.

      • say for do { ... I use a do block so that the %seen hash is not left hanging around but goes out of scope at the end of the block. Not necessary here as it is a one-liner but a good habit to get into I think. I'm not sure what you mean by "mixed paradigms" as I'm a bit old school, paradigms hadn't been invented when I started programming. To me it is just a piece of code that DWIMs.

      • The use of my %seen; grep { not $seen{ $_ } ++ } ... is so simple that it hardly seems worth loading a module, especially as in this case the module uses pretty much the same wheel. I'm all for modules when a task is more complex but not when there is nothing to gain.

      • I find it a bit puzzling that you consider map  { ( split m{;} )[ 1 ] } to be too much logic yet suggest as an alternative the use of two maps, the first to split and pass on an anonymous array, the second to pull out an element of that array (which actually should be element [1] not [0]). To me that appears to add more complication.

      Thank you for making those suggestions (++), it was interesting to look again at my post in the light of your comments and question whether it was optimal or not. On balance I don't think I would change anything as I can justify to myself the reasons for coding it that way. I would be interested to know if others feel that my reasoning is flawed.

      Cheers,

      JohnGG

        Maybe I was overly critical there. It's just the purist in me who says "Don't mix iterative and functional style".


        holli

        You can lead your users to water, but alas, you cannot drown them.