in reply to Re: how to avoid full scan in file.
in thread how to avoid full scan in file.

I got this but the result I receive is not the same from the first script. I was analysing this code and I notice I will not cover the all combinations. My first result was 6382 lines and the result of this script was 928. The lines of second result file is in the first result file but it still missing some lines

Replies are listed 'Best First'.
Re^3: how to avoid full scan in file.
by poj (Abbot) on May 25, 2019 at 14:38 UTC

    Looks like your script creates multiple output records, so if

    l100107,bbbbbbb,c_0300,loc,12,6
    in FileA matches
    l100107,bbbbbbb,c_0300,389
    in FileB, the output is 6 lines (the value of $qtd the last column)
    l100107,bbbbbbb,loc,12,389
    l100107,bbbbbbb,loc,12,389
    l100107,bbbbbbb,loc,12,389
    l100107,bbbbbbb,loc,12,389
    l100107,bbbbbbb,loc,12,389
    l100107,bbbbbbb,loc,12,389
    

    Is that what you want ?

    Also, can you please explain what this code line does.

    last if $count == $max;
    poj
      Here the exactly example. I match the file A with the file B through the keys "l100107,bbbbbb,a_0100" so I decrement the $qtd from file A in this example 16 and 24 till 0. Notice that if you sum up this two values they are the $max of file B 40. There is no possibility that my process result different values from both files. I put the $tot in the result file to explain the flow.
      File A l100107,bbbbbb,a_0100,loc,13,16 l100107,bbbbbb,a_0100,loc,14,24 File B l100107,bbbbbb,a_0100,40 Result File l100107,bbbbbb,loc,13,40,15 l100107,bbbbbb,loc,14,40,23 l100107,bbbbbb,loc,13,40,14 l100107,bbbbbb,loc,14,40,22 l100107,bbbbbb,loc,13,40,13 l100107,bbbbbb,loc,14,40,21 l100107,bbbbbb,loc,13,40,12 l100107,bbbbbb,loc,14,40,20 l100107,bbbbbb,loc,13,40,11 l100107,bbbbbb,loc,14,40,19 l100107,bbbbbb,loc,13,40,10 l100107,bbbbbb,loc,14,40,18 l100107,bbbbbb,loc,13,40,9 l100107,bbbbbb,loc,14,40,17 l100107,bbbbbb,loc,13,40,8 l100107,bbbbbb,loc,14,40,16 l100107,bbbbbb,loc,13,40,7 l100107,bbbbbb,loc,14,40,15 l100107,bbbbbb,loc,13,40,6 l100107,bbbbbb,loc,14,40,14 l100107,bbbbbb,loc,13,40,5 l100107,bbbbbb,loc,14,40,13 l100107,bbbbbb,loc,13,40,4 l100107,bbbbbb,loc,14,40,12 l100107,bbbbbb,loc,13,40,3 l100107,bbbbbb,loc,14,40,11 l100107,bbbbbb,loc,13,40,2 l100107,bbbbbb,loc,14,40,10 l100107,bbbbbb,loc,13,40,1 l100107,bbbbbb,loc,14,40,9 l100107,bbbbbb,loc,13,40,0 l100107,bbbbbb,loc,14,40,8 l100107,bbbbbb,loc,14,40,7 l100107,bbbbbb,loc,14,40,6 l100107,bbbbbb,loc,14,40,5 l100107,bbbbbb,loc,14,40,4 l100107,bbbbbb,loc,14,40,3 l100107,bbbbbb,loc,14,40,2 l100107,bbbbbb,loc,14,40,1 l100107,bbbbbb,loc,14,40,0

        Is it OK if the records are sorted like this ?

        l100107,bbbbbb,loc,13,40,15
        l100107,bbbbbb,loc,13,40,14
        .. etc
        l100107,bbbbbb,loc,13,40,2
        l100107,bbbbbb,loc,13,40,1
        l100107,bbbbbb,loc,13,40,0
        l100107,bbbbbb,loc,14,40,23
        l100107,bbbbbb,loc,14,40,22
        .. etc
        l100107,bbbbbb,loc,14,40,2
        l100107,bbbbbb,loc,14,40,1
        l100107,bbbbbb,loc,14,40,0
        
        poj
      At this point in my process, I can not sort the $ idx column. This distribution I am using is similar to a distribution of playing cards. And I remove 1 item from each loop of each $ idx until it reaches 0.