Re^3: Fastest way to merge (and de-dup) two large arrays?

Replies are listed 'Best First'.
Re^4: Fastest way to merge (and de-dup) two large arrays? by perldigious (Priest) on Aug 12, 2016 at 13:41 UTC
I think I get it, you are trying to ensure the one to one equality between the ultimately combined array and the lookup hash of its lines, in other words maintaining the integrity between the two. I was actually thinking of treating the lookup hash like a temporary throwaway. `my @rows = <$file_for_rows>; { my %seen = map {$_ => 1} @rows; while (my $rawData = <$file_for_data>) { push @rows, $rawData if (!$seen{$rawData}); } }` [download] I figured this way the larger array is read in to `@rows` right away (assumes no need to check for duplicates within itself), and then the `@data` array is read in and checked a line at a time (it may be faster to read it all in to an array first, but I figured this saves a little temporary memory) before being added to `@rows` if it doesn't already exist in it. I added the extra unlabeled code block to lower scope everything except `@rows` assuming that once the block is done `@rows` will have all the lines from both arrays with no duplicates and the memory taken up by the lookup hash is freed up again (I'm under the impression that's an advantage of the added scoping anyway). Again, I was assuming there was no need to check for duplicates inside of each individual array, and that the lookup hash is just a temporarily created throwaway that isn't needed after the merge is finished. Sorry, I like asking little questions and debating about minutia like this because I don't have nearly as much experience with Perl or coding in general as a lot of the Monks on here (definitely no where near as much as you), so asking knit-picky little questions like these helps me learn, and hopefully helps others who are like me learn too when they read it. It's why I've quickly grown to like this place so much. I love it when things get difficult; after all, difficult pays the mortgage. - Dr. Keith Whites I hate it when things get difficult, so I'll just sell my house and rent cheap instead. - perldigious	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^4: Fastest way to merge (and de-dup) two large arrays?
by perldigious (Priest) on Aug 12, 2016 at 13:41 UTC

I think I get it, you are trying to ensure the one to one equality between the ultimately combined array and the lookup hash of its lines, in other words maintaining the integrity between the two. I was actually thinking of treating the lookup hash like a temporary throwaway.

my @rows = <$file_for_rows>;
{
    my %seen = map {$_ => 1} @rows;
    
    while (my $rawData = <$file_for_data>)
    {
        push @rows, $rawData if (!$seen{$rawData});
    }
}
[download]

I figured this way the larger array is read in to @rows right away (assumes no need to check for duplicates within itself), and then the @data array is read in and checked a line at a time (it may be faster to read it all in to an array first, but I figured this saves a little temporary memory) before being added to @rows if it doesn't already exist in it. I added the extra unlabeled code block to lower scope everything except @rows assuming that once the block is done @rows will have all the lines from both arrays with no duplicates and the memory taken up by the lookup hash is freed up again (I'm under the impression that's an advantage of the added scoping anyway).

Again, I was assuming there was no need to check for duplicates inside of each individual array, and that the lookup hash is just a temporarily created throwaway that isn't needed after the merge is finished.

Sorry, I like asking little questions and debating about minutia like this because I don't have nearly as much experience with Perl or coding in general as a lot of the Monks on here (definitely no where near as much as you), so asking knit-picky little questions like these helps me learn, and hopefully helps others who are like me learn too when they read it. It's why I've quickly grown to like this place so much.

I love it when things get difficult; after all, difficult pays the mortgage. - Dr. Keith Whites
I hate it when things get difficult, so I'll just sell my house and rent cheap instead. - perldigious

[reply]
[d/l]
[select]