in reply to How can I keep the first occurrence from duplicated strings?

You could just reverse the order of the lines in the file...

use strict; use warnings; my @lines = reverse(<DATA>); my %test; foreach (@lines) { my ($name, $number) = split / /; $test{$name} = $number; } print %test; __DATA__ nick 5 nick 10 nick 20 john 78 erik 9 erik 12

This gives the result...

nick5 john78 erik9

Replies are listed 'Best First'.
Re^2: How can I keep the first occurrence from duplicated strings?
by afoken (Chancellor) on Aug 30, 2023 at 07:58 UTC

    You could just reverse the order of the lines in the file...

    use strict; use warnings; my @lines = reverse(<DATA>);

    This will read the entire file into RAM. No problem for 100 kBytes, big trouble for big files (larger than free RAM). The solutions from choroba, Grandfather, eyepopslikeamosquito, and the second solution from tybalt89 do not suffer from that problem, because they all read only one line at a time.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      See also File::ReadBackwards (however, its documentation doesn't mention file encodings, so it might blow up on UTF-8).

      This will read the entire file into RAM. No problem for 100 kBytes, big trouble for big files

      Yes of course that's true.

      But given the apparent nature of the data, I feel it's safe to assume that the file size will be small relative to available RAM and paging files. If it were more than a few lines then processing it using Perl is almost certainly the wrong approach. For a file large enough to be a problem, Perl should be reading in one line at a time and loading it into a database when the desired result of getting the first occurrence becomes trivial.

      So for a big huge file, the question needs asking on SQLMonks (wishful thinking...)

        For a file large enough to be a problem, Perl should be reading in one line at a time and loading it into a database

        IME you are seriously underestimating the time it would take to perform those insertions. I would not take this approach but rather would use the all-in-Perl approach as proposed by other respondents. It will be more robust, quicker to develop and faster to run to completion.

        There are of course different tasks where the time penalty of loading into a database will be outweighed by other advantages but a single pass through the data while discarding a majority of rows like this isn't one of them.


        🦛

        For a file large enough to be a problem, Perl should be reading in one line at a time and loading it into a database

        As usual, you'd need to benchmark the specific application to know which is faster.

        In this thread, where some suggested using an external database instead of a gigantic Perl hash, I remember this quote from BrowserUk:

        I've run Perl's hashes up to 30 billion keys/2 terabytes (ram) and they are 1 to 2 orders of magnitude faster, and ~1/3rd the size of storing the same data (64-bit integers) in an sqlite memory-based DB. And the performance difference increases as the size grows. Part of the difference is that however fast the C/C++ DB code is, calling into it from Perl, adds a layer of unavoidable overhead that Perl's built-in hashes do not have.

        In Re: Fastest way to lookup a point in a set, when asked if he tried a database, erix replied: "I did. It was so spectacularly much slower that I didn't bother posting it".

        In Re: Memory efficient way to deal with really large arrays? by Tux, Perl benchmarked way faster than every database tried (SQLite, Pg, mysql, MariaDB).

        With memory relentlessly getting bigger and cheaper (a DDR4 DIMM can hold up to 64 GB while DDR5 octuples that to 512 GB) doing everything in memory with huge arrays and hashes is becoming more practical over time.

        "If it were more than a few lines then processing it using Perl is almost certainly the wrong approach."

        Why?

        "For a file large enough to be a problem, Perl should be reading in one line at a time and loading it into a database when the desired result of getting the first occurrence becomes trivial."

        False, given this sort of task perl is capable of reading a line at a time and generating the required output without first inserting each line into a database then querying it.