The filesystem caches pages, not files. So while
perl is reading line by line, it probably reads often
from the same page. Every time that page is read from
cache. Works quite efficient for sequential reads.
The DB is quite easy to use, it has a tied interface (aka
you can approach it just like a hash).
Indeed regular line by line. That way you can reduce
the size of the data to reduce memory usage. For example,
you can remove double spaces, remove unneeded data,
or write numbers as bytes, etc etc, without having to store
everything in memory.