The benefit of using a database, either an RDMBS or DBM, is that the question "which files use this word" can be answered in a relatively small number of disk reads compared with scanning a large flat file.
With a flat file, you have to read until you find the word (or determine it isn't there). On average, you'll have to read half of the file. Worst case, the entire file. Assuming a 400Kb file is read in 8K chunks (by underlying OS buffering), that's 25 reads on average; 50 worst case.
With a database, you'll incur a small number or disk reads to traverse the index, and then a single read to pick up the rest of the record (file numbers, in your example).
If the search expression contains a number of words, at some point it might be faster to scan a flat file (assuming you're going to search for all terms in a single pass). You may need to determine the break-even point by experimenting.
Another issue to consider in choosing between a flatfile or database-based implementations is ease of update, particularly if you're going to attempt incremental updates. It's going to be faster to make a single pass over a flatfile (writing an updated flatfile to replace it) than it is to scan the database record-by-record.
With that as background: let's consider your questions.
Q: Is there something inherently better about using a database?
A: "It depends", but the answer is probably YES.
Q: What's the DBM format, and how does it compare to SQL?
A: Both separate indexes from record data, allowing a "does this key exist" search to be done with a small number of disk accesses.
Q: Is DBM common?
A: Yes, but some (older, poor) implementation limit the record size to 1K. Check this on your system.
Q: Where can I go to learn about Perl talking to SQL?
A: There are a number of good books and on-line articles that talk about using Perl's DBI interface (DBI.pm) to talk to databases. Take a look at O'Reilly's new
ONLAMP site, or search for "Perl DBD" on <href="http://www.google.com/">Google.
Q: Should I be opening my 400Kb file for every search term?
A: No. Write some extra code so that you can do the search in one pass.
Q: What about RAM?
A: If you're scanning the file line by line, RAM shouldn't be an issue for you. Perl will buffer a disk page at a time, from which it extracts lines (which you read into $_, which gets overwritten every time you read a line.)
Q: What about my data structures?
A: On first blush, they seem O.K. Having URLs and document titles available as arrays isn't going to significantly increase your script's startup time, and having your indexing process emit a .pl file you can require is an elegant way to load those arrays.
Let's pretend you also asked...
Q: Is there anything else I can do to save space or time?
A: Consider using a set of "stop words" -- words that are so common that they're liable to occur in nearly every document (e.g., "a an are in it is the this") and don't index those words. In the unlikely event that someone searches only on one of those terms, you can either verbally abuse them or do a linear search through your documents. You can probably cut your index size in half.
P.S. Good first post. Welcome to the Monastery.