Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re^2: Matching and combining two text files

by GrandFather (Saint)
on Jan 23, 2012 at 20:27 UTC ( #949519=note: print w/replies, xml ) Need Help??

in reply to Re: Matching and combining two text files
in thread Matching and combining two text files

Bad koolgirl. I gave you nice clean code and you nastised it. Here are a few guidelines to help make it nice again:

  1. define variables where they are first required
  2. use the three argument version of open
  3. use lexical file handles (ones declared with my)
  4. avoid nesting if
  5. don't slurp files and such (for loops implicitly slurp)
  6. show the name for failed opens along with the OS's error message
  7. don't comment block ends. If you avoid nesting, keep blocks short and indent nicely there is no need and it removes clutter

and the niceised code:

use strict; use warnings; my $dir = 'F:\project_files'; my $fInName = 'carson_county_abstract.txt'; my $fOutName = 'cars_abstract.txt'; my %docs; opendir my $dirScan, $dir or die "Failed to open $dir: $!\n"; while (defined(my $entry = readdir $dirScan)) { next if $entry !~ /docs(.*)/; my $currParcel = $1; my $filePath = "$dir\\$entry"; open my $inFile, '<', $filePath or die "Can't open $filePath: $!\n +"; while (defined (my $line = <$inFile>)) { chomp $line; $docs{$line} = $currParcel; } close $inFile; } open my $fIn, '<', $fInName or die "Failed to open $fInName: $!"; open my $fOut, '<', $fOutName or die "Failed to create $fOutName: $!"; while (defined (my $line = <$fIn>)) { next if $line !~ /Document #: ([0-9]*)(.*)/; print $docs{$1} . "\n"; print $fOut "parcel# $docs{$1} doc num $1 $2\n"; } close $fOut or die "Error closing $fOut: $!\n";

To fix your duplication problem you might want to use another hash to check for duplicate document/parcel pairs.

Ok, and if you don't understand stuff find a friendly web site dedicated to the area and ask for clarification ;).

True laziness is hard work

Replies are listed 'Best First'.
Re^3: Matching and combining two text files
by koolgirl (Hermit) on Jan 23, 2012 at 22:53 UTC

    hahahaha, <--- that's real laughter, not nervous, I look like a slump and need to play it off laughter. ;-) OK, so, I hardly ever use any of the types of syntax you do. I'm finding this a lot. I guess it's because I learned from the first edition books, and never have learned much about new ways, for instance,

    open(IN, $file) || die $!

    is the only way I know how to open a file, and have never bothered using a different method, just as I have never used

    next if

    and I've always been taught that nice clean code, always has all variables declared before hand, never in the middle of the program. So, basically, my "nastiness" of the code you gave me :p, was to figure out what you were doing, where I didn't understand. I have sort of been thrown into the big leagues whether I belong there or not, so I guess I better pick up a bat...

    So, if I may, please help me understand why you're using the next if, instead of regular if's nested as I did, why nesting if statements is a bad idea, and what do you mean about slurping up files? I know I'm probably going to lose a million XP and get slammed for these questions, but oh well, I tried to avoid it and GrandFather threw me out in the open, so I guess I might as well ask now. ;-)

      Asking about things you don't understand is smart and good. Not asking is the dumbest thing you can do because it wastes everyone's time and you learn nothing and keep repeating the same mistakes.

      I've always been taught that nice clean code, always has all variables declared before hand

      I'd get a new teacher if I were you! Code with all the declarations lumped together may look pretty to some eyes, but you gain no advantage whatsoever from declaring variables like that. Always declare variables in the smallest sensible scope and initialise them at the same time. That helps avoid a whole slew of bugs including reusing a variable name and ending up with subtle heisenbugs as a result.

      First, to answer a couple of questions you didn't ask. Use the three parameter form of open (open handle, mode, target) because providing an explicit mode ('>' or '<' for example) is both clearer and safer. Use lexical file handles because they are clearer and safer - safer because their scope is limited to the current block and using strict you are more likely to catch typos.

      The point about avoiding nesting is that the deeper the nesting goes the harder it is to figure out what the code does. If you have a simple test and can bail (as in the next if line) you don't have to worry about that case any more, it's all done and dusted.

      Slurping is where you read everything into an array. If that is followed by looping over the array using a for loop then very likely you can remove the array and use a while loop instead. That has two advantages: 1/ you can see from the code that you loop while there is stuff in the file (which gets sorta obscured by the array), and 2/ you only read one line of the file into memory at a time. Most often the second point isn't all that important, but if the file is huge it can be a killer. Neither reason is absolutely compelling, but slurping seldom has an advantage over iteration using a while loop so you might as well go with clearer and use the wile loop (iteration) form.

      True laziness is hard work

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://949519]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2022-09-27 08:40 GMT
Find Nodes?
    Voting Booth?
    I prefer my indexes to start at:

    Results (118 votes). Check out past polls.