Re: Parsing/manipulating CSV files

There's plenty of options. Maybe too many!

Regular expressions are one. For example - assuming your line has address, state & postcode:

my ($addr,$state,$pc) = $line =~ /([^,]+),([^,]+),([^,]+)/;
[download]

Stick that in a loop as you iterate over your file:

open DATA , "<", "path-to-file";
my @pcodes;
while (my $line = <DATA>){
my ($addr,$state,$pc) = $line =~ /([^,]+),([^,]+),([^,]+)/;

## Not a valid data line? Then move on.
defined $pc or next;

push @pcodes, $pc;
}
## Do stuff with @pcodes

close DATA;
[download]

Now you can do what you like with the post codes. Note that this doesn't read the whole file in one go - but 1 line at time.
If memory is a problem you could pause every 10k records, or so, and print out to file.

Is that any use to you?

Comment on Re: Parsing/manipulating CSV files Select or Download Code

Replies are listed 'Best First'.
Re^2: Parsing/manipulating CSV files by Ansi (Initiate) on Oct 21, 2011 at 18:07 UTC
Very much so;thank you. Of course you're the other side of the pond so any regexs aren't likely to work for me, but it points me in the right direction. As you point out there are likely many ways of doing the task and I don't want to waste too much time exploring them. I would rather know a way that it will work and learn what I need to of a relatively small area of the language I am going to have to iterate over the fields in each record to find the postcode (it's not in a regular place)using a comparison to a known format which is what I did last time I'll get there!	[reply]
Re^3: Parsing/manipulating CSV files by mrstlee (Beadle) on Oct 22, 2011 at 12:32 UTC
I'm not clear what you mean about the regex's. They tend to be portable to different locale's (well, within reason ...). For example: `\d - matches 0 .. 9 (\d+) - means match at least 1 digit, could be more. [^ ] - means don't match space \S - Another way to match something that isn't a space` [download] And so on are universal. It is a vast area and easy to get lost in. But absolutely invaluable for data parsing. Baby steps and you will indeed get there! OK - so the postcode isn't in a predictable place. I assume it is in a predictable format that won't match anything else in a record? e.g 2 upper case chars, followed by 3 digits and 2 further upper case chars. Then your regex: `$line =~ /,?([A-Z]{3}\d{2}[A-Z]),/;` [download] The ',?' bit means maybe match a comma ahead of the postcode. This is to catch the case that the pc is at the start of the record. $1 will contain your postcode. If you pass me the format I'd be happy to provide a suitable regex.	[reply] [d/l] [select]