in reply to Re^2: Cout & parsing
in thread Cout & parsing
It looks like a job for unpack. Inspect the format in detail - if all the length encoding are a single byte, and there are only email addresses and numbers (as there are in your example) then unpacking the binary format makes more sense than casting to a string and regexing your way through.
my $str = << 'EOT'; ^Pbishop@yahho.com^H17769025^D3352^Vblueangel@acc essmo.com^H17769714^D3352^Oboe@stooges.com^H17773 126^D3352^Mbirk@joke.com^H17773968^D3352^Rbobfitz @mcione.com^H17768877^D3352^Nbob@yohaoo.com^H1776 9806^D3352^R EOT # stitch and recast control characters # use the original binary format in reality $str =~ s/\s+//g; $str =~ s/\^([A-Z])/chr( ord($1) & 0xBF )/ge; my @emails = grep m/@/, unpack "(c/a)*", $str; print "$_\n" for @emails;
If the data is indeed formatted the way it appears to be, this avoids all sorts of unpleasantness when emails are longer than 32 characters, and the lengths cease to be in the range of control characters.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Cout & parsing
by ikegami (Patriarch) on Dec 06, 2005 at 00:58 UTC |