in reply to Re: Cout & parsing
in thread Cout & parsing

^P (for example) is not the character ^ followed by the character P. It's the single character Ctrl-P (chr 16). The OP just posted the output of a program that represent chr 16 as ^P. When the strings were stored, they were prefixed by their length. 16 is the length of bishop@yahho.com.

The length might not always be a byte. In some libraries (such as C++ STL, I think), the size of the field varies to accomodate strings longer than 255 characters.

Also, email addresses are not as simple as you assume. They may contain multiple "@", for starters.

Replies are listed 'Best First'.
Re^3: Cout & parsing
by fishbot_v2 (Chaplain) on Dec 06, 2005 at 00:25 UTC

    It looks like a job for unpack. Inspect the format in detail - if all the length encoding are a single byte, and there are only email addresses and numbers (as there are in your example) then unpacking the binary format makes more sense than casting to a string and regexing your way through.

    my $str = << 'EOT'; ^Pbishop@yahho.com^H17769025^D3352^Vblueangel@acc essmo.com^H17769714^D3352^Oboe@stooges.com^H17773 126^D3352^Mbirk@joke.com^H17773968^D3352^Rbobfitz @mcione.com^H17768877^D3352^Nbob@yohaoo.com^H1776 9806^D3352^R EOT # stitch and recast control characters # use the original binary format in reality $str =~ s/\s+//g; $str =~ s/\^([A-Z])/chr( ord($1) & 0xBF )/ge; my @emails = grep m/@/, unpack "(c/a)*", $str; print "$_\n" for @emails;

    If the data is indeed formatted the way it appears to be, this avoids all sorts of unpleasantness when emails are longer than 32 characters, and the lengths cease to be in the range of control characters.

      Nice, I didn't know about .../a. Unforunately, (...) requires Perl 5.8.0+