andyw has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I've got about 1,000 files with various text in them including pairs split by : How do I skip over everything that doesn't have : between it.
i.e. I just want the name:value pairs
Example...

This is a text file full of stuff
name:andy
phone:123-456-7896
address: 1 street ave
city:my town
This is the end of the file


thank you thank you

Replies are listed 'Best First'.
Re: Pulling out pairs
by dws (Chancellor) on Dec 13, 2001 at 03:55 UTC
    How do I skip over everything that doesn't have : between it.

    Assuming you're passing filenames on the command line, try

    while ( <> ) { next if ! /:/; ... }
Re: Pulling out pairs
by chip (Curate) on Dec 13, 2001 at 04:09 UTC
    while (<>) { chomp; (my ($key, $val) = split /:/) == 2 or next; }

        -- Chip Salzenberg, Free-Floating Agent of Chaos

      Thats rather clever.... There is a subtle issue with extrapolating that particular idiom though. Suppose we wanted to match lines of exactly four fields, but were only interested in the first two fields of such lines:
      dog:cat:pig <== skip, only has three fields Amy:Ann <== skip, only has two fields ape:ant:bug:car <== match this line (it has four fields) but we only need to keep 'ape' and 'ant'

      Tweaking the code above, we might think that this will do the trick for us:

      while (<DATA>) { chomp; (my ($first, $second) = split /:/) == 4 or next; print "$first $second\n"; } __DATA__ dog:cat:pig Amy:Ann ape:ant:bug:car
      However, that doesn't seem to work for us, nothing gets printed. In fact, try as we might, we can't construct any line at all that passes the IDIOM==4 test.

      So, does anyone want to guess why IDIOM==4 is so different from IDIOM==2?

      Update: As an added hint, testing for IDIOM==3 will match the 'dog:cat:pig' line....

      -Blake

        The problem is that the optimiser will see:
        ($a, $b) = split /:/
        And know that things can go faster by turning that into:
        ($a, $b) = split /:/, $_, 3
        thus you actually should write:
        (my ($a, $b) = split /:/, $_, 5) == 4 or next;
Re: Pulling out pairs
by runrig (Abbot) on Dec 13, 2001 at 04:02 UTC
    while (<>) { next unless /^([^:]+):(.*)/; my ($name, $value) = ($1, $2); ...
Re: Pulling out pairs
by how do i know if the string is regular expression (Initiate) on Dec 13, 2001 at 04:57 UTC
    @pairs = grep(/^([^:]+):(.*)/, <>);
    Or if you want to only find the lines that have only one colon...
    @pairs = grep(/^([^:]+):([^:]+)$/, <>);
    /me loves grep

    - FrankG