vishi has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

I am trying to match all occurances of a particular pattern in a file and put all the matched patterns into an array. I have the code below, but I'm not sure why it's not working.

The file has about 2-3 lines and the pattern I am looking for is like this : HOST=<something>-<domain.com> PORT=<num>, HOST=<something1>-<domain.com> PORT=<num1> ....etc

It looks something like this:
HOST=machine1-basement.xyz.com PORT=1234 HOST=machine2-attic.xyz.com PORT=9999 HOST=machine3-garage.xyz.com PORT=5555

I want to put all hosts in one array and the ports in another array. So, I am opening this file, reading one line at a time and pushing all matches into an array. However, this is not working!

open (SOURCETNS, "/home/$User/Work/PROJ/$sourceTnsFileName"); while ($record = <SOURCETNS>) { chomp ($record); if ($record =~ /[\w]*\-[\w]*/) { push (@sourceDBHostsFromTnsEntry, $&); } if ($record =~ /[\d]*/) { push (@sourceDBPortsFromTnsEntry, $&); } } close(SOURCETNS);

What am I doing wrong? Is this something like greedy matching? The problem here is that these patterns all occur in the same line...so I need to know how I should match multiple patterns in the same line and push them into the array?

Thanks!

Replies are listed 'Best First'.
Re: Am I doing Greedy Matching?
by jwkrahn (Abbot) on Nov 10, 2011 at 11:34 UTC

    You say the data looks like machine1-basement.xyz.com but the pattern /[\w]*\-[\w]*/ (or more simply /\w*-\w*/) will only match the string machine1-basement, not the whole domain name.

    You probably need something like this:

    open SOURCETNS, '<', "/home/$User/Work/PROJ/$sourceTnsFileName" or die + "Cannot open '/home/$User/Work/PROJ/$sourceTnsFileName' because: $!" +; while ( my $record = <SOURCETNS> ) { push @sourceDBHostsFromTnsEntry, $record =~ /HOST=(\S+-\S+)/g; push @sourceDBPortsFromTnsEntry, $record =~ /PORT=(\d+)/g; } close SOURCETNS;

      I do not want the domain name.. I just need the machine1-basement string in my array and hence the Regex. I just discovered the "/g" and will try it out and let you know.. Thanks !

        OK, so change:

        push @sourceDBHostsFromTnsEntry, $record =~ /HOST=(\S+-\S+)/g;

        To:

        push @sourceDBHostsFromTnsEntry, $record =~ /HOST=(\w+-\w+)/g;
Re: Am I doing Greedy Matching?
by cavac (Prior) on Nov 10, 2011 at 11:53 UTC

    As far as i can see, yes you are greedy-matching.

    Before i start: You don't seem to use strict; and use warnings;, neither the 3-argument version of open, nor checking if open actually worked. Filehandles should also be written as scalars ("$hello") - which you would have noticed when using strict.

    I'm a bit oldstyle, i would do it like this:

    my @sourceDBHostsFromTnsEntry; my @sourceDBPortsFromTnsEntry; open(my $SOURCETNS, "<", "/home/$User/Work/PROJ/$sourceTnsFileName") o +r die($!); while((my $record = <$SOURCETNS>)) { chomp ($record); my @parts = split /\ /, $record; foreach my $part (@parts) { if($part =~ /HOST=(.*)/o) { push @sourceDBHostsFromTnsEntry, $1; } if($part =~ /PORT=(.*)/o) { push @sourceDBPortsFromTnsEntry, $1; } } } close($SOURCETNS);

    If i wrote the program, i also would have decided to use shorter variable names (e.g. names that could be keyed in by remembering them instead of having to do copy&paste ;-)

    I didn't test this specific code and there are probably "nicer" ways to do it but it should get the job done (although there may be the odd typo).

    Explanation: In the first step, i read in a line and chop it up on the "space" delimeter into @parts. Then, foreach $part in @parts i match against the two tags "HOST=" and "PORT=". If it matches, i put the remainder into their respective arrays.

    Don't use '#ff0000':
    use Acme::AutoColor; my $redcolor = RED();
    All colors subject to change without notice.
      In the first step, i read in a line and chop it up on the "space" delimeter into @parts.

      You also chomp the line which isn't required because (.*) will not match a newline.

      And you are using the /o option which is superfluous because there are no variables in the pattern to interpolate.

        You also chomp the line which isn't required

        Quite right, it isn't required (and could hurt performance slightly on a big file). Over time, it has become part of my coding style to always use chomp when reading from a filehandle. More than once did i change the regex or some other part of the code and suddenly wondered where the newline came from. So i started using chomp unless required otherwise.

        And you are using the /o option which is superfluous because there are no variables in the pattern to interpolate.

        Again you are quite right. Does this actually hurt performance? I find it rather helpful when coding, even in static expressions, since i could just glance at the end and see "ok, this isn't a dynamic regex" instead of looking through the whole line noise for variables.

        But thats probably just me. You know, old dogs, new tricks...

        Don't use '#ff0000':
        use Acme::AutoColor; my $redcolor = RED();
        All colors subject to change without notice.
Re: Am I doing Greedy Matching?
by reisinge (Hermit) on Nov 10, 2011 at 12:45 UTC
    I would do it like this:
    #!/usr/bin/perl use strict; use warnings; my(@hosts, @ports); while (<DATA>) { push @hosts, $1 while /HOST=(\S*?)\./g; # nongreedy quantifier *? push @ports, $1 while /PORT=(\S*)/g; } print "Hosts: ", join(", ", sort @hosts), "\n"; print "Ports: ", join(", ", sort @ports), "\n"; __DATA__ HOST=machine1-basement.xyz.com PORT=1234 HOST=machine2-attic.xyz.com P +ORT=9999 HOST=machine3-garage.xyz.com PORT=5555

    Have a nice day, j