burningredmoon has asked for the wisdom of the Perl Monks concerning the following question:

I'm taking a line from a file, and the information taken from the file is currently stored in an array. Just one line is an element.

Now, I would like to take a line, and separate two things with a comma from a file. One being one variable and the other, another variable. Kind of like so:

DomainName, IPAddress DomainName2, IPAddress2

If I'm storing this information in a file that is being read, what is the best way to separate them so I may match IPs I capture in my program to the IPAddresses in my file and then print the DomainName (from my file) that goes with said IP?

I've been told split is a great way to do this but I'm not sure how and just looking at the generic way to use it from Google results isn't going so well for me. Thank you for reading in advance!

Replies are listed 'Best First'.
Re: To Split or Not to Split
by FunkyMonk (Bishop) on Apr 24, 2011 at 23:31 UTC
    Your question is so vague I don't know if this will help you or not
    Use split when you know what to throw away and a capturing regex when you know what to keep
    I would love to attribute that quote to someone, but I've known it for so long I forgot who wrote it :(

    update:

    Following, tchrist's and Corion's comments I googled and I got the quote wrong, so I've added a word (s/regex/capturing regex/).

    I also found this link where Dominus attributes the quote to merlyn.

    I agree with all of tchrist's comments. Perhaps I should also add that it's a rule of thumb and not the law. It certainly helped me choose between using splits and captures when I was a Perl noob.

      “Use split when you know what to throw away and a regex when you know what to keep.”

      Except that that doesn’t really work, considering that:

      1. Because split takes a regex as its own argument, you cannot oppose “split” with “regex”: wherever you have split, there too do you also have a regex. It’s like some sort of logic error.

      2. Sometimes split does not necessarily throw (all of) what it matches away.
        $str = "this here and that or those there and his or hers nor thee"; @words = split /\h*\b(and|nor|or)\b\h*/, $str;
      3. A regex doesn’t always return (all of) what it matches.
        $str = "fee=1 fie=2 foe=3 fum=4"; my %settings = $str =~ /\b(\w+)=(\S*)/g;
        and also, in a completely different way:
        % perl -pe 's//IoException/ if ?import java\.io\.\KFile?'

      So I would be careful with passing along that particular phrase. It’s catchy, but it isn’t really all that correct. Perhaps it was originally said about some other language than Perl, since it doesn’t seem to make sense for Perl when looked at closely.

      At most I might point out that m//g and split are often used in complementary senses, with one looking for the parts you’re interested in and the other for the parts you’re not. Even so, in many contexts I’d hasten to add that while they might kinda work that way in common cases, that’s much too simplistic a description for what you can — and quite often do — do with both of these constructs in Perl.

      Let’s just say that the refrain presents a rather simplified version of reality. :)

        Maybe "match" is a better term than "regex", but still, I also found that most of the time when I despaired trying to make split do my bidding, a match would easily collect the information I wanted (and knew how) to keep.

      lol. It's okay but thanks for replying!
Re: To Split or Not to Split
by Anonymous Monk on Apr 24, 2011 at 22:02 UTC
      Thank you for replying! I assume that I would be needing to use this part in the link you sent me:
      open(PASSWD, '/etc/passwd'); while (<PASSWD>) { chomp; ($login, $passwd, $uid, $gid,$gcos, $home, $shell) = split(/:/); #... }

      What exactly is the first line doing? The while is reading through the file at the directory /etc/passwd ?

      For my purposes would I write

      open READINGFILE, "data.txt" or die $!; open LOGFILE, ">>logfile.txt" or die $!; while (<READINGFILE>) { chomp; ($domain, $IP) = split(/,/); if ($ip_obj->{src_ip} eq $IP){ print LOGFILE $domain; print LOGFILE " has been found"; } }

      Right now I'm getting errors:

      Global symbol "$domain" requires explicit package name at test.pl line + 113. Global symbol "$IP" requires explicit package name at test.pl line 113 +. Global symbol "$IP" requires explicit package name at test.pl line 114 +. Global symbol "$domain" requires explicit package name at test.pl line + 115. BEGIN not safe after errors--compilation aborted at test.pl line 235 ( +#1) (F) You've said "use strict" or "use strict vars", which indicates that all variables must either be lexically scoped (using "my" or +"state"), declared beforehand using "our", or explicitly qualified to say which package the global variable is in (using "::"). Uncaught exception from user code: Global symbol "$domain" requires explicit package name at test +.pl line 1 13. Global symbol "$IP" requires explicit package name at test.pl line 113 +. Global symbol "$IP" requires explicit package name at test.pl line 114 +. Global symbol "$domain" requires explicit package name at test.pl line + 115. BEGIN not safe after errors--compilation aborted at test.pl line 235. at test.pl line 235

      What am I missing?

      And yes I have several files at various steps in the process to make sure I don't take steps backwards, thanks for the advice though and the link is a good reference but I need it a little more... dumbed down for me to understand. lol.

        Here's the list of tips

        • 1) Lexical file handles: my $fh
        • 2) 3 parameter form of open
        • 3) Validate your data file to make sure each line has data.
        • 4) Declare your variables inside the loop with my
        open my $data_fh, 'data.txt' or die $!; open my $log_fh, '>>', 'logfile.txt' or die $!; while (<$data_fh>) { chomp; next if /^\s*$/; my ($domain, $IP) = split ','; if ($ip_obj->{src_ip} eq $IP){ print $log_fh "$domain has been found"; } }

        It is opening a file called 'password' (no extension)

        "'X' requires explicit package name" means you haven't declared those variables (or you typoed them). That's what the text between the "(F)" and the "uncaught exception" is basically saying.

        Use: my ($domain, $ip) = split(/,/); to declare them while you assign the values.

        Note: the test.pl isn't for backups. It is for trying out new things to see what they will do. Use it to play around with split or regexes or whatever you're not sure about. Its all about having a tiny, clean sandbox to try things out in, where you know there is no other code to confuse you.