krujos has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to feed a sting in that is read from <> I want the values to take on the values come in but I cant figure out what I am doing wrong here. They are always blank. This is a homework assignment, and I am not asking for someone to do it for me but rather make a suggestion or tell me if I am using regular expressions totally wrong. Thanks
while (defined(<>)) { chomp $_; ($name,$status,$class,$major,$po,$phone,$advisor,$email)= / +(\w+).*(\w+)\W\W*(\w+)\W\W*(\w+)\W\W*(\w+)\W\W*(\w+)\W\W*(\w+)\W\W+(\ +w+)\W+/; print "$name\n"; write OUTPUT; } close (OUTPUT) || die "cant close $!";
this is my input file well part of it. It all appears on one line in the file.And yes the person in the file is me and I am not putting some poor guys deal online....

Kruck, Joshua David Registered JR Psychology 1065 555-555-5555 Johnson, Andy Jay joshua-kruck@XXYZ.edu

Replies are listed 'Best First'.
Re: How do i get the variables to actually get in here....
by wog (Curate) on Nov 07, 2001 at 02:40 UTC
    At least part of your problem seems to be with while (defined(<>)). The <> takes a line from the filehandle, and then you test it for being defined, and (continue to) loop if it is. Unfortunatly, unlike using while(<>) it doesn't actually assign the value into $_. To do that you can use while (defined($_=<>)), but that's exactly equivilent to while(<>).

    I would advise that you look at using split, instead of regex for this task. It looks like it probably would be better suited for this data.

      Split would be nice and easy but alas my prof has said its "off limits" for this project
Re: How do i get the variables to actually get in here....
by dragonchild (Archbishop) on Nov 07, 2001 at 02:34 UTC
    .* sucks. Don't use it unless you have to. Better would be to do something like:
    my @colNames = qw(name status class major po phone advisor email); while (<>) { chomp; my %hash; @hash(@colNames) = /^(\w+, \w+ (?:\w+)?)\s*(\w+)\s* .../; print "$hash{name}\n"; print OUTPUT "$hash{name}\n"; } close OUTPUT || die "Cannot close OUTPUT: $!\n";
    I didn't complete the regex ... that's left as an exercise for the reader. :-)

    Update: Fixed while (defined <>) as per wog's comment.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Re: How do i get the variables to actually get in here....
by runrig (Abbot) on Nov 07, 2001 at 02:33 UTC
    Your 'name' section is only '\w+' when the name in your data contains more than just '\w' characters. The regex contains '\W\W*' which is equivalent to '\W+', and if you're looking for whitespace between fields you're better off using '\s+' anyway. In fact, you'd do well to use better matching parameters on all of the fields, you're capturing all your fields with '\w+', when, e.g., the phone number contains more than just '\w' characters ('-' for instance), and won't contain any alphabetic '\w' characters.

    See perldoc perlre for explanations of what '\w' and '\s' are and hints on what you ought to be using instead.

variable number of fields
by pike (Monk) on Nov 07, 2001 at 13:58 UTC
    Actually, part of the problem seems to be your input data. Both 'name' and 'advisor' can consist of more than one name according to your example. Which raises the question: are the names always of the form 'lastname, firstname middlename'? What if a person has no middle name or more than one? In any case I don't see that your regex covers that.

    Since this is a homework assignment, you probably can't change the input data (which is what I would normally do - use a separator like ':' to delimit the fields), so the way to go is probably:

    - split at blanks (use  @a = /([\w-,@]+\s+)/)

    - check which fields terminate in ',' - these are the last names of the person and his advisor

    - check how many fields are between the person's last name and the advisor's last name and how many after the advisor's last name to find out how many first / middle names were given in each case

    Hope this helps you,

    pike

Re: How do i get the variables to actually get in here....
by krujos (Curate) on Nov 07, 2001 at 02:30 UTC
    there is more than one space between the fields in the input file. sorry about that..
Re: How do i get the variables to actually get in here....
by blackmateria (Chaplain) on Nov 07, 2001 at 02:40 UTC
    You're using = instead of =~ on the line with the regex. That alone will make the script not work. Look in perlop and perlre for more information on =~ and regexes, if you're interested.

    That regex looks awfully complicated to me. I think you should look into using split instead (in perlfunc). I can't say for sure whether this will work without seeing your input data, but it seems pretty likely given that all your regex contains is permutations of \W and \w.

    One additional point, don't use while (defined(<>)). <> is always defined until end-of-file, and I don't think that form assigns the result of <> to $_. Use while (<>) instead.

    Update: turns out = will do the right thing in this case, at least according to dragonchild (see next post). Oh well, it was probably the while (defined(<>)) bit anyway...

      Actually, he wants to be using =, not =~. =~ says to use the variable on the left as the thing to perform the regex on. If there is no source, $_ =~ is automatically put in at the left of the regex. As he wants to assign the results of the regex, performed on $_, to the lvalues, = is correct.

      This is actually a very common thing to do when parsing something that split doesn't work nicely on. Again, that's his example data. So, = and a regex are exactly the right calls here. (I spent 3 hours Friday working with similar data that was almost fixed-delimited, but wasn't, so I had to move away from unpack to a regex and it worked perfectly. Yes, they're complicated, but they work.)

      (I didn't realize this at the time, but this is my 500th node!)

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.