Put another way, you aren't sure about how to split the line into fields? Once you have the line split up into fields, you just extract the second element of the array, i.e. $aFields[1] and see if it is in the hash.

How you split up the lines depends heavily on the syntax/grammar of your file. Do you know what that is? You seem a bit uncertain. Are these "fields" from a row of a database or a generated report? Does the file have a documented format? Or are these actually just words in a line?

If these are just words, you could safely split the line on whitespace, like this

$line =~ s/^\s+//; # strip leading whitespace from the line $line =~ s/\s+\z//; # strip trailing whitespace from the line my @aFields = split(/\s+/,$line); #extract words

However, the above won't work if you have whitespace inside a field because it will split the field in two. If on some line of the file, the first column contains three words, then column 2 would end up in the fourth array element and you'd never know.

What are the rules that actually govern the organization of this file into rows and fields? To know whether or not you need to use regex's you first need to know the file's rules. Regex's aren't always the best solution. thezip has pointed out that unpack would be a better way to split the line into fields if you are dealing with fixed width columns. (each column/field has known-in-advance number of characters.).

On the other hand, if fields are separated from one another using a separator string or character, regex's are often good for splitting such lines into fields. However, you won't know what regex to use without knowing the format. Rules for separator delimited fields can be very simple or complex.

A simple rule might be "a tab always means column separator". If that was your rule, you could just use split("\t",$line) to break up the line into fields.

Or it could be more complicated - columns are separated by whitespace except where the whitespace is quoted or escaped. Or it could be even more complicated: the first character of each line determines the field separator for the rest of the line, plus there is an escaping/quoting mechanism. The possibilities are endless. It would be hard to advise you without knowing the intended rules of the file.

Update: various clarifications and rewordings.


In reply to Re^3: comparing columns using regular expression by ELISHEVA
in thread comparing columns using regular expression by rocky13

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.