Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a problem -- I have a text file like this:

This is a line.
Another line is here.
  Here's a line.

Line five is this one.

What I need to do is get the first column of each of these (columns being ended by spaces), so that my output looks like this:

This
Another


Line

As you can see, if the line begins with spaces or is empty, I just want to print out a blank line so I can tell that there was something there. I would do this on my own, but this is just a little part of a much bigger project of mine, and I'm still just learning Perl and I'm really bad with regular expressions. Any help would be much appreciated (and I promise to learn from your answers before I return!). Thanks!

Replies are listed 'Best First'.
Re: Grabbing first column of text
by japhy (Canon) on Aug 02, 2001 at 20:15 UTC
    You simply want to use ($first) = $line =~ /(\S*)/;

    _____________________________________________________
    Jeff japhy Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      You simply want to use ($first) = $line =~ /(\S*)/;

      I believe that the original poster also wanted to capture a series of blank spaces, if that was what occupied the first element of a line.

      \S matches non-whitespace characters only. . .

      . . . the split solutions seem much more well-suited.

      --twerq

        No, if you'll read the entire post, you'll notice that I said "... if the line begins with spaces or is empty, I just want to print out a blank line..." Anyway, the first two examples here do just what I'm looking for. Thanks, guys!

        No, the original post says:
        As you can see, if the line begins with spaces or is empty, I just want to print out a blank line so I can tell that there was something there.
        He either wants the column of text, or nothing. That is what I give him.

        _____________________________________________________
        Jeff japhy Pinyan: Perl, regex, and perl hacker.
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Grabbing first column of text
by CheeseLord (Deacon) on Aug 02, 2001 at 20:13 UTC

    Try one of these one-liners (they have the same output, but I thought the second might be a little easier to understand):

    perl -ple '($_) = /^\S+/g' filename perl -ple '$_ = (/^\S+/g)[0]' filename

    Basically, it reads a line, then sets it to the first group of non-whitespace stuff at the beginning of the line, and then prints the changed line out. Hope this helps!

    His Royal Cheeziness

Re: Grabbing first column of text
by tachyon (Chancellor) on Aug 02, 2001 at 20:37 UTC

    Here is a nice simple example. We read from the DATA file handle but it could just as well be a file handle you open with an open FILE, "<file.txt" or die "Oops $!\n"

    while (<DATA>) { ($first) = $_ =~ m/^(\S+)/; print "$first\n"; } __DATA__ This is a line. Another line is here. Here's a line. Line five is this one.

    The while iterates over the filehandle assigning each line to $_. The next line is a standard perl idiom to capture a regex match. We match all non whitespace at the begining of the each line. We specify the beginning with the ^ and the non whitespace with the \S the + after \S specifies 1 or more of (and as much as possible)

    You can also do this with split. Split returns an array of values so (split/\s/,$_)[0] is the first value when we split $_ on whitespace. Note that in the example I don't bother to specify $_ as this is the default string that split works on.

    while (<DATA>) { $first = (split/\s/)[0]; print "$first\n"; }

    Getting completely carried away you can also do it using substr and index. Index gets you the postion of the first space and substr gets you the string.

    while (<DATA>) { $first = substr $_, 0, index $_, " "; print "$first\n"; }

    Hope this helps.

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Grabbing first column of text
by Hofmator (Curate) on Aug 02, 2001 at 20:47 UTC

    Your question has already been answered. I just want to give you some hints for the more general case (n-th column or you need more than one column):

    • Don't try to use a regex there, use split:
      my @columns = split /\s*/, $line; # or my $col3 = (split /\s*/, $line)[2]
    • Depending on what you would like to get, split on / /, /\s*/ or ' '. Or whatever your column delimter might be.

    -- Hofmator