Your seemingly simple question actually brings up a number of fine points. When extracting tokens from a line, there are two basic ways: (1)split and (2)regex match global. The mantra is: "use split when you know what to throw away and use match global when you know what to keep". More in a moment...
To backtrack a bit, "\s" in Perl lingo means any space character: <FF><LF><CR><TAB><SPACE>. If you split upon "\s+", that will throw away any sequence of consecutive space characters. Your code splits upon a single space, not a potential sequence of spaces. I suspect that [:.,\s]+ would be closer to what you really want, albeit not what you actually want (make the suggested change in the code below and run it for yourself).
Note: As you see below, I used single quotes around the "@words". In my experience this is a better way to go rather than separating tokens with "-". Mileage varies.
In Perl you will see (a) split ' ',$line and (b) split /\s+/,$line. That ' ', like many things in Perl is a short-cut that essentially means "do a split on /\s+/, but throw away blank spaces at the beginning of the line. That does not mean to split upon a single character of a literal ' '. Splitting lines upon spaces is the most common form of split and Perl is optimized for that.
In this particular case, I decided to use 'match global' instead of 'split'. This avoids the problem of having to get rid of leading spaces after the split.
Many of the files that I process have the possibility of a user interaction that may add one or more blank lines at the end of file. So I almost always skip lines "which have no data". Here is my code. Play with it. Break it. See what changes are necessary for your specific application.
My textual description above may have some errors in it. This is tricky stuff. Run this code and see what it does.
use strict; use warnings; while (my $line = <DATA>) { (my @words) = $line =~ /([^:.,\s]+)/g; # (my @words) = split /[:.,\s]+/, $line; #TRY THIS LINE INSTEAD next unless @words; # skip input lines that have no "words" print "\'$_\' " foreach @words; print "\n"; } =prints: Note: that the first data line with only ':' is skipped. 'this' 'is' 'a' 'simple' 'space' 'separated' 'line' 'this' 'is' 'a' 'line' 'with' 'spaces' 'at' 'the' 'beginning' 'this' 'line' 'has' 'multiple' 'spaces' 'embedded' 'in' 'it' 'a' 'comma' 'list' 'a' 'b' 'unconsidered' 'are' '(1)' 'item' 'lists' 'or' '(comments' 'like' 'thi +s)' '$this_is_a_program_variable' 'this' 'shows' '"a' 'quote"' =cut __DATA__ : this is a simple space separated line this is a line with spaces at the beginning this line has multiple spaces embedded in it a comma: list,a,b unconsidered are: (1) item lists or (comments like this) $this_is_a_program_variable this shows "a quote"
In reply to Re: Can read one txt file and not another?
by Marshall
in thread Can read one txt file and not another?
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |