comment on

First, thank you for leaving your original post alone once you figured out that: my $in=<STDIN>; was the problem. It is ok to update your post noting the change, perhaps: #my $in=<STDIN>; #UPDATE: removing this line solved problem.

Your seemingly simple question actually brings up a number of fine points. When extracting tokens from a line, there are two basic ways: (1)split and (2)regex match global. The mantra is: "use split when you know what to throw away and use match global when you know what to keep". More in a moment...

To backtrack a bit, "\s" in Perl lingo means any space character: <FF><LF><CR><TAB><SPACE>. If you split upon "\s+", that will throw away any sequence of consecutive space characters. Your code splits upon a single space, not a potential sequence of spaces. I suspect that [:.,\s]+ would be closer to what you really want, albeit not what you actually want (make the suggested change in the code below and run it for yourself).

Note: As you see below, I used single quotes around the "@words". In my experience this is a better way to go rather than separating tokens with "-". Mileage varies.

In Perl you will see (a) split ' ',$line and (b) split /\s+/,$line. That ' ', like many things in Perl is a short-cut that essentially means "do a split on /\s+/, but throw away blank spaces at the beginning of the line. That does not mean to split upon a single character of a literal ' '. Splitting lines upon spaces is the most common form of split and Perl is optimized for that.

In this particular case, I decided to use 'match global' instead of 'split'. This avoids the problem of having to get rid of leading spaces after the split.

Many of the files that I process have the possibility of a user interaction that may add one or more blank lines at the end of file. So I almost always skip lines "which have no data". Here is my code. Play with it. Break it. See what changes are necessary for your specific application.

My textual description above may have some errors in it. This is tricky stuff. Run this code and see what it does.

use strict;
use warnings;

while (my $line = <DATA>)
{      
   (my @words) = $line =~ /([^:.,\s]+)/g;
   # (my @words) = split /[:.,\s]+/, $line; #TRY THIS LINE INSTEAD
   next unless @words; # skip input lines that have no "words"
   
   print "\'$_\' " foreach @words;
   print "\n";
   
}

=prints:
Note: that the first data line with only ':' is skipped.
'this' 'is' 'a' 'simple' 'space' 'separated' 'line' 
'this' 'is' 'a' 'line' 'with' 'spaces' 'at' 'the' 'beginning' 
'this' 'line' 'has' 'multiple' 'spaces' 'embedded' 'in' 'it' 
'a' 'comma' 'list' 'a' 'b' 
'unconsidered' 'are' '(1)' 'item' 'lists' 'or' '(comments' 'like' 'thi
+s)' 
'$this_is_a_program_variable' 
'this' 'shows' '"a' 'quote"' 
=cut


__DATA__
:
this is a simple space separated line
     this is a line with spaces at the beginning
  this line has    multiple spaces embedded in     it
a comma: list,a,b
unconsidered are: (1) item lists  or (comments like this)
  $this_is_a_program_variable
this shows "a quote"
[download]

In reply to Re: Can read one txt file and not another? by Marshall
in thread Can read one txt file and not another? by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.