Re: regular expressions query
by shemp (Deacon) on Jun 30, 2004 at 17:54 UTC
|
In a regex, whitespace is \s
An 's' matches the literal character 's'. so one way to do it would be:
($thing1, $thing2) = ($1, $2) if /^\s{9}(\S+)\s{10}(\S+)/;
You need to include the things you're trying to capture, i.e. the (\S+)
\S means anything except whitespace.
BUT, this is much better suited to using split:\
($thing1, $thing2) = split;
Now using split without any args is a special case that splits $_ on /\s+/
You should look into how split works, i think your post the other day would have worked better with split also. | [reply] [d/l] [select] |
Re: regular expressions query
by Anonymous Monk on Jun 30, 2004 at 17:54 UTC
|
($meanH1, $meanH2) = /^\s{9}(.*?)\s{10}(.*?)$/;
# Or you may be able to generalize it a bit more with:
($meanH1, $meanH2) = /^\s*(.*?)\s*(.*?)\s*$/;
# Or, if the second option is true, you could event use:
($meanH1, $meanH2) = split;
# Which is a short hand version for
($meanH1, $meanH2) = split /\s+/, $_;
All of the above are rather basic examples of regex and are well documented in perlre (perldoc or perldoc.com)
Ted | [reply] [d/l] |
Re: regular expressions query
by hmerrill (Friar) on Jun 30, 2004 at 18:29 UTC
|
Like most things in Perl, there are usually many different ways to accomplish the same thing. Others have given good suggestions using regular expressions, split, etc. But I don't think anyone has mentioned unpack yet.
If your situation involves fixed length records where each field occupies the same columns on each record, then unpack will work for you.
The Perl Cookbook p.297 has recipe 8.15 titled "Reading Fixed-Length Records" which describes using unpack:
# $RECORDSIZE is the length of a record, in bytes.
# $TEMPLATE is teh unpack template for the record
# FILE is the file to read from
# @FIELDS is an array, one element per field
until ( eof(FILE) ) {
read(FILE, $record, $RECORDSIZE) == $RECORDSIZE
or die "short read\n";
@FIELDS = unpack($TEMPLATE, $record);
}
Now to relate that to your example (I'm on Windows XP):
#!perl -w
use strict;
my $record = " none lt2dpmnt";
print "\$record = [$record]\n";
my @FIELDS = unpack('a9a4a10a8', $record);
foreach (@FIELDS) {
print "field=[$_]\n";
}
Produces this output:
C:\DOCUME~1\hmerrill.000\TEST_P~1>test_unpack.pl
$record = [ none lt2dpmnt]
field=[ ]
field=[none]
field=[ ]
field=[lt2dpmnt]
Again, this only works if you know that every record is the same length, and each field in the record occupies the same columns. "perldoc -f pack" and "perldoc -f unpack" for more information.
HTH. | [reply] [d/l] [select] |
|
|
Greetings all,
Just an FYI you can use an 'x' in your unpack template to remove the spaces ('x'='A null byte.'), that is unless you want the spaces.
so
my @FIELDS = unpack('a9a4a10a8', $record);
becomes
my @FIELDS = unpack('x9a4x10a8', $record);
Given your example code above the output would be:
$record = [ none lt2dpmnt]
field=[none]
field=[lt2dpmnt]
-injunjoel
"I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo
| [reply] [d/l] [select] |
Re: regular expressions query
by sweetblood (Prior) on Jun 30, 2004 at 17:55 UTC
|
| [reply] |
|
|
This will not get rid of the leading whitespace on those lines (it returns a null field as the first field). But if you use ' ' instead it should work fine. That is:
my ($meanH1,$meanH2) = split ' ';
as per the documentation:
A split on /\s+/ is like a split(' ') except that any leading whitespace produces a null first field. -enlil | [reply] [d/l] |
Re: regular expressions query
by Enlil (Parson) on Jun 30, 2004 at 18:00 UTC
|
You might want to look over perlretut and perlre. In order to use the $1,$2,$3 ... variables you have to have a matching regular expression and you need capturing parens. Anyhow if all lines are in that format you can use: ($var1, $var2) = ($1,$2) if /(\S+)\s+(\S+)/;
if the lines are not the same throughout the file and you and you need to be more specific:($var1,$var2) = ($1,$2) if /^\s{9}(\S+)\s{10}(\S+)/;
-enlil | [reply] [d/l] [select] |
Re: regular expressions query
by davido (Cardinal) on Jun 30, 2004 at 18:02 UTC
|
my ( $meanH1, $meanH2 );
( $meanH1, $meanH2 ) = ( $1, $2 )
if $line =~ m/^\s{9}(\S+)\s{10}(\S+)/;
You're correct to be checking the success of your matching. I don't like solutions that skip past this important step.
The preceeding example will look for (and skip past) the leading nine whitespaces. It will then capture all contiguous non-whitespace. It will then look for and skip past the next ten whitespaces. It will then capture all remaining contiguous non-whitespace. If there's anything else on the line (like a trailing newline) it will be ignored.
| [reply] [d/l] |
Re: regular expressions query
by apocalyptica (Acolyte) on Jun 30, 2004 at 20:15 UTC
|
Hmmm... These are all excellent ideas, but none of them seem to be quite working for me. Another way I was thinking about doing this is to look at the end of the line before this one in the data file: each line before the one where I want to cull data from ends with the text "VALUES FOR". I tried this:
($meanH1, $meanH2) = ($1, $2) if VALUES FOR$\s+(\S+)\s+(\S+)/;
But it doesn't seem to work. From my understanding, the \s+ should also match for newline feeds in addition to whitespace, correct? Any suggestions? | [reply] [d/l] |
|
|
I just tried the following:
#!/usr/local/perl
$test = " foo bar";
($var1, $var2) = ($1, $2) if ($test =~ /\s+(\S+)\s+(\S+)/);
print "Var1: $var1\nVar2: $var2";
exit;
...and it grabbed the text out and printed fine, so I'm not sure what you mean when you say none of the suggestions are working for you, can you be more specific?
If you're ever lost and need directions, ask the guy on the motorcycle.
| [reply] [d/l] |
|
|
#!/usr/bin/perl
while ( <DATA>) {
($a,$b) = split ' ';
print "split '$a','$b'\n";
my ($a,$b) = $_ =~ /(\S+)\s+(\S+)/;
print "match '$a','$b'\n";
}
__DATA__
none bing
some bong
any bang
output:
split 'none','bing'
match 'none','bing'
split 'some','bong'
match 'some','bong'
split 'any','bang'
match 'any','bang'
| [reply] [d/l] [select] |
|
|
#!/usr/bin/perl
my $txt = '
none bing
some bong
any bang ';
while ( $txt =~ /^ {9}(\S+) {10}(\S+)\s*$/mg ) {
print "'$1' '$2'\n";
}
qq | [reply] [d/l] |
Re: regular expressions query
by ercparker (Hermit) on Jun 30, 2004 at 22:48 UTC
|
apocalyptica But it doesn't seem to work. From my understanding, the \s+ should also match for newline feeds in addition to whitespace, correct? Any suggestions?
regarding your question as to what \s will match
it will match whitespace including tabs, carriage returns, newlines and form feeds
| [reply] |
Re: regular expressions query
by rupesh (Hermit) on Jul 01, 2004 at 06:16 UTC
|
| [reply] [d/l] |