regular expressions query

apocalyptica has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: regular expressions query by shemp (Deacon) on Jun 30, 2004 at 17:54 UTC
In a regex, whitespace is \s An 's' matches the literal character 's'. so one way to do it would be: `($thing1, $thing2) = ($1, $2) if /^\s{9}(\S+)\s{10}(\S+)/;` [download] You need to include the things you're trying to capture, i.e. the (\S+) \S means anything except whitespace. BUT, this is much better suited to using split:\ `($thing1, $thing2) = split;` [download] Now using split without any args is a special case that splits $_ on /\s+/ You should look into how split works, i think your post the other day would have worked better with split also.	[reply] [d/l] [select]
Re: regular expressions query by Anonymous Monk on Jun 30, 2004 at 17:54 UTC
Let us assume that this string is in $_ `($meanH1, $meanH2) = /^\s{9}(.?)\s{10}(.?)$/; # Or you may be able to generalize it a bit more with: ($meanH1, $meanH2) = /^\s(.?)\s(.?)\s*$/; # Or, if the second option is true, you could event use: ($meanH1, $meanH2) = split; # Which is a short hand version for ($meanH1, $meanH2) = split /\s+/, $_;` [download] All of the above are rather basic examples of regex and are well documented in perlre (perldoc or perldoc.com) Ted	[reply] [d/l]
Re: regular expressions query by hmerrill (Friar) on Jun 30, 2004 at 18:29 UTC
Like most things in Perl, there are usually many different ways to accomplish the same thing. Others have given good suggestions using regular expressions, split, etc. But I don't think anyone has mentioned unpack yet. If your situation involves fixed length records where each field occupies the same columns on each record, then unpack will work for you. The Perl Cookbook p.297 has recipe 8.15 titled "Reading Fixed-Length Records" which describes using unpack: `# $RECORDSIZE is the length of a record, in bytes. # $TEMPLATE is teh unpack template for the record # FILE is the file to read from # @FIELDS is an array, one element per field until ( eof(FILE) ) { read(FILE, $record, $RECORDSIZE) == $RECORDSIZE or die "short read\n"; @FIELDS = unpack($TEMPLATE, $record); }` [download] Now to relate that to your example (I'm on Windows XP): `#!perl -w use strict; my $record = " none lt2dpmnt"; print "\$record = [$record]\n"; my @FIELDS = unpack('a9a4a10a8', $record); foreach (@FIELDS) { print "field=[$_]\n"; }` [download] Produces this output: `C:\DOCUME~1\hmerrill.000\TEST_P~1>test_unpack.pl $record = [ none lt2dpmnt] field=[ ] field=[none] field=[ ] field=[lt2dpmnt]` [download] Again, this only works if you know that every record is the same length, and each field in the record occupies the same columns. "perldoc -f pack" and "perldoc -f unpack" for more information. HTH.	[reply] [d/l] [select]
Re^2: regular expressions query by injunjoel (Priest) on Jul 02, 2004 at 18:06 UTC
Greetings all, Just an FYI you can use an 'x' in your unpack template to remove the spaces ('x'='A null byte.'), that is unless you want the spaces. so `my @FIELDS = unpack('a9a4a10a8', $record);` [download] becomes `my @FIELDS = unpack('x9a4x10a8', $record);` [download] Given your example code above the output would be: `$record = [ none lt2dpmnt] field=[none] field=[lt2dpmnt]` [download] -injunjoel "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo	[reply] [d/l] [select]
Re: regular expressions query by sweetblood (Prior) on Jun 30, 2004 at 17:55 UTC
my ($meanH1, $meanH2) = split /\s+/ check out perldoc -f split HTH Sweetblood	[reply]
Re^2: regular expressions query by Enlil (Parson) on Jun 30, 2004 at 18:05 UTC
This will not get rid of the leading whitespace on those lines (it returns a null field as the first field). But if you use ' ' instead it should work fine. That is: `my ($meanH1,$meanH2) = split ' ';` [download] as per the documentation: A split on /\s+/ is like a split(' ') except that any leading whitespace produces a null first field. -enlil	[reply] [d/l]
Re: regular expressions query by Enlil (Parson) on Jun 30, 2004 at 18:00 UTC
You might want to look over perlretut and perlre. In order to use the $1,$2,$3 ... variables you have to have a matching regular expression and you need capturing parens. Anyhow if all lines are in that format you can use: `($var1, $var2) = ($1,$2) if /(\S+)\s+(\S+)/;` [download] if the lines are not the same throughout the file and you and you need to be more specific: `($var1,$var2) = ($1,$2) if /^\s{9}(\S+)\s{10}(\S+)/;` [download] -enlil	[reply] [d/l] [select]
Re: regular expressions query by davido (Cardinal) on Jun 30, 2004 at 18:02 UTC
`my ( $meanH1, $meanH2 ); ( $meanH1, $meanH2 ) = ( $1, $2 ) if $line =~ m/^\s{9}(\S+)\s{10}(\S+)/;` [download] You're correct to be checking the success of your matching. I don't like solutions that skip past this important step. The preceeding example will look for (and skip past) the leading nine whitespaces. It will then capture all contiguous non-whitespace. It will then look for and skip past the next ten whitespaces. It will then capture all remaining contiguous non-whitespace. If there's anything else on the line (like a trailing newline) it will be ignored. Dave	[reply] [d/l]
Re: regular expressions query by apocalyptica (Acolyte) on Jun 30, 2004 at 20:15 UTC
Hmmm... These are all excellent ideas, but none of them seem to be quite working for me. Another way I was thinking about doing this is to look at the end of the line before this one in the data file: each line before the one where I want to cull data from ends with the text "VALUES FOR". I tried this: `($meanH1, $meanH2) = ($1, $2) if VALUES FOR$\s+(\S+)\s+(\S+)/;` [download] But it doesn't seem to work. From my understanding, the \s+ should also match for newline feeds in addition to whitespace, correct? Any suggestions?	[reply] [d/l]
Re^2: regular expressions query by heroin_bob (Sexton) on Jun 30, 2004 at 21:12 UTC
I just tried the following: `#!/usr/local/perl $test = " foo bar"; ($var1, $var2) = ($1, $2) if ($test =~ /\s+(\S+)\s+(\S+)/); print "Var1: $var1\nVar2: $var2"; exit;` [download] ...and it grabbed the text out and printed fine, so I'm not sure what you mean when you say none of the suggestions are working for you, can you be more specific? If you're ever lost and need directions, ask the guy on the motorcycle.	[reply] [d/l]
Re^2: regular expressions query by qq (Hermit) on Jun 30, 2004 at 23:24 UTC
Post your code and a couple of lines of data. The examples given _do_ work. Perhaps we are missing a part of the problem? `#!/usr/bin/perl while ( <DATA>) { ($a,$b) = split ' '; print "split '$a','$b'\n"; my ($a,$b) = $_ =~ /(\S+)\s+(\S+)/; print "match '$a','$b'\n"; } __DATA__ none bing some bong any bang` [download] output: `split 'none','bing' match 'none','bing' split 'some','bong' match 'some','bong' split 'any','bang' match 'any','bang'` [download]	[reply] [d/l] [select]
Re^2: regular expressions query by qq (Hermit) on Jun 30, 2004 at 23:35 UTC
If you are matching across newlines, are you not reading line by line? If the data is in one big string, perhaps you want something more like: `#!/usr/bin/perl my $txt = ' none bing some bong any bang '; while ( $txt =~ /^ {9}(\S+) {10}(\S+)\s*$/mg ) { print "'$1' '$2'\n"; }` [download] qq	[reply] [d/l]
Re: regular expressions query by ercparker (Hermit) on Jun 30, 2004 at 22:48 UTC
apocalyptica But it doesn't seem to work. From my understanding, the \s+ should also match for newline feeds in addition to whitespace, correct? Any suggestions? regarding your question as to what \s will match it will match whitespace including tabs, carriage returns, newlines and form feeds	[reply]
Re: regular expressions query by rupesh (Hermit) on Jul 01, 2004 at 06:16 UTC
chomp?	[reply] [d/l]