Help for awk/regex/newbie

cmenser has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Help for awk/regex/newbie by merlyn (Sage) on Aug 14, 2000 at 15:08 UTC
To do it literally as `awk` does it, use: `@tokens = split " ", $line;` [download] That enables "awk emulation mode", causing leading whitespace to be ignored. Without that, leading whitespace generates an empty first element, and the first non-whitespace stuff being the second element. -- Randal L. Schwartz, Perl hacker	[reply] [d/l]
RE: Help for awk/regex/newbie by Yaakov (Novice) on Aug 14, 2000 at 18:03 UTC
As you use pipes on the command line, I guess you will like the command line switches -e, -p and -n: They allow you to write a short perl program on the commandline, right into your pipes. Here are three short solutions for your question. They differ in that the first one prints a final space at the end of the results while the other solutions don't: `xcommand\|perl -wne '/(\S+)/ and print "$1 "' xcommand\|perl -we 'print join " ", map{/(\S+)/; $1} <>' xcommand\|perl -we 'print join " ", map{(split" ", $_)[0]}<>'` [download] Let's explain the tools we used in these three solutions: The -w switch does the bulk of the work: It tells me when I did something wrong. The -n switch in the first example builds a loop around our main program to process the input line by line: `while(<>){ ... the code goes here ... }`. The -e switch reads the next command line argument and executes it as the perl program. In the first solution, the program simply says `/(\S+)/;`: "Search {//} for at one or more {+} non-white-space characters {\S} and remember them all {()}". `print "$1 "`: print what you have remembered and a space. You remember, this is done for every line of the input. The second solution does almost the same thing. Instead of the -n switch, we use `map` the list of all input lines `<>` and join the results by spaces. Thus, we do not print an extra space after the last field. The third solution uses the `split` function instead of a regular expression. Note: In a Dos-Window (UUUHHHH-OOOOHHH-EEEEEKS), the examples will not work as given because there the "shell" messes the quotation marks up. You have to use ouble quotes (") around your code (and you can't use them inside)!	[reply] [d/l] [select]
RE: Help for awk/regex/newbie by t0mas (Priest) on Aug 14, 2000 at 15:12 UTC
split(' ') can be used to emulate awk's default behavior (I think..) You can try something like: `my @cleanarray; open(PH,"xcommand\|") or die "Can't open xcommand: $!"; while (<PH>) {push @cleanarray, (split(' '))[0];} close (PH);` [download] /brother t0mas	[reply] [d/l]
Re: Help for awk/regex/newbie by ColtsFoot (Chaplain) on Aug 14, 2000 at 14:51 UTC
The following snippet will print out the first token of each line in the file try.asc The first parameter to split contains the token seperators, in this case just space ASCII 040 `$file = "try.asc"; open(MYFILE, $file) or die qq(Cannot open $file\n); $line = <MYFILE>; while ($line ne "") { @tokens = split / /, $line; print qq($tokens[0]\n); $line = <MYFILE> }` [download] Hope this helps	[reply] [d/l]
RE: Re: Help for awk/regex/newbie by cmenser (Initiate) on Aug 14, 2000 at 15:10 UTC
first off thank you for your quick response, but this is what I am currently using. Is there a method of pulling out the matching string from a regex???	[reply]
RE: RE: Re: Help for awk/regex/newbie by ZZamboni (Curate) on Aug 14, 2000 at 17:49 UTC
You can use the special variable $& to get the whole string that matched. Like this: `if (/^\S+/) { print "$&\n"; }` [download] Or you can put parenthesis around the part that you want to extract, and reference them by $1, $2, etc. Like this: `if (/^(\S+)\s+/) { print "$1\n"; }` [download] Both of the cases above extract any non-whitespace characters at the beginning of the line. Although if that's all you want to do, you are probably better off using split as suggested by others in this thread. It's simpler and probably more efficient. Use regular expressions if you only want to match certain lines that satisfy certain conditions. So to strictly emulate the shell/awk command line you gave, you can use this: `xcommand \| perl -nae 'print $F[0],"\n"'` [download] The -a flag causes it to automatically split each line into whitespace-separated fields, leaving the result in the @F array. The -n option puts a `while(<>) { ... }` loop around the code. The code itself is specified by the -e option. From you question, it seems as if you want to provide the command to execute as input to the program. In that case you could do something like this: `my $command="xcommand"; open(CMD, "$command \|") or die "Error: $!\n"; while(<CMD>) { print (split(" "))[0]."\n"; # or whatever else } close(CMD);` [download] --ZZamboni	[reply] [d/l] [select]
RE: RE: Re: Help for awk/regex/newbie by DrManhattan (Chaplain) on Aug 14, 2000 at 17:33 UTC
`#!/usr/bin/perl -w use strict; my @cleanarray = <>; foreach (@cleanarray) { print "$1\n" if m/(\S+)/; }` [download] Update: Fixed the loop to only print if the match succeeds, per merlyn Now that I think about it though, a mock awk really ought to look like this: `#!/usr/bin/perl -w use strict; my @cleanarray = <>; foreach (@cleanarray) { # The regex always matches so there's no need # for a conditional. m/(\S*)/; print "$1\n"; }` [download] That matches awk's behavior on blank lines better than my first stab. -Matt	[reply] [d/l] [select]
RE: RE: RE: Re: Help for awk/regex/newbie by merlyn (Sage) on Aug 14, 2000 at 17:38 UTC
Re: Help for awk/regex/newbie by lindex (Friar) on Aug 14, 2000 at 18:05 UTC
Just my 2 cents (ignore if you like), but this is how I did it. `#!/usr/bin/perl -wn use strict; print((split(' '))[0],"\n") unless($_ =~ /^\s+$/);` [download] lindex `/**************************/ jason@gost.net, wh@ckz.org http://jason.gost.net /***************************/` [download]	[reply] [d/l] [select]