da97mld has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I need some help with regex. If I have a file which contains ceartain lines starting with "[[106535 abc]]" and I want a regex that only gives me those lines which begins with for example "[[106535 abc]]". The numbers can be any number. And the character can be any character including words. How do I do this?

regards, Mathias

Edit by castaway - added html/code tags

Replies are listed 'Best First'.
Re: regular expressions help
by EvdB (Deacon) on Nov 06, 2003 at 09:28 UTC
    The following will come in handy for you in the regex:
    \d - a digit \w - a word character \s - whitespace.
    This will allow you to put together things such as:
    foreach my $line ( @lines ) { next unless $line =~ m/^\d{6} \w{3}/; # Will match '111111 aaa' where 1 is any # number, a any char. next unless $line =~ m/^\d+\s+\w+/; # Will match any number followed by any word. # The '^' anchors the search to the beginning # of the string. }

    --tidiness is the memory loss of environmental mnemonics

        Is that supposed to be one or two square brackets? In any case, you can look for square brackets in a regex by escaping them using a '\' like: \[.

        Now you should be able to build it, have a go..

        C.

        /[[]/ matches a single [, and /[]]/ matches a single ].

        Abigail

Re: regular expressions help
by Art_XIV (Hermit) on Nov 06, 2003 at 16:05 UTC

    Here's a little script that might illustrate some of concepts presented by the other monks:

    use strict; my @data = <DATA>; print "1:\n"; #1: print lines that contain something like '[[106535 abc]]' foreach my $line (@data) { print $line if $line =~ /\[\[\d+ \w+\]\]/; } print '-' x 20, "\n2:\n"; #2: print lines that start with something like '[[106535 abc]]', but #with an indeterminate number of brackets foreach my $line (@data) { print $line if $line =~ /^\[+\d+ \w+\]+/; } print '-' x 20, "\n3:\n"; #3: print lines that start with something like '[[106535 abc]]' foreach my $line (@data) { print $line if $line =~ /^\[\[\d+ \w+\]\]/; } print '-' x 20, "\n4:\n"; #4: print lines that start with two brackets, six digits, #a space, three alphanumerics and two brackes foreach my $line (@data) { print $line if $line =~ /^\[\[\d{6} \w{3}\]\]/; } print '-' x 20, "\n5:\n"; #5: print out the value inside of the brackets and then #the rest of the line foreach my $line (@data) { print "$1 - $2\n" if $line =~ /^\[\[(\d+ \w+)\]\]\s(.+)/; } __DATA__ [[106535 abc]] blah blah blah... [[298727 xyz etc etc etc etc etc [[093 hij]] so on and so on and so on [[459313 def] more more more more 459313 yadda yadda yadda yadda yadda [[349581 wxy]] yack yack yack yack yack they're coming to [[549412 qqq]] me away ipsum plurum ipsum plurum ipsum plurum ... ... ... ...
    Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"