Miraculix has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm relatively new to Perl and got some headache because I don't know how to solve this regex problem:

Assumed I have an ASCII file containing this lines:


c114: 0245, 0456, 1545

2555, 2444, 0344
0444, 3434, 1434


r145: 0544, 0688, 2988

1332, 0221, 0867
0655, 4548, 7463


c12: 2322, 0556, 3998

3545, 2002, 4500
5650, 0830, 3324
3433, 7070, 3404


It's easy to find the expression "r145:", but how to delete the complete paragraph starting with "r145:" and all subsequent lines starting with a tab inside this paragraph? In my example the paragraph has 3 lines, but it could also have 2 lines, or 6 or whatever.

Thanks

  • Comment on Regex: how to consider an unknown number of tabs?

Replies are listed 'Best First'.
Re: Regex: how to consider an unknown number of tabs?
by toolic (Bishop) on Apr 04, 2010 at 14:51 UTC
    The Easter Bunny left you some code...
    use warnings; use strict; my $flag = 0; while (<DATA>) { if (/^[^\t]/) { if (/^r145:/) { $flag = 1; } else { $flag = 0; } } print unless $flag; } __DATA__ c114: 0245, 0456, 1545 2555, 2444, 0344 0444, 3434, 1434 r145: 0544, 0688, 2988 1332, 0221, 0867 0655, 4548, 7463 c12: 2322, 0556, 3998 3545, 2002, 4500 5650, 0830, 3324 3433, 7070, 3404

    Here is the output:

    c114: 0245, 0456, 1545 2555, 2444, 0344 0444, 3434, 1434 c12: 2322, 0556, 3998 3545, 2002, 4500 5650, 0830, 3324 3433, 7070, 3404
Re: Regex: how to consider an unknown number of tabs?
by BrowserUk (Patriarch) on Apr 04, 2010 at 15:04 UTC

    See perlvar for 'paragraph mode' (setting $/ = '' or -000):

    C:\test>type junk.dat c114: 0245, 0456, 1545 2555, 2444, 0344 0444, 3434, 1434 r145: 0544, 0688, 2988 1332, 0221, 0867 0655, 4548, 7463 c12: 2322, 0556, 3998 3545, 2002, 4500 5650, 0830, 3324 3433, 7070, 3404 C:\test>perl -000nle"/^r145/ or print" junk.dat c114: 0245, 0456, 1545 2555, 2444, 0344 0444, 3434, 1434 c12: 2322, 0556, 3998 3545, 2002, 4500 5650, 0830, 3324 3433, 7070, 3404

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Regex: how to consider an unknown number of tabs?
by bichonfrise74 (Vicar) on Apr 04, 2010 at 16:54 UTC
    Consider this checking out the $/ - input record separator in perlvar.
    #!/usr/bin/perl use strict; local $/ = "\n\n"; while (<DATA>) { print if ( ! /^r145/ ); } __DATA__ c114: 0245, 0456, 1545 2555, 2444, 0344 0444, 3434, 1434 r145: 0544, 0688, 2988 1332, 0221, 0867 0655, 4548, 7463 c12: 2322, 0556, 3998 3545, 2002, 4500 5650, 0830, 3324 3433, 7070, 3404
Re: Regex: how to consider an unknown number of tabs?
by balakrishnan (Monk) on Apr 04, 2010 at 16:05 UTC
    You can also try the below code to achieve your need.
    use IO::File; my $fh = new IO::File "input_file", "r"; my @lines = <$fh>; my $pattern = "r145"; my ($para,@output); foreach my $line(@lines) { if($line =~ /^(\S+)/ ) { if($line =~ /^$pattern/) { $para = undef; } else { $para = $1; push @output,$line; } } elsif(defined($para) ) { push @output,$line; } } print @output;
Re: Regex: how to consider an unknown number of tabs?
by Gangabass (Vicar) on Apr 04, 2010 at 14:30 UTC
    perldoc split

    But can you explain what (and why) you need?

Re: Regex: how to consider an unknown number of tabs?
by PeterPeiGuo (Hermit) on Apr 04, 2010 at 16:38 UTC

    Since you already know how to use regexp to find out those rows start with r, you can find out those c rows in the same way, and that's where the deletion stops.

    That's essentially couple of replies above do.

Re: Regex: how to consider an unknown number of tabs?
by biohisham (Priest) on Apr 05, 2010 at 08:50 UTC
    Change the definition of "$/" about what a new line is, in this case, the records are separated on each empty line and no chomping's done hence the records structure is preserved for each line, however, you might wanna chop if you wanted to get rid of the trailing new line characters for each independent record..
    use strict; use warnings; local $/=''; #Empty lines are the record terminators while(<DATA>){ # print if !/^r145/; print unless /^r145/; } __DATA__ c114: 0245, 0456, 1545 2555, 2444, 0344 0444, 3434, 1434 r145: 0544, 0688, 2988 1332, 0221, 0867 0655, 4548, 7463 c12: 2322, 0556, 3998 3545, 2002, 4500 5650, 0830, 3324 3433, 7070, 3404
    OUTPUTS:
    c114: 0245, 0456, 1545 2555, 2444, 0344 0444, 3434, 1434 c12: 2322, 0556, 3998 3545, 2002, 4500 5650, 0830, 3324 3433, 7070, 3404


    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.