stallion has asked for the wisdom of the Perl Monks concerning the following question:

im trying to extract some tags between start and end in a file..for eg..the file format is

[start] bbc_arc_001 bbc_arc_002 abc_arc_001 [end] bbc_arc_001 bbc_arc_002 bbc_arc_003 bbc_arc_004

I want the bbc_arc_001 and bbc_arc_002 extracted from the file i.e the tags which is present b/w start and end..

I have the regex for extracting the tags from the whole file but how to specify my search within the start and end..Thanks Monks...

The snippet is

if ($_ =~ /$bbc_Prefix[a-zA-Z]*[0-9]*_*\d+/i) {

i have opened the file and read the whole contents and $_ contains the data..

Replies are listed 'Best First'.
Re: Extract Tags between Two strings
by NetWallah (Canon) on Jun 11, 2012 at 14:35 UTC
    Look at the "Range" or "flip-flop" operator in perlop.

                 I hope life isn't a big joke, because I don't get it.
                       -SNL

Re: Extract Tags between Two strings
by fishmonger (Chaplain) on Jun 11, 2012 at 15:27 UTC
    Here's an example using the flip-flop (range) operator.
    #!/usr/bin/perl use 5.10.0; use strict; use warnings; my %wanted; while (my $line = <DATA>) { if ( $line =~ /\[start\]/ .. $line =~ /\[end\]/) { next if ($line =~ /\[start\]/ or $line =~ /\[end\]/); chomp $line; $wanted{$line}++; } } say $_ for sort keys %wanted; __DATA__ [start] bbc_arc_001 bbc_arc_002 abc_arc_001 [end] bbc_arc_001 bbc_arc_002 bbc_arc_003 bbc_arc_004
Re: Extract Tags between Two strings
by ww (Archbishop) on Jun 11, 2012 at 13:57 UTC
Re: Extract Tags between Two strings
by Athanasius (Archbishop) on Jun 11, 2012 at 14:58 UTC

    The following makes a single pass over the input file (I have used in-file DATA for convenience. You will, of course, have to change this to read from your data file):

    use strict; use warnings; my (%tags_to_match, @extracted_tags); my $in_matching = 0; my $tag_prefix = 'bbc_'; my $tag_regex = qr{ ( $tag_prefix \w+ _ \d+ ) }x; while (my $line = <DATA>) { if ($in_matching) { if ($line =~ / ^ \s* \[ end \] \s* $ /x) { $in_matching = 0; } elsif ($line =~ $tag_regex) { $tags_to_match{ $1 }++; } } elsif ($line =~ / ^ \s* \[ start \] \s* $ /x) { $in_matching = 1; } elsif ($line =~ $tag_regex) { my $tag = $1; foreach (keys %tags_to_match) { if ($tag eq $_) { push @extracted_tags, $tag; last; } } } } say "\@extracted_tags = ", join(', ', @extracted_tags); __DATA__ [start] bbc_arc_001 bbc_arc_002 abc_arc_001 [end] bbc_arc_001 bbc_arc_002 bbc_arc_003 bbc_arc_004

    This should work provided the tags to be extracted always appear after the start/end block in which they are specified. If this is not the case for your input file, you will need to make two passes over the file: the first to read the contents of the start/end block(s), the second to extract the specified tags.

    Also note that your regex may not be doing what you wanted. [a-zA-Z]*[0-9]*_* means: zero or more letters, followed by zero or more digits, followed by zero or more underscores. In my code I use a regex which is a guess at what was intended.

    HTH,

    Athanasius <°(((><contra mundum

Re: Extract Tags between Two strings
by ansh batra (Friar) on Jun 11, 2012 at 15:04 UTC

    iterate file
    use Regular expression to detect start
    set $found variable to 1
    use Regular expression to detect end
    set $found variable to 0
    if found is 1 then save the line
    after the iteration is complete use this to get the unique result