shylaja has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I have a piece of data which looks something like this,
-----------------------------------------------------------
ISSUE ABCDE00078945
-----------------------------------------------------------

Summary : summary about the issue

Component : Component desscription
Product : Product name
Version : Version number
Date : 2013-10-15

-----------------------------------------------------------
ISSUE ABCDE00012345
-----------------------------------------------------------

Summary : summary about the issue

Component : Component description
Product : Product name
Version : Version number
Date : 2014-10-15

The above pattern will be containing special characters.
Each set of "ISSUE" i e.,
-----------------------------------------------------------
ISSUE ABCDE00078945
-----------------------------------------------------------

Summary : summary about the issue

Component : Component desscription
Product : Product name
Version : Version number
Date : 2013-10-15
should be parsed in an iteration. I am using the pattern,

$text =~ /(ISSUE\s+\w{5}\d+)\n-*\n\s*([\S\s]*?)---*/g;

where $text will be having the above mentioned data. But the script is not returning the issue set. Is there something wrong in the pattern?

Replies are listed 'Best First'.
Re: matching characters
by Athanasius (Archbishop) on Sep 03, 2014 at 07:16 UTC

    Hello shylaja,

    As McA has shown, you can use the /g modifier in this way if you first read in the whole data file. But if the file is large, it may be necessary to process it line by line. For that, you need a different strategy; for example:

    #! perl use strict; use warnings; use Data::Dump; my @fields = qw( Summary Component Product Version Date ); my @issues; while (<DATA>) { chomp; if (/^ISSUE\s+(\w{5}\d+)$/) { push @issues, { Id => $1 }; } else { for my $field (@fields) { if (/^$field\s*:\s+(.+)$/) { $issues[-1]->{$field} = $1; last; } } } } dd \@issues; __DATA__ ----------------------------------------------------------- ISSUE ABCDE00078945 ----------------------------------------------------------- Summary : summary about the issue Component : Component desscription Product : Product name Version : Version number Date : 2013-10-15 ----------------------------------------------------------- ISSUE ABCDE00012345 ----------------------------------------------------------- Summary : summary about the issue Component : Component description Product : Product name Version : Version number Date : 2014-10-15

    Output:

    17:02 >perl 996_SoPW.pl [ { Component => "Component desscription ", Date => "2013-10-15", Id => "ABCDE00078945", Product => "Product name", Summary => "summary about the issue", Version => "Version number", }, { Component => "Component description ", Date => "2014-10-15", Id => "ABCDE00012345", Product => "Product name", Summary => "summary about the issue", Version => "Version number", }, ] 17:02 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      If one removes the checking of the field names, one can write this also in a more concise way. TIMTOWDY!

      my @issues; while (<DATA>) { push @issues, { Id => $1 } if /^ISSUE\s+(\w{5}\d+)$/; $issues[-1]->{$1} = $2 if /^(\w+)\s*:\s+(.+)$/ } dd \@issues;

        Thanks all for your replies.

Re: matching characters
by McA (Priest) on Sep 03, 2014 at 06:59 UTC

    Hi,

    try the matching modifier s.

    #!/usr/bin/perl use strict; use warnings; my $string = <<EOF; ----------------------------------------------------------- ISSUE ABCDE00078945 ----------------------------------------------------------- Summary : summary about the issue Component : Component desscription Product : Product name Version : Version number Date : 2013-10-15 ----------------------------------------------------------- ISSUE ABCDE00012345 ----------------------------------------------------------- Summary : summary about the issue Component : Component description Product : Product name Version : Version number Date : 2014-10-15 The above pattern will be containing special characters. Each set of "ISSUE" i e., ----------------------------------------------------------- ISSUE ABCDE00078945 ----------------------------------------------------------- Summary : summary about the issue Component : Component desscription Product : Product name Version : Version number Date : 2013-10-15 EOF while($string =~ /(ISSUE\s+\w{5}\d+)\n-*\n\s*([\S\s]*?)---*/gs) { print "$1\n"; }

    Regards
    McA