comment on

Team

I need help with fixing the below problem, for which I am unable to find a solution.

I am trying to write a program to extract all data within the tag "BIB."

The problem is this: When my find code is this

while ($data1 =~ m{(<BIB>.*</BIB>)}gx)

the output comes as

<BIB>Falco (2012)</BIB> today Louise is hardly isolated. More than 5 m
+illion babies have been born using the procedure, which has become al
+most routine. And at the age of 28, Louise became a mother herself, g
+iving birth to a baby boy name Cameron—conceived, by the way, in the 
+old-fashioned way (<BIB>Falco, 2012</BIB>; <BIB>ICMRT, 2012</BIB>
Total occurrences of <BIB> is 1
[download]

which is not what I want.

When my find code is changed to this

while ($data1 =~ m{(<BIB>)}gx)

I get something closer; at least the number of items within the "BIB" tag matches the total number of items within "BIB."

What I want is this, each entry saved as an array value:

<BIB>Falco (2012)</BIB>

<BIB>Falco, 2012</BIB>

<BIB>ICMRT, 2012</BIB>

use strict;
use 5.14.2;

my $bib_count = 0;
my $INPUT_REF_FH;
my @text_found;
open $INPUT_REF_FH,"<:utf8", "ch01.txt";
binmode STDOUT, ':utf8';
while(<$INPUT_REF_FH>){
    my $data1 = $_;
    while ($data1 =~ m{(<BIB>.*</BIB>)}gx){
        $bib_count += 1;
#        print "$&\n";
        push @text_found, ${^MATCH}; 
    };
};
foreach (@text_found){
    print "$_\n";
};
print "Total occurrences of <BIB> is $bib_count";
close $INPUT_REF_FH;
[download]

INPUT TEXT:

In fact, <BIB>Falco (2012)</BIB> today Louise is hardly isolated. More than 5 million babies have been born using the procedure, which has become almost routine. And at the age of 28, Louise became a mother herself, giving birth to a baby boy name Cameron—conceived, by the way, in the old-fashioned way (<BIB>Falco, 2012</BIB>; <BIB>ICMRT, 2012</BIB>).

In reply to Extract Data between Tags by ppremkumar

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.