Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^2: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf

by VladP (Novice)
on Oct 18, 2021 at 16:19 UTC ( [id://11137688]=note: print w/replies, xml ) Need Help??


in reply to Re: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf
in thread read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf

Thanks. It worked but now I need to change teh regex to search for <endnote id= in the input.txt file. I changed the IF statement to reflect the change but now I am getting the Use of uninitialized... error again.</P.

input.txt

<endnote id=(1)>Text...</endnote>
<endnote id=(2)>Text...</endnote>
if ( $tag =~ m/<endnote id=/ ) { $tags{sprintf("%04d%6s",$1 || 999,$2)} = $tag; } else { warn "Failed to match: $tag" }
  • Comment on Re^2: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf
  • Download Code

Replies are listed 'Best First'.
Re^3: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf
by choroba (Cardinal) on Oct 18, 2021 at 16:28 UTC
    The parentheses in regular expressions define so called "capture groups". Originally, you had two capture groups:
    m/<tagid=(\d*)([[:alpha:]]*)>/ <~~~><~~~~~~~~~~~~> #1 #2
    The variables $1 and $2 contain the parts matched by the respective capture groups. \d* means "zero or more digits", [[:alpha:]]* means "zero or more characters". In your new regex, you don't have any parentheses:
    m/<endnote id=/
    So $1 and $2 aren't populated.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      ok - now I understand. I'm thinking this is going to get a little complicated with the possible differences in the @ID values. It may or may not have parenthesis. Could have only closing parenthesis, so it has to be able to match all different occurrences.

      I tried adding in opening/closing parenthesis and it still won't match. I'm not a REGEX expert.

      if ( $tag =~ m/<endnote id=((\d*)([[:alpha:]]*))>/ ) {

      Input could be:

      <endnote id=(1)>Text...</endnote>
      <endnote id=(2)>Text...</endnote>
      
      <endnote id=1)>Text...</endnote>
      <endnote id=2)>Text...</endnote>
      
      <endnote id=1.>Text...</endnote>
      <endnote id=2.>Text...</endnote>
      
      <endnote id=1a>Text...</endnote>
      <endnote id=2cb>Text...</endnote>
      
      <endnote id=a.1>Text...</endnote>
      <endnote id=a.2>Text...</endnote>
      
      etc...
      
      

        That's a pretty strange format. It might be easier to work with if you do it in two steps, first matching everything that isn't a > with [^>]+ and then cleaning up the value:

        use warnings; use strict; use Data::Dump qw/dd pp/; while ( my $tag = <DATA> ) { chomp($tag); next unless $tag =~ /\S/; # skip blank lines if ( my ($id) = $tag =~ /<endnote id=([^>]+)>/ ) { $id =~ s/\W+//g; print pp($tag)," -> ",pp($id),"\n"; } else { warn "Couldn't match ".pp($_) } } __DATA__ <endnote id=(1)>Text...</endnote> <endnote id=(2)>Text...</endnote> <endnote id=1)>Text...</endnote> <endnote id=2)>Text...</endnote> <endnote id=1.>Text...</endnote> <endnote id=2.>Text...</endnote> <endnote id=1a>Text...</endnote> <endnote id=2cb>Text...</endnote> <endnote id=a.1>Text...</endnote> <endnote id=a.2>Text...</endnote>

        Note your examples aren't very consistent: Your regex so far only matches digits followed by [[:alpha:]], so it's unclear what you expect for id=a.1. You'll have to provide some representative sample input along with the expected output for that input if you want answers that actually adress your problem fully.

        Btw, you'll probably want to have a look at perlretut and perlrequick.

        I suspect you want to escape the outside brackets because you want them to mean literally brackets in your input and not to denote a capture group in regex. And so you will also have to escape the dot (.) if you need its literal value. (escape: \( \) \.). So something like: if ( $tag =~ m/<endnote id=\(?(\d*)([[:alpha:].]*)\)?>/ ) { (note that dot needs no escaping inside []). But your regex will fail on your 2 last cases: <endnote id=a.1>Text...</endnote>, so why not something like: if ( $tag =~ m/<endnote id=\(?([0-9[:alpha:].]*)\)?>/ )

        But if I were you I would split the code in two subs: 1) to clean the input and remove unwanted characters. 2) to parse only properly formatted input. If your input is badly formed XML then perhaps invest doing a proper (1) and then let a proper XML parser do (2). It depends on your use case and how complex it can become in the future. To paraphrase a grand writer: All good data are alike; each bad data is bad in its own way.

        bw, bliako

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11137688]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-04-20 08:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found