Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^4: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf

by VladP (Novice)
on Oct 18, 2021 at 17:58 UTC ( #11137694=note: print w/replies, xml ) Need Help??


in reply to Re^3: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf
in thread read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf

ok - now I understand. I'm thinking this is going to get a little complicated with the possible differences in the @ID values. It may or may not have parenthesis. Could have only closing parenthesis, so it has to be able to match all different occurrences.

I tried adding in opening/closing parenthesis and it still won't match. I'm not a REGEX expert.

if ( $tag =~ m/<endnote id=((\d*)([[:alpha:]]*))>/ ) {

Input could be:

<endnote id=(1)>Text...</endnote>
<endnote id=(2)>Text...</endnote>

<endnote id=1)>Text...</endnote>
<endnote id=2)>Text...</endnote>

<endnote id=1.>Text...</endnote>
<endnote id=2.>Text...</endnote>

<endnote id=1a>Text...</endnote>
<endnote id=2cb>Text...</endnote>

<endnote id=a.1>Text...</endnote>
<endnote id=a.2>Text...</endnote>

etc...

  • Comment on Re^4: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf
  • Download Code

Replies are listed 'Best First'.
Re^5: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf
by haukex (Bishop) on Oct 18, 2021 at 18:10 UTC

    That's a pretty strange format. It might be easier to work with if you do it in two steps, first matching everything that isn't a > with [^>]+ and then cleaning up the value:

    use warnings; use strict; use Data::Dump qw/dd pp/; while ( my $tag = <DATA> ) { chomp($tag); next unless $tag =~ /\S/; # skip blank lines if ( my ($id) = $tag =~ /<endnote id=([^>]+)>/ ) { $id =~ s/\W+//g; print pp($tag)," -> ",pp($id),"\n"; } else { warn "Couldn't match ".pp($_) } } __DATA__ <endnote id=(1)>Text...</endnote> <endnote id=(2)>Text...</endnote> <endnote id=1)>Text...</endnote> <endnote id=2)>Text...</endnote> <endnote id=1.>Text...</endnote> <endnote id=2.>Text...</endnote> <endnote id=1a>Text...</endnote> <endnote id=2cb>Text...</endnote> <endnote id=a.1>Text...</endnote> <endnote id=a.2>Text...</endnote>

    Note your examples aren't very consistent: Your regex so far only matches digits followed by [[:alpha:]], so it's unclear what you expect for id=a.1. You'll have to provide some representative sample input along with the expected output for that input if you want answers that actually adress your problem fully.

    Btw, you'll probably want to have a look at perlretut and perlrequick.

      That's a pretty strange format.

      With quotes around the ID attribute, it would be at least a valid XML fragment, and in that case, using a proper parser would be recommended. Without the quotes, it looks more like tagsoup HTML.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re^5: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf
by bliako (Monsignor) on Oct 18, 2021 at 18:45 UTC

    I suspect you want to escape the outside brackets because you want them to mean literally brackets in your input and not to denote a capture group in regex. And so you will also have to escape the dot (.) if you need its literal value. (escape: \( \) \.). So something like: if ( $tag =~ m/<endnote id=\(?(\d*)([[:alpha:].]*)\)?>/ ) { (note that dot needs no escaping inside []). But your regex will fail on your 2 last cases: <endnote id=a.1>Text...</endnote>, so why not something like: if ( $tag =~ m/<endnote id=\(?([0-9[:alpha:].]*)\)?>/ )

    But if I were you I would split the code in two subs: 1) to clean the input and remove unwanted characters. 2) to parse only properly formatted input. If your input is badly formed XML then perhaps invest doing a proper (1) and then let a proper XML parser do (2). It depends on your use case and how complex it can become in the future. To paraphrase a grand writer: All good data are alike; each bad data is bad in its own way.

    bw, bliako

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11137694]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2022-05-17 05:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (65 votes). Check out past polls.

    Notices?