Re^2: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf

Replies are listed 'Best First'.
Re^3: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf by choroba (Cardinal) on Oct 18, 2021 at 16:28 UTC
The parentheses in regular expressions define so called "capture groups". Originally, you had two capture groups: `m/<tagid=(\d)([[:alpha:]])>/ <~~~><~~~~~~~~~~~~> #1 #2` [download] The variables `$1` and `$2` contain the parts matched by the respective capture groups. `\d` means "zero or more digits", `[[:alpha:]]` means "zero or more characters". In your new regex, you don't have any parentheses: `m/<endnote id=/` [download] So $1 and $2 aren't populated. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^4: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf by VladP (Novice) on Oct 18, 2021 at 17:58 UTC
ok - now I understand. I'm thinking this is going to get a little complicated with the possible differences in the @ID values. It may or may not have parenthesis. Could have only closing parenthesis, so it has to be able to match all different occurrences. I tried adding in opening/closing parenthesis and it still won't match. I'm not a REGEX expert. `if ( $tag =~ m/<endnote id=((\d)([[:alpha:]]))>/ ) {` [download] Input could be: <endnote id=(1)>Text...</endnote> <endnote id=(2)>Text...</endnote> <endnote id=1)>Text...</endnote> <endnote id=2)>Text...</endnote> <endnote id=1.>Text...</endnote> <endnote id=2.>Text...</endnote> <endnote id=1a>Text...</endnote> <endnote id=2cb>Text...</endnote> <endnote id=a.1>Text...</endnote> <endnote id=a.2>Text...</endnote> etc...	[reply] [d/l]
Re^5: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf by haukex (Archbishop) on Oct 18, 2021 at 18:10 UTC
That's a pretty strange format. It might be easier to work with if you do it in two steps, first matching everything that isn't a `>` with `[^>]+` and then cleaning up the value: use warnings; use strict; use Data::Dump qw/dd pp/; while ( my $tag = <DATA> ) { chomp($tag); next unless $tag =~ /\S/; # skip blank lines if ( my ($id) = $tag =~ /<endnote id=([^>]+)>/ ) { $id =~ s/\W+//g; print pp($tag)," -> ",pp($id),"\n"; } else { warn "Couldn't match ".pp($_) } } __DATA__ <endnote id=(1)>Text...</endnote> <endnote id=(2)>Text...</endnote> <endnote id=1)>Text...</endnote> <endnote id=2)>Text...</endnote> <endnote id=1.>Text...</endnote> <endnote id=2.>Text...</endnote> <endnote id=1a>Text...</endnote> <endnote id=2cb>Text...</endnote> <endnote id=a.1>Text...</endnote> <endnote id=a.2>Text...</endnote> [download] Note your examples aren't very consistent: Your regex so far only matches digits followed by `[[:alpha:]]`, so it's unclear what you expect for `id=a.1`. You'll have to provide some representative sample input along with the expected output for that input if you want answers that actually adress your problem fully. Btw, you'll probably want to have a look at perlretut and perlrequick.	[reply] [d/l] [select]
Re^6: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf by afoken (Chancellor) on Oct 18, 2021 at 18:27 UTC
Re^5: read/write delete duplicates/sort PROBLEM! - Use of uninitialized value in sprintf by bliako (Monsignor) on Oct 18, 2021 at 18:45 UTC
I suspect you want to escape the outside brackets because you want them to mean literally brackets in your input and not to denote a capture group in regex. And so you will also have to escape the dot (.) if you need its literal value. (escape: ` \.`). So something like: `if ( $tag =~ m/<endnote id=$?(\d)([[:alpha:].])$?>/ ) {` (note that dot needs no escaping inside `[]`). But your regex will fail on your 2 last cases: `<endnote id=a.1>Text...</endnote>`, so why not something like: `if ( $tag =~ m/<endnote id=$?([0-9[:alpha:].])$?>/ )` But if I were you I would split the code in two subs: 1) to clean the input and remove unwanted characters. 2) to parse only properly formatted input. If your input is badly formed XML then perhaps invest doing a proper (1) and then let a proper XML parser do (2). It depends on your use case and how complex it can become in the future. To paraphrase a grand writer: All good data are alike; each bad data is bad in its own way.* bw, bliako	[reply] [d/l] [select]


Syntactic Confectionery Delight
	PerlMonks