I'm trying to parse a text file and convert it to XML. The .txt file consists of a list of entries, separated by line breaks. So, two sample entries look like this:
Leyson, Captain Burr. "With or Without Gadgets." Boys' Life. Nov 19 +49. p. 6. "An old-timer knew what he had to do in a jam. He didn't + need hundreds of those gadgets to guide him to safety." @gauge %tri +vial "The Battle Against Baldness." Kiplinger's Personal Finance. Feb 194 +9. "A little home hair-cutter gadget--a comb with a razor attached-- + has zipped its way into fame in recent months. Barbers pooh-pooh it + as a threat, but sales are going strong." @tool %american
Each entry contains bibliographic data, a quotation from that source, and two sets of tags: a set of primary and secondary classifications, one using @tags and the other using %tags, all on a single line.
The most important information for me to extract from each entry is year and tags. So, I came up with the following script:
#!/usr/bin/perl -w my $year = ""; while (<>) { chomp; if ($_ eq "") {next;} elsif ($_ =~ /^\d\d\d\d$/) { $_ = $year; } else { s/\@(\w*)/ <keyword> $1 <\/keyword>/g; s/\%(\w*)/ <tag> $1 <\/tag>/g; print "<entry>$_ <year> $year </year> </entry>\n"; } }
The @tags and %tags are recognized just fine. Problem is, entries and years are not located. My program doesn't differentiate between entries: I get <entry> at the very beginning of the output and </entry> at the very end. Similarly, there's only a single, blank <year></year> right before </entry>.
I realize there's probably a very simple solution to this, but I'm still at the circumference of a circle, knock-knock-joke stage of perl programming, so your expertise would be very much appreciated. Thanks!
In reply to Converting a Text file to XML by monk8148n038
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |