tagging question

bagerson has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: tagging question by LassiLantar (Monk) on Jul 23, 2004 at 23:20 UTC
If I understand your question properly... `$string = "il asdfasdfasdf"; $string =~ s/([\S]{2})//; $tag = "<$1>"; $string = $tag . "$string ". $tag` [download] Would do what you're asking for. I'm sure a true perl master could condense that into 1-2 lines, but I'm just a little perl footsoldier right now... Peace, LassiLantar	[reply] [d/l]
Re^2: tagging question by Ovid (Cardinal) on Jul 23, 2004 at 23:30 UTC
I'm sure a true perl master could condense that into 1-2 lines. But a true Perl master would go for clarity over conciseness unless the circumstances dictate otherwise. Cheers, Ovid New address of my CGI Course.	[reply]
Re^3: tagging question by LassiLantar (Monk) on Jul 23, 2004 at 23:33 UTC
True, true. Again, I am outclassed =) Peace, LassiLantar	[reply]
Re: tagging question by b10m (Vicar) on Jul 23, 2004 at 23:58 UTC
I'm pretty sure you can make this code shorter if you really want to, but I am curious why you chose to substitute all lines. That, to me, looks like a lot of useless hassle ;) Anyways, for the OP, my €0,02: `while(<DATA>) { print "<$1>$1$2<$1>" if $_ =~ m\|(\w{2})(.*)\|; } __DATA__ il yadayadayada df yadayadayada` [download] -- b10m All code is usually tested, but rarely trusted.	[reply] [d/l]
Re^2: tagging question by LassiLantar (Monk) on Jul 24, 2004 at 04:03 UTC
<snappy comeback> Well, he said each line had a tag on the beginning of it, so I figured it would be extra to deal with checking whether it did. </snappy comeback> <real excuse> Didn't think of it =) </real excuse> Peace, LA	[reply]
Re^2: tagging question by beable (Friar) on Jul 24, 2004 at 01:16 UTC
I must not be a true perl master, because I hacked on your program and it got BIGGER! `#!/usr/bin/perl # you have to use strict and warnings unless you # have a really good reason not to. use strict; use warnings; my $string = "il asdfasdfasdf"; my $tag = ""; # use matching here instead of substitution # all of the string should appear in the output # also, don't need square brackets in match if ($string =~ m/(\S{2})/) { $tag = "<$1>"; } # you don't need to concatenate, just interpolate the lot $string = "$tag $string $tag"; print "string = $string\n"; __END__` [download]	[reply] [d/l]
Re^3: tagging question by LassiLantar (Monk) on Jul 24, 2004 at 04:12 UTC
I must not be a true perl master, because I hacked on your program and it got BIGGER! Gwuahaha! I am superior! (read: I am too lazy to write in use strict/use warnings on PM). I agree with you, use strict and warnings are totally necessary. I'm so lazy I even sometimes try to circumvent use strict by redeclaring my variables in random places, but really they're improving the way I write code. (As is sparring with the monks). Peace, LassiLantar	[reply]
Re: tagging question by graff (Chancellor) on Jul 24, 2004 at 04:17 UTC
I'm working on tagging a large linguistic corpus Been there, done that. (Still there, doing it, in fact...) What I need to do is add a tag around each line (<il> or <df> in the above cases) where the contents of the tag match the two character string at the head of each line: <il> il yadayada <il> Might you happen to be somewhat new to the area of markup languages (i.e. XML) also? You may want to double-check what the goal is supposed to be. Many people doing linguistic-related research would prefer to use real XML in their corpus data, and what you proposed is not real XML, despite having something in common with it (using angle brackets). There are two things you should consider (maybe ask others in your group/research community to get their suggestions): The tags you add should be paired like this: `<tag> text content ... </tag>` [download] Note the slash character in the second tag that marks the end of the region -- that's required. If the initial "token" on each is really a classifier (i.e. an annotation that someone has added to the corpus data, rather than being part of the original spoken or written corpus content), then the XML tags ought to replace the classifier, rather than simply being placed around it. On the second point, I could see wanting to leave the 2-letter code in the line, just to make sure you put the tags in the right way, but there are better ways to validate your process. If I'm guessing right about what you really should be doing, your regex should just put angle brackets around the initial 2-character token, then make a copy of it at the end of the line with a slash added as needed. Something like this: `s{^(\w{2})(.*)}{<$1>$2 </$1>};` [download] (I chose to use curlies around the regex and replacement, just so I wouldn't have to use a backslash-escape for the slash in the closing tag.) (P.S.: Welcome to the Monastery!)	[reply] [d/l] [select]
Re: tagging question by Ovid (Cardinal) on Jul 23, 2004 at 23:49 UTC
If you just want this on the command line to read from one file and write to STDOUT (great for seeing that it works): `perl -pe '/^(\w{2})(.*)/;$_ = "<$1>$1$2<$1>\n"' data.txt` Cheers, Ovid New address of my CGI Course.	[reply] [d/l]
Re: tagging question by beable (Friar) on Jul 23, 2004 at 23:14 UTC
`#!/usr/bin/perl use strict; use warnings; # read in the data line by line while (my $line = <DATA>) { # chomp off the newline chomp $line; # see if we have a match of two letters at the start # of the line if ($line =~ m\|^(\w{2})\|) { # if it matched, add tags my $tag = $1; print "<$tag> $line <$tag>\n"; } else { # if it didn't match, just print the line print "$line\n"; } } __DATA__ il yadayadayada df yadayadayada` [download]	[reply] [d/l]
Re: tagging question by NetWallah (Canon) on Jul 23, 2004 at 23:09 UTC
Here is a snippet: `my $x='il yadayadayada'; $x=~s/^(\w{2})(.)/<$1> $1$2 <$1>/; print $x; -- output -- <il> il yadayadayada <il>` [download] Update:beable's (++) nit noted and picked. Earth first!* (We'll rob the other planets later)	[reply] [d/l]
Re^2: tagging question by beable (Friar) on Jul 23, 2004 at 23:17 UTC
Dude, the output is supposed to be: `<il> il yadayada <il>` [download] . Therefore, you should have written this: `$x=~s/^(\w{2})(.*)/<$1> $1$2 <$1>/;` [download] </nitpick>	[reply] [d/l] [select]
Re: tagging question by murugu (Curate) on Jul 24, 2004 at 07:56 UTC
My code is, `while (<DATA>){ s#^(\w{2}).*#<$1>$&<\/$1># && print } __DATA__ lg alkjslkjs sl slksjlkjslkjs slkjslkjs` [download]	[reply] [d/l]