Hi folks,
I've just ported the Keyword Search superdoc (which will only be visible for pmdev, currently), over from the test site. For those that can't see it, its very similar to the Perl Monks User Search with a dropdown list of all currently used keywords to select from.
Anyway, the purpose of this discussion, is to outline some FAQ which will suggest how people should go about tagging nodes, such that useful searching is possible. I've been doing some tagging of my own already, but for global use this needs to be documented.
My suggestion for this would be something like:
- If a node mentions a module or its question can be solved with a module, tag with the module name, eg "IO::Socket".
- Always use common capitali(s|z)ations/spellings of nouns, eg "Perl6", "Perl/Tk", "SOAP" - (Though I hope the search is/will be case insensitive anyway)
- .. ?
The doc also needs to mention that keyword deletion can be done by editors^Wjanitors, and a consideration/editor request/msg can be used to accomplish such.
Also I'm wondering if we shouldn't have a "tagging group" who will put official tags on things, so the search can either search "official" tags, and "everything else" (Can be accomplished by adding a column to the keyword table for the user_id, where the id==tagging group for members of that group).
Please add your ideas about the documentation to this discussion, thanks.
C.
Re: Keyword Nodelet / Tagging documentation
by Tanktalus (Canon) on Sep 09, 2005 at 21:38 UTC
|
I'm very curious as to how the tags were done, are being done, will be done. Because when I select a tag and go to the found node, I can't figure out where that tag is supposed to be. ;-)
I assume a node can have more than one tag. What is the planned interface for this? A select box where one can use the ctrl-click interface to add additional seen-before tags, and a label box for typing arbitrary (new) tags?
Also, will there be a publically viewable "Keywords: a, b, c" that everyone will (eventually) be able to see for a given node?
I'm not trying to change the design, just trying to find out what it was/is :-) If these questions change the design, that was by accident not by, um, er, design. :-)
Thanks,
| [reply] |
|
| [reply] |
Re: Keyword Nodelet / Tagging documentation (vote > privilege)
by tye (Sage) on Sep 10, 2005 at 04:10 UTC
|
My experience with PM leads me to believe that voting has a much better chance of resulting in a useful categorization system than privilege (the tag adders, tag deleters, and considerers) does.
My bet is that privilege won't result in a very useful tag system and, even if it starts to, the effect won't last.
I think you need to allow anyone (or nearly so, maybe about level 3) to add keywords and nearly anyone to vote for/against keywords. But tracking votes is O( nodes * users ) and is the biggest part of our database. Keyword votes could be O( nodes * users * words ). But not tracking keyword votes just affords voting abuse... Maybe an alternative of only tracking recent votes and limiting number of votes per day would not balloon out of control but would strongly discourage voting abuse...
But I don't think you've designed a keyword system that will work yet. And I don't think that will be an easy design. But I think such could be very useful.
| [reply] |
|
I think you have a good point re the usergroup, it may start off well, but would probably not keep uptodate much or stay that way.
Voting as such is a nice idea, though as you say it will produce a lot of data. Which reminds me that I added, and then removed again, the "Rating" field, and you've reminded me what that was - we already keep track of how many times a keyword was added to a particular node, which is sort of like voting, only doesn't store who added which keyword, and allows users to add the same keyword multiple times themselves.
Having this as a level power sounds like a good idea.
I wasn't really out to *design* a new keyword system, so much as make the existing one more usable. With a search at least *I* can find stuff I've keyworded the same, and with a documentation, theres a slightly increased chance that others will use the same or similar ones.
If its a concern of the size the table will get, how about attaching the keywords to the nodes themselves somehow? (In the node table, since I'd like to be able to tag everything and anything), although that wouldn't solve the "who tagged it" problem (if there is one).
I'm not entirely sure I understand why there would be any use for anyone abusing the vote-on-keyword thing at all, the search will/should show rating(relevanace), but can be sorted by many other criteria. Also since keywords can be removed, all the effort would go to waste fairly quickly. Adding keywords to someones nodes should *not* IMO, give XP of any kind, either to the adder or the node owner. Limiting votes will just get us less keywords..
Does this mean you don't approve of a documentation at all, currently, or just that you'd like a better system in the future. (This solution will mostly solve my itch, at least, even if I'm the only one using it..)
C.
| [reply] |
|
The abuse I predict is someone tagging every node by merlyn as "bull..." and other rude, abusive, and obscene tags being thrown in because that's what many children are prone to do when given an anonymous way to scribble on the walls.
Adding a keyword is so trivially easy while finding the offense, considering it, and getting a privileged user to remove it, is severals times more work. So I bet that increased visibility of the keyword system will eventually lead to an annoying amount of abuse.
Which reminds me that part of thee value of voting is abuser correction, not just abuse correction.
One idea would be a non-XP point system whereby adding keywords that get downvoted cost you points such that you can't add keywords as frequently...
I'm not saying your patch shouldn't be applied. But I personally wouldn't spend time implementing a privileged keywording group and would be prepared for the keyword system needing to be disabled until a major overhaul happens.
| [reply] |
|
|
Re: Keyword Nodelet / Tagging documentation
by planetscape (Chancellor) on Sep 12, 2005 at 15:08 UTC
|
Here is the list I have been formulating while Keywording nodes to which I have replied (which hopefully equates to "nodes which I know something about" ;-)):
Updated: 2005-09-16
HTH,
| [reply] [d/l] |
Re: Keyword Nodelet / Tagging documentation
by castaway (Parson) on Sep 10, 2005 at 14:07 UTC
|
theorbtwo just suggested a couple of ideas to enforce the documentation, such as having a list of stop words not allowed as keywords ("and" "or" etc). Also a list of aliases such as "Tk" -> "Perl/Tk"..
C. | [reply] |
|
| [reply] |
Re: Keyword Nodelet / Tagging documentation
by eric256 (Parson) on Sep 12, 2005 at 16:02 UTC
|
One way to limit abuse would be to require a keyword to be added a certain number of times before it becomes visible. Then you do need to remember who adds what keyword so that they can't do it twice, but that should be fairly trivial. I also think some way of either matching entered keywords to existing ones or listing existing keywords that you can add would be good. That would help with consistency in keywording. I would also think that normalizing keywords as they are entered is important. Removing and,a ,with,the, etc. and lowercasing everything would be a good first step.
Update: It would also be nice to be able to keyword and vote all in one swipe. So if we are going to use keywords more it would be nice if there was just a field below the entry to enter them in. Then I could keyword and vote on an entire thread all at once. BTW Could keywords use up a *vote*? That should certainly limit the amount of damage an abuser could do.
| [reply] |
|
| [reply] |
Re: Keyword Nodelet / Tagging documentation
by planetscape (Chancellor) on Sep 20, 2005 at 19:03 UTC
|
Per castaway's request:
General Guidelines for Picking Keywords
Since I have been starting with my writeups - nodes whose roots I've authored
or to which I've replied - I have tried to take into consideration both the root
node and its replies when picking keywords. (As opposed to keywording
nodes which have no replies yet.) So...
- Modules mentioned in the root node or in replies (those that are either
the problem or a possible solution, such as "XML::Simple", for example);
problems with modules (or installing modules) in a more general sense get
tagged with "modules" and/or "installing".
- "Automation" is what MS calls it when you use one app to drive another, so
that's what I'm calling using Win32::OLE to drive an app such as Excel from
Perl (since people more familiar with MS stuff than Perl are more likely to
type that into a keyword search than the module name); plus I mention the
module, Win32::OLE, for the more Perly types. In other words, I try to use
keywords that people coming from "outside" the Monastery might try first.
- If I know of a term(s) that is synonymous, I try to get that in, too...
Like "Ngram" and "Markov Chain," or "course," "class," and "training"...
- If there is a common abbreviation for something mentioned in the offered
solutions (such as "LCS" for "Longest Common Substring", or "KWIC" for "KeyWord
in Context"), then I include that... especially since "Longest Common
Substring" won't fit in the input area for the Keyword Nodelet. ;-)
- When the question (and/or solution) depends heavily on a particular
method, option, or hash key (Text::ExtractWords minwordlen and maxwordlen re: "minlen/minwordlen - maxlen/maxwordlen"
and Re: XML::Simple "transforming data" re: "NoAttr"), I try to get those in.
HTH,
| [reply] |
Re: Keyword Nodelet / Tagging documentation
by dimar (Curate) on Jan 14, 2006 at 17:45 UTC
|
Hmmm ... this thread looks a little 'mature'...
I cannot help but ask, however, why not just use
a site like del.icio.us for this kind of functionality?
I am a big fan of tagging and 'folksonomy' and it seems
like a lot of the effort required for 'infrasturcture' is
already in place, and the real difficult part is
the actual cognitive effort required to link 'human operational terms' into 'perl equivalent terms'.
The benefit of del.icio.us is also that folksonomies get
associated with a specific *user*, which means tag pollution and tag bigotry would have an automatic 'credibility filter'. If you happen to like a given perlmonks user, and someone has maliciously tagged that user's content as *junk*, you can ignore the person who did the malicious tagging and assume they are a low-credibility source.
UPDATE:This was added late because I thought these points merited consideration, not to downplay the work already done here on perlmonks. Overall I think the functionality sounds great, and I hope it takes off.
=oQDlNWYsBHI5JXZ2VGIulGIlJXYgQkUPxEIlhGdgY2bgMXZ5VGIlhGV
| [reply] |
|
|