This is a gripe on something that can only get worse over time, as new nodes are added all the time...

Lately I've noticed that if you type in an approximate title, this site brings up a list of possible matches, but that list isn't necessarily complete, while it pretends to be (no "search more" option), and it doesn't appear to be sort the results to list the most recent nodes first. As a result, it often becomes impossible to quickly find recent nodes this way, thereby (IMO) reducing the value of this lookup system as a whole.

Two examples:

Yet I do have the impression I've made up pretty close matches to the actual titles.

I do strongly feel that listing most recent threads first, would be more useful as a whole. I don't think actually showing more matches, or having a "search more" link, would be really necessary, if only the most recent nodes were all listed — at least those freshly in memory, up to a few weeks in age, and at least the root nodes. If necessary, notes whose title matches their parent's, could be dropped first.

So... what do you think... Would it actually appear to be better to anybody else? Would this be difficult to patch?

Replies are listed 'Best First'.
Re: Name clashes in lookup by title
by davido (Cardinal) on May 06, 2004 at 15:24 UTC
    I understand the aggrevation. This is a particularly annoying problem when people pick usernames that match parts of the Perl lexicon, so that if you think you're highlighting "sort" with [sort] you're really linking to a user.

    It's an ongoing problem bound to get worse. As it does, the search button at the top of the page is likely to become less useful over time, and Super Search will become more of a vital component to finding anything around here.

    Do realize that the Search button at the top of every page is only searching titles. That means that if you search for "Newbie question" or "Regex Question" there will be thousands of hits, limited to 50, and probably sorted by node id. So in many ways, Super Search is already a better option.

    The very problem you've mentioned (title clashes) prompted the SiteDocClan to adopt a policy "way back when" of always linking within site documentation by node id instead of by node title.

    When presenting links in your nodes, the safest links are and will always be specific links, rather than best guess links. Try to favor the following types:

    [id://........] (Node ID) [cpan://......] (CPAN search) [perldoc://...] (perldoc.com search) [doc://.......] (link to actual function POD or actual POD documents) [pad://.......] .....

    There are others, but you get the idea; it's better to be specific as to what you're trying to link to rather than to rely on the "lazy"  [ ..... ] links.

    For the record, you can actually search for specific nodes by id, just by typing the node ID into the search box.


    Dave

      This is a particularly annoying problem when people pick usernames that match parts of the Perl lexicon, so that if you think you're highlighting "sort" with sort you're really linking to a user.

      I don't think so. Where does grep ([grep]) take you? It doesn't take you to grep ([id://133383], a user) but to perlfunc:grep. Hence the need for [user://grep] which we'll have one day RSN.

      The "Search" at the top needs to be renamed "Find" as it isn't very good at searching, only at finding things that you already suspect are there and that you know exactly or nearly exactly how to get to them.

      And we shouldn't have (many nor important) nodes with titles consisting of only digits (for example, it is no longer possible to register a username consisting of only digits) but searching by node_id should be fast. So at some point, node=1234 will be directly translated to node_id=1234 (because only digits are given) instead of the current behavior of first looking for matching titles and then falling back to looking up the node ID. So, although a link like 65535 ([65535]) currently takes you to an inactive user's home node, in the future it will take you to Re: Submit Button (node ID 65535). Before that happens, you will also be given the ability to link via [title://65535] if you really wanted to find title(s) containing numbers.

      Some of the DWIM in "simple search" needs to go away because it is simply too inefficient. Currently, if you give it 9 words, it will try to find nodes with titles that contain all 9 of those words. If there are none, then it will find nodes having titles that each contain 8 of the 9 words, then try 7 of 9, etc. It does this rather efficiently (it doesn't take 9 passes over the set of node titles; just a single pass is done, perhaps in several steps), but not nearly as efficiently as Super Search's simple method of just requiring that matches contain all 9 words (or else no matches are found).

      Super Search also needs to support 'simple search's trick of adding a space to the front and back of titles so you could search for titles containing " foo " and not find titles containing "food" but still find titles where the "foo" word is either the first or last word in the title (though it won't find ":foo:", simply because SQL doesn't provide an efficient way to support that w/o reindexing on "words" which we defer to google because they do it better than we could).

      Update: Oh, and as to the original question, "newest first" would be better. Since Super Search has proven that MySQL can handle it despite the bugs that can make such ordering very inefficient if you aren't careful, I'll add that to my plans. And I may make trying to search but having no exact matches just take you straight to super search with stuff prepopulated...

      - tye        

      For the record, you can actually search for specific nodes by id, just by typing the node ID into the search box.
      Just as a counter example, I'd like you to look up 1. It returns 2 results, but not one with node_id 1.

      But in general, you are correct.

      Except I don't think that searching for nodes by node_id is all that useful.

      p.s. Nice post.

        You're correct, putting a node ID in the search box subjects it to name clashes as well. Good thing that very few nodes are named 31523. ...and exactly one carrys that ID. There will be times when you simply can't search even by ID. You will always be able to link by ID though. (Just wanted to clarify).

        You may also always get to a particular node ID by hand-crafting the URL. This is not terribly user-friendly, but works fine:

        http://www.perlmonks.org/index.pl?node_id=xxxx (where xxx is the actual node id)

        Careful.


        Dave

Re: Name clashes in lookup by title
by theorbtwo (Prior) on May 06, 2004 at 10:30 UTC

    Um, they seem to be sorted by node_id, highest (newest) first, just like you wanted.

    If I'm wrong, could you point out a conunterexample?

      They appear to be listed sorted by reverse node_id, but I seriously doubt they are searched that way. In SQL terms, I think there's one of
      SELECT ... LIMIT 50 SELECT ... ORDER BY node_id LIMIT 50
      with perhaps a sort done in Perl, but not
      SELECT ... ORDER BY node_id DESC LIMIT 50
      which in that case would likely fix it. (I'm not pmdev, so I have no access to the actual source, so I'm just guessing.)

      For example, a search for List of Lists shows 128632 at the top, and 17802 at the bottom. That's nowhere near the 350865 for the node (list of lists (LOL)) I was actually looking for.

      As an aside, note that my search term is actually part of the node title, which is not the case for any of the results shown.

Re: Name clashes in lookup by title
by artist (Parson) on May 06, 2004 at 14:30 UTC
    We may never implement it, but voting on titles would be a good idea. In that way, we can see the best voted titles and posters are encouraged to view them.<If we can monitor titles and single out good ones ( a big job) that would also help us. When you write the title of your post, if there could be suggestion for forming good titles.
Re: Name clashes in lookup by title
by BUU (Prior) on May 06, 2004 at 09:10 UTC
      I feel that Super Search is too much of a hassle to quickly find something by approximate title.

      Perhaps if that link to Super Search at the top of the result page included enough parameters to prefill the form to search for the same title, then I might like it a lot better as a fallback. Now you just have to start a fresh search.

      Oh, if only the results were searched for in reverse order of age or node id, I'd already be very pleased.