•Re: make a web site searchable
by merlyn (Sage) on Aug 20, 2003 at 19:05 UTC
|
If you don't mind bouncing them out to Google and back to your site, you can add "search this site with Google" without any programming whatsoever. I added the following HTML to the template for the bottom of my pages:
<form action="http://www.google.com/search" method=GET>
<INPUT TYPE=hidden name=site value=swr>
<INPUT TYPE=hidden name=q value="site:stonehenge.com">
<INPUT TYPE=text name=as_q size=31 maxlength=256 value="">
<INPUT TYPE=submit name=btnG VALUE="Search stonehenge.com with Google"
+>
</form>
Just replace the two occurrances of "stonehenge.com" with your
domain name, and be sure that Google is hitting your site regularly.
-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply. | [reply] [d/l] |
|
|
I think it only works if your site has webpages have somehow been indexed by Google. Otherwise, nothing returns.
| [reply] |
|
|
| [reply] |
|
|
Re: make a web site searchable
by tcf22 (Priest) on Aug 20, 2003 at 18:54 UTC
|
Take a look at HTML::Index. This should do what you want. | [reply] |
Re: make a web site searchable
by cfreak (Chaplain) on Aug 20, 2003 at 19:25 UTC
|
| [reply] |
|
|
We use perlfect at work, and it seems pretty good. On the other hand, I didn't set it up, and when I tried setting it up on my own site, I found it to be a pain in the arse. So I stuck with the one that I'd written a couple of years before and which does a sufficiently good job that I am disinclined to try any harder with perlfect.
| [reply] |
Re: make a web site searchable
by halley (Prior) on Aug 20, 2003 at 19:21 UTC
|
| [reply] |
Re: make a web site searchable
by bean (Monk) on Aug 21, 2003 at 00:15 UTC
|
This particular wheel has already been invented. Use ht://Dig if you want to accomplish this with a minimal amount of effort. Unless this is a programming assignment or for your own personal edification, in which case you'll need to exclude common words, research/choose/implement ranking algorithms, maybe even look into clustering techniques to find related/similar documents, etc. | [reply] |
Re: make a web site searchable
by hmerrill (Friar) on Aug 20, 2003 at 20:15 UTC
|
I've never used it myself, but it might have some value here - I don't know what database your using, but MySQL has a Full Text Search capability, and I think Oracle has something similar - not sure about PostgreSQL.
Here's a snippet from a doc I found on www.mysql.com when I did a search(upper right) for 'full text search':
As of Version 3.23.23, MySQL has support for full-text indexing and se
+arching. Full-text indexes in MySQL are an index of type FULLTEXT. FU
+LLTEXT indexes are used with MyISAM tables only and can be created fr
+om CHAR, VARCHAR, or TEXT columns at CREATE TABLE time or added later
+ with ALTER TABLE or CREATE INDEX. For large datasets, it will be muc
+h faster to load your data into a table that has no FULLTEXT index, t
+hen create the index with ALTER TABLE (or CREATE INDEX). Loading data
+ into a table that already has a FULLTEXT index will be slower.
Full-text searching is performed with the MATCH() function.
mysql> CREATE TABLE articles (
-> id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
-> title VARCHAR(200),
-> body TEXT,
-> FULLTEXT (title,body)
-> );
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO articles VALUES
-> (NULL,'MySQL Tutorial', 'DBMS stands for DataBase ...'),
-> (NULL,'How To Use MySQL Efficiently', 'After you went through a
+ ...'),
-> (NULL,'Optimising MySQL','In this tutorial we will show ...'),
-> (NULL,'1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'
+),
-> (NULL,'MySQL vs. YourSQL', 'In the following database compariso
+n ...'),
-> (NULL,'MySQL Security', 'When configured properly, MySQL ...');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM articles
-> WHERE MATCH (title,body) AGAINST ('database');
+----+-------------------+------------------------------------------+
| id | title | body |
+----+-------------------+------------------------------------------+
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
| 1 | MySQL Tutorial | DBMS stands for DataBase ... |
+----+-------------------+------------------------------------------+
2 rows in set (0.00 sec)
and it continues on...
Here's the link to this page:
http://www.mysql.com/doc/en/Fulltext_Search.html
HTH. | [reply] [d/l] |
Re: make a web site searchable
by trs80 (Priest) on Aug 20, 2003 at 20:14 UTC
|
| [reply] |
Re: make a web site searchable
by johndageek (Hermit) on Aug 20, 2003 at 20:50 UTC
|
Depending on the size of the site and searching capability needed a more scalable solution may be:
Scan the pages
strip non words/tags
parse the words - create a table with word and page-url as columns
Index on word
Please note you can normalize data if your site is large enough.
your search page can then parse the key words, look up the words individually in the database with a join or a union (depending on type of return you want) as well as searching on partial words.
Enjoy
-John
| [reply] |
Re: make a web site searchable
by dont_you (Hermit) on Aug 20, 2003 at 22:35 UTC
|
Take a look at mnoGoSearch, they have done all the work for you already. I've got very good results with it.
From their site: "mnoGoSearch (formerly known as UdmSearch) is a full-featured web search engine software for intranet and internet servers. mnoGoSearch for UNIX is a free software covered by the GNU General Public License"
It's C based code, and the compiled CGI frontend works far better than the provided Perl script. Maybe some day someone will write an XS interface...
| [reply] |
Re: make a web site searchable
by bear0053 (Hermit) on Aug 20, 2003 at 19:18 UTC
|
The google suggestion won't work because i need this search to be in house and not bounce off google. Thanks though for the suggestion it would be nice if i could utilize this feature. | [reply] |
Re: make a web site searchable
by Anonymous Monk on Aug 21, 2003 at 13:26 UTC
|
you might give swish-e a try, depending on the size of your project. it can handle lots of data, indexes pretty fast, searches as well quite speedy and its not too memory consuming. it is used e.g. on apache.org, and in our company (daily newspaper) we index ~ 250.000 documents with it. downside: no incremental updates possible, but indexing is pretty fast (about 40 mins in our case), and it can handle multiple indices.
cheers m | [reply] |
Re: make a web site searchable
by bugsbunny (Scribe) on Aug 21, 2003 at 11:22 UTC
|
try this one :
http://perlfect.com/ | [reply] |
|
|
Hi there
I am about to launch a Perl Application which will do mostly what you need. It will be available for demo soon at http://www.minigoogle.co.uk
It searches msql databases, entire web directories and contains an engine which allows the admin to specify a url on another website to be spidered and indexed in the database. So you could group websites together under a topic and search these sites by keyword.
I am hoping the concept will take off and I can "chain" the search engines together to make a much bigger searchable repository (similar to the way filesharing works but with information).
The intention is to create a service which allows a more focused search with more accurate results. BUt it is early days yet.
I also have a Lite version of the script which searches only flat file databases. It has been a team effort with several programmers working on it from all over the world.
Pretty soon there will also be a version which will work like a Yahoo style directory engine but the plans for this have only just been drawn up.
Does anyone have any comments, thoughts or ideas?
cheers
Dataferret
| [reply] |
|
|
| [reply] |
Re: make a web site searchable
by Anonymous Monk on Aug 22, 2003 at 04:37 UTC
|
Hey,
I am working on exactly what you are looking for (i am planning to eventually make it open source).
Note that my approach is probebly overkill in your situation, it is meant for large data sets (tested on 250MB of text & works surprisingly well).
I found that the best way is to set up an inverted-index of all the terms as well as an index which shows the position of each word within each document.
I then use an algorithm which gives a bonus if the words that are being searched appear close to each other in a ducument -- this proximity-search algorithm is described at http://citeseer.nj.nec.com/cachedpage/550719/1 .
Also to improve the inverted-index words are indexed by their stem (a stemming algorithm can be found here http://www.ldc.usb.ve/~vdaniel/porter.pm ).
Aswell I have implemented an algorithm similar to google's pagerank (a good description of it is at http://citeseer.nj.nec.com/cachedpage/368196/1 ), the popularity of a page is taken into account when returning results.
I use MySQL for all the storage / indexes. | [reply] |
Re: make a web site searchable
by richardX (Pilgrim) on Aug 22, 2003 at 09:14 UTC
|
I have used both Perlfect and Swish but for my small sites I use the FREE service from Atomz.com
Atomz does all the work for you and you don't have to install anything. KISS
Richard
There are three types of people in this world, those that can count and those that cannot. Anon | [reply] |