The center for all Wikitravel images!

Tech:Lucene search

From Wikitravel Shared
Revision as of 17:18, 12 June 2007 by Evan (talk | contribs) (Experiences: phrases in quotes)
Jump to: navigation, search

Wikimedia projects use an extended search tool based on the Java Lucene search tool. It would be nice to incorporate that into the Wikitravel server, too. It's supposedly faster and more flexible. --Evan 22:00, 28 September 2006 (EDT)

This is implemented on review. You can try the search on . Kevin Sours, the main travel site developer for Internet Brands, did the work to integrate this for Wikitravel. This is the first step with using Lucene. In the future, hopefully, we'll be able to have more targeted search, like "Find UNESCO World Heritage sites near Cologne", "Find a Mexican restaurant within 10 miles of my hotel with a price range of $8-15 per entree" or "Find all the salsa dance club in Connecticut". As we move to use more structured listings, this will be more possible. --Evan 13:41, 1 June 2007 (EDT)
We want to roll this out in production soon, so please test out the review version. --Evan 13:41, 1 June 2007 (EDT)
One thing I like about this is that the default search engine doesn't work well with short words. For example, comes up with no results. works correctly. --Evan 16:45, 11 June 2007 (EDT)

I created a new article, Apples, and searched for it. Nothing came up. I then added the word Foobar and searched for that. Still nothing. I checked back about 20 minutes later and now get results for both the text and article search. I'll do another test and see if I can pin down how long the delay is... Maj 16:54, 11 June 2007 (EDT)

Ok, it looks like it's about 15 minutes between updates. That's probably a little too long, but not the end of the world. Under 10 would be better... Maj 16:54, 11 June 2007 (EDT)

UTF-8 characters are sometimes problematic in Lucene, but a search for "東京" (Tokyo) pulled up the appropriate results, which is impressive. I'd suggest doing a bit of testing with Thai script and some other non-UTF-8 character sets just to be sure, but it looks good to me and is better than the current search. -- Ryan 12:48, 12 June 2007 (EDT)

Thai and Arabic both seem to work. المغرب pulls up Morocco... Maj 13:02, 12 June 2007 (EDT)
Korean checks out as well... Maj 13:05, 12 June 2007 (EDT)


I just tested this feature out and I'm a little perplexed by the results it gave me, but then again I'm sometimes perplexed by the results the current search tool gives too. In one instance I searched for "Clubs in Warsaw" and these were the results I got. I got hits for Serbia and some Polish cities. The results also highlighted weird words like "termini", "changing", "industry", and "independent". My suspicion is that the search tool highlighted words with the letters i and n next to each other and picked articles with words like "independent" as long as it had a link to Warsaw.

I also searched for "Purple Bridge" expecting only the guide to Newport (Kentucky) to show up since I imagine there'd only be one purple color bridge in the world that's worth mentioning, but some place called "Wulai" precedes Newport, despite no mention of a purple bridge in Wulai. Though the words "bridge" and "purple" show up on the same line, though in separate sentences.

I did, however, get what I was looking for when I searched for "gothic church buddhist temple". See result [1]. I'm not trying knock the feature, but it doesn't seem very optimal, at least for the time being. I do see potential for it however. With this will we be able to search for those tags included in an article and when will the tag="" attributes be working within the coded listings? -- Sapphire(Talk) • 13:03, 12 June 2007 (EDT)

Don't jump too far ahead there cowboy! Lets get the text search working and then move forward on the tag stuff ;-). That said, it looks like there's a problem with the stop-list (or lack there of) and the word boundaries or partial word search. I searched to "in the a" and got [2], which isn't quite right. While we do want to be able to search on short word (like "San"), it probably should have a basic stop list... and not default to partial word search unless folks do something fancy like "*a*"... Maj 13:10, 12 June 2007 (EDT)
So, with this new search engine you can do phrase searches. You can search for 'purple bridge' (which I think gives hits for 'purple' or 'bridge'), or for '"purple bridge"', which gives hits just for that exact phrase. I'm not sure how to get an exact hit on "Purple People Bridge" without getting all things that mention purple close to bridge, though.
I got better hits with '"clubs in Warsaw"' and '"gothic church"' when I enclosed them in quotes, too. --Evan 13:14, 12 June 2007 (EDT)