I've written a script to scrape the websites of some major hotel chains to all the info needed to create a Wikitravel accomodation listing.
I have big, unrealistic, and totally unrealized plans for this tool. Right now it does the following:
- Scrapes major chains
- Emits a wiki-formatted list of entries
I eventually hope to do the following too:
- Scrape Wikitravel to figure out the correct name for a city. For example, is it Newark or Newark (California)?
- Keep a database of entries.
- Keep track of which entries have been inserted into articles so as to avoid re-adding them later if someone deletes them (see Wikitravel:Avoid negative reviews).
- Be able to update entries if web site changes or whatever.
- If someone adds descriptive text, be able to merge changes in and respect the human-added text
- Be able to add a new hotel chain, and emit just the entries for that chain
- If the number of entries gets out of hand (Las Vegas), be able to randomly select a few automatically.
If anyone wants access to the current list for a particular state, please ask and I'll upload it. But for the sake of my future dreams for the hotel program, I ask that you please copy entries verbatim. If the list for a particular city is large, it's fine if you just copy some of the entries, just don't change the entries if you can help it.
Here are the data for a few states: