Making old PC messages available for searching - Web Sites for Christmas and building them. Read Only. No more posting. - PlanetChristmas! Forums. Read Only. We've moved to http://talk.planetchristmas.com - The Forums of PlanetChristmas have moved to http://talk.PlanetChristmas.com
Between 1998 and November 2005, the "early generation" PlanetChristmas chatroom generated 68,505 individual messages, each on a separate HTML page. They are the raw messages and not linked to the senders. Excellent!
I got to scanning some of the old stuff last night. What a hoot! It's really amazing how much has changed over the years.
After reading a bunch of these messages, I need to figure out a way to make them available to the PlanetChristmas community. Maybe some sort of on-line search capability of just these messages... I'm not sure. Any suggestions?
Chuck
____________________ Chuck Smith at http://www.PlanetChristmas.com
Don't forget our three PlanetChristmas rules: Positive, family friendly and Christmas centric
Chuck, wasn't that run under Webboard 7? If you still have a license for it, perhaps it could run in "read only" mode -- not accept any new members and not have any old members either. It could be linked to the new PC forum so that any guest could browse it and messages could easily be linked to from WowBB.
I can not stress how valuable old messages are - especially if they have a search function with them. I moved the my cartoon site 3 times now (twice from 'free' services), and each time it was heart wrenching to have to loose all those old messages.
Not only do they give a historical look at things, they are the basis on which NEW things should be done (ie: no sense in re-inventing the wheel!)
Do your best to get those messages up and available.
tsmith35 wrote: Chuck, wasn't that run under Webboard 7? If you still have a license for it, perhaps it could run in "read only" mode -- not accept any new members and not have any old members either. It could be linked to the new PC forum so that any guest could browse it and messages could easily be linked to from WowBB.
We actually started with WebBoard 3 way back in the old days. The reason we're no longer using WebBoard is because it couldn't handle the number of messages or our volume. Bringing up the old WebBoard is not an option.
All the messages... 68,505 individual HTML pages consume about 65MB of disk space but zip up to a hair over 43MB.
My goal is to park the files (assuming my ISP will let me have 68,505 of them) in a subdomain of PlanetChristmas.com and include some sort of targeted search engine. Anyone ever heard of http://www.mtopsoft.com/sitesearch/index.htm ?
____________________ Chuck Smith at http://www.PlanetChristmas.com
Don't forget our three PlanetChristmas rules: Positive, family friendly and Christmas centric
It might be possible to load the old posts from the HTML into the WowBB database as closed topics either in new "archive" forums or in the existing ones. WowBB isn't terribly complex on the backend--I've already looked at it. Let me know if you're interested in going down this route and I'll look-into it further.
--Ethan
____________________ A noble spirit embiggens even the smallest man.
-- Jebediah Springfield
The big negative is, most of the search utilities are very expensive at the 68000+ page level. And most provide their searches as a service, with monthly charges forever. Ack. But there's always Google. They offer their "Google Free" search for free. This actually looks very interesting, and it should help get more PlanetChristmas pages available to everyone.
Take a look at Google's offering and see what you think.
Of course, they always have their search appliances available for those folks with lots of cash.
Tom
Last edited on Wednesday August 2nd, 2006 03:19 am by tsmith35
Kinda depends how the files are layed out - if the filenames or something in the content allows you to put the threads back together, you're in good shape.
If so, write a script to take each file, strip off the HTML outside the content of the actual message (if there's a threadID, keep it). You can take what's left and start dumping it to a new file.
Once the script is done running, you'll have a file that has thread IDs and the message content. I think you'll find that you're significantly under 43 MB (I'd wager less than 1MB...)
If you have access to your database that runs PC, you can either create a new table to put all the legacy messages in, or import them into the existing PC forum tables.
One of our very own is in the process of taking the old messages (with people's names stripped out) and providing us a way to sort through them. Stay tuned!
____________________ Chuck Smith at http://www.PlanetChristmas.com
Don't forget our three PlanetChristmas rules: Positive, family friendly and Christmas centric