Not Logged in.
Login | Register

General Discussion Board \ Site \ Archival Backup...

Click here to log in (you must be logged in to post comments).

Posted: 21 Dec 2006
23:13 GMT
Total Posts: 1051
To some this would be considered un-ethical, but I'm not aware of anyone else who is doing it so I thought I might as well start. Now I'm not talking about our website's archive, but's. Even if they block me from doing this in the future, I'll still have a decent backup. I recently came across an app that will allow me to download any site virtually. Now, if the webpages are generated via scripts, then I get the resulting html files, but that's a minor discrepancy at this time. As of earlier today I have a complete backup of's arvchives. I'm in the process of updating it now as well and will soon hit the 17000 file marker. I'm considering adding this as a regular regimine (might be able to turn it into a cron task) that either once a week or month will scan the archive again and add any missing files.

I can do it for this site as well, but then again, I can just dl it via the ftp access but backups are regularly performand so no worrys anyhow. Well food and 24+ hours of awakeness are calling me. Night all.
Posted: 22 Dec 2006
15:45 GMT
Total Posts: 1189
I asked Michael Vincent if I could do this and he told me it was not allowed.

Someone call for an exterminator?
Posted: 22 Dec 2006
18:08 GMT
Total Posts: 1005
*keeps his mouth shut on that topic...*

Hmmm... how exactly would that be considered unethical? There's already websites that do that ( comes to mind...), so where's the wrong in that, especially considering the fact that the files are free.

Of course, ticalc could protest the archival of their site, but seeing as each page contains their copyright at the bottom, and the fact that it's solely for archival purposes, and any other use of said pages would (in an ethical world) warrant credit where it is due, I see no problem with this at all.

Then again, ticalc might, and prolly does, see this matter differently.
Posted: 22 Dec 2006
21:19 GMT
Total Posts: 714
I dont see why it would be a problem. As long as you are only backing up and archiving. Then there shouldn't be a problem.

It is much easier to suggest solutions when you know nothing about the problem.
Posted: 22 Dec 2006
21:42 GMT
Total Posts: 1051
Threefingeredguy, it's Michael Vincent. Thats why DarkSideProgramming is keeping his trap shut. I think I viewed teh conversation in IRC when it happened, hence me questioning its ethics.

Since I'm sure that they've seen a single IP make up ~15% of their traffic per week in 1 day, Magnus would have blocked me. Yes there are other sites that do this, but at the same time one technically would have to dl that site if we would put it back online again.

But yeah, its just archival and backup. Nothing to do with the site as a whole, though that is a simple url change if I so decided.
Posted: 23 Dec 2006
20:11 GMT
Total Posts: 2486
Yeah, I'd say as long as only the user submitted content is saved it shouldn't be a problem, for personal backup use only. Some hosts may not like something like that due to excessive bandwidth use since many hosts will not count internal backups towards bandwidth quota. As with, its crawler follows .htaccess rules so if stuff is disabled in the .htaccess is won't archive or crawl it. Though, that reminds me to get a fresh backup of CG and dump it onto dvd, its been a few month since I backed it up last...

09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
Posted: 23 Dec 2006
21:31 GMT
Total Posts: 1005
Prolly a good idea, Z... we wouldn't want a crash of some sort or another to cause the site to go down :)
Posted: 24 Dec 2006
10:59 GMT
Total Posts: 1892
I tend to download CG backups rather haphazardly--I actually tried to snag one yesterday, but it stalled at ~450MB for whatever reason.
I'll do it again right now. :)
Posted: 25 Dec 2006
16:42 GMT
Total Posts: 1005
And the verdict is...
Posted: 26 Dec 2006
09:26 GMT
Total Posts: 939
Their reasoning for not allowing it is, if I recall correctly, bandwidth waste. I would highly recommend you stop, lest you find yourself DROPped via iptables.
Posted: 26 Dec 2006
15:57 GMT
Total Posts: 2486
Yeah, Andy pretty much confirmed my educated guess. That's a pretty big bandwidth strain considering how many files get archived so unless site owners are cool with that, being persistent may indeed get you banned. If you want to keep a mirror you definitely need to contact site owner to see their standpoint or to justify your idea. Like with CG, we wouldn't be able to afford someone downloading all the files or daily crawling the site due to bandwidth and unnecessary server strain. However, I'm cool with periodic backups via ftp since that doesn't count towards our bandwidth. As long as people who run the site are aware and agree on what's going on then you'll be able to mirror with a clean conscience. That's my 2 cents guys.

09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
Posted: 27 Dec 2006
20:13 GMT
Total Posts: 1051
Thats also why I think I said, which if I didn't, I will now, that once the first one is done, then if I do anything else after that, its all delta updates. Welcome to comma splice haven btw. As for .htaccess, the program I use doesn't touch that to my knowledge, though after looking through the archive its nothign more than what you can browse online. It does though respect the robots.txt file on sites which is only about a few logs on I definately wouldn't be doing this if it were full copies everytime.

Portal | My Account | Register | Lost Password or Username | TOS | Disclaimer | Help | Site Search | File Archives Copyright © 2002-2019