[00:16] Okay so far I got it from 8033 domains to 3735 [00:16] I expect a few hundred more to fall off before I am done [00:45] alard, I am going to look into setting up a warror server myself at a later date. I want to collect stats data that can we can publish as CC0 so there is more research out there [00:46] omf_: Excellent. [00:48] I have been a web developer for 17 years now and we only recently had large scale open data [00:48] google, yahoo, craigslist [00:48] they all started giving bits out [00:49] that lead to others and more community projects. I just see the next logical step being stats from millions of pages at a time [00:49] so more of the higher scalability end so beginners have something to learn from [10:51] omf_: Let's continue here. [10:51] The links between all these sites and inside these sites is a mess [10:52] a full premap would make this process easy but with no way knowing when shit is turned off, we do not have that kind of time [10:52] The universal-tracker system works best if you can split your task into small, but not too small subtasks. [10:53] But you have a small number of very large sites, is that correct? [10:53] looks that way [10:54] planetquake, the forums and a few others [10:54] I am still trying to figure out why the few attempted made completed without getting nearly everything [10:55] take this url for instance [10:55] http://planetquake.gamespy.com/View.php?view=POTD.Detail&id=4222 [10:55] now all I have to do is minus one the number on the end and I got the previous page [10:56] there is also a list page with 167 pages of results [10:56] and yet wget got none of it [10:56] part of that I know is the cross domain image fetching bs [10:56] How do we bake that in [10:57] all images have this kind of url http://pnmedia.gamespy.com/planetquake.gamespy.com/fms/images/potd/4199/1323262539_fullres.jpg [11:00] That depends on your Wget parameters, probably. [11:00] Yeah I figure out what they need to be [11:00] I will have a few people test it [11:01] Is there a list of the sites you want to save on the wiki? [11:02] I don't have permission to create a wiki page to post it. [11:02] You don't have an account? [11:03] I have an account it just cannot create pages [11:03] just update and edit [11:03] never really had the need [11:03] That's weird. Should I create a page that you can then edit? [11:03] Since I have the impression that you're trying to save a lot of very different sites. [11:04] basically everything under 1up, gamespy, ugo and ign [11:04] it is all getting turned off [11:04] What do I call the page? [11:05] http://archiveteam.org/index.php?title=IGN [11:06] we called the chat room ispygames [11:06] Ah.