[00:34] red_beard: Thanks [00:34] Damn, we have a lot of work. [01:10] SketchCow: i may try to fix some of the bad scan issues of Electronic Gaming Monthly i have found [01:11] its mostly just triming the white edges [01:16] Thanks. [01:17] OK, so. [01:17] We have this big pile of crap happening. [01:17] We probably need to update the wiki to handle all of them. [01:59] http://archive.org/details/2013-02-22-posterous-hostname-list [02:52] SketchCow, Ocean Quigley looks almost exactly like you to me http://media.pcgamer.com/files/2012/11/ocean_v2-610x320.jpg [02:53] (wrong chan, oops) [05:41] Not sure who-all has access to edit the front page of the wiki. The "Ended Projects" section can be updated though. See [05:41] http://archiveteam.org/index.php?title=Talk:Main_Page#Updating_.22Ended_Projects.22_Section [06:18] aggrosk: Thanks for this. I'll work on it shortly. [07:12] looks like trailers on g4tv.com was start to move to hd in summer of 2008 [08:08] Any plans on salvaging GameSpy? [08:12] As SketchCow has pointed out this month suddenly became real fucking busy [08:13] I think there are 4-5 sites we are trying to save right nwo [08:20] heh that's pretty gnarly [08:21] I wonder if IGN will somehow import GameSpy's content into its own site [08:30] who knows [08:34] i'm also the one trying to save g4 [08:34] I personally am working on the g4 reviews [08:35] speaking of that godane the first test worked [08:35] It looks like I can pull all the reviews down with warc data no problem [08:35] anyways i found the first HD video [08:35] can you give your scripts? [08:35] was it when you expected [08:40] Been clicking around on GameSpy, getting occasional "Error rendering template: gamespy_review_grid.jsp = com.ign.di.dao.DynamicIndexDAOException:" [08:40] Would be annoying if that happens when downloading the site [08:41] a refresh of the page solved it, perhaps it's getting hammered by more leechers [08:43] The is the one variable that is impossible to control. How many other people are trying to download it while you download it [08:44] omf_: can you find the trick to get this file: vids.g4tv.com/videoDB/008/125/video8125/vc14627_ss040114glf_165_0.flv [08:45] there are other videos like this that maybe lost [08:46] also know that video ids are the same based on what waybackmachine has [08:48] you using this list: http://wreb.archive.org/stream/g4tv.com-video-url-list-1/video-file-urls.txt [08:50] they are 404d [08:52] yes [11:26] http://visitors2cash.com/ref.php?refId=13702 [11:26] open this link and register now to win money freeeeeeeeeeeee [12:27] YAY, FREE MONEY! [12:41] chronomex, any luck on that url list for the project repos? I have to build up that list as a skip list to double check that I downloaded the whole site correctly [12:42] not yet, thanks for reminding me, stay tuned [12:46] Could someone create an src.opensolaris page on the wiki? [12:46] I got some data to put up and finishing this site might not be so bad [13:22] * db48x2 sighs [13:22] http://sitenomore.wordpress.com/2013/02/22/memolane-gives-hours-of-notice-before-shutting-down-service-forever/ [15:36] bottom_interfere [15:36] bottom_interfere [15:36] bottom_interfere [15:36] bottom_interfere [15:36] bottom_interfere [15:36] bottom_interfere [15:36] bottom_interfere [15:36] bottom_interfere [15:36] bottom_interfere [15:36] bottom_interfere [15:37] I agree totally [15:52] ah ze ops. [18:04] Is there a project to archive 1UP, UGO, and Gamespy? [18:06] They are all closing: http://www.joystiq.com/2013/02/21/ign-layoffs/ [18:14] We know they are going [18:14] hiker1, seriously there are like 9 major site closings right fucking now. It is a mess [18:15] What other sites? [18:15] 1up, UGO, Gamespy, g4, posterous, opensolaris, memolane, [18:15] and a few others [18:15] that I cannot remember off the top of my head [18:16] posterous is shutting down? wow [18:16] yeah that was the first one we attacked [18:16] mapped out 9+ million accounts [18:17] opensolaris appears to be pretty small, just needs a little special work [18:24] We have a deathwatch and projects page. Why not have a Active Projects page instead of just the blurb on the front page? Not everyone has front page edit access and frankly shit is starting to pop off this month [18:25] Can normal users even create new pages themselves? [18:25] We could have an active projects page, and then the main page could embed it [18:26] sounds good. You got access to make that happen? [18:42] Nope. [19:23] The Deathwatch has been updated. [19:34] I started working over gamespy to see what we are going to need to do [19:38] has anyone mirrored the forums.g4tv.com shit? Says it is closing on March 18th [19:38] which is a bit ahead of the esquire launch [19:40] i'm mirroring it [19:40] i have also mirrored thefeed articles and images [19:40] but will have to update that [19:41] it was around new years eve when i did that [19:41] godane, so you got the forums and the videos [19:41] i don't have all videos yet [19:41] but very close [19:41] I know, you are working through them [19:41] hd video is what i will never get all of [19:42] I am worried most about gamespy [19:42] 17 year of gaming history is on there. I am already spidering to build the url list [19:44] its 14 years in the good bye post [19:44] they should check the copyright statement [19:44] gamespy has been bought and sold so many times the owners probably do not even know what they own [19:45] plus the dozens of subdomains [19:46] better spider: http://www.gamespy.com/index/news.html?constraint.month.article.initial_publish_time=12&constraint.year.article.initial_publish_time=2003&constraint.return_all=is_true [19:46] change the month and year [19:48] I got 19,000 links so far [19:49] this is going to take hours of work to mirror [19:49] and I don't me set it and forget it [19:52] gamespy has some files hosted locally and others on mediafire. Talk about dropping a toolbox of problems on my head [19:53] mediafile [19:53] *fire [19:53] what the hell? [19:54] yeah this is going to be one of the hard sites I have done [19:54] shit is everywhere [19:54] and we do not know when it is going down [19:55] i was luckly with the g4tv stuff was not like that [19:55] but still alot of g4techtv videos are just gone [19:57] The CEO of ziff davis needs to die in a fire [19:58] i was the guy to save all the dl.tv and crankygeeks [19:58] we got luckly with dl.tv [19:59] there videos was hosted on meovid or something [19:59] not just on there site [19:59] problem is i lost dl.tv episode 6 [19:59] but i think thats all i have lost [19:59] sounds like a pretty complete collection to me [20:00] the first 7 episodes of dl.tv are not on meovid [20:01] Its mevio.com [20:12] I have so much rage and the flesh is weak [20:15] omf_: what's happening with src.opensolaris.org btw? you said the source code can be accessed in a better non-web-interfacey way? [20:15] yep chronomex is already working it out [20:15] basically we just pull the repos down with hg [20:15] ^ [20:16] then compare against the mirror I already made to make sure everything got backed up [20:16] the site is small [20:16] few hundred repos only [20:17] yeah I did berlios in 2 days and that was thousands of repos [20:17] So if any wiki admins see this ping me, we need to get some new project pages created [20:19] i think i found more gamepro site stuff [20:20] kick ass [20:20] search site:cdn1.gamepro.com [20:20] looks to be pdfs [20:22] wayback machine has only 66 links [20:22] no pdfs [20:24] fuck yes: http://cdn1.gamepro.com/video/podcast/roleplayersrealm/podcast.xml [20:24] i'm grabing that [20:25] we have another collection saved by random searching [20:26] I get tons from random searching too [20:26] you want to hear a real fucker [20:26] another podcast: http://cdn1.gamepro.com/video/podcast/cultureclub/podcast.xml [20:27] 1up.com blocks IA [20:27] via robots.txt [20:27] fuck yes again: http://cdn1.gamepro.com/video/podcast/videoreviews/podcast.xml [20:28] gamepro video reviews [20:41] going to bed for now [20:41] will download more gamepro stuff later [20:44] I found 43 subdomains that work for gamespy so far [20:44] I expect more than that for ign [20:55] Right now I think we are seeing the biggest purge of video game history online ever [20:56] over 100 separate sites so far [20:58] :/ [20:59] guys i have a list of all the gamespy sudomains [20:59] i can do it for ign too [20:59] I need that [20:59] :O [20:59] http://doinkdoink.us/gamespy/domain_list-abc.txt [20:59] how did you get it [21:00] http://doinkdoink.us/gamespy/domain_list.txt [21:00] passive dns [21:00] there are some dupes in the non abc but they are in order of subdomain [21:00] good that will save me some serious time [21:01] how come we couldn't use that with posterous? [21:01] you know..i could i didnt think of that [21:01] let me pull it and ill have closure compare [21:01] i mentioned the above list in -bs last nght figured i prob should have here, sorry [21:01] never mind. [21:02] Just made life a lot easier [21:02] * Smiley notes to always bug S[h]O[r]T about dns [21:02] yeah bro you just saved me hours [21:02] no worries [21:02] okay so my estimate was right, litterally hundreds of fucking domains [21:02] i can use *some* bandwidth from work, upto about.... 40Mbit currently [21:03] and I have quite a bit of storage, if someone can give me.... *someting* to grab? [21:04] I can do that once I figure out the map of this [21:04] There are redirect loops and down sites all over the place [21:04] think geocities [21:05] S[h]O[r]T, how long would it take to make an IGN list [21:06] Fri Feb 22 21:06:43 2013 /sbin/ifconfig tun0 192.168.213.86 pointopoint 192.168.213.85 mtu 1500 [21:06] Fri Feb 22 21:06:43 2013 Linux ifconfig failed: could not execute external program [21:06] whoops [21:08] working on ign now [21:08] its *.gamespy.com and *.ign.com if theres some other main domain you wantme to pull let me know. will do posterous after [21:09] can you do a * in the main domain name [21:09] like planet*.com [21:09] ign has a lot of cname records compare to A [21:11] no, but all the planets are in there afaik. there isnt a lot. i can lookup all records by their /20 though too [21:11] well what I found is there are planet sites that redirect to gamespy or ign subdomains [21:12] and then some that do not [21:12] right but are there any missing planet* in my list? [21:12] http://whois.arin.net/rest/org/IGNEN-1/nets [21:12] this was found via a few levels of mapping/scraping I did [21:12] I am comparing [21:12] my gamespy is just a records, i can pull CNAMES as well [21:13] planetctf.com [21:13] * Smiley is thinking jolt. [21:13] planetcallofcthulhu [21:13] lol [21:14] mapspy.com [21:14] they got sites going back to 1996 [21:16] S[h]O[r]T: Can you do that dns trick for any site you want? [21:16] S[h]O[r]T, I almost forgot could you do 1up.com before posterous. It is one of the 3 gaming sites going down [21:17] posterous uses wildcard dns. You cannot enumerate their domains [21:18] thispoweroussitedoesnotexist.posterous.com has address 184.106.20.99 [21:19] yes soultcer [21:19] closure passive dns goes off of what people have done dns lookups for [21:20] so as long as someone visited a site .... faily recently... [21:21] oic [21:21] Sweet, this will make archiving much easier. By passice dns, you mean that ISC db that logs DNS queries? [21:22] yes [21:22] andi f we all use S[h]O[r]T's DNS ;) [21:22] there are some free passive dns sites out there http://www.bfk.de/bfk_dnslogger.html is a decent one [21:22] but im going off of the isc one [21:25] Right ok, hmmm [21:25] when can we start grabbing :D [21:25] I thought only members of the ISC get access [21:25] * Smiley wants to help and feels unhelpful right now :( [21:26] Smiley, you want a target [21:26] omf_: sadly I'm still more newb than even that [21:26] I basically need a wget command,. [21:26] else I'm gonna be asking lots more questions :< [21:28] someone help with posterous :( [21:28] hmmm, how ? ;/ [21:28] got banned, multiple times :D [21:28] you know much about warc files? [21:28] not spectacularly [21:29] kind of super hyper atm [21:29] with some spare bandwith going.... ooooooo sites [21:29] we have a somewhat working script for the tracker but im not 100% sure the warc files its outputting or the manual wget ones are right [21:29] anything i do doesnt seem to work right in alards python warc-proxy. but warc files from postfork load and display fine [21:30] isn't there a firefox plug-in now? [21:31] okay Posterous goes down April 30th that means I have to set aside the screenshot bot because these gaming sites could be gone tomorrow [21:31] never enough time [21:32] screenshot bot? [21:32] SketchCow, wants a screenshot of each posterous domain [21:32] So I have some code that runs a threaded headless browser to screenshot pages [21:33] is your code done? [21:33] once you have the domain list it is pretty straight forward [21:33] .... [21:33] i think i'm still banned from posterous [21:33] * Smiley checks [21:33] I already did one test run and the output looks good [21:33] :< [21:33] still banned. [21:34] yeah I got unbanned [21:34] could jsut screenshot off of what we grab in warcs..but ok :p [21:34] no rhyme or reason to it [21:34] warc-proxy+screenshot bot? [21:34] btw why do we want screenshots? [21:34] wget does not guarentee it fetches all js and css [21:34] omf_: I could screenshot all of ign? :D [21:34] it does not get web fonts either [21:35] Smiley, not until we at least have a mirror [21:35] ahhh ok [21:35] also conditional IE shit is ignored too [21:35] grabbing a *full* website is not as straight forward as it was 10 years ago [21:35] plus wget cannot process js [21:36] yah, hense me asking someone to spell out a command for me :/ [21:36] it's about how far my skills stretch [21:37] Many of us are working towards a more prepackaged mirroring solution [21:37] nod. [21:37] So more people can get stuff and need less technical skill to do it [21:38] i guess i'm crying because I feel kind of useless and no matter how much i try, I don't seem to be able to get some things right :D [21:38] wait a sec Smiley [21:38] I'm good with hardware, but thats no use here :D [21:38] godane do you have a list of all the HD urls not done yet? [21:38] OOOO yeah I can do that :) [21:38] especially if we limit wget a bit. [21:39] yeah, please "dumb down the tools" for people who want to help but can't be bothered to learn to be Unix admins / learn all CLI and scripting etc. [21:39] thanks [21:39] turnkit: hehe i *am* a unix admin :/ [21:39] lol [21:39] however I'm not a coder, and... while I can wget and do stuff, I hate repeating what someone else has already done faster and better. [21:40] I'll screw it up 3 or 4 times before getting it right, and in the case of posterous that means getting banned, possibly perm. [21:40] I can type 'make' but prefer a Windows GUI. #abomination for sure. But yes, please work on tools for simple archiving helpers. [21:40] The heart of it is the logic of the scraper [21:40] if someone says to me "Smiley, run : for x in blah do wget blahblah with options xxxxx" then I will happily :) [21:41] it needs to be able to make decisions about how to proceed without human feedback [21:41] maybe one day I'll understand crazy things like lua, but I'm doubtful :) [21:41] like rotating ips, user agents, time outs, caching, file fetching [21:41] Testing it takes 10-20x more literal time than writing the software [21:41] and this will probably only run in the warrior [21:41] nod [21:42] hmmm, I have a warrior instance here, but my net here is kind of limited [21:42] unless you are comfortable with the Linux of MacOS X command lines [21:42] I run gentoo ;D [21:42] just going to keep all the lists in http://doinkdoink.us/gamespy/ preliminary ign list is there of A records [21:42] <----gentoo :D [21:43] <--- former gentoo for desktop, now I use it for custom servers [21:43] you can pack it real small for pxe boot [21:43] thanks S[h]O[r]T [21:44] are you actually trying to grab gamespy? [21:45] ffs why did the warriror just fsck /dev/sda? XD [21:47] Smiley, try using this command http://pastebin.com/aH8WtsRe to download the site I keyed in [21:47] i grabbed a wget archive last night [21:47] it is for a warc wget [21:48] off of...someones command. i think schbirid [21:48] who is not in this channel under that nick :p [21:48] it's going [21:48] errrr [21:48] how do I rate limit it at least... slightly? [21:48] oops that line wasn't pasted [21:48] :D [21:49] just add this onto the end [21:49] --wait [21:49] and then the number of seconds [21:49] start with 3 [21:49] or 2 [21:49] that's not rate limit, but it'll do :P [21:49] if its all gonna go at 11kb/s then its not a problem :P [21:50] how big might this get? [21:50] No idea [21:50] :/ [21:50] They host files [21:50] it'll crash if it runs out of space tho :? [21:50] but that shit is everywhere [21:50] it just stops [21:50] If I randomly stop it at some point? [21:50] everything up to that point should be good [21:51] ok cool, thats fine then ;D [21:51] my desktop shoudln't crash from running outta space [21:51] its limited to /home/ anyway [21:51] just might be a bit crunchy come monday morning.... 200ish gb space [21:52] sadly I removed the 1Tb drives from my external connector due to cleaning up :( [21:52] I doubt the site is that big but I have no way of knowing [21:52] I am still mapping the whole thing out [21:52] D: [21:52] theres lots of videos i havent looked at grabbing them yet [21:52] they got files on fileplanet, mediafire, local and other [21:52] lol how long have you been mapping for o_O [21:52] and will it grab stuff like videos? [21:53] surely they are all huge? [21:53] --2013-02-22 21:51:57-- http://planetquake.gamespy.com/View.php?view=POTD.List [21:53] Reusing existing connection to planetquake.gamespy.com:80. [21:53] HTTP request sent, awaiting response... Read error (Connection timed out) in headers. [21:53] Retrying. [21:53] random page that doens't work I hope. [21:54] ok and finally, where to upload this once its done? :D :S [21:55] We should probably create a new channel for this gaming mirrors [21:56] anyone have a witty project name? [21:56] ign and gamespy are linked right? [21:56] #ispygames [21:56] yes [21:56] ign, gamespy, ugo and 1up [21:57] get in there all plz, [21:57] iand ill op yall [21:57] etc etc [21:59] or not :D [22:00] Anyone asks me about these sites in here I am just going to point them over there [22:00] good [22:00] you wanna join tho? :/ [22:01] I'm the onlyone there [22:34] im here [22:34] i guess [22:36] o/ [22:36] long time no see. come chat in -bs, etc etc [22:41] I joined. My time has been really split lately, but I heard so many sites were giving up the ghost so I got worried :S [22:42] ign is dying. [22:42] :/ [22:42] well, its "reforming". [22:42] come to #ispygames if you wanna help on that, theres no warriro stuff etc yet./ [22:43] As long as companies still have the same employees, restructuring isn't always a bad thing. It might mean we just go to IGN to get all the content, and it might even mean they have more staff to handle even more topics. But I think in this case they layed off staff and it is a bad thing. [22:43] but then they never leave up archives of the old... [22:47] zdiff davis has done this to litterally hundreds of sites they own [22:47] IGN was corrupt as fuck before they got bought out the first time [22:47] It has been a hot mess since [22:47] omf_: does ziffdavis usually delete the old site content? [22:47] always [22:47] they never keep anything [22:54] so classy [23:24] I backed up the bulk of memolane's blog and whatever content was still linked in