[00:32] http://tracker.archiveteam.org/ looks amazing [00:41] Soo... what's this I see about the Posterous project? [00:42] Getting "No Item Received" from Punchfork, so figured I'd switch over, but it says to ask here first. [00:42] Posterous is getting geared up. [00:43] http://tracker.archiveteam.org/posterous/ [00:44] @omf: Yup, I see that. Just don't want to get banned. [00:58] I see that. [00:58] (I'm running it, to get banned.) [01:19] i need no tracker for my uploads [01:19] of course g4tv.com is maybe the last time i do a very big project [01:20] it fills like backing up geocities (to me) all by yourself [01:20] *feels [01:30] hi [01:31] What do I need to know before working on the Posterous project in Warrior? [01:33] you are going to get banned [01:33] everyone has [01:33] we are trying to figure it out [01:34] I don't personally mind getting banned, but should I wait until it's solved to start? [01:36] godane i know i mentioned before and never really pursuied but if you have scripts and lists that just need to be run for the HD videos on g4tv i can do that. ive got bw and storage [01:39] i have the list uploaded [01:39] https://archive.org/details/g4tv.com-video-url-list-1 [01:40] just need sed -i 's|_flv.flv|_flvhd.flv' list [01:40] just know that i have seen some hd videos that i needed to edit the name more then that [01:41] one hd video (i think one of the g4films) i need to remove fix2 to download it [01:41] cause just change _flv to _flvhd was not enough [01:43] i also have all descs that i custom sed a xml warc to making the uploads have the right descs [01:46] just know you will have to start at about 26000 for hd stuff [01:47] you will not get it until 26419 (i think) and not ever video will have hd [01:48] recent video doesn't even have hd [01:49] this one: http://www.g4tv.com/videos/61896/metal-gear-rising-revengeance-launch-trailer/ [01:50] some part of me thinks i should have check for flvhd file first then if it doesn't exist go for flv [02:46] nomoon: everyone always gets banned [02:47] you'll have to get a new ip address every hour [03:49] so my mouse couldn't left click [03:50] luckly i have a old mouse that i can use [04:25] ok [04:25] woah [04:25] hi guys [04:25] long time no see [04:25] What've I missed? [04:27] also, holy shit is it good to be back [04:27] hey underscor [04:27] i'm just trying to save g4tv.com [04:29] underscor: everything is shutting down [04:29] Yeah, posterous, gamespot (iirc?) [04:29] anything else? [04:30] not gamespot, ign/gamespy/1up [04:30] which is a shitload of sites [04:32] opensolaris [04:32] punchfork [04:34] damn [04:34] Couldn't they spread themselves out? Jeez... [05:10] *yawn* [05:10] So, about posterous... dynamic ip is the way to go I guess then :) [05:14] For the moment, yes. [05:46] is it performed automagically, or are they hunting manually? [05:47] might be fun to do this over thor, if one can verify that a new endpoint would be chosen each time you reconnect. [06:01] ewook: both [06:02] ewook: we surmise that there is a cron job that runs 10 minutes before the hour which automatically bans based on some log info or other [06:03] but it looks like there is occasionally human intervention as well, because some of the bans happen at other times, and the duration is variable [06:21] db48x: geebus. Why in the world would one do that - bandwidth-capping sure, but simply toss the connection away? Does it look like one ban/one ip, or have you seen subnets being blocked in the manual part as well? (perhaps to early to tell?). [06:23] one solution could be to ssh-tunnel and use cron to circulate between "ip's" and that might go under the radar, if I jump often enough. [06:26] does the warrior still go full user get, and then move on, or does perhaps alternating between user-data work better? [06:27] Maybe misidentify user agent to not be wget? [06:28] If I were watching my site, I'd check for ip's going through a users data in the same manner, and with "dedication". Sure, that would be one what to do it, and much easier. [06:30] oh, is punchfork done now? my warriors are just sitting pretty :). [06:31] can't really say what was going on in their head when they wrote it [06:31] sadly no :(. [08:44] Hello, [08:44] I have a question to ask about retrieving photos that were posted by a [08:44] gallery for anything else, and when I just went to look at my photos, [08:44] photos, but never the entire gallery. I have never used the MobileMe [08:44] professional photographer to a Mobile Me gallery. I downloaded a few [08:44] I saw the shocking news that the site is shut down. I am sick to [08:44] think that these priceless photos of my toddler may be gone. Is there [08:44] a way that your company could retrieve these photos? [08:44] Thanks for your help, [08:44] Kimberly [08:45] SketchCow: <3, managed to help btw? [08:46] Asking her for the username [08:52] Do you have a list of these .... rescues anywhere? [08:53] Ask people if they mind being added to a website listing those who've contacted you/the group and been helped. [09:01] Yeah testimonials work just like good reviews in driving more people to the cause [09:02] In one of the videos, at defcon I think. The site for that child who had died. [09:02] That convinced me. [09:02] The one on geocities [09:03] I saw that clip in the Open Source Bridge talk [09:03] I watched that talk and was happy to learn people were coming together to be more proactive about saving the internet [09:04] tbh I'd never really though about it before. [09:04] I had heard about geocities but at the time I thought it was a one time gig [09:04] I still haven't checked to see if my site is in the geocities grab. [09:04] To me the geocities clip shows the impressive value of stored culture. [09:04] All my old ones are [09:04] like 7 of them [09:05] I did not even have backups of those [09:05] I had erased them a long time ago [09:05] All this data saved has immense value [09:05] everything else is just trying to explain that point to people [09:05] I don't know how to check tbh. [09:05] as it's not a warc? [09:10] http://gallery.me.com/stephaniefay1/101561 [09:10] Anyone want to take a shot at finding this? [09:12] alard: Let me know if we can find this outside the index [09:12] Or if there's a list somewhere of what all we grabbed [11:31] hi everyone! [11:31] burn all jews in oven! [11:31] death to jews [11:31] sieg heil [11:31] burn all jews in GAS oven [11:32] ty. [11:32] funny how his kick reason is the same as his nick, yet appropriate [11:32] ;) [11:33] I would of done offended ;) [11:33] hi i came back [11:33] death to jews! [11:33] sieg heil [11:33] allahu akhbar [11:34] Like so [11:34] wtf where did my reasoning go [11:34] :D [12:12] Bleh, missed an @ - which made things a little silly. [12:21] ersi: wut r u doin [12:22] oh wait you fixed it [12:28] D: [12:28] ersi: :D [12:28] i just did it for speed. [15:28] moved some channels which I think are finished to the idle section on http://www.archiveteam.org/index.php?title=IRC - if I missed some or moved incorrectly pls fix kthxbye [15:35] DFJustin: Good job. [15:40] other than username and concurrent items, what configuration do i have to do for the warrior? [15:40] I click "Available projects" but nothing happens [15:43] sep332: it should load a page that lets you choose which project to help with [15:43] that's what i expected but nothing happens [15:43] i tried chrome and firefox [15:44] (my firefox is full of extensions that could cause problems but chrome is clean) [15:51] is this the right place to carp about the warrior? :) [15:51] it's as good as any [15:52] I'm trying to load the api page that the project list comes from [15:53] hmm, got an http error [15:53] that would cause it trouble [15:56] I think there might be issues with the back-end tracking servers - which I think is why you're having problems sep332 with the "Project listing" page [15:56] ah, ok [15:56] is http://tracker.archiveteam.org/ supposed to load in a normal browser? [15:57] it's throwing 502's right now [15:57] yes [15:57] that's the problem [15:57] (although the warrior loads a slightly different url for this sort of thing) [15:57] gotcha, thanks a lot db48x [15:57] I'd just sit tight - it'll be fixed :-) [15:57] you're welcome [15:58] this looked pretty simple and i was feeling dumb for not even being able to turn it on :) [15:59] yea, it's our fault for not having a warning or something [15:59] it just assumes that the request didn't fail [16:00] will i have to restart it later, or can i just keep clicking the button to see if it works? :) [16:01] you'll have to restart it [16:01] ok [16:07] sep332: Feel free to hang around or come around from time to time, by the way :-) [16:08] thanks! i should hang out here more often I think [16:09] hmm [16:09] I just realized that I don't have any way to diff two warcs [16:11] What are you looking to accomplish? Compare each record with each other? De-duplicate? See how some Target-URI has changed on different fetches? [16:11] yes [16:12] comparing the results of two runs with different user agent strings [19:10] alard: wtf is tracker:/var/www/rsync for? [19:10] it's killing the box [19:16] doing what I can to free other space up in the meantime [19:40] what am I supposed to ask about before running the Posterous project in the warrior? [19:47] sep332: Posterous will ban you sooner or later [19:47] ok... [19:47] So if you need to visit any posterous blogs, you should not run it in the warrior [19:47] is that per-ip? [19:47] oh ok thanks [19:49] Yes, per IP [20:59] chronomex: Ah, grrr. /var/www/rsync is the debugging space that I use as a temporary upload area if I want to inspect a few warcs. I should remember to switch it back. [21:02] Do you have a space where you can put the data? Fos? [21:03] Yes, I've now switched the upload target back to fos. (There's an option in the tracker admin panel.) [21:03] I'm rsyncing-and-deleting the files to fos. [21:03] thanx [21:06] FOS is ready for it. [21:08] SketchCow: That's very good, since most of it was already going that way. It's just that FOS is missing an easy just-let-me-look-in-that-warc-file-to-see-what's-wrong option. [21:08] posterous bans for 7-10 days [21:09] SketchCow: If stephaniefay1 is not in the index, I wouldn't know how to find her, I. [21:09] ... I'm afraid. [21:10] That's fine. [21:10] We tried. [21:11] She's also not in the tracker log. [21:11] So are we doing EC2 to get around this ban? [21:11] The ban pisses me off. [21:12] I am going to be testing some vpn services in a little while [21:13] rotating ips [21:13] I thought most VPN services have a static IP that is shared by many costumers [21:13] depends on who you go with [21:14] Do we have ANY idea how big posterous will be? [21:14] Some have multiple class C subnets you can switch between and since ips are pooled you rotate on connection [21:14] We know # of accounts [21:15] over 9 million [21:15] Heroku could also be an option, if anyone would want to go that way: https://github.com/ArchiveTeam/heroku-buildpack-archiveteam-warrior [21:26] Is that Heroku pack functional on 1 dyno? [21:26] I see that Heroku's AUP says "please don't use more than 2TB/month" so I guess bandwidth isn't really an issue. [21:28] nomoon: Yes, I think so. [22:51] ADD Moment: looks E3 2005 G4 Coverage was co-production with ign.com [22:52] so in some weird way i'm helping save stuff from ign [22:52] but not really [23:55] AT . [23:55] Our company Millenniata makes a 1,000-year data storage disc called the M-Disc which comes in Blu Ray (25 GB) and DVD (4.7 GB). We would love to share with your team how these discs can help you in your quest to archive history for a long period of time. Check out our website at www.mdisc.com. We'd be happy to answer any questions you have and explain how this might benefit you in your quest. [23:56] Thank you, [23:56] Josh Krall [23:56] Director of Marketing [23:58] I had heard of them. I might order a 10pack to try out [23:58] unfortunately I don't have a time machine to transport these back to 1500 or so and test them hout [23:59] wait nevermind I thought they had bluray disks [23:59] There are heat tests [23:59] will they send out lots of them as test discs? [23:59] soak them in water