[00:28] but we are all stuck with their bullshit from now into eternity [00:39] chronomex: we always are [00:39] chronomex: geocities had a novel but ultimately silly way of organizing users [00:40] gotta deal with that every time you deal with geocities in any way [00:40] and they tried to foist it upon the world as a web standard [00:40] unlike geocities [00:40] true :) [01:21] HI [01:21] Better run, faster than my bullet [01:21] DFJustin: Congrats, you're now an admin. [01:22] yay [01:22] already been tweaking some stuff [01:23] also you can throw these in the archiveteam collection or w/e http://www.archive.org/details/konachan-siterip-2010 http://www.archive.org/details/konachan-siterip-2011 [01:28] who do I bother about bugs in the isoview script [01:35] Gear them up and mail them to me. [01:35] Ultimately I'd like someone to rewrite it to be pretty and HTML compliant. [01:42] Is this some php (or other script) code that reads data directly from an ISO on request? [01:43] yes [01:43] Maybe there's something to be won from putting that data in a database [01:43] Not sure [01:43] And my not sure is not sarcasm [01:44] or dump a 'find -ls' from every iso into its metadata [01:44] same for zip, tar, etc [01:44] Or that [01:45] put into metadata, database is step #17 [01:46] Another project someone might want to dump at some point. 8bitcolelctive, if/when the site comes back [01:47] About 100 GB of chiptune IIRC [01:47] yow, thats a lot of chiptune [01:47] Yup [01:48] I may be wrong on the exact number, and how it's developed over the last years [01:48] sure [01:48] But in the order 50-150 GB [01:49] Complication: The guy who owns the place is known not to pay his bills in time. And his father is a trigger-happy lawyer [01:49] If his bandwidth bill doubled over a month, he might wtf a bit [01:50] heh [01:50] aha excellent [01:50] i'm sure he'd be willing to make a donation [01:51] If you liked it you should have put metadata on it. [01:51] Complication whaaaat [01:51] How about this [01:51] How about I talk to Nullsleep about talking to the guy about mirroring it on archive.org [01:52] The owner is Jose Torres, hated by most of the old chip world [01:52] Most people, including nullsleep, wouldn't want to have anything to do with him if they could help it [01:54] haa [01:54] Well, that's charming. [01:57] yay computers [01:57] bringing out the dickheads in all of us [02:01] I'm in the mood for some internet http://welcometointernet.org/ [02:01] iiiiinternetttt [02:01] we love us some iiiiinternettttt [02:04] Also, I need to learn wget/set up a way to mirror page. Far to often shit just disappears on me [02:04] Not cool, internet [02:04] Today, devrs.com [02:04] blogs.starwars.com shutting down [02:04] http://www.theforce.net/latestnews/story/BlogsStarWarscom_Is_Shutting_Down_144246.asp [02:05] following cancellation of hyperspace fan club [02:05] Speaking of Twitter, is Posterous at-risk now? [02:05] YES [02:05] it was an acqui-hire [02:05] so I don't think they're interested in the posterous tech at all [02:06] Luckily, someone had a mirror of it from a couple of years back (including zip files) and with some addition of GCache and IA material I should be able to create a pretty up to date copy [02:06] I think they're providing migration tools for tumblr and google blogs [02:06] er to [07:24] Well, that was a great show [07:24] I think a dude was hitting on me [07:24] But no boldness. [07:24] Danced, but no bear hug [07:24] Go for the bear hug, jesus [07:24] What's to lose, it's 1am [07:38] for serious [07:40] you can't get laid if you don't ask [07:40] why'd you let him get away, man [07:40] I'm sure that would have made an awesome story [07:50] hi:) [07:51] y0 [07:55] Cool story, bro [07:55] Tie a handkerchief to his angle [07:55] ankle [07:59] what's up, jonas__? [08:02] OK, next. http://www.archiveteam.org/index.php?title=Main_Page [08:02] Are people seeing the new site? [08:02] With the broken thumbnail? [08:02] I see we are going to rescue your shit [08:03] This page was last modified on 25 January 2012, at 20:50. [08:07] I see it didn't 'take'. [08:18] OK, it now 'took". [08:18] It's hope it work.s [08:19] f5,f5,f5,not yet [08:24] whats your current take on me.com? wondering to join and put my nick in the "highscore at memac.heroku [08:26] that'd be excellent [08:26] me.com is chugging along slowly [08:26] jesus shit I didn't realize we had 50T already [08:36] archiveteam is yet only allowed to store at archive.org not run the scripts there? ^_^ [08:37] and how were the 200k usernames collected actually, is that list complete? [08:38] as usual we have no good way to know for sure [08:39] Chaos is our fuel [08:40] I'd be willing to bet it isn't a complete list of usernames, but you know what's greater than 0? 200k! [08:42] :) [08:43] just wondering how its made [08:43] google + wordlists [08:43] most likely yeah [08:44] no, like, that's how archiveteam does this shit. [08:44] google + wordlists are source number one [08:46] didnt google increase restrictions on that? [08:47] the whole world hates us [08:47] ;) [08:48] *shakehands* [08:49] so you need a big set of IPs, which arnt commonly known proxies, to get >99% of the google results with the words list [08:49] ? [08:51] I dunno, I haven't done that side of things in a while [08:52] jonas__: IPv6 man, IPv6 [08:55] Good luck, I'm behind seven dead hookers [09:16] jonas__: you'd have to ask alard, I think he was the one that came up with the user list [09:16] also, it's fairly easy to add in new users if you can find them :) [09:20] Hi. [09:20] ipv6 is indeed very helpful. [09:50] Can people visit www.tapedocumentary.com [09:52] SketchCow: yes [09:52] Thanks [09:52] your is after your [09:52] should be the other way around [09:52] and you have no doctype [09:53] no html5 for you [10:05] SketchCow: what did you change on the front page? [10:06] I don't see anything in the history... [10:06] I do see a new spammer though :) [10:08] Do you. [10:08] alas, I do [10:09] Extra bonus points for telling me before I fall asleep [10:10] http://www.archiveteam.org/index.php?title=User:DugaldHurst3751 [10:11] Congrats on misdirecting me. [10:12] oh? [10:13] Spammer will temporarily win [10:13] We were discussing tapedocumentary.com [10:13] then you go 'the front page" [10:13] So I wasted, oh, 2-3 minutes on "find spammer on front page of tapedocumentary.com" [10:14] Going to bed now, will deal tomorrow [10:14] oh, my bad [10:15] sleep well [10:15] I am finally hunkering down and fixing all 10 sites that have spam or other issues. [10:15] And are hacked a la dreamhost. [10:16] This is taking a little time. [10:17] yea, I imagine [10:20] New archiveteam.org has fixed image. [10:20] I found a weird bug there. [12:13] What was the last reason I quit? [12:13] I seem to have have a very borken IPv6 connection. [12:14] PepsiMax: just a ping timeout [13:19] db48x: darnit, then something is wrong with mah IPv6. [13:20] alas [13:20] is it a tunnel? [14:47] db48x: no, my router supports native IPv6. [14:48] I told Irssi to stop preffering IPv6, so I should be on IPv4 now. [14:57] kennethre: Hi. The MobileMe break is over, apparently. So if you want start again... (But maybe your virtual bill is already large enough.) [14:57] alard: excellent! I'll spin up a few [14:58] alard: I need to think about the virtual bill though :) [14:58] Sure. But everything helps! [15:21] MORNING [15:21] OK, there goes the spammers on archive.org [15:28] I havn't tried to get a range of content from archive.org in a while. How would I go about downloading all of project gutenberg? [15:32] I mean archiveteam.org [15:32] Now adding the Cite and Captcha extensions back, now that the rest is working. [15:36] I guess I can just wget http://gutenberg.readingroo.ms/ [15:39] isforinse: Or use rsync. http://www.gutenberg.org/wiki/Gutenberg:Mirroring_How-To [15:48] OK, I got the Cite and Blacklist extensions in [15:48] And UserMerge [15:55] underscor wrote an ia_grab script but I haven't seen it posted anywhere, he got distracted by doing it with git-annex instead [15:56] which is cool but not really a grab-n-go solution for people [16:10] yea, git annex is shiny [16:12] is there a way to bypass a 302 with wget [16:12] i trying to get the whitepapers from the register [16:13] well, depends what triggers it [16:13] wait what [16:13] it goes to a login screen [16:13] there is 2404 whitepapers on the register [16:14] freely available? [16:14] you have to login to get them [16:15] i have a login account but i don't want to manually download everyone though firefox [16:15] then login, save the cookie data, use the cookie with wget [16:15] might wor [16:15] k [16:27] you need to figure out the login form data you need to send, as well as the url to send it to. then you can tell wget to save cookies (including session cookies) to a file while sending the form data to the login handler. then, on your download attempts with wget, you just tell it to load the cookies from the file [16:27] wget --load-cookies=cookies.txt --keep-session-cookies http://.../download [16:27] wget --save-cookies=cookies.txt --keep-session-cookies --post-data="username=X&password=Y" http://.../login [16:27] (provided they are using cookies to track logins and not http auth) [16:28] pretty much what alard said as well. I don't think he provided the form data you need, however. you still need to figure that out. [16:28] :) [16:33] my cookie is always empty [16:33] godane: link to site? [16:33] http://account.theregister.co.uk/login/ [16:35]

[16:35] hrrm, nvm [16:37] looks like you may need to make something to give the urls to wget, since I am pretty sure wget will not spider forms [17:42] http://raganwald.posterous.com/dear-landlord [18:05] how do you redirect wget to get the right filename? [18:06] i do this: wget --keep-session-cookies --load-cookies=cookies.txt http://whitepapers.theregister.co.uk/paper/download/0003 [18:07] it redirects to filename in browser but not with wget [18:07] use -O [18:08] i want there file name [18:08] i don't want to have to type it in for everyone [18:09] *every time [18:11] godane: does the file provide the filename? [18:11] ...download/0003 probably saves with a useless filename, yes? [18:12] I imagine they're giving the filename in one of the headers [18:12] Content-disposition or whatever it is [18:14] I think you need --trust-server-names for that. [18:31] WELL TODAY JASON LEARNED A LITTLE LESSON ABOUT NPR'S "ON THE MEDIA" [18:31] I think the best part was when he compared me to OJ Simpson [18:32] Also, they kept gearing the conversation to Ghostbusters and we are like Ghostbusters and aren't we like Ghostbusters and at the end the host said "Cue the Ghostbusters theme" so I think we know where this is going. [18:32] this deserves an image macro [18:33] archiveteam: ghostbusters or oj simpson? BOTH! [18:33] Link? [18:34] host has yet to edit himself into seeming less hung over [18:34] which he totally is, clearly [18:34] (I listened in) [18:36] heh [18:36] yes, it was meant to be 'come, see how professionals conduct themselves' and it was 'come, see a professional trip over his own dick so many times you want to call 911 and make him recite the alphabet' [18:37] He said "how's the law" and I said this and that [18:37] And he said "Well, OJ simpson is in jail for using guns to get his own property" [18:37] And I said "Archive Team does not empoy weapons or firepower to acquire websites." [18:37] I hope that goes in [18:37] Singularly awful [18:38] a singularity of fuckery [18:39] "This is a robbery. Hand over over your web site or I'll shoot your fucking CPU out!" [18:40] haxxxx [18:41] watchout we haax the gibsonnn [18:44] i think the show comes out on fridays [19:02] oink shutting down http://techcrunch.com/2012/03/14/kevin-roses-oink-shuts-down/ [19:03] they have export tool, they say [19:24] Oh, other Oink [19:24] I was about to say; that's been gone for years... [19:43] http://lifehacker.com/5893278/how-to-protect-your-data-in-the-event-of-a-webapp-shutdown-and-prevent-the-problem-in-the-future [19:46] doh [19:47] joe's barbershop to be shutdown for a few months [19:47] fire in basement [19:47] http://sfist.com/2012/03/14/cyclists_terrorized_on_the_wiggle_s.php [19:48] sry wrong chan [19:49] heh [20:53] hahaha [20:53] http://holistic.xkcd.com/ [20:54] yea [21:10] alard: --trust-server-names doesn't work [21:10] But you do get a file? Or not even that? [21:11] i get the file as 0003 [21:12] but i don't want to use -O cause there is over 2000 urls like this [21:16] godane: use web-sniffer.net or something to get a look at the response headers [21:19] Who is it who does the wiki archiving? [21:19] godane: Try --content-disposition (if your wget is recent enough). [21:19] --content-disposition [21:19] and i'd been using it for a few years. I'd be surprised if the version in use is too old [21:19] that worked [21:20] Coderjoe: The manual says 'experimental (not fully-functional)', but yeah, who knows how long it has been like that. [21:21] isforinse: emijrp is a driving force at that [21:21] isforinse: I'd recommend popping into #wikiteam [21:21] That's the name, thanks ersi [21:21] he's not there (or here) right now though [21:22] yeah :) np, you're welcome [21:22] date on the earliest script I have with --content-disposition: Apr 18 2009 [21:31] looks like 4 thur 8 whitepapers don't exist [21:33] ugghhhh im getting sick [21:33] good thing im coming home RIGHT NOW [21:35] More data deletion http://0000free.com/e/403.html [21:35] (A site I liked was hosted there.) [21:51] looks like some of the whitepapers needs more information [21:52] so i end up doing those manually [22:00] already linked here? http://www.ipetitions.com/petition/save-cbc-music-archives [22:02] Yes [22:02] You know what I hate? Petitions. [22:02] Know what I like? Action. [22:02] Over two months ago I called the archivists in charge of the archives. [22:02] They're going to good homes, and archive.org is there if they can't be found. [22:02] +1 [22:10] oh goodie [22:11] i'm getting a 34mb flash video [22:12] So it's not true that they're throwing away things which have not been digitized? [22:13] Of course I didn't link it to ask signatures. :p [22:15] -store-buys-entire-music-archive-cbc-calgary [22:15] ah http://calgary.openfile.ca/blog/curator-blog/curated-news/2012/calgary-music [22:38] of interest: githubarchive.org [22:38] code: https://github.com/igrigorik/githubarchive.org/ [22:56] good way to eventually enumerate all github repos [22:56] * closure removes that from his todo list