[00:07] is anyone backing up coursera? someone I know just asked for 'introduction to sociology' which got removed like many courses [00:08] well, not removed, but archive hidden [00:15] why hidden? [00:15] the course providers decides whether the archive stays up after the class ends [01:03] sounds like it could be an ip issue [01:04] if the course providers explicitly don't want it up [03:42] HEY UNDERSCOR [03:42] I just like poking him on principle. [03:43] >:c [03:43] Aw, fuck [03:43] Everything is filled up and turbofucked [03:44] I hate it when that happens [03:44] Wow, I've had formspring uploading for a solid day + [03:51] I'm tempted to pause tracker jobs while I pick up this mess [03:52] Or at least temporarily shift uploads to fos [03:52] Because these machines need to be manually drained/maneuvered onto other disks [03:53] FOS can handle a little more stuff incoming [03:53] But don't throw the whole shebang at it. [03:54] K [03:54] I'll add fos back in to the rotation and just set up rate limits on my boxes [04:03] I see stuff coming in. [04:31] Poor machine, it had so much formspring to deal with [04:34] I came across your uploads for TOSEC on the Internet Archive while just screwing around one day, and I'm very confused about what's stored in this archive. It says images, but does that mean ROM images, or does it mean something along the line of JPEG images? Also, I've noticed that the Internet Archive's archive of TOSEC material is much, much smaller than is on the official TOSEC webpage. Are you planning on updating the collection constantly, or ar [04:34] And, I also came across your Archiveteam upload of the website Friendster. I noticed that you were only able to grab about 20% of the accounts on this site before it was shuttered. Have you considered taking snapshots or entire grabs of other social networks before they are shuttered as a premature safety measure? [04:34] Thanks a million for your time! [04:34] .. [04:34] See this, this is the opposite of a helpful letter. [04:34] This is now how you write a letter to me. [04:34] "Explain to me tons of stuff I could suss out for myself, while also, I think you missed a bunch of stuff but I don't feel like giving examples. Big fan!" [04:36] These dipshits tend to annoy me too [04:36] You never get mail like this. [04:36] http://www.youtube.com/watch?v=L8onlB0F1_A [04:37] I run a YouTube gaming channel and people leave comments asking questions that they could find their own answer to with 5 minutes in Google [04:40] It tends to piss me off, especially this one guy who left 20 comments on 20 videos over one month saying "are ps2 games on vita" [05:59] An "archive your flickr" tool would be hugely popular these days http://www.flickr.com/help/forum/en-us/72157633650721234/72157633651235708/ [05:59] We have tools for that [06:00] I should polish them up sometime soon [06:05] underscor: also for Windows users? [06:05] eww, windows users [06:05] * Nemo_bis hides [06:06] Uh, no, they're in ruby and/or bash, iirc [06:06] I should port them to python [06:06] well, think of linux users first :p [06:06] I mean [06:06] ideally [06:06] :) [06:06] in a few days, people will be too angry to bother about archiving [06:06] someone could set up a box that is like "gimme your username, and I'll spit this massive fucking zip at you" [06:07] yep [06:07] like some of our past trackers where you could allow usernames [06:07] *add [06:07] http://pi.pe/ this is a neat thing [06:13] Ah. And what happens if destination is Google Drive? Let's try. [06:13] Also, everyone update http://archiveteam.org/index.php?title=Flickr please [06:15] Does it *really* manage to archive Facebook streams? Those are really horrible [06:26] doesn't seem to do anything [09:01] SketchCow: re: examples of stuff that you're missing, you don't have most of the tosec iso sets like IBM http://www.pleasuredome.org.uk/details.php?id=137667e22983694b2f81e8eb141b9ecc38f9741f http://www.pleasuredome.org.uk/details.php?id=03d445619790d251655e69427282aa9fab22a243 [09:01] sega saturn http://www.pleasuredome.org.uk/details.php?id=6d4119213ca6d49c7467c5c0857bf17247b152e5 [09:01] etc. just search for tosec iso [09:01] I assume that's what this fellow was referring to [09:04] yeah SketchCow doesn't like Pleasuredome DFJustin [09:07] And that's fine. No ISO yet. [09:07] But what about the regular rom sets. [09:07] I'm STILL making that thing presentable into a 1.0 [09:08] I wonder what rom sets I'm missing. [09:08] I [09:09] I've got a script I'm running right now, which can take a pile of a directory and make it into a fully formed collection [09:09] I'm running it on the TOSEC-PIX I download [09:09] THAT is taking forever too. [09:09] But I'm jamming dozens of magazines, manuals and newsletters up every minute. [09:15] I don't think there's much missing in the way of rom sets other than the last year's worth of updates [09:15] ^ [09:16] I would check the TOSEC website for you but it's erroring out on me [09:16] pleasuredome is not that hard to work with, if you leave that tosec-pix thing going for a while on a box with a fat pipe you'll have loads of credit in no time [09:19] I wish I had a fat pipe to up my own credit on it. [09:20] Main reason I love Underground Gamer is that a $25 donation makes you immune to being shitlisted / banned automatically [09:21] There we go - full derp on the uploads [09:21] Wack it up, full-derp! [09:21] DURPMAX [09:21] DerpMax! [09:22] oh found a typo, https://archive.org/details/Front_Fareast_Magic_Drive_TOSEC_2012_04_23 "Farest" -> "Fareast" [09:25] full derp? [09:28] Far East [09:30] Ha, I'm oblitering the incoming queue [09:30] I should go sleep for a while [09:32] gnight...also if you could seed that Picpx torrent I sent you that'd be nice :P [09:33] oh there are some missing rom sets, will e-mail details tomorrow [09:33] Good. [09:34] I was going to do that search down the line, but I figured I'd ask. [09:34] Like, just tracking down decent descriptions of systems and putting those into the entry and adding a photo, that's been eating days [09:34] So many of them [09:34] For updates, of course, I'll just transfer the photos. [09:34] and the descs. [09:37] Is there anything I can help with related to the TOSEC sets? [12:24] Did Yahoo buy Tumblr just to close it later? :p [13:03] The command line output in "Current project" (AT Warrior) is so... dead. Why not show detailed information? [13:05] http://www.reddit.com/r/technology/comments/1f1m9x/googles_schmidt_teens_mistakes_will_never_go_away/ca61ahq [13:07] "There's a group called "Archive Team" who hates that archive.org obeys robots.txt, and will be downloading all of reddit and making it searchable so that your stupid posts here will be forever available." [13:10] Are we supposed to give a fuck what some redditor thinks? I have been archiving reddit for over year and see no reason to stop [13:11] no, I found the comment amusing [13:11] http://www.reddit.com/r/privacy/comments/1emh4r/urgent_delete_any_old_reddit_posts_you_dont_want/ [13:11] and hey, top comment is someone sensible [13:14] "Disallow: /my_shiny_metal_ass" Reddit doesn't want its shiny metal ass to be crawled. [13:15] Wow what fucking morons. Let me tell you a short story. When I was growing up we were reminded not to break the law etc.. because of school records and criminal records etc.. Things you did have consequences over the course of the rest of your life. [13:19] Now after posting something to the internet for everyone to see they are crying foul and want it erased? Too fucking bad, you were stupid enough to embarrass yourself online live with it [13:45] I like watching uninformed people flip their shit over archiving. [13:46] And how they insult Archive Team as a whole. [14:17] Redditors can kiss my ass [14:21] * Smiley wonders how offtopic this might go. [14:42] "subcontractors" lol [14:42] Now where did I keep this contract again? [16:16] Morning. [16:16] omf, how goes the warc gallery [17:40] Couple quick questions. [17:40] If a warc has a lot of images do you want it broken down into smaller collages [17:41] re: earlier discussion, it's easy to say it's your own fault as a teenager for posting dumb shit online, but the thing is there is a lot of science showing that the "responsible" parts of the brain don't fully develop until later [17:42] and that's one of the reasons most countries don't give youths long-term punishments for crimes [17:43] that said I have no idea what can be done to stop it now that we have an internet [17:44] I think it's a side effect of how fast the technology fell on society. [17:44] I was arrested for shoplifting once, got community service. Got a mugshot, too. [17:44] None of that's online, etc. [17:44] DFJustin, You hit the nail on the head. The internet is changing things and people need to accept that instead of ignore or deny it [17:45] I'd probably be sad if my little sad sack face was on the net. [17:45] I have turned the process of adding a directory of the same item (one magazine run, one set of console manuals) into one command followed by one keypress. [17:45] basically social attitudes / behaviour of hr interviewers / etc. are going to have to change [17:45] Can't get faster than that. [17:49] One keypress - http://stream1.gifsoup.com/view4/2239283/homer-bird-o.gif [17:52] Pretty much. [17:52] Basically, I made the default "make a collection for this set of items", but I needed a way to say "nah, don't do that" [18:28] hi, is there a script to save reddit threads to pdf or epub ? [18:28] maybe even mht [18:48] Not that I know of. [18:48] There should be. [18:48] I've got three separate windows slamming newsletters, magazines and manuals into the archive. [18:48] New one being added every second. Every. Second. [18:54] SketchCow, building whole libraries 3 items a second [18:54] fuck academia [19:00] Obviously, it takes longer than it should to derive the resulting items, and I do have to go back and link the collections. [19:00] WiK, how many TB are you up to now? [19:00] But the magazines are going in very fast now, three windows. [19:02] It's very hard to tell my uploading. [19:02] But bear in mind, these magazines and newsletters are pretty small PDFs. [19:02] Like, anywhere from 500k to 20mb [19:02] So in total, this TOSEC-PIX I'm incorporating is maybe 375gb [19:06] I forget the meta manager way to say "what have I added today" [19:06] I'm sure it's something. [19:07] I'm keeping the three windows very busy, no downtim. [19:08] Amiga Joker Magazine (German), Computer News 80 (TRS-80 Magazine) and One for ST Games (Atari ST Magazine) [19:28] awesome to see names that i remember reading (amiga joker) [19:29] i have a vps.. with 50-70GB free .. anythign i could run on it? [19:29] SketchCow: publidate 20130526* [19:29] publicdate* [19:29] you'll have to turn on the field [19:29] i don't need anything personally but i'd figure i'd try to help for a short while [19:30] underscor: I can tack it onto the end manually [19:32] zenguy_pc, which version of Linux is on it [19:34] underscor: Actually, it's publicdate, and the form is in 2013-05-26* [19:34] But other than every single aspect of the help being wrong, thanks [19:34] * SketchCow gatorade dump [19:34] Anyway. I've added 1,498 discrete texts to archive.org today. [19:35] 149 yesterday. [19:36] 396 day before that, 1,306 day before that, 959 day before that, 343 day before that. [19:36] So a good busy week. [19:37] starting to look like in the missing ids for 10000s area only the man show clips are alive [19:47] debian 7.0 [19:47] sorry for the late response [19:52] zenguy_pc, well the baseline perl looks new enough that it shouldn't be too complicated to get the screehshot application on there. I am taking screens of the front of posterous blogs [19:53] debian 6 is just way too out of date [19:57] i just reinstalled debian since my upgrade last week failed.. i successfully upgraded it on the second try and I shut it down until i could follow some guides to secure it [19:57] what will i need to run? . [20:04] is anyone backing up coursera? someone I know just asked for 'introduction to sociology' which got removed like many courses [20:04] I have an old-ish (few months) dump of their course metadqta [20:04] metadata * [20:04] not sure to what extent courses are hidden [20:04] because it really only is the metadata [21:04] DFJustin: Guy was referring to my not referencing the newest datasets [21:04] k [22:04] well, ive got a few hundred gb left on this last drive, once thats full ill be at 10tb [22:06] omf_: 10tb in a day or two [22:07] nice [22:08] then n i think ill have to stop downloading for awhile [22:08] ill be out of drive space, and ill need to finish sorting/grepping all the results for my defcon talk, ill fund out mid next month if its accepted or not [22:13] we have some clips of action blast [22:23] later tonight i maybe showing one of Jason Scott talks to my brother [22:48] heh, got two opml files out of that reddit post [22:54] I don't think anyone there realizes how Reader works [22:55] what reddit post? [22:56] http://www.reddit.com/r/privacy/comments/1emh4r/urgent_delete_any_old_reddit_posts_you_dont_want/ [22:56] reddit rss? [22:56] oh i saw that earlier [22:57] i don't mind if my account is archived.. i just have to be extra careful of linking it to my offline identity [22:58] some people are more paranoid creating accounts weekly .. i don't have much need for that [22:58] i just vote save and comment intermittenly.. [23:28] it's astonishing how people have any expectation of privacy in a public forum [23:28] don't want it public? don't say it in public [23:30] the problem is for many years there was privacy by obscurity- sure, the info was public, but you weren't going to know about it unless someone told you or you already knew [23:31] now that it's so easy to find things and connect them, a lot of that is going away