[00:04] ivan`: the whole harddrive ? [00:04] i grep just on the swap [00:42] this movie on kickstarter about the theft of a stuxnet-like cyber weapon looked interesting, so I backed it: https://www.kickstarter.com/projects/1747096622/crow-hill-the-feature-film [02:22] http://www.retrothing.com/2009/04/10-year-old-sony-comic-book-phone-already-forgotten.html [03:45] hey SketchCow [03:46] i think i know of a way around the semi-admin problem in computerandtechvideos [03:46] i think the way around that is put rev3 stuff in a revision3 collection [03:47] and the twit stuff in to a twit collection [03:47] then i can have full admin of those collections [03:47] and subcollection [04:28] It'll head towards that way, yes. [04:46] just know This week in startups is NOT a Revision 3 show [04:50] its most likely to go that way just so we can get the computerandtechvideos neater [04:50] that way there isn't 87 collections in it [04:51] there will just be something close to 20 or 30 [04:55] anyways i'm going to upload Engadget Distro pdfs [08:35] An enormous amount of items dumping into the archive from my various attached hard drives. :) [15:16] Random Blogspot blogger: "Derp de durr... I made a PDF that I want to share with my audience. Oh wow, look how easy it is to host it on Dropbox!" [15:16] SadDM (clicking link 6 months later): "Noooooooo!" [15:18] More like Derpbox [15:18] Or Burpbox [15:19] I mean, I get it. All of a sudden folks are able to host files... it's miraculous to them. [15:19] But man, talk about transient. [15:26] Yeah [15:35] ah, derpbox, it's like devnull-as-a-service.com [15:41] ugh, I'm also going to put google docs in the same boat... I haven't been burned yet, but I brace myself every time I click a link. [15:42] rapidshare etc. etc. are worse than either of those [15:44] google doesn't actively expire files after x months [15:45] although none of them are waybackable (annoyingly, dropbox used to be) [15:48] right. I forgot all about those (though they seem to come up less frequently in the parts of the web that I trawl through). [15:49] this is the current robot.txt of dropbox http://pastebin.com/jdA0xKcc [16:11] a lot of the MAME scene folks seem to have latched onto sendspace which deletes everything after just a couple months [16:15] want the photos of this rare arcade game from a 3 month old post? too bad! http://www.mameworld.info/ubbthreads/showflat.php?Cat=&Number=318306&page=4&view=expanded&sb=5&o=&fpart=1&vc=1&new=#Post318306 [16:16] oh... that's downright criminal :-P [16:16] just this weekend I did a grab of a forum where there were a TON of images hosted on photobucket [16:17] I actually grepped through the warc.gz to find them and download them too [16:17] then I rolled both of the warcs together [16:18] It was kind of a pain, but given that the photos were the focus of the threads they were in, I figured it was worth it. [16:44] dropbox doesn't actively delete files after x months, either. The user did that because they ran out of space and made the poor decision that those files were no longer important [16:46] Yeah, and that almost makes it worse. Clearly it was important enough to share with the world at one point, but now the new episode of Dancing With the Stars takes priority. [20:34] Sorry about all of that crap I just pasted into #archiveteam guys. Mental note... don't keep a huge, full paste buffer :-( [20:35] I'm sure you can rejoin (and apologise!) there, if your client is quite finished pasting! [20:36] (also, uh, perhaps use a client with accidental paste detection... it's saved my bacon a couple of times) [20:36] SadDM: all is well, join again :p [20:36] Yeah, I though irssi had paste protection... maybe it's not set up out of the box [20:38] that looked like it was getting pretty steamy [20:38] fwiw, disconnecting your IRC client after spamming will stop the spam [20:39] a proper IRC client not running in your terminal like hexchat will also let you paste newlines into the input box without sending them [20:40] SadDM: at least it was not some more graphic fanfic ;) [20:40] SadDM: ah, I'm on irssi too - so it does have it, just needs configuring I guess! [20:42] it's weird, I accidentlly tap the right button in a putty window... it immediatly fills up with the story, and then irssi says: "Pasting 7 lines to #archiveteam. Press Ctrl-K if you wish to do this or Ctrl-C to cancel." [20:42] thanks irssi... what about the other 30 lines I didn't mean to paste? [20:42] so I wanted to know if there was more about Kran and Jendara [20:42] do you really? [20:47] that's what putty does [20:47] right click always pastes [20:47] whatever is in the buffer [20:47] left click highlights/copies [20:47] oh I know, I must have grabbed my mouse funny :-P [20:48] where was that story from anyway? [20:50] here maybe: http://paizo.com/paizo/blog/v5748dyo5lfwm?Skinwalkers-Sample-Chapter [20:50] Leo_TCK: it's a sample chapter posted at http://paizo.com/paizo/blog/v5748dyo5lfwm?Skinwalkers-Sample-Chapter [20:50] ^5 SadDM [20:50] ha... ninja'd while I tripple-checked my paste buffer [20:58] hmm [21:07] Is it possible to somehow get a list of all .nl websites or .zw websites? [21:10] crawl the homepages of .nl websites and look for links to other .nl websites [21:11] if you have that, let me know and I'll make a huge seed list for you based on my URLs [21:12] arkiver: I would contact SIDN(.nl) and say something like "I'm doing a research paper on the structure of the web in the Netherlands. Could I possibly get a list of all currently registered .NL domains? Thanks" [21:13] If you'd like to not lie, you could just replace "a research paper" to "research" and bam, it's not a lie anymore. [21:14] hmm thank you ivan` and ersi [21:15] looks like Meet John Doe (1941 firm) is going to be played on The Blaze [21:15] and .zw seems to be handled by http://www.zispa.org.zw/ [21:15] I'll try both ways, will send an email tomorrow to SIDN and start crawling a list of .nl websites (first need to find out what the best way is to do that... ) [21:15] only reason is cause its in public domain [21:16] arkiver: I bet there isn't a best way, but any way you start is a good start :) [21:17] ersi: haha, yes! I'll start it tomorrow and tell you how it goes [21:17] I'd crawl through Common Crawl's past crawls for .NL domains and crawl those pages first. Maybe take a looksie at Alexa and stuff [21:18] Then maybe some web directories, like https://en.wikipedia.org/wiki/List_of_web_directories [21:18] and/or if you know any ".nl portals" where users have hosted their stuff/had homepages and stuff previously, like prior to Facebook and what not [21:48] ersi: thank you fo your help ersi! Will try some things out tomorrow... :) [21:48] np :)