[00:09] Today I found out Random House publishing was meant to be a reference to actually publishing random books [01:00] now this is weird [01:00] a file for computer science spring 10 has timestamp of march 30 2011 [01:00] *spring 2010 [01:01] and the spring 2011 has timestamp of may 17 2010 [01:02] it looks to be same file [01:02] http://crypto.stanford.edu/cs155old/cs155-spring11/hw_and_proj/proj3/traces/ [01:02] http://crypto.stanford.edu/cs155old/cs155-spring10/hw_and_proj/proj3/traces/ [01:02] there is lots of dedup that could have happened on this website warc grab [01:03] maybe usful when you have a 4gb warc file limit [01:46] Someone on Slashdot linked this glorious textfile: http://www.textfiles.com/food/newcoke.txt [01:47] I love how, aside from a couple cultural references, that could be an angry internet rant from today [01:54] "In conclusion, I understand that Burger King plans to change the recipe on the Whopper soon." [01:58] http://www.youtube.com/watch?feature=player_embedded&v=6pDy-CSFsPs [02:28] does anyone know of a custom dirbuster to get a list of folders/files from a website? [02:28] i'm ask this cause i think computerpoweruser.com has alot more pdfs files [02:29] folders index can't be access so the only way is to guess the file names [02:29] hmm, no, but you could probably just bang something together with curl and your favorite scripting language [02:51] godane: are you checking http://wayback-beta.archive.org/ when determining if there have been crawls or not [02:55] i know there there are crawls for this DFJustin [02:56] but i have found other folders that have not been crawled at all [02:57] I know, just wondering if you're using the up to date beta information [02:58] alot of these captures have happened back in 2006 [05:01] i found something interesting [05:01] looks like mininova.org forums still has stuff going back to 2005 [05:01] so that maybe have some worth [06:45] so i got crypto.stanford.edu [06:46] the warc max size was size to 1gb [06:46] i got about 3.7gb of the site in warc.gz :-D [06:46] its for 4gb uncompress [07:50] some stuff for the manuals collection https://archive.org/post/441568/ [09:29] uploaded: http://archive.org/details/www.urinal.net-20121128-mirror [09:30] it has 100s of images of different urinals in it from around the world [09:40] * BlueMax salutes City of Heroes [10:41] so i found a 14.5mb pdf on computerpoweruser.com [10:41] turns out is just 3 pages [10:41] its funny cause i have 5mb full magazine pdfs [10:42] pdfs sometimes are just bundled JPEG files or worse [10:46] SketchCow: if you pass through hamburg by any chance on your deutschland trip, i would have about 3TB of jamendo ogg vorbis for drive-through copying. dating back to 2009 or whenever i started grabbing them [13:35] That is not going to happen. :) [13:35] (spoiler alert) [21:33] aww :D [22:19] it's been publicly announced: http://blog.archive.org/2012/11/30/3-for-1-match/ [22:46] so i got more urls the archive.org has for computerpoweruser.com/articles/2001/ [22:46] i have dozens of pdfs from there [23:49] uploaded finally: http://archive.org/details/crypto.stanford.edu-20121130-mirror