#archiveteam-bs 2012-12-01,Sat

↑back Search

Time	Nickname	Message
00:09 ^🔗	SketchCow	Today I found out Random House publishing was meant to be a reference to actually publishing random books
01:00 ^🔗	godane	now this is weird
01:00 ^🔗	godane	a file for computer science spring 10 has timestamp of march 30 2011
01:00 ^🔗	godane	*spring 2010
01:01 ^🔗	godane	and the spring 2011 has timestamp of may 17 2010
01:02 ^🔗	godane	it looks to be same file
01:02 ^🔗	godane	http://crypto.stanford.edu/cs155old/cs155-spring11/hw_and_proj/proj3/traces/
01:02 ^🔗	godane	http://crypto.stanford.edu/cs155old/cs155-spring10/hw_and_proj/proj3/traces/
01:02 ^🔗	godane	there is lots of dedup that could have happened on this website warc grab
01:03 ^🔗	godane	maybe usful when you have a 4gb warc file limit
01:46 ^🔗	mistym	Someone on Slashdot linked this glorious textfile: http://www.textfiles.com/food/newcoke.txt
01:47 ^🔗	mistym	I love how, aside from a couple cultural references, that could be an angry internet rant from today
01:54 ^🔗	chronomex	"In conclusion, I understand that Burger King plans to change the recipe on the Whopper soon."
01:58 ^🔗	balrog_	http://www.youtube.com/watch?feature=player_embedded&v=6pDy-CSFsPs
02:28 ^🔗	godane	does anyone know of a custom dirbuster to get a list of folders/files from a website?
02:28 ^🔗	godane	i'm ask this cause i think computerpoweruser.com has alot more pdfs files
02:29 ^🔗	godane	folders index can't be access so the only way is to guess the file names
02:29 ^🔗	chronomex	hmm, no, but you could probably just bang something together with curl and your favorite scripting language
02:51 ^🔗	DFJustin	godane: are you checking http://wayback-beta.archive.org/ when determining if there have been crawls or not
02:55 ^🔗	godane	i know there there are crawls for this DFJustin
02:56 ^🔗	godane	but i have found other folders that have not been crawled at all
02:57 ^🔗	DFJustin	I know, just wondering if you're using the up to date beta information
02:58 ^🔗	godane	alot of these captures have happened back in 2006
05:01 ^🔗	godane	i found something interesting
05:01 ^🔗	godane	looks like mininova.org forums still has stuff going back to 2005
05:01 ^🔗	godane	so that maybe have some worth
06:45 ^🔗	godane	so i got crypto.stanford.edu
06:46 ^🔗	godane	the warc max size was size to 1gb
06:46 ^🔗	godane	i got about 3.7gb of the site in warc.gz :-D
06:46 ^🔗	godane	its for 4gb uncompress
07:50 ^🔗	DFJustin	some stuff for the manuals collection https://archive.org/post/441568/
09:29 ^🔗	godane	uploaded: http://archive.org/details/www.urinal.net-20121128-mirror
09:30 ^🔗	godane	it has 100s of images of different urinals in it from around the world
09:40 ^🔗	*	BlueMax salutes City of Heroes
10:41 ^🔗	godane	so i found a 14.5mb pdf on computerpoweruser.com
10:41 ^🔗	godane	turns out is just 3 pages
10:41 ^🔗	godane	its funny cause i have 5mb full magazine pdfs
10:42 ^🔗	schbiridi	pdfs sometimes are just bundled JPEG files or worse
10:46 ^🔗	schbiridi	SketchCow: if you pass through hamburg by any chance on your deutschland trip, i would have about 3TB of jamendo ogg vorbis for drive-through copying. dating back to 2009 or whenever i started grabbing them
13:35 ^🔗	SketchCow	That is not going to happen. :)
13:35 ^🔗	SketchCow	(spoiler alert)
21:33 ^🔗	schbiridi	aww :D
22:19 ^🔗	Coderjoe	it's been publicly announced: http://blog.archive.org/2012/11/30/3-for-1-match/
22:46 ^🔗	godane	so i got more urls the archive.org has for computerpoweruser.com/articles/2001/
22:46 ^🔗	godane	i have dozens of pdfs from there
23:49 ^🔗	godane	uploaded finally: http://archive.org/details/crypto.stanford.edu-20121130-mirror

irclogger-viewer