Time |
Nickname |
Message |
00:09
🔗
|
SketchCow |
Today I found out Random House publishing was meant to be a reference to actually publishing random books |
01:00
🔗
|
godane |
now this is weird |
01:00
🔗
|
godane |
a file for computer science spring 10 has timestamp of march 30 2011 |
01:00
🔗
|
godane |
*spring 2010 |
01:01
🔗
|
godane |
and the spring 2011 has timestamp of may 17 2010 |
01:02
🔗
|
godane |
it looks to be same file |
01:02
🔗
|
godane |
http://crypto.stanford.edu/cs155old/cs155-spring11/hw_and_proj/proj3/traces/ |
01:02
🔗
|
godane |
http://crypto.stanford.edu/cs155old/cs155-spring10/hw_and_proj/proj3/traces/ |
01:02
🔗
|
godane |
there is lots of dedup that could have happened on this website warc grab |
01:03
🔗
|
godane |
maybe usful when you have a 4gb warc file limit |
01:46
🔗
|
mistym |
Someone on Slashdot linked this glorious textfile: http://www.textfiles.com/food/newcoke.txt |
01:47
🔗
|
mistym |
I love how, aside from a couple cultural references, that could be an angry internet rant from today |
01:54
🔗
|
chronomex |
"In conclusion, I understand that Burger King plans to change the recipe on the Whopper soon." |
01:58
🔗
|
balrog_ |
http://www.youtube.com/watch?feature=player_embedded&v=6pDy-CSFsPs |
02:28
🔗
|
godane |
does anyone know of a custom dirbuster to get a list of folders/files from a website? |
02:28
🔗
|
godane |
i'm ask this cause i think computerpoweruser.com has alot more pdfs files |
02:29
🔗
|
godane |
folders index can't be access so the only way is to guess the file names |
02:29
🔗
|
chronomex |
hmm, no, but you could probably just bang something together with curl and your favorite scripting language |
02:51
🔗
|
DFJustin |
godane: are you checking http://wayback-beta.archive.org/ when determining if there have been crawls or not |
02:55
🔗
|
godane |
i know there there are crawls for this DFJustin |
02:56
🔗
|
godane |
but i have found other folders that have not been crawled at all |
02:57
🔗
|
DFJustin |
I know, just wondering if you're using the up to date beta information |
02:58
🔗
|
godane |
alot of these captures have happened back in 2006 |
05:01
🔗
|
godane |
i found something interesting |
05:01
🔗
|
godane |
looks like mininova.org forums still has stuff going back to 2005 |
05:01
🔗
|
godane |
so that maybe have some worth |
06:45
🔗
|
godane |
so i got crypto.stanford.edu |
06:46
🔗
|
godane |
the warc max size was size to 1gb |
06:46
🔗
|
godane |
i got about 3.7gb of the site in warc.gz :-D |
06:46
🔗
|
godane |
its for 4gb uncompress |
07:50
🔗
|
DFJustin |
some stuff for the manuals collection https://archive.org/post/441568/ |
09:29
🔗
|
godane |
uploaded: http://archive.org/details/www.urinal.net-20121128-mirror |
09:30
🔗
|
godane |
it has 100s of images of different urinals in it from around the world |
09:40
🔗
|
* |
BlueMax salutes City of Heroes |
10:41
🔗
|
godane |
so i found a 14.5mb pdf on computerpoweruser.com |
10:41
🔗
|
godane |
turns out is just 3 pages |
10:41
🔗
|
godane |
its funny cause i have 5mb full magazine pdfs |
10:42
🔗
|
schbiridi |
pdfs sometimes are just bundled JPEG files or worse |
10:46
🔗
|
schbiridi |
SketchCow: if you pass through hamburg by any chance on your deutschland trip, i would have about 3TB of jamendo ogg vorbis for drive-through copying. dating back to 2009 or whenever i started grabbing them |
13:35
🔗
|
SketchCow |
That is not going to happen. :) |
13:35
🔗
|
SketchCow |
(spoiler alert) |
21:33
🔗
|
schbiridi |
aww :D |
22:19
🔗
|
Coderjoe |
it's been publicly announced: http://blog.archive.org/2012/11/30/3-for-1-match/ |
22:46
🔗
|
godane |
so i got more urls the archive.org has for computerpoweruser.com/articles/2001/ |
22:46
🔗
|
godane |
i have dozens of pdfs from there |
23:49
🔗
|
godane |
uploaded finally: http://archive.org/details/crypto.stanford.edu-20121130-mirror |