Time |
Nickname |
Message |
00:02
🔗
|
dashcloud |
Yottabyte is old news now- Hellabyte will be replacing it as a measure of really big data amounts: https://twitter.com/timoreilly/status/274198934409338880 |
00:19
🔗
|
godane |
joepie91: i grabbed that site |
00:19
🔗
|
godane |
i had to put lines in a index for a good mirror but i think i got it all |
00:33
🔗
|
godane |
also my arstechnica.com grab is going good |
00:34
🔗
|
godane |
i have 2001 thur 2006 |
00:34
🔗
|
godane |
grabing 2007 |
00:35
🔗
|
godane |
i will do the image dumps once i'm up to 2011 grabs |
00:36
🔗
|
godane |
i will most likely only look for images in cdn.arstechnica.com |
03:33
🔗
|
ivan` |
underscor: are you still on my UT torrent |
07:35
🔗
|
godane |
i'm check url links with on computerpoweruser.com with dirbuster |
09:02
🔗
|
SmileyG |
organics... |
09:10
🔗
|
norbert79 |
Female organs |
12:16
🔗
|
joepie91 |
godane: raided4tor or the RCT thingie? |
20:32
🔗
|
chronomex |
"DailyBooth, a social network that lets users share photos in real-time, has raised $6 million in a first round of funding." |
20:32
🔗
|
chronomex |
(from march 2011) |
20:32
🔗
|
chronomex |
ummm, ... don't all social networks do that? |
20:34
🔗
|
norbert79 |
But they are more... HIP |
20:37
🔗
|
chronomex |
"The company says that to date 13 million photos have been shared via its service" |
20:37
🔗
|
* |
chronomex does math |
20:37
🔗
|
chronomex |
hmmm |
20:38
🔗
|
chronomex |
$0.45/photo ... that's more expensive than a photo lab |
21:52
🔗
|
SketchCow |
http://2.bp.blogspot.com/-zQsjB4cj1cQ/UF3eLl-749I/AAAAAAAAgEE/jbiuy95wOQY/s1600/Christina+Aguilera+Cleavage+-+on+%27The+Voice%27+1.jpg |
22:43
🔗
|
swebb |
Is there an #archiveteam-nsfw channel? :) |
22:44
🔗
|
chronomex |
is there an #archiveteam-sfw channel? |
22:50
🔗
|
BlueMax |
this is the NSFW channel, the other one is the SFW channel |
23:12
🔗
|
SketchCow |
There is no safe for work channel. |
23:27
🔗
|
godane |
SketchCow: i think we need some sort of dedup check with warc |
23:27
🔗
|
godane |
so it doesn't add a 88mb file like 3 times thats the same size and checksum |
23:28
🔗
|
SketchCow |
In what context? |
23:29
🔗
|
godane |
the crypto.stanford.edu has boxes-040405.tar.bz2 file in like 4 different urls |
23:29
🔗
|
godane |
http://crypto.stanford.edu/cs155old/cs155-spring07/boxes-040405.tar.bz2 |
23:29
🔗
|
SketchCow |
I'd not be worried that much. |
23:30
🔗
|
godane |
http://crypto.stanford.edu/cs155old/cs155-spring06/boxes-040405.tar.bz2 |
23:30
🔗
|
godane |
http://crypto.stanford.edu/cs155old/cs155-spring05/boxes-040405.tar.bz2 |
23:30
🔗
|
godane |
http://crypto.stanford.edu/cs155old/cs155-spring04/boxes-040405.tar.bz2 |
23:31
🔗
|
alard |
There must be easier ways to save much more space? |
23:31
🔗
|
SketchCow |
This is an interesting debate. |
23:32
🔗
|
SketchCow |
Also, this is quite an outlier thing you're going after, Godane |
23:32
🔗
|
godane |
i know |
23:32
🔗
|
SketchCow |
You're downloading coursework and example VMs for Stanford Crypto CS courses? |
23:32
🔗
|
godane |
crypto.stanford.edu was not mirrored much on archive.org |
23:33
🔗
|
godane |
on the wayback machine |
23:33
🔗
|
godane |
i was just think there was pdfs and docs there |
23:34
🔗
|
godane |
but at least you can say you have a full mirror of it |
23:45
🔗
|
godane |
my thought on how the dedup would work before being add to warc is to check if there is a file of same byte size in warc |
23:46
🔗
|
godane |
then check if the checksum is the same before adding to warc or refering the older file in the warc for the current url |
23:47
🔗
|
dashcloud |
from twitter, here's an example of a spectacular metadata fail: http://digitalnz.org/records/23254060 |