| Time |
Nickname |
Message |
|
01:16
๐
|
dashcloud |
so, apparently you do need to have word: word for the warc header command |
|
01:18
๐
|
dashcloud |
holy crap- Angelfire is still live and changing site, and not just a repository of 90's websites |
|
01:19
๐
|
Famicoman |
My old angelfire is still up, kinda |
|
01:21
๐
|
dashcloud |
there's a split between the old style and new style (which you can still get- they sell hosting and websites still) |
|
01:22
๐
|
dashcloud |
apparently at some point angelfire and tripod became linked? (corporately at least) |
|
01:26
๐
|
Smiley |
dashcloud: they all lync sites arnt they?? |
|
01:26
๐
|
Smiley |
i think thats how it was spekt, at work, cant check. |
|
01:27
๐
|
dashcloud |
that's certainly my hope for my archive job |
|
01:28
๐
|
dashcloud |
hopefully with a small number of sites, I can cover a huge amount of angelfire just by visiting every angelfire link on the page |
|
07:51
๐
|
garyrh |
"balrog> does anyone know if Cameron Kaiser (of tenfourfox/classilla) is on twitter?" |
|
07:52
๐
|
garyrh |
https://twitter.com/doctorlinguist , but it's protected. |
|
07:54
๐
|
garyrh |
his last tweet was in 2012, so I guess he doesn't use twitter anymore |
|
12:14
๐
|
balrog |
garyrh: :/ ok |
|
12:14
๐
|
balrog |
I do follow him |
|
12:14
๐
|
balrog |
oh, he's active on ADN |
|
12:50
๐
|
Nemo_bis |
Please add to archivebot, I'm told it's going offline http://137.204.24.205/cis13b/bsco3/Default.asp |
|
13:02
๐
|
ivan` |
Nemo_bis: the whole domain? |
|
13:04
๐
|
Nemo_bis |
ivan`: I'm not sure, at least that cis13b/ directory which has some irreplaceable stuff |
|
14:36
๐
|
joepie91 |
ivan`, Nemo_bis: is that already being taken care of? |
|
14:59
๐
|
ivan` |
only partially |
|
15:29
๐
|
joepie91 |
ivan`: how partially? as in, is there something that needs to be done still :P |
|
15:52
๐
|
godane |
so i think i fucked my cbsradio collection some how |
|
15:53
๐
|
godane |
https://archive.org/details/cbsradio-hourly-2009-07-30 |
|
15:53
๐
|
godane |
no mp3 at all |
|
15:53
๐
|
godane |
all cause i was trying to fix a typo |
|
15:53
๐
|
godane |
:'( |
|
15:54
๐
|
joepie91 |
that reminds me |
|
15:54
๐
|
joepie91 |
godane: |
|
15:54
๐
|
joepie91 |
I have a -lot- of podcasts still |
|
15:54
๐
|
joepie91 |
from nhk |
|
15:54
๐
|
joepie91 |
did you ever end up fetching those? |
|
15:55
๐
|
godane |
i don't think so |
|
15:55
๐
|
joepie91 |
maybe you should :P |
|
15:55
๐
|
godane |
there on your remote sever right |
|
15:56
๐
|
godane |
cause other wise i will only be able to get the last 7 days |
|
15:57
๐
|
godane |
SketchCo1: are you moving my cbsradio items right now? |
|
15:58
๐
|
godane |
cause i'm finding items that 0 files in them |
|
15:58
๐
|
godane |
but others have files |
|
16:01
๐
|
godane |
joepie91: maybe you should upload your collection of nhk mp3s |
|
16:01
๐
|
godane |
they way its one less thing for me to do |
|
16:03
๐
|
godane |
i know remember why that hourly one doesn't have mp3s |
|
16:03
๐
|
joepie91 |
godane: ah, you're short on time? |
|
16:04
๐
|
godane |
i don't remember the rsync |
|
16:04
๐
|
joepie91 |
and yes, they're on a server of mine |
|
16:04
๐
|
joepie91 |
rsync://croissant.cryto.net/nhk |
|
16:04
๐
|
joepie91 |
they need deduplication though |
|
16:04
๐
|
joepie91 |
(you can tell from the last modified timestamp) |
|
16:04
๐
|
joepie91 |
if you don't have the time, let me know and I'll put it on my todo |
|
16:08
๐
|
exmic |
Nemo_bis, ivan`, joepie91: that directory http://137.204.24.205/cis13b/ is it being grabbed or what? |
|
16:19
๐
|
joepie91 |
exmic: that's what I was trying to establish |
|
16:19
๐
|
joepie91 |
oh |
|
16:19
๐
|
exmic |
right |
|
16:19
๐
|
joepie91 |
[18:12] <+ATGoKart> balrog: Your job for http://cis.alma.unibo.it/cis13b/bsco3/Default.asp has finished. |
|
16:19
๐
|
exmic |
that's /cis13b/bsco3 |
|
16:19
๐
|
joepie91 |
that's why my ctrl+F didn't find it then |
|
16:19
๐
|
joepie91 |
yes |
|
16:19
๐
|
joepie91 |
the /cis13b/ dir doesn't have a listing |
|
16:19
๐
|
exmic |
not /cis13b as requested |
|
16:19
๐
|
exmic |
ah ok |
|
16:19
๐
|
joepie91 |
so unless you have a db of URLs handy... |
|
16:20
๐
|
exmic |
nope |
|
16:20
๐
|
joepie91 |
coming to think of it |
|
16:20
๐
|
joepie91 |
we could ask IA |
|
16:20
๐
|
yipdw |
I should add a !ia command to archivebot |
|
16:20
๐
|
yipdw |
all it does is check the URL and tell you whether or not wayback has it |
|
16:20
๐
|
yipdw |
and/or is blocked by robots txt etc |
|
16:20
๐
|
joepie91 |
exmic: https://web.archive.org/web/*/http://cis.alma.unibo.it/cis13b/* |
|
16:20
๐
|
yipdw |
(I should get back to working on it, period) |
|
16:20
๐
|
joepie91 |
yipdw: godo it! :P |
|
16:21
๐
|
joepie91 |
go do * |
|
16:21
๐
|
exmic |
I don't have time to supervise an archivebot job this week |
|
16:21
๐
|
joepie91 |
and be sure to make it return the last archival date |
|
16:21
๐
|
yipdw |
yeah, I'll get back to it once I have less crap to do |
|
16:22
๐
|
Nemo_bis |
The most interesting stuff is in that subdir AFAIK |
|
16:22
๐
|
Nemo_bis |
Sorry for confusion |
|
18:13
๐
|
SketchCow |
Hi |
|
18:16
๐
|
joepie91 |
ohai |
|
19:01
๐
|
closure |
guy claims to have 8tb of geocities http://www.reddit.com/r/DataHoarder/comments/27y8ux/standing_up_40tbs_of_data_for_fun_times/ |
|
19:47
๐
|
Nemo_bis |
And 80 % of it is in multiple copies of the stock geocities gifs? :P |
|
19:49
๐
|
Nemo_bis |
รยซWe don't the dedup the content in any way.รยป So might be. |
|
20:23
๐
|
SN4T14 |
They DON'T the dedup? :p |
|
20:44
๐
|
Nemo_bis |
dededup? |
|
20:45
๐
|
Nemo_bis |
do not-the-dedup? |
|
20:45
๐
|
SN4T14 |
I dedededup all my files. |
|
21:33
๐
|
closure |
they warc and get all the dups same as archiveteam does these days |
|
21:33
๐
|
closure |
seems like it would be a very nice dataset to pull into wayback |
|
21:44
๐
|
DFJustin |
"We got the archive from the archive team in the first case, so I would hope its the same" |
|
21:46
๐
|
closure |
huh, I didn't think the geocities rip was anywhere near 8tb |
|
22:53
๐
|
midas |
this could be me being daft, but the internetarchive python script for uploading doesnt let you specify a certain catagory? like video, web etc etc? |
|
22:59
๐
|
DFJustin |
--metadata="mediatype:movies" --metadata="collection:opensource_movies" |
|
22:59
๐
|
DFJustin |
I would assume |
|
23:11
๐
|
midas |
yep, me being daft most likely. and sleepdeprived |
|
23:15
๐
|
midas |
goodnight custodis pro datus, or keepers of data |
|
23:27
๐
|
DFJustin |
custodes pro datis |
|
23:31
๐
|
is4 |
https://maps.google.com/locationhistory/b/0 |
|
23:31
๐
|
is4 |
I am horrified by what google knows about my comings and goings |
|
23:57
๐
|
dashcloud |
so, my laptop froze and I had to power it off, killing my ongoing wget-warc grab. If I re-run the command, will it overwrite the existing warc or create a new one? |