Time |
Nickname |
Message |
01:16
๐
|
dashcloud |
so, apparently you do need to have word: word for the warc header command |
01:18
๐
|
dashcloud |
holy crap- Angelfire is still live and changing site, and not just a repository of 90's websites |
01:19
๐
|
Famicoman |
My old angelfire is still up, kinda |
01:21
๐
|
dashcloud |
there's a split between the old style and new style (which you can still get- they sell hosting and websites still) |
01:22
๐
|
dashcloud |
apparently at some point angelfire and tripod became linked? (corporately at least) |
01:26
๐
|
Smiley |
dashcloud: they all lync sites arnt they?? |
01:26
๐
|
Smiley |
i think thats how it was spekt, at work, cant check. |
01:27
๐
|
dashcloud |
that's certainly my hope for my archive job |
01:28
๐
|
dashcloud |
hopefully with a small number of sites, I can cover a huge amount of angelfire just by visiting every angelfire link on the page |
07:51
๐
|
garyrh |
"balrog> does anyone know if Cameron Kaiser (of tenfourfox/classilla) is on twitter?" |
07:52
๐
|
garyrh |
https://twitter.com/doctorlinguist , but it's protected. |
07:54
๐
|
garyrh |
his last tweet was in 2012, so I guess he doesn't use twitter anymore |
12:14
๐
|
balrog |
garyrh: :/ ok |
12:14
๐
|
balrog |
I do follow him |
12:14
๐
|
balrog |
oh, he's active on ADN |
12:50
๐
|
Nemo_bis |
Please add to archivebot, I'm told it's going offline http://137.204.24.205/cis13b/bsco3/Default.asp |
13:02
๐
|
ivan` |
Nemo_bis: the whole domain? |
13:04
๐
|
Nemo_bis |
ivan`: I'm not sure, at least that cis13b/ directory which has some irreplaceable stuff |
14:36
๐
|
joepie91 |
ivan`, Nemo_bis: is that already being taken care of? |
14:59
๐
|
ivan` |
only partially |
15:29
๐
|
joepie91 |
ivan`: how partially? as in, is there something that needs to be done still :P |
15:52
๐
|
godane |
so i think i fucked my cbsradio collection some how |
15:53
๐
|
godane |
https://archive.org/details/cbsradio-hourly-2009-07-30 |
15:53
๐
|
godane |
no mp3 at all |
15:53
๐
|
godane |
all cause i was trying to fix a typo |
15:53
๐
|
godane |
:'( |
15:54
๐
|
joepie91 |
that reminds me |
15:54
๐
|
joepie91 |
godane: |
15:54
๐
|
joepie91 |
I have a -lot- of podcasts still |
15:54
๐
|
joepie91 |
from nhk |
15:54
๐
|
joepie91 |
did you ever end up fetching those? |
15:55
๐
|
godane |
i don't think so |
15:55
๐
|
joepie91 |
maybe you should :P |
15:55
๐
|
godane |
there on your remote sever right |
15:56
๐
|
godane |
cause other wise i will only be able to get the last 7 days |
15:57
๐
|
godane |
SketchCo1: are you moving my cbsradio items right now? |
15:58
๐
|
godane |
cause i'm finding items that 0 files in them |
15:58
๐
|
godane |
but others have files |
16:01
๐
|
godane |
joepie91: maybe you should upload your collection of nhk mp3s |
16:01
๐
|
godane |
they way its one less thing for me to do |
16:03
๐
|
godane |
i know remember why that hourly one doesn't have mp3s |
16:03
๐
|
joepie91 |
godane: ah, you're short on time? |
16:04
๐
|
godane |
i don't remember the rsync |
16:04
๐
|
joepie91 |
and yes, they're on a server of mine |
16:04
๐
|
joepie91 |
rsync://croissant.cryto.net/nhk |
16:04
๐
|
joepie91 |
they need deduplication though |
16:04
๐
|
joepie91 |
(you can tell from the last modified timestamp) |
16:04
๐
|
joepie91 |
if you don't have the time, let me know and I'll put it on my todo |
16:08
๐
|
exmic |
Nemo_bis, ivan`, joepie91: that directory http://137.204.24.205/cis13b/ is it being grabbed or what? |
16:19
๐
|
joepie91 |
exmic: that's what I was trying to establish |
16:19
๐
|
joepie91 |
oh |
16:19
๐
|
exmic |
right |
16:19
๐
|
joepie91 |
[18:12] <+ATGoKart> balrog: Your job for http://cis.alma.unibo.it/cis13b/bsco3/Default.asp has finished. |
16:19
๐
|
exmic |
that's /cis13b/bsco3 |
16:19
๐
|
joepie91 |
that's why my ctrl+F didn't find it then |
16:19
๐
|
joepie91 |
yes |
16:19
๐
|
joepie91 |
the /cis13b/ dir doesn't have a listing |
16:19
๐
|
exmic |
not /cis13b as requested |
16:19
๐
|
exmic |
ah ok |
16:19
๐
|
joepie91 |
so unless you have a db of URLs handy... |
16:20
๐
|
exmic |
nope |
16:20
๐
|
joepie91 |
coming to think of it |
16:20
๐
|
joepie91 |
we could ask IA |
16:20
๐
|
yipdw |
I should add a !ia command to archivebot |
16:20
๐
|
yipdw |
all it does is check the URL and tell you whether or not wayback has it |
16:20
๐
|
yipdw |
and/or is blocked by robots txt etc |
16:20
๐
|
joepie91 |
exmic: https://web.archive.org/web/*/http://cis.alma.unibo.it/cis13b/* |
16:20
๐
|
yipdw |
(I should get back to working on it, period) |
16:20
๐
|
joepie91 |
yipdw: godo it! :P |
16:21
๐
|
joepie91 |
go do * |
16:21
๐
|
exmic |
I don't have time to supervise an archivebot job this week |
16:21
๐
|
joepie91 |
and be sure to make it return the last archival date |
16:21
๐
|
yipdw |
yeah, I'll get back to it once I have less crap to do |
16:22
๐
|
Nemo_bis |
The most interesting stuff is in that subdir AFAIK |
16:22
๐
|
Nemo_bis |
Sorry for confusion |
18:13
๐
|
SketchCow |
Hi |
18:16
๐
|
joepie91 |
ohai |
19:01
๐
|
closure |
guy claims to have 8tb of geocities http://www.reddit.com/r/DataHoarder/comments/27y8ux/standing_up_40tbs_of_data_for_fun_times/ |
19:47
๐
|
Nemo_bis |
And 80 % of it is in multiple copies of the stock geocities gifs? :P |
19:49
๐
|
Nemo_bis |
รยซWe don't the dedup the content in any way.รยป So might be. |
20:23
๐
|
SN4T14 |
They DON'T the dedup? :p |
20:44
๐
|
Nemo_bis |
dededup? |
20:45
๐
|
Nemo_bis |
do not-the-dedup? |
20:45
๐
|
SN4T14 |
I dedededup all my files. |
21:33
๐
|
closure |
they warc and get all the dups same as archiveteam does these days |
21:33
๐
|
closure |
seems like it would be a very nice dataset to pull into wayback |
21:44
๐
|
DFJustin |
"We got the archive from the archive team in the first case, so I would hope its the same" |
21:46
๐
|
closure |
huh, I didn't think the geocities rip was anywhere near 8tb |
22:53
๐
|
midas |
this could be me being daft, but the internetarchive python script for uploading doesnt let you specify a certain catagory? like video, web etc etc? |
22:59
๐
|
DFJustin |
--metadata="mediatype:movies" --metadata="collection:opensource_movies" |
22:59
๐
|
DFJustin |
I would assume |
23:11
๐
|
midas |
yep, me being daft most likely. and sleepdeprived |
23:15
๐
|
midas |
goodnight custodis pro datus, or keepers of data |
23:27
๐
|
DFJustin |
custodes pro datis |
23:31
๐
|
is4 |
https://maps.google.com/locationhistory/b/0 |
23:31
๐
|
is4 |
I am horrified by what google knows about my comings and goings |
23:57
๐
|
dashcloud |
so, my laptop froze and I had to power it off, killing my ongoing wget-warc grab. If I re-run the command, will it overwrite the existing warc or create a new one? |