#archiveteam 2014-06-12,Thu

โ†‘back Search

Time Nickname Message
01:16 ๐Ÿ”— dashcloud so, apparently you do need to have word: word for the warc header command
01:18 ๐Ÿ”— dashcloud holy crap- Angelfire is still live and changing site, and not just a repository of 90's websites
01:19 ๐Ÿ”— Famicoman My old angelfire is still up, kinda
01:21 ๐Ÿ”— dashcloud there's a split between the old style and new style (which you can still get- they sell hosting and websites still)
01:22 ๐Ÿ”— dashcloud apparently at some point angelfire and tripod became linked? (corporately at least)
01:26 ๐Ÿ”— Smiley dashcloud: they all lync sites arnt they??
01:26 ๐Ÿ”— Smiley i think thats how it was spekt, at work, cant check.
01:27 ๐Ÿ”— dashcloud that's certainly my hope for my archive job
01:28 ๐Ÿ”— dashcloud hopefully with a small number of sites, I can cover a huge amount of angelfire just by visiting every angelfire link on the page
07:51 ๐Ÿ”— garyrh "balrog> does anyone know if Cameron Kaiser (of tenfourfox/classilla) is on twitter?"
07:52 ๐Ÿ”— garyrh https://twitter.com/doctorlinguist , but it's protected.
07:54 ๐Ÿ”— garyrh his last tweet was in 2012, so I guess he doesn't use twitter anymore
12:14 ๐Ÿ”— balrog garyrh: :/ ok
12:14 ๐Ÿ”— balrog I do follow him
12:14 ๐Ÿ”— balrog oh, he's active on ADN
12:50 ๐Ÿ”— Nemo_bis Please add to archivebot, I'm told it's going offline http://137.204.24.205/cis13b/bsco3/Default.asp
13:02 ๐Ÿ”— ivan` Nemo_bis: the whole domain?
13:04 ๐Ÿ”— Nemo_bis ivan`: I'm not sure, at least that cis13b/ directory which has some irreplaceable stuff
14:36 ๐Ÿ”— joepie91 ivan`, Nemo_bis: is that already being taken care of?
14:59 ๐Ÿ”— ivan` only partially
15:29 ๐Ÿ”— joepie91 ivan`: how partially? as in, is there something that needs to be done still :P
15:52 ๐Ÿ”— godane so i think i fucked my cbsradio collection some how
15:53 ๐Ÿ”— godane https://archive.org/details/cbsradio-hourly-2009-07-30
15:53 ๐Ÿ”— godane no mp3 at all
15:53 ๐Ÿ”— godane all cause i was trying to fix a typo
15:53 ๐Ÿ”— godane :'(
15:54 ๐Ÿ”— joepie91 that reminds me
15:54 ๐Ÿ”— joepie91 godane:
15:54 ๐Ÿ”— joepie91 I have a -lot- of podcasts still
15:54 ๐Ÿ”— joepie91 from nhk
15:54 ๐Ÿ”— joepie91 did you ever end up fetching those?
15:55 ๐Ÿ”— godane i don't think so
15:55 ๐Ÿ”— joepie91 maybe you should :P
15:55 ๐Ÿ”— godane there on your remote sever right
15:56 ๐Ÿ”— godane cause other wise i will only be able to get the last 7 days
15:57 ๐Ÿ”— godane SketchCo1: are you moving my cbsradio items right now?
15:58 ๐Ÿ”— godane cause i'm finding items that 0 files in them
15:58 ๐Ÿ”— godane but others have files
16:01 ๐Ÿ”— godane joepie91: maybe you should upload your collection of nhk mp3s
16:01 ๐Ÿ”— godane they way its one less thing for me to do
16:03 ๐Ÿ”— godane i know remember why that hourly one doesn't have mp3s
16:03 ๐Ÿ”— joepie91 godane: ah, you're short on time?
16:04 ๐Ÿ”— godane i don't remember the rsync
16:04 ๐Ÿ”— joepie91 and yes, they're on a server of mine
16:04 ๐Ÿ”— joepie91 rsync://croissant.cryto.net/nhk
16:04 ๐Ÿ”— joepie91 they need deduplication though
16:04 ๐Ÿ”— joepie91 (you can tell from the last modified timestamp)
16:04 ๐Ÿ”— joepie91 if you don't have the time, let me know and I'll put it on my todo
16:08 ๐Ÿ”— exmic Nemo_bis, ivan`, joepie91: that directory http://137.204.24.205/cis13b/ is it being grabbed or what?
16:19 ๐Ÿ”— joepie91 exmic: that's what I was trying to establish
16:19 ๐Ÿ”— joepie91 oh
16:19 ๐Ÿ”— exmic right
16:19 ๐Ÿ”— joepie91 [18:12] <+ATGoKart> balrog: Your job for http://cis.alma.unibo.it/cis13b/bsco3/Default.asp has finished.
16:19 ๐Ÿ”— exmic that's /cis13b/bsco3
16:19 ๐Ÿ”— joepie91 that's why my ctrl+F didn't find it then
16:19 ๐Ÿ”— joepie91 yes
16:19 ๐Ÿ”— joepie91 the /cis13b/ dir doesn't have a listing
16:19 ๐Ÿ”— exmic not /cis13b as requested
16:19 ๐Ÿ”— exmic ah ok
16:19 ๐Ÿ”— joepie91 so unless you have a db of URLs handy...
16:20 ๐Ÿ”— exmic nope
16:20 ๐Ÿ”— joepie91 coming to think of it
16:20 ๐Ÿ”— joepie91 we could ask IA
16:20 ๐Ÿ”— yipdw I should add a !ia command to archivebot
16:20 ๐Ÿ”— yipdw all it does is check the URL and tell you whether or not wayback has it
16:20 ๐Ÿ”— yipdw and/or is blocked by robots txt etc
16:20 ๐Ÿ”— joepie91 exmic: https://web.archive.org/web/*/http://cis.alma.unibo.it/cis13b/*
16:20 ๐Ÿ”— yipdw (I should get back to working on it, period)
16:20 ๐Ÿ”— joepie91 yipdw: godo it! :P
16:21 ๐Ÿ”— joepie91 go do *
16:21 ๐Ÿ”— exmic I don't have time to supervise an archivebot job this week
16:21 ๐Ÿ”— joepie91 and be sure to make it return the last archival date
16:21 ๐Ÿ”— yipdw yeah, I'll get back to it once I have less crap to do
16:22 ๐Ÿ”— Nemo_bis The most interesting stuff is in that subdir AFAIK
16:22 ๐Ÿ”— Nemo_bis Sorry for confusion
18:13 ๐Ÿ”— SketchCow Hi
18:16 ๐Ÿ”— joepie91 ohai
19:01 ๐Ÿ”— closure guy claims to have 8tb of geocities http://www.reddit.com/r/DataHoarder/comments/27y8ux/standing_up_40tbs_of_data_for_fun_times/
19:47 ๐Ÿ”— Nemo_bis And 80 % of it is in multiple copies of the stock geocities gifs? :P
19:49 ๐Ÿ”— Nemo_bis ร‚ยซWe don't the dedup the content in any way.ร‚ยป So might be.
20:23 ๐Ÿ”— SN4T14 They DON'T the dedup? :p
20:44 ๐Ÿ”— Nemo_bis dededup?
20:45 ๐Ÿ”— Nemo_bis do not-the-dedup?
20:45 ๐Ÿ”— SN4T14 I dedededup all my files.
21:33 ๐Ÿ”— closure they warc and get all the dups same as archiveteam does these days
21:33 ๐Ÿ”— closure seems like it would be a very nice dataset to pull into wayback
21:44 ๐Ÿ”— DFJustin "We got the archive from the archive team in the first case, so I would hope its the same"
21:46 ๐Ÿ”— closure huh, I didn't think the geocities rip was anywhere near 8tb
22:53 ๐Ÿ”— midas this could be me being daft, but the internetarchive python script for uploading doesnt let you specify a certain catagory? like video, web etc etc?
22:59 ๐Ÿ”— DFJustin --metadata="mediatype:movies" --metadata="collection:opensource_movies"
22:59 ๐Ÿ”— DFJustin I would assume
23:11 ๐Ÿ”— midas yep, me being daft most likely. and sleepdeprived
23:15 ๐Ÿ”— midas goodnight custodis pro datus, or keepers of data
23:27 ๐Ÿ”— DFJustin custodes pro datis
23:31 ๐Ÿ”— is4 https://maps.google.com/locationhistory/b/0
23:31 ๐Ÿ”— is4 I am horrified by what google knows about my comings and goings
23:57 ๐Ÿ”— dashcloud so, my laptop froze and I had to power it off, killing my ongoing wget-warc grab. If I re-run the command, will it overwrite the existing warc or create a new one?

irclogger-viewer