[00:00] <nickname_> I'm suprised that this 11 year old file still on NPR.org and still works in RealPlayer
[00:01] <dashcloud> hmm- let me try to track down an mplayer install and the codec pack- that should handle the file then
[00:02] <nickname_> Here is the one with the weird file format on archive.org: https://archive.org/details/MorningEditionForDecember22200520051222Me14
[00:03] <nickname_> (it's not in the player, you have go to show all)
[00:09] <godane> nickname_: i will look at npr
[00:10] <godane> i have been doing it: https://archive.org/details/npr-morning-edition
[00:10] <nickname_> okay, i've been doing it manually with firefox using FlashGot and DownThemAll...
[00:11] <nickname_> I saw that collection and decided to start from January 2005, but when you click on some of them they are unavailible for some reason
[00:11] <nickname_> (copyright?)
[00:12] <godane> i blute force the paths using wget
[00:12] <godane> so sometimes there is not mp3 but there is real media
[00:13] <godane> or sometimes its real media but no mp3
[00:13] <nickname_> on each page there is a download button for usually mp3 that is a direct link
[00:13] <godane> for Morning Edition yes
[00:13] <nickname_> this page (http://www.npr.org/programs/morning-edition/2005/12/22/12927340/?showDate=2005-12-22) is one of those annoying real media ones
[00:13] <godane> for show like Talk of the Nation there was real media for a long time
[00:15] <nickname_> I tried wget, but there was no easy way to do it quickly
[00:15] <godane> https://archive.org/details/npr-talk-of-the-nation
[00:16] <godane> talk of the nation real media files after august 20, 2001 don't encode into mp3 well
[00:16] <nickname_> some of those are unavailible as well
[00:17] <dashcloud> holy shit- mplayer plays that first one just fin- 20051222_me_13.rm
[00:17] <nickname_> what!
[00:17] <godane> mplayer plays the real media files fine
[00:18] <godane> it just doesn't re-encode on archive.org right
[00:18] <nickname_> hmm
[00:20] *** BlueMaxim has quit IRC (Read error: Operation timed out)
[00:28] *** ris has quit IRC ()
[00:57] *** VADemon has quit IRC (Quit: left4dead)
[00:57] <nickname_> godane: have you tried converting the *.rm file with mencoder to mp3?
[01:01] <dashcloud> maybe it would be better to convert to a .wav file, then upload that so IA can make derivatives?
[01:09] <godane> i upload the original files
[01:10] <godane> the derive problem started around august 20 2001
[01:10] <godane> after that IA couldn't re-encode the real media files
[01:10] <joepie91> dashcloud: better to do it as FLAC then
[01:11] <joepie91> space-waste-wise
[01:11] <godane> i prefer just to upload original files
[01:11] <joepie91> sure, but you can upload both the original and a FLAC version
[01:11] <joepie91> and let it derive from the FLAC
[01:12] <godane> then IA can fix it this later
[01:12] <godane> i figure its IA derive process problem
[01:15] *** tomwsmf-a has joined #archiveteam-bs
[01:16] <godane> the one that works before they stop deriving : https://archive.org/details/npr-talk-of-the-nation-08-20-2001
[01:19] *** JesseW has joined #archiveteam-bs
[01:59] *** j08nY has quit IRC (Quit: Leaving)
[02:00] <nickname_> that one says there are issues with this items content
[02:02] <SketchCow> Safe at home.
[02:05] <godane> nickname_: i can see the item
[02:05] <godane> also some of my stuff i upload goes to podcast collection
[02:05] <godane> that collection makes things disappear publicly for some reason
[02:07] <nickname_> you can see it, but i can't (https://i.imgur.com/fQLJU5F.png)
[02:07] <godane> i know
[02:07] <godane> it works when i'm login
[02:08] <godane> looks like i can't even download it on wget
[02:08] <godane> *using wget
[02:08] <godane> SketchCow: we need to fix that podcast collection
[02:09] <godane> even if we have move the collections out of podcast collection so there viewer/downloadable
[02:12] <SketchCow> Whut
[02:14] <godane> its a bug from a log time ago
[02:15] <godane> the podcast collection i think blocks non-login uses from downloading the items in it
[02:15] <godane> SketchCow: you noticed this bug to at one point
[02:21] <godane> fun fact: i was around 617k around jan 22 2016
[02:21] <godane> i'm at 718k so i got up about 101k in about 5 months
[02:23] <godane> so i figure the goal would be to get around to 850k+ before the end of the year
[02:26] *** Start_ has joined #archiveteam-bs
[02:26] *** Start has quit IRC (Ping timeout: 260 seconds)
[02:37] *** acridAxid has quit IRC (marauder)
[02:38] *** acridAxid has joined #archiveteam-bs
[02:49] <godane> so i got a book called The official AT&T Worldnet Web Discovery Guide
[02:49] <godane> it came with a cd with a free month of at&t worldnet
[02:51] <DoomTay> Now I'm thinking of http://motherboard.vice.com/read/this-dude-is-collecting-your-old-aol-cds
[02:51] <DoomTay> Wait...
[03:01] <dashcloud> glad to hear you made it home safely SketchCow
[03:12] <godane> i'm uploading the iso now
[03:14] <godane> i'm listening to the All Things Considered -- A review of the World Wide Web and RealAudio!
[03:14] <godane> 1995-06-06 episode
[03:23] <godane> https://archive.org/details/The_Official_ATT_Worldnet_Web_Discovery_Guide_CD
[03:24] <DoomTay> You should scan the CD cover as well
[03:27] *** nickname_ has quit IRC (Ping timeout: 492 seconds)
[03:29] *** xXx_ndidd has joined #archiveteam-bs
[03:30] *** tomwsmf-a has quit IRC (Read error: Operation timed out)
[03:32] *** ndiddy has quit IRC (Ping timeout: 244 seconds)
[04:01] *** vitzli has joined #archiveteam-bs
[04:06] *** Sk1d has quit IRC (Ping timeout: 250 seconds)
[04:12] *** FalconK has quit IRC (Ping timeout: 260 seconds)
[04:13] *** FalconK has joined #archiveteam-bs
[04:15] *** Sk1d has joined #archiveteam-bs
[04:22] *** tomwsmf-a has joined #archiveteam-bs
[04:31] *** tomwsmf-a has quit IRC (Ping timeout: 258 seconds)
[04:52] *** DoomTay has quit IRC (Quit: Page closed)
[04:52] <godane> https://archive.org/details/Signing_For_Dummies_CD
[05:07] *** BlueMaxim has joined #archiveteam-bs
[05:16] <godane> so i'm grabbing Analytical Chemistry journals
[05:28] <godane> someone else can go after it: https://thepiratebay.org/torrent/13670809/Analytical_Chemistry_(1929-2015)
[05:28] <godane> this guy has a lot of interesting stuff: https://thepiratebay.org/user/clouderone/0/5/0
[05:36] *** JesseW has quit IRC (Ping timeout: 370 seconds)
[06:07] *** dashcloud has quit IRC (Read error: Operation timed out)
[06:09] *** dashcloud has joined #archiveteam-bs
[06:11] <godane> starting to upload NASA docs again: https://archive.org/details/NASA_NTRS_Archive_19630002631
[06:24] <godane> i'm also starting to upload my deadspin.com sitemap urls
[06:24] <godane> i'm starting to download 2010 urls
[06:48] <ivan`> when you notice that your ubuntu 16.04 tmuxes are locking up, grep this log and read what I wrote here :-) https://github.com/ludios/grab-site/commit/ca7bc71045784b1cfda2a6d1d6dbc44a4000aa13
[07:39] *** is- has quit IRC (Ping timeout: 362 seconds)
[07:46] *** schbirid has joined #archiveteam-bs
[08:00] *** vitzli has quit IRC (Quit: Leaving)
[08:31] *** is- has joined #archiveteam-bs
[08:53] *** Stilett0 has joined #archiveteam-bs
[08:53] *** Stiletto has quit IRC (Read error: Operation timed out)
[09:26] *** Zebranky has quit IRC (Read error: Operation timed out)
[09:36] *** Zebranky has joined #archiveteam-bs
[09:48] *** vitzli has joined #archiveteam-bs
[11:51] <midas> ivan`: ubuntu 16.04, systemd?
[11:54] *** jut has joined #archiveteam-bs
[12:38] *** xXx_ndidd has quit IRC (Read error: Connection reset by peer)
[12:41] *** j08nY has joined #archiveteam-bs
[12:47] *** VADemon has joined #archiveteam-bs
[13:36] *** nickname_ has joined #archiveteam-bs
[13:43] *** VADemon has quit IRC (Quit: left4dead)
[13:44] *** nickname_ has quit IRC (Read error: Operation timed out)
[13:50] *** sep332 has quit IRC (Quit: Konversation terminated!)
[14:00] *** jut has quit IRC (Read error: Connection reset by peer)
[14:25] *** BlueMaxim has quit IRC (Quit: Leaving)
[14:25] *** aschmitz has quit IRC (Read error: Operation timed out)
[15:18] *** VADemon has joined #archiveteam-bs
[15:30] *** DoomTay has joined #archiveteam-bs
[15:36] <luckcolor> guys any good tool to download an entire archive.org collection?
[15:38] <DoomTay> So we're clear, you mean a single collection, not the entire archive.org database?
[15:38] <luckcolor> ofc a single collection :P
[15:39] <luckcolor>  entire archive.org collection == entire collection?
[15:39] <luckcolor> :P
[15:39] <JW_work> luckcolor: Use the internetarchive python module, and some scripting around it
[15:39] <JW_work> MacNN is going away — http://www.macnn.com/articles/16/06/20/long.time.staff.winding.up.two.decades.of.service.at.the.end.of.june.134716/
[15:39] * luckcolor doesn't use python
[15:40] <luckcolor> but i will check out how that module is would prefer other stuff
[15:46] <JW_work> well, you're going to need to do *some* scripting, afaik
[15:46] <JW_work> what language do you prefer?
[15:47] <luckcolor> What languages do you know? maybe it's easier :)
[15:48] <luckcolor> well found one https://github.com/ghalfacree/bash-scripts/blob/master/archivedownload.sh
[15:49] <JW_work> add it to the wiki if it isn't already there
[15:49] <luckcolor> sigh wiki editing incoming....
[15:52] <luckcolor> actually this script requires you to tell it wich file type to download
[15:53] *** j08nY has quit IRC (Quit: Leaving)
[15:53] <luckcolor>  ia download --search 'collection:terroroftinytown'
[15:54] <JW_work> You want to download the zip files
[15:59] <luckcolor> ia download --search 'subject:terroroftinytown'
[15:59] <luckcolor> error: the query "'subject:terroroftinytown'" returned no results
[16:00] <JW_work> I think you want 'collection:urlteam'
[16:00] <luckcolor> trying
[16:02] <luckcolor> ia download --search subject:terroroftinytown
[16:02] <luckcolor> urlteam_2014-11-06-20-33-23 (1/512): dd
[16:02] <luckcolor> had to remove the unix like quotation marks
[16:02] <JW_work> ha
[16:02] <luckcolor> apparently windows or the script doens't like them
[16:02] <luckcolor> :P
[16:03] <luckcolor> side note: that's the worst looking chracter for a progress bar
[16:03] <luckcolor> *character
[16:03] <luckcolor> oh well
[16:04] <JW_work> heh, feel free to make a PR
[16:27] *** vitzli has quit IRC (Quit: Leaving)
[16:29] *** Rickster has quit IRC (Quit: ZNC 1.6.1 - http://znc.in)
[16:30] *** Rickster has joined #archiveteam-bs
[17:02] <luckcolor> Also it's ultra slow don't know if it's my connection but will run it on my server
[17:02] <luckcolor> 200mbit/s go
[17:02] <luckcolor> !
[17:06] *** metalcamp has joined #archiveteam-bs
[17:07] <luckcolor> still incredibily slow i will leave it running overnight
[17:07] <luckcolor> JW_work perhaps do you know how big is the collection?
[17:11] <luckcolor> ok on my server it says it had errors
[17:12] * luckcolor decides to change tool as this one seems to be stupid
[17:15] <JW_work> less than a terabyte
[17:16] <JW_work> I'm seeding the whole collection, also — so you could get it that way if you prefer
[17:16] <JW_work> but you'll still need to grab the torrents (or magnet links)
[17:17] <luckcolor> JW_work good news then because i don't have enough space anywhere 
[17:17] <DoomTay> 200mbits is better than what I have
[17:17] <luckcolor> it's on the server
[17:17] <luckcolor> and apparently it says error
[17:18] <luckcolor> SIGH python
[17:18] <JW_work> luckcolor: how much space do you have?
[17:18] <luckcolor> not enough 250 gb here
[17:18] <luckcolor> 36 on the server
[17:18] <JW_work> 36gb? yeah, you'll need more than that :-)
[17:19] <luckcolor> well ssd volumes aren't cheap
[17:19] <JW_work> why does it need to be an SSD?
[17:20] <luckcolor> Scaleway
[17:20] <luckcolor> only ssds there
[17:20] <JW_work> ah, makes sense
[17:32] <yipdw> must be the program, can't be the network or infrastructure or anything else, oh no
[17:34] <luckcolor> i mean i don't have the space anyway
[17:48] *** Start_ is now known as Start
[17:57] <espes__> true? https://twitter.com/marcan42/status/745183129283919872
[17:57] <espes__> 'Why does RT get to retroactively edit YouTube videos, keeping the URL, even though YT says that is "not possible"?'
[17:58] <espes__> https://meduza.io/en/news/2016/06/20/russian-state-television-accidentally-broadcasts-evidence-that-moscow-uses-cluster-bombs-in-syria
[18:14] <arkiver> I still see the big barrels here https://www.youtube.com/watch?v=dNbIRD8Cq48
[18:15] <espes__> see the note
[18:16] <espes__> guess youtube is mutable -_-
[18:18] <schbirid> i thought partners can replace videos
[18:19] <yipdw> if RT didn't get to rewrite history in their own image it wouldn't really be RT
[18:20] <xmc> harsh
[18:20] <yipdw> wabam
[18:32] *** tomwsmf-a has joined #archiveteam-bs
[18:43] <DoomTay> The sad thing about sites going down is that even if they were perfectly saved, they would still be harder to search
[18:43] <xmc> ya
[18:55] <JW_work> eh, not always — once they are converted into downloadable dumps, it can be *easier* to search them
[18:55] <JW_work> just because we don't *currently* have fabulous WARC searching tools doesn't mean that such things can't be written
[18:56] <xmc> yeah you could build a thing that pulls unusual words out of the warc'd html and adds it to a cdx-style index
[19:16] *** j08nY has joined #archiveteam-bs
[19:26] *** nickname_ has joined #archiveteam-bs
[19:26] *** antomati_ has joined #archiveteam-bs
[19:26] *** swebb sets mode: +o antomati_
[19:26] *** RichardG_ has joined #archiveteam-bs
[19:28] *** godane has quit IRC (Read error: Operation timed out)
[19:29] *** godane has joined #archiveteam-bs
[19:29] *** antomatic has quit IRC (Read error: Operation timed out)
[19:29] *** Igloo has quit IRC (Read error: Operation timed out)
[19:29] *** whopper has quit IRC (Read error: Operation timed out)
[19:29] *** Igloo has joined #archiveteam-bs
[19:31] *** RichardG has quit IRC (Read error: Operation timed out)
[19:33] *** Whopper has joined #archiveteam-bs
[20:20] *** nickname_ has quit IRC (Read error: Operation timed out)
[20:35] *** anjacks0n has joined #archiveteam-bs
[20:49] *** schbirid has quit IRC (Quit: Leaving)
[21:04] *** ris has joined #archiveteam-bs
[21:36] *** metalcamp has quit IRC (Quit: Bye)
[21:44] *** ndiddy has joined #archiveteam-bs
[21:53] <xmc> ticalc.org turned 20 today and we're not closing the site http://www.ticalc.org/archives/news/articles/14/148/148917.html
[21:53] *** Simpbrain has quit IRC (Read error: Connection reset by peer)
[21:53] *** anjacks0n has quit IRC (anjacks0n)
[22:05] *** Sue_ has quit IRC (Ping timeout: 260 seconds)
[22:27] <JW_work> we should probably drop it in archivebot anyway :-P
[22:28] <JW_work> (but likely not with high priority)
[22:28] <DoomTay> And to think two jobs "failed" on me recently
[22:39] *** dashcloud has quit IRC (Read error: Operation timed out)
[22:43] *** dashcloud has joined #archiveteam-bs
[23:21] *** tomwsmf-a has quit IRC (Read error: Operation timed out)
[23:35] *** Whopper has quit IRC (Read error: Operation timed out)
[23:40] *** tomwsmf-a has joined #archiveteam-bs
[23:57] *** Sue_ has joined #archiveteam-bs