[00:00] I'm suprised that this 11 year old file still on NPR.org and still works in RealPlayer [00:01] hmm- let me try to track down an mplayer install and the codec pack- that should handle the file then [00:02] Here is the one with the weird file format on archive.org: https://archive.org/details/MorningEditionForDecember22200520051222Me14 [00:03] (it's not in the player, you have go to show all) [00:09] nickname_: i will look at npr [00:10] i have been doing it: https://archive.org/details/npr-morning-edition [00:10] okay, i've been doing it manually with firefox using FlashGot and DownThemAll... [00:11] I saw that collection and decided to start from January 2005, but when you click on some of them they are unavailible for some reason [00:11] (copyright?) [00:12] i blute force the paths using wget [00:12] so sometimes there is not mp3 but there is real media [00:13] or sometimes its real media but no mp3 [00:13] on each page there is a download button for usually mp3 that is a direct link [00:13] for Morning Edition yes [00:13] this page (http://www.npr.org/programs/morning-edition/2005/12/22/12927340/?showDate=2005-12-22) is one of those annoying real media ones [00:13] for show like Talk of the Nation there was real media for a long time [00:15] I tried wget, but there was no easy way to do it quickly [00:15] https://archive.org/details/npr-talk-of-the-nation [00:16] talk of the nation real media files after august 20, 2001 don't encode into mp3 well [00:16] some of those are unavailible as well [00:17] holy shit- mplayer plays that first one just fin- 20051222_me_13.rm [00:17] what! [00:17] mplayer plays the real media files fine [00:18] it just doesn't re-encode on archive.org right [00:18] hmm [00:20] *** BlueMaxim has quit IRC (Read error: Operation timed out) [00:28] *** ris has quit IRC () [00:57] *** VADemon has quit IRC (Quit: left4dead) [00:57] godane: have you tried converting the *.rm file with mencoder to mp3? [01:01] maybe it would be better to convert to a .wav file, then upload that so IA can make derivatives? [01:09] i upload the original files [01:10] the derive problem started around august 20 2001 [01:10] after that IA couldn't re-encode the real media files [01:10] dashcloud: better to do it as FLAC then [01:11] space-waste-wise [01:11] i prefer just to upload original files [01:11] sure, but you can upload both the original and a FLAC version [01:11] and let it derive from the FLAC [01:12] then IA can fix it this later [01:12] i figure its IA derive process problem [01:15] *** tomwsmf-a has joined #archiveteam-bs [01:16] the one that works before they stop deriving : https://archive.org/details/npr-talk-of-the-nation-08-20-2001 [01:19] *** JesseW has joined #archiveteam-bs [01:59] *** j08nY has quit IRC (Quit: Leaving) [02:00] that one says there are issues with this items content [02:02] Safe at home. [02:05] nickname_: i can see the item [02:05] also some of my stuff i upload goes to podcast collection [02:05] that collection makes things disappear publicly for some reason [02:07] you can see it, but i can't (https://i.imgur.com/fQLJU5F.png) [02:07] i know [02:07] it works when i'm login [02:08] looks like i can't even download it on wget [02:08] *using wget [02:08] SketchCow: we need to fix that podcast collection [02:09] even if we have move the collections out of podcast collection so there viewer/downloadable [02:12] Whut [02:14] its a bug from a log time ago [02:15] the podcast collection i think blocks non-login uses from downloading the items in it [02:15] SketchCow: you noticed this bug to at one point [02:21] fun fact: i was around 617k around jan 22 2016 [02:21] i'm at 718k so i got up about 101k in about 5 months [02:23] so i figure the goal would be to get around to 850k+ before the end of the year [02:26] *** Start_ has joined #archiveteam-bs [02:26] *** Start has quit IRC (Ping timeout: 260 seconds) [02:37] *** acridAxid has quit IRC (marauder) [02:38] *** acridAxid has joined #archiveteam-bs [02:49] so i got a book called The official AT&T Worldnet Web Discovery Guide [02:49] it came with a cd with a free month of at&t worldnet [02:51] Now I'm thinking of http://motherboard.vice.com/read/this-dude-is-collecting-your-old-aol-cds [02:51] Wait... [03:01] glad to hear you made it home safely SketchCow [03:12] i'm uploading the iso now [03:14] i'm listening to the All Things Considered -- A review of the World Wide Web and RealAudio! [03:14] 1995-06-06 episode [03:23] https://archive.org/details/The_Official_ATT_Worldnet_Web_Discovery_Guide_CD [03:24] You should scan the CD cover as well [03:27] *** nickname_ has quit IRC (Ping timeout: 492 seconds) [03:29] *** xXx_ndidd has joined #archiveteam-bs [03:30] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [03:32] *** ndiddy has quit IRC (Ping timeout: 244 seconds) [04:01] *** vitzli has joined #archiveteam-bs [04:06] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:12] *** FalconK has quit IRC (Ping timeout: 260 seconds) [04:13] *** FalconK has joined #archiveteam-bs [04:15] *** Sk1d has joined #archiveteam-bs [04:22] *** tomwsmf-a has joined #archiveteam-bs [04:31] *** tomwsmf-a has quit IRC (Ping timeout: 258 seconds) [04:52] *** DoomTay has quit IRC (Quit: Page closed) [04:52] https://archive.org/details/Signing_For_Dummies_CD [05:07] *** BlueMaxim has joined #archiveteam-bs [05:16] so i'm grabbing Analytical Chemistry journals [05:28] someone else can go after it: https://thepiratebay.org/torrent/13670809/Analytical_Chemistry_(1929-2015) [05:28] this guy has a lot of interesting stuff: https://thepiratebay.org/user/clouderone/0/5/0 [05:36] *** JesseW has quit IRC (Ping timeout: 370 seconds) [06:07] *** dashcloud has quit IRC (Read error: Operation timed out) [06:09] *** dashcloud has joined #archiveteam-bs [06:11] starting to upload NASA docs again: https://archive.org/details/NASA_NTRS_Archive_19630002631 [06:24] i'm also starting to upload my deadspin.com sitemap urls [06:24] i'm starting to download 2010 urls [06:48] when you notice that your ubuntu 16.04 tmuxes are locking up, grep this log and read what I wrote here :-) https://github.com/ludios/grab-site/commit/ca7bc71045784b1cfda2a6d1d6dbc44a4000aa13 [07:39] *** is- has quit IRC (Ping timeout: 362 seconds) [07:46] *** schbirid has joined #archiveteam-bs [08:00] *** vitzli has quit IRC (Quit: Leaving) [08:31] *** is- has joined #archiveteam-bs [08:53] *** Stilett0 has joined #archiveteam-bs [08:53] *** Stiletto has quit IRC (Read error: Operation timed out) [09:26] *** Zebranky has quit IRC (Read error: Operation timed out) [09:36] *** Zebranky has joined #archiveteam-bs [09:48] *** vitzli has joined #archiveteam-bs [11:51] ivan`: ubuntu 16.04, systemd? [11:54] *** jut has joined #archiveteam-bs [12:38] *** xXx_ndidd has quit IRC (Read error: Connection reset by peer) [12:41] *** j08nY has joined #archiveteam-bs [12:47] *** VADemon has joined #archiveteam-bs [13:36] *** nickname_ has joined #archiveteam-bs [13:43] *** VADemon has quit IRC (Quit: left4dead) [13:44] *** nickname_ has quit IRC (Read error: Operation timed out) [13:50] *** sep332 has quit IRC (Quit: Konversation terminated!) [14:00] *** jut has quit IRC (Read error: Connection reset by peer) [14:25] *** BlueMaxim has quit IRC (Quit: Leaving) [14:25] *** aschmitz has quit IRC (Read error: Operation timed out) [15:18] *** VADemon has joined #archiveteam-bs [15:30] *** DoomTay has joined #archiveteam-bs [15:36] guys any good tool to download an entire archive.org collection? [15:38] So we're clear, you mean a single collection, not the entire archive.org database? [15:38] ofc a single collection :P [15:39] entire archive.org collection == entire collection? [15:39] :P [15:39] luckcolor: Use the internetarchive python module, and some scripting around it [15:39] MacNN is going away — http://www.macnn.com/articles/16/06/20/long.time.staff.winding.up.two.decades.of.service.at.the.end.of.june.134716/ [15:39] * luckcolor doesn't use python [15:40] but i will check out how that module is would prefer other stuff [15:46] well, you're going to need to do *some* scripting, afaik [15:46] what language do you prefer? [15:47] What languages do you know? maybe it's easier :) [15:48] well found one https://github.com/ghalfacree/bash-scripts/blob/master/archivedownload.sh [15:49] add it to the wiki if it isn't already there [15:49] sigh wiki editing incoming.... [15:52] actually this script requires you to tell it wich file type to download [15:53] *** j08nY has quit IRC (Quit: Leaving) [15:53] ia download --search 'collection:terroroftinytown' [15:54] You want to download the zip files [15:59] ia download --search 'subject:terroroftinytown' [15:59] error: the query "'subject:terroroftinytown'" returned no results [16:00] I think you want 'collection:urlteam' [16:00] trying [16:02] ia download --search subject:terroroftinytown [16:02] urlteam_2014-11-06-20-33-23 (1/512): dd [16:02] had to remove the unix like quotation marks [16:02] ha [16:02] apparently windows or the script doens't like them [16:02] :P [16:03] side note: that's the worst looking chracter for a progress bar [16:03] *character [16:03] oh well [16:04] heh, feel free to make a PR [16:27] *** vitzli has quit IRC (Quit: Leaving) [16:29] *** Rickster has quit IRC (Quit: ZNC 1.6.1 - http://znc.in) [16:30] *** Rickster has joined #archiveteam-bs [17:02] Also it's ultra slow don't know if it's my connection but will run it on my server [17:02] 200mbit/s go [17:02] ! [17:06] *** metalcamp has joined #archiveteam-bs [17:07] still incredibily slow i will leave it running overnight [17:07] JW_work perhaps do you know how big is the collection? [17:11] ok on my server it says it had errors [17:12] * luckcolor decides to change tool as this one seems to be stupid [17:15] less than a terabyte [17:16] I'm seeding the whole collection, also — so you could get it that way if you prefer [17:16] but you'll still need to grab the torrents (or magnet links) [17:17] JW_work good news then because i don't have enough space anywhere [17:17] 200mbits is better than what I have [17:17] it's on the server [17:17] and apparently it says error [17:18] SIGH python [17:18] luckcolor: how much space do you have? [17:18] not enough 250 gb here [17:18] 36 on the server [17:18] 36gb? yeah, you'll need more than that :-) [17:19] well ssd volumes aren't cheap [17:19] why does it need to be an SSD? [17:20] Scaleway [17:20] only ssds there [17:20] ah, makes sense [17:32] must be the program, can't be the network or infrastructure or anything else, oh no [17:34] i mean i don't have the space anyway [17:48] *** Start_ is now known as Start [17:57] true? https://twitter.com/marcan42/status/745183129283919872 [17:57] 'Why does RT get to retroactively edit YouTube videos, keeping the URL, even though YT says that is "not possible"?' [17:58] https://meduza.io/en/news/2016/06/20/russian-state-television-accidentally-broadcasts-evidence-that-moscow-uses-cluster-bombs-in-syria [18:14] I still see the big barrels here https://www.youtube.com/watch?v=dNbIRD8Cq48 [18:15] see the note [18:16] guess youtube is mutable -_- [18:18] i thought partners can replace videos [18:19] if RT didn't get to rewrite history in their own image it wouldn't really be RT [18:20] harsh [18:20] wabam [18:32] *** tomwsmf-a has joined #archiveteam-bs [18:43] The sad thing about sites going down is that even if they were perfectly saved, they would still be harder to search [18:43] ya [18:55] eh, not always — once they are converted into downloadable dumps, it can be *easier* to search them [18:55] just because we don't *currently* have fabulous WARC searching tools doesn't mean that such things can't be written [18:56] yeah you could build a thing that pulls unusual words out of the warc'd html and adds it to a cdx-style index [19:16] *** j08nY has joined #archiveteam-bs [19:26] *** nickname_ has joined #archiveteam-bs [19:26] *** antomati_ has joined #archiveteam-bs [19:26] *** swebb sets mode: +o antomati_ [19:26] *** RichardG_ has joined #archiveteam-bs [19:28] *** godane has quit IRC (Read error: Operation timed out) [19:29] *** godane has joined #archiveteam-bs [19:29] *** antomatic has quit IRC (Read error: Operation timed out) [19:29] *** Igloo has quit IRC (Read error: Operation timed out) [19:29] *** whopper has quit IRC (Read error: Operation timed out) [19:29] *** Igloo has joined #archiveteam-bs [19:31] *** RichardG has quit IRC (Read error: Operation timed out) [19:33] *** Whopper has joined #archiveteam-bs [20:20] *** nickname_ has quit IRC (Read error: Operation timed out) [20:35] *** anjacks0n has joined #archiveteam-bs [20:49] *** schbirid has quit IRC (Quit: Leaving) [21:04] *** ris has joined #archiveteam-bs [21:36] *** metalcamp has quit IRC (Quit: Bye) [21:44] *** ndiddy has joined #archiveteam-bs [21:53] ticalc.org turned 20 today and we're not closing the site http://www.ticalc.org/archives/news/articles/14/148/148917.html [21:53] *** Simpbrain has quit IRC (Read error: Connection reset by peer) [21:53] *** anjacks0n has quit IRC (anjacks0n) [22:05] *** Sue_ has quit IRC (Ping timeout: 260 seconds) [22:27] we should probably drop it in archivebot anyway :-P [22:28] (but likely not with high priority) [22:28] And to think two jobs "failed" on me recently [22:39] *** dashcloud has quit IRC (Read error: Operation timed out) [22:43] *** dashcloud has joined #archiveteam-bs [23:21] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [23:35] *** Whopper has quit IRC (Read error: Operation timed out) [23:40] *** tomwsmf-a has joined #archiveteam-bs [23:57] *** Sue_ has joined #archiveteam-bs