#archiveteam-bs 2016-06-21,Tue

↑back Search

Time Nickname Message
00:00 🔗 nickname_ I'm suprised that this 11 year old file still on NPR.org and still works in RealPlayer
00:01 🔗 dashcloud hmm- let me try to track down an mplayer install and the codec pack- that should handle the file then
00:02 🔗 nickname_ Here is the one with the weird file format on archive.org: https://archive.org/details/MorningEditionForDecember22200520051222Me14
00:03 🔗 nickname_ (it's not in the player, you have go to show all)
00:09 🔗 godane nickname_: i will look at npr
00:10 🔗 godane i have been doing it: https://archive.org/details/npr-morning-edition
00:10 🔗 nickname_ okay, i've been doing it manually with firefox using FlashGot and DownThemAll...
00:11 🔗 nickname_ I saw that collection and decided to start from January 2005, but when you click on some of them they are unavailible for some reason
00:11 🔗 nickname_ (copyright?)
00:12 🔗 godane i blute force the paths using wget
00:12 🔗 godane so sometimes there is not mp3 but there is real media
00:13 🔗 godane or sometimes its real media but no mp3
00:13 🔗 nickname_ on each page there is a download button for usually mp3 that is a direct link
00:13 🔗 godane for Morning Edition yes
00:13 🔗 nickname_ this page (http://www.npr.org/programs/morning-edition/2005/12/22/12927340/?showDate=2005-12-22) is one of those annoying real media ones
00:13 🔗 godane for show like Talk of the Nation there was real media for a long time
00:15 🔗 nickname_ I tried wget, but there was no easy way to do it quickly
00:15 🔗 godane https://archive.org/details/npr-talk-of-the-nation
00:16 🔗 godane talk of the nation real media files after august 20, 2001 don't encode into mp3 well
00:16 🔗 nickname_ some of those are unavailible as well
00:17 🔗 dashcloud holy shit- mplayer plays that first one just fin- 20051222_me_13.rm
00:17 🔗 nickname_ what!
00:17 🔗 godane mplayer plays the real media files fine
00:18 🔗 godane it just doesn't re-encode on archive.org right
00:18 🔗 nickname_ hmm
00:20 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
00:28 🔗 ris has quit IRC ()
00:57 🔗 VADemon has quit IRC (Quit: left4dead)
00:57 🔗 nickname_ godane: have you tried converting the *.rm file with mencoder to mp3?
01:01 🔗 dashcloud maybe it would be better to convert to a .wav file, then upload that so IA can make derivatives?
01:09 🔗 godane i upload the original files
01:10 🔗 godane the derive problem started around august 20 2001
01:10 🔗 godane after that IA couldn't re-encode the real media files
01:10 🔗 joepie91 dashcloud: better to do it as FLAC then
01:11 🔗 joepie91 space-waste-wise
01:11 🔗 godane i prefer just to upload original files
01:11 🔗 joepie91 sure, but you can upload both the original and a FLAC version
01:11 🔗 joepie91 and let it derive from the FLAC
01:12 🔗 godane then IA can fix it this later
01:12 🔗 godane i figure its IA derive process problem
01:15 🔗 tomwsmf-a has joined #archiveteam-bs
01:16 🔗 godane the one that works before they stop deriving : https://archive.org/details/npr-talk-of-the-nation-08-20-2001
01:19 🔗 JesseW has joined #archiveteam-bs
01:59 🔗 j08nY has quit IRC (Quit: Leaving)
02:00 🔗 nickname_ that one says there are issues with this items content
02:02 🔗 SketchCow Safe at home.
02:05 🔗 godane nickname_: i can see the item
02:05 🔗 godane also some of my stuff i upload goes to podcast collection
02:05 🔗 godane that collection makes things disappear publicly for some reason
02:07 🔗 nickname_ you can see it, but i can't (https://i.imgur.com/fQLJU5F.png)
02:07 🔗 godane i know
02:07 🔗 godane it works when i'm login
02:08 🔗 godane looks like i can't even download it on wget
02:08 🔗 godane *using wget
02:08 🔗 godane SketchCow: we need to fix that podcast collection
02:09 🔗 godane even if we have move the collections out of podcast collection so there viewer/downloadable
02:12 🔗 SketchCow Whut
02:14 🔗 godane its a bug from a log time ago
02:15 🔗 godane the podcast collection i think blocks non-login uses from downloading the items in it
02:15 🔗 godane SketchCow: you noticed this bug to at one point
02:21 🔗 godane fun fact: i was around 617k around jan 22 2016
02:21 🔗 godane i'm at 718k so i got up about 101k in about 5 months
02:23 🔗 godane so i figure the goal would be to get around to 850k+ before the end of the year
02:26 🔗 Start_ has joined #archiveteam-bs
02:26 🔗 Start has quit IRC (Ping timeout: 260 seconds)
02:37 🔗 acridAxid has quit IRC (marauder)
02:38 🔗 acridAxid has joined #archiveteam-bs
02:49 🔗 godane so i got a book called The official AT&T Worldnet Web Discovery Guide
02:49 🔗 godane it came with a cd with a free month of at&t worldnet
02:51 🔗 DoomTay Now I'm thinking of http://motherboard.vice.com/read/this-dude-is-collecting-your-old-aol-cds
02:51 🔗 DoomTay Wait...
03:01 🔗 dashcloud glad to hear you made it home safely SketchCow
03:12 🔗 godane i'm uploading the iso now
03:14 🔗 godane i'm listening to the All Things Considered -- A review of the World Wide Web and RealAudio!
03:14 🔗 godane 1995-06-06 episode
03:23 🔗 godane https://archive.org/details/The_Official_ATT_Worldnet_Web_Discovery_Guide_CD
03:24 🔗 DoomTay You should scan the CD cover as well
03:27 🔗 nickname_ has quit IRC (Ping timeout: 492 seconds)
03:29 🔗 xXx_ndidd has joined #archiveteam-bs
03:30 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
03:32 🔗 ndiddy has quit IRC (Ping timeout: 244 seconds)
04:01 🔗 vitzli has joined #archiveteam-bs
04:06 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
04:12 🔗 FalconK has quit IRC (Ping timeout: 260 seconds)
04:13 🔗 FalconK has joined #archiveteam-bs
04:15 🔗 Sk1d has joined #archiveteam-bs
04:22 🔗 tomwsmf-a has joined #archiveteam-bs
04:31 🔗 tomwsmf-a has quit IRC (Ping timeout: 258 seconds)
04:52 🔗 DoomTay has quit IRC (Quit: Page closed)
04:52 🔗 godane https://archive.org/details/Signing_For_Dummies_CD
05:07 🔗 BlueMaxim has joined #archiveteam-bs
05:16 🔗 godane so i'm grabbing Analytical Chemistry journals
05:28 🔗 godane someone else can go after it: https://thepiratebay.org/torrent/13670809/Analytical_Chemistry_(1929-2015)
05:28 🔗 godane this guy has a lot of interesting stuff: https://thepiratebay.org/user/clouderone/0/5/0
05:36 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
06:07 🔗 dashcloud has quit IRC (Read error: Operation timed out)
06:09 🔗 dashcloud has joined #archiveteam-bs
06:11 🔗 godane starting to upload NASA docs again: https://archive.org/details/NASA_NTRS_Archive_19630002631
06:24 🔗 godane i'm also starting to upload my deadspin.com sitemap urls
06:24 🔗 godane i'm starting to download 2010 urls
06:48 🔗 ivan` when you notice that your ubuntu 16.04 tmuxes are locking up, grep this log and read what I wrote here :-) https://github.com/ludios/grab-site/commit/ca7bc71045784b1cfda2a6d1d6dbc44a4000aa13
07:39 🔗 is- has quit IRC (Ping timeout: 362 seconds)
07:46 🔗 schbirid has joined #archiveteam-bs
08:00 🔗 vitzli has quit IRC (Quit: Leaving)
08:31 🔗 is- has joined #archiveteam-bs
08:53 🔗 Stilett0 has joined #archiveteam-bs
08:53 🔗 Stiletto has quit IRC (Read error: Operation timed out)
09:26 🔗 Zebranky has quit IRC (Read error: Operation timed out)
09:36 🔗 Zebranky has joined #archiveteam-bs
09:48 🔗 vitzli has joined #archiveteam-bs
11:51 🔗 midas ivan`: ubuntu 16.04, systemd?
11:54 🔗 jut has joined #archiveteam-bs
12:38 🔗 xXx_ndidd has quit IRC (Read error: Connection reset by peer)
12:41 🔗 j08nY has joined #archiveteam-bs
12:47 🔗 VADemon has joined #archiveteam-bs
13:36 🔗 nickname_ has joined #archiveteam-bs
13:43 🔗 VADemon has quit IRC (Quit: left4dead)
13:44 🔗 nickname_ has quit IRC (Read error: Operation timed out)
13:50 🔗 sep332 has quit IRC (Quit: Konversation terminated!)
14:00 🔗 jut has quit IRC (Read error: Connection reset by peer)
14:25 🔗 BlueMaxim has quit IRC (Quit: Leaving)
14:25 🔗 aschmitz has quit IRC (Read error: Operation timed out)
15:18 🔗 VADemon has joined #archiveteam-bs
15:30 🔗 DoomTay has joined #archiveteam-bs
15:36 🔗 luckcolor guys any good tool to download an entire archive.org collection?
15:38 🔗 DoomTay So we're clear, you mean a single collection, not the entire archive.org database?
15:38 🔗 luckcolor ofc a single collection :P
15:39 🔗 luckcolor entire archive.org collection == entire collection?
15:39 🔗 luckcolor :P
15:39 🔗 JW_work luckcolor: Use the internetarchive python module, and some scripting around it
15:39 🔗 JW_work MacNN is going away — http://www.macnn.com/articles/16/06/20/long.time.staff.winding.up.two.decades.of.service.at.the.end.of.june.134716/
15:39 🔗 * luckcolor doesn't use python
15:40 🔗 luckcolor but i will check out how that module is would prefer other stuff
15:46 🔗 JW_work well, you're going to need to do *some* scripting, afaik
15:46 🔗 JW_work what language do you prefer?
15:47 🔗 luckcolor What languages do you know? maybe it's easier :)
15:48 🔗 luckcolor well found one https://github.com/ghalfacree/bash-scripts/blob/master/archivedownload.sh
15:49 🔗 JW_work add it to the wiki if it isn't already there
15:49 🔗 luckcolor sigh wiki editing incoming....
15:52 🔗 luckcolor actually this script requires you to tell it wich file type to download
15:53 🔗 j08nY has quit IRC (Quit: Leaving)
15:53 🔗 luckcolor ia download --search 'collection:terroroftinytown'
15:54 🔗 JW_work You want to download the zip files
15:59 🔗 luckcolor ia download --search 'subject:terroroftinytown'
15:59 🔗 luckcolor error: the query "'subject:terroroftinytown'" returned no results
16:00 🔗 JW_work I think you want 'collection:urlteam'
16:00 🔗 luckcolor trying
16:02 🔗 luckcolor ia download --search subject:terroroftinytown
16:02 🔗 luckcolor urlteam_2014-11-06-20-33-23 (1/512): dd
16:02 🔗 luckcolor had to remove the unix like quotation marks
16:02 🔗 JW_work ha
16:02 🔗 luckcolor apparently windows or the script doens't like them
16:02 🔗 luckcolor :P
16:03 🔗 luckcolor side note: that's the worst looking chracter for a progress bar
16:03 🔗 luckcolor *character
16:03 🔗 luckcolor oh well
16:04 🔗 JW_work heh, feel free to make a PR
16:27 🔗 vitzli has quit IRC (Quit: Leaving)
16:29 🔗 Rickster has quit IRC (Quit: ZNC 1.6.1 - http://znc.in)
16:30 🔗 Rickster has joined #archiveteam-bs
17:02 🔗 luckcolor Also it's ultra slow don't know if it's my connection but will run it on my server
17:02 🔗 luckcolor 200mbit/s go
17:02 🔗 luckcolor !
17:06 🔗 metalcamp has joined #archiveteam-bs
17:07 🔗 luckcolor still incredibily slow i will leave it running overnight
17:07 🔗 luckcolor JW_work perhaps do you know how big is the collection?
17:11 🔗 luckcolor ok on my server it says it had errors
17:12 🔗 * luckcolor decides to change tool as this one seems to be stupid
17:15 🔗 JW_work less than a terabyte
17:16 🔗 JW_work I'm seeding the whole collection, also — so you could get it that way if you prefer
17:16 🔗 JW_work but you'll still need to grab the torrents (or magnet links)
17:17 🔗 luckcolor JW_work good news then because i don't have enough space anywhere
17:17 🔗 DoomTay 200mbits is better than what I have
17:17 🔗 luckcolor it's on the server
17:17 🔗 luckcolor and apparently it says error
17:18 🔗 luckcolor SIGH python
17:18 🔗 JW_work luckcolor: how much space do you have?
17:18 🔗 luckcolor not enough 250 gb here
17:18 🔗 luckcolor 36 on the server
17:18 🔗 JW_work 36gb? yeah, you'll need more than that :-)
17:19 🔗 luckcolor well ssd volumes aren't cheap
17:19 🔗 JW_work why does it need to be an SSD?
17:20 🔗 luckcolor Scaleway
17:20 🔗 luckcolor only ssds there
17:20 🔗 JW_work ah, makes sense
17:32 🔗 yipdw must be the program, can't be the network or infrastructure or anything else, oh no
17:34 🔗 luckcolor i mean i don't have the space anyway
17:48 🔗 Start_ is now known as Start
17:57 🔗 espes__ true? https://twitter.com/marcan42/status/745183129283919872
17:57 🔗 espes__ 'Why does RT get to retroactively edit YouTube videos, keeping the URL, even though YT says that is "not possible"?'
17:58 🔗 espes__ https://meduza.io/en/news/2016/06/20/russian-state-television-accidentally-broadcasts-evidence-that-moscow-uses-cluster-bombs-in-syria
18:14 🔗 arkiver I still see the big barrels here https://www.youtube.com/watch?v=dNbIRD8Cq48
18:15 🔗 espes__ see the note
18:16 🔗 espes__ guess youtube is mutable -_-
18:18 🔗 schbirid i thought partners can replace videos
18:19 🔗 yipdw if RT didn't get to rewrite history in their own image it wouldn't really be RT
18:20 🔗 xmc harsh
18:20 🔗 yipdw wabam
18:32 🔗 tomwsmf-a has joined #archiveteam-bs
18:43 🔗 DoomTay The sad thing about sites going down is that even if they were perfectly saved, they would still be harder to search
18:43 🔗 xmc ya
18:55 🔗 JW_work eh, not always — once they are converted into downloadable dumps, it can be *easier* to search them
18:55 🔗 JW_work just because we don't *currently* have fabulous WARC searching tools doesn't mean that such things can't be written
18:56 🔗 xmc yeah you could build a thing that pulls unusual words out of the warc'd html and adds it to a cdx-style index
19:16 🔗 j08nY has joined #archiveteam-bs
19:26 🔗 nickname_ has joined #archiveteam-bs
19:26 🔗 antomati_ has joined #archiveteam-bs
19:26 🔗 swebb sets mode: +o antomati_
19:26 🔗 RichardG_ has joined #archiveteam-bs
19:28 🔗 godane has quit IRC (Read error: Operation timed out)
19:29 🔗 godane has joined #archiveteam-bs
19:29 🔗 antomatic has quit IRC (Read error: Operation timed out)
19:29 🔗 Igloo has quit IRC (Read error: Operation timed out)
19:29 🔗 whopper has quit IRC (Read error: Operation timed out)
19:29 🔗 Igloo has joined #archiveteam-bs
19:31 🔗 RichardG has quit IRC (Read error: Operation timed out)
19:33 🔗 Whopper has joined #archiveteam-bs
20:20 🔗 nickname_ has quit IRC (Read error: Operation timed out)
20:35 🔗 anjacks0n has joined #archiveteam-bs
20:49 🔗 schbirid has quit IRC (Quit: Leaving)
21:04 🔗 ris has joined #archiveteam-bs
21:36 🔗 metalcamp has quit IRC (Quit: Bye)
21:44 🔗 ndiddy has joined #archiveteam-bs
21:53 🔗 xmc ticalc.org turned 20 today and we're not closing the site http://www.ticalc.org/archives/news/articles/14/148/148917.html
21:53 🔗 Simpbrain has quit IRC (Read error: Connection reset by peer)
21:53 🔗 anjacks0n has quit IRC (anjacks0n)
22:05 🔗 Sue_ has quit IRC (Ping timeout: 260 seconds)
22:27 🔗 JW_work we should probably drop it in archivebot anyway :-P
22:28 🔗 JW_work (but likely not with high priority)
22:28 🔗 DoomTay And to think two jobs "failed" on me recently
22:39 🔗 dashcloud has quit IRC (Read error: Operation timed out)
22:43 🔗 dashcloud has joined #archiveteam-bs
23:21 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
23:35 🔗 Whopper has quit IRC (Read error: Operation timed out)
23:40 🔗 tomwsmf-a has joined #archiveteam-bs
23:57 🔗 Sue_ has joined #archiveteam-bs

irclogger-viewer