#archiveteam-bs 2016-04-29,Fri

↑back Search

Time Nickname Message
00:03 🔗 dashcloud has quit IRC (Read error: Operation timed out)
00:06 🔗 dashcloud has joined #archiveteam-bs
00:31 🔗 _Crocatow has quit IRC (Read error: Connection reset by peer)
00:31 🔗 _Crocatow has joined #archiveteam-bs
00:41 🔗 JesseW has joined #archiveteam-bs
00:56 🔗 fmope has quit IRC (Remote host closed the connection)
00:56 🔗 fmope has joined #archiveteam-bs
00:59 🔗 Stiletto has quit IRC (Read error: Operation timed out)
02:12 🔗 ranma has joined #archiveteam-bs
02:15 🔗 ranma the IA makes a LITTLE bit of an effort to backup zips and exes etc on websites, right?
02:15 🔗 ranma or not usually
02:15 🔗 ranma ?
02:17 🔗 JesseW I don't think IA's general crawls make a distinction between such files any any other ones.
02:18 🔗 JesseW What's the URL of the fork on github?
02:18 🔗 ranma https://github.com/OldSparkyMI/minishowcase
02:18 🔗 Stiletto has joined #archiveteam-bs
02:19 🔗 VADemon has quit IRC (Quit: left4dead)
02:19 🔗 JesseW ranma: probably worth grabbing a copy of that yourself, if you haven't already
02:19 🔗 ranma yep. and threw it on google drive
02:20 🔗 JesseW I've just put the zip, https://github.com/OldSparkyMI/minishowcase/archive/master.zip into #archivebot, too
02:20 🔗 JesseW so the thing is pretty well saved at this point
02:22 🔗 ranma i'm assuming thisamericanlife, radiolab, other npr podcasts are backed up?
02:24 🔗 JesseW ranma: check with godane
02:24 🔗 JesseW what shows up on Wayback?
02:25 🔗 ranma checking
02:26 🔗 godane i have this american life on my drive
02:26 🔗 godane i have not uploaded it yet
02:27 🔗 ranma https://web.archive.org/web/20160127230614/http://www.thisamericanlife.org/radio-archives/episode/549/amateur-hour
02:27 🔗 ranma download link doesn't work
02:28 🔗 ranma not butthurt, just more curious as to the archive bot-thingie's capabilities
02:34 🔗 MrRadar Huh, it looks like they used to have direct download links but now redirect you to buy episodes from iTunes and Amazon
02:34 🔗 MrRadar E.g. the download link for this episode works (though it was also archived through ArchiveBot and not the IA's normal crawler) https://web.archive.org/web/20140929042520/http://www.thisamericanlife.org/radio-archives/episode/536/the-secret-recordings-of-carmen-segarra
02:36 🔗 MrRadar Hmm... it looks like episodes are only available for direct download for a short time before the downloads are paywalled
02:37 🔗 MrRadar E.g. last week's episode has a direct download http://www.thisamericanlife.org/radio-archives/episode/585/in-defense-of-ignorance
02:37 🔗 MrRadar But this one from January is pay to download http://www.thisamericanlife.org/radio-archives/episode/577/something-only-i-can-see
02:38 🔗 MrRadar If it doesn't get crawled while it's a direct download neither our bot nor the IA's can probably extract the audio from the stream
02:38 🔗 godane there is a m3u8 file that you can use
02:39 🔗 MrRadar It looks like youtube-dl can also scrape their streams
02:39 🔗 MrRadar (This American Life at least)
03:28 🔗 bwn_ has joined #archiveteam-bs
03:29 🔗 Medowar has quit IRC (Quit: Connection closed for inactivity)
03:34 🔗 bwn has quit IRC (Read error: Operation timed out)
03:39 🔗 bwn_ has quit IRC (Ping timeout: 633 seconds)
03:47 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
03:52 🔗 bwn has joined #archiveteam-bs
03:54 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
03:56 🔗 beardicus has quit IRC (Read error: Operation timed out)
03:57 🔗 chazchaz has quit IRC (Read error: Operation timed out)
03:57 🔗 chazchaz has joined #archiveteam-bs
03:58 🔗 RichardG has quit IRC (Ping timeout: 272 seconds)
04:00 🔗 RichardG has joined #archiveteam-bs
04:00 🔗 Frogging has quit IRC (Read error: Operation timed out)
04:03 🔗 achip has quit IRC (Ping timeout: 258 seconds)
04:04 🔗 bwn has quit IRC (Ping timeout: 258 seconds)
04:05 🔗 wyatt8740 has quit IRC (Read error: Operation timed out)
04:05 🔗 godane has quit IRC (Ping timeout: 258 seconds)
04:05 🔗 Kaz has quit IRC (Read error: Operation timed out)
04:05 🔗 Infreq has quit IRC (Ping timeout: 258 seconds)
04:05 🔗 logchfoo1 has quit IRC (Ping timeout: 258 seconds)
04:10 🔗 logchfoo4 starts logging #archiveteam-bs at Fri Apr 29 04:10:22 2016
04:10 🔗 logchfoo4 has joined #archiveteam-bs
04:10 🔗 fie_ has quit IRC (Read error: Connection reset by peer)
04:10 🔗 dashcloud has quit IRC (Read error: Operation timed out)
04:11 🔗 balrog has joined #archiveteam-bs
04:11 🔗 swebb sets mode: +o balrog
04:11 🔗 ring has joined #archiveteam-bs
04:12 🔗 achip has joined #archiveteam-bs
04:13 🔗 mr-b has joined #archiveteam-bs
04:13 🔗 zenguy has joined #archiveteam-bs
04:15 🔗 acridAxid has joined #archiveteam-bs
04:18 🔗 joepie91 has quit IRC (Read error: Operation timed out)
04:20 🔗 joepie91 has joined #archiveteam-bs
04:21 🔗 wyatt8740 has joined #archiveteam-bs
04:22 🔗 godane has joined #archiveteam-bs
04:22 🔗 Kaz has joined #archiveteam-bs
04:24 🔗 dashcloud has joined #archiveteam-bs
04:45 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
04:52 🔗 Sk1d has joined #archiveteam-bs
04:58 🔗 bwn has joined #archiveteam-bs
05:08 🔗 bwn has quit IRC (Quit: Quit)
05:20 🔗 beardicus has joined #archiveteam-bs
05:23 🔗 Honno has joined #archiveteam-bs
05:37 🔗 godane SketchCow: i'm grabbing 2600 off the wall wusb
05:46 🔗 Mayonaise has joined #archiveteam-bs
05:52 🔗 JesseW godane: That sentence could make much less sense if we didn't know the context.
05:55 🔗 godane http://www.2600.com/otw-broadband.xml
05:59 🔗 xmc godane: speaking of loveline -- 22:53 <supersat> last night of loveline tonight
06:12 🔗 godane so i maybe able to get Bill Orelly Radio program
06:31 🔗 bwn has joined #archiveteam-bs
06:49 🔗 bwn has quit IRC (Read error: Operation timed out)
06:57 🔗 godane we are also getting a brute force xml of billoreilly.com
07:03 🔗 Honno has quit IRC (Read error: Operation timed out)
07:23 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
07:59 🔗 bwn has joined #archiveteam-bs
08:06 🔗 schbirid has joined #archiveteam-bs
08:18 🔗 Medowar has joined #archiveteam-bs
08:35 🔗 metalcamp has joined #archiveteam-bs
10:06 🔗 SketchCo1 has joined #archiveteam-bs
10:06 🔗 swebb sets mode: +o SketchCo1
10:07 🔗 SketchCow has quit IRC (Read error: Connection reset by peer)
10:08 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
10:08 🔗 BnA-Rob1n has quit IRC (Ping timeout: 244 seconds)
10:08 🔗 joepie91 has quit IRC (Ping timeout: 244 seconds)
10:08 🔗 SN4T14 has quit IRC (Ping timeout: 244 seconds)
10:08 🔗 zerkalo has quit IRC (Ping timeout: 244 seconds)
10:09 🔗 BnA-Rob1n has joined #archiveteam-bs
10:10 🔗 zerkalo has joined #archiveteam-bs
10:12 🔗 joepie91 has joined #archiveteam-bs
10:12 🔗 SN4T14 has joined #archiveteam-bs
10:53 🔗 BlueMaxim has quit IRC (Quit: Leaving)
12:59 🔗 weslord has joined #archiveteam-bs
13:19 🔗 godane i'm grabbing the old but official Classic Love Line mp3s from 1996
13:32 🔗 weslord has quit IRC (Quit: Lost terminal)
13:43 🔗 VADemon has joined #archiveteam-bs
14:08 🔗 ranma http://www.bloomberg.com/news/articles/2016-04-29/unmasking-the-men-behind-zero-hedge-wall-street-s-renegade-blog
14:17 🔗 Start has quit IRC (Quit: Disconnected.)
14:21 🔗 bwn_ has joined #archiveteam-bs
14:34 🔗 bwn has quit IRC (Read error: Operation timed out)
15:01 🔗 Honno has joined #archiveteam-bs
15:16 🔗 Start has joined #archiveteam-bs
15:18 🔗 SketchCo1 is now known as SketchCow
15:31 🔗 Yoshimura has joined #archiveteam-bs
15:31 🔗 Yoshimura yipdw: Hey. Was there anything wrong with the pipeline?
15:43 🔗 JesseW has joined #archiveteam-bs
15:46 🔗 SketchCow Ha
16:00 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
16:06 🔗 Start has quit IRC (Quit: Disconnected.)
16:07 🔗 dashcloud has quit IRC (Read error: Operation timed out)
16:14 🔗 dashcloud has joined #archiveteam-bs
16:47 🔗 dashcloud has quit IRC (Read error: Operation timed out)
16:54 🔗 dashcloud has joined #archiveteam-bs
17:11 🔗 metalcamp has joined #archiveteam-bs
17:27 🔗 godane SketchCow: The Savage Nation is getting uploaded: https://archive.org/details/godaneinbox?and[]=subject%3A%22The+Savage+Nation%22&sort=-publicdate
17:32 🔗 yakfish has joined #archiveteam-bs
17:32 🔗 matthusby has joined #archiveteam-bs
17:34 🔗 SadDM has joined #archiveteam-bs
17:34 🔗 swebb sets mode: +o SadDM
17:37 🔗 jspiros has joined #archiveteam-bs
17:41 🔗 godane SketchCow: just found out that Love Line end there run last month
17:41 🔗 SketchCow Yep
17:42 🔗 SketchCow I figured that's what inspired you!
17:42 🔗 godane i didn't even know
17:43 🔗 godane the list is a incomplete one but its based on the official flashplayer date xml pages
17:43 🔗 godane close to 3000 mp3s
18:19 🔗 Start has joined #archiveteam-bs
18:43 🔗 remsen has quit IRC (ircd.choopa.net irc2.choopa.net)
18:43 🔗 remsen1 has joined #archiveteam-bs
18:50 🔗 bwn_ has quit IRC (Read error: Operation timed out)
19:10 🔗 bwn_ has joined #archiveteam-bs
19:33 🔗 atrocity going to attempt to setup httrack, lol
19:44 🔗 Start has quit IRC (Quit: Disconnected.)
19:52 🔗 phuzion atrocity: https://www.youtube.com/watch?v=PEZWYXPvmS8
19:55 🔗 atrocity so i can point firefox to it to tell it to archive a site for me
19:55 🔗 atrocity and it seems like it was the eternal debate of it vs. wget for windows, lol
20:25 🔗 Yoshimura atrocity: httrack sucks, at least every time I tried, stopped trying past years.
20:25 🔗 Yoshimura Wget is consisntent and works well.
20:28 🔗 remsen1 has quit IRC (ZNC 1.6.2 - http://znc.in)
20:28 🔗 remsen has joined #archiveteam-bs
20:32 🔗 atrocity hmm, kk
20:48 🔗 tomwsmf-a has joined #archiveteam-bs
20:53 🔗 VADemon Aaaand I've just run into the issue of Cloudflare not letting wget access anything backed by their CDN
20:54 🔗 VADemon So far goes the consistency. But does anyone have an easy solution for this maybe?
20:54 🔗 r3c0d3x try and set a common user-agent and headers and try again
20:56 🔗 VADemon I did that, but it's not enough. Wget needs to load a URL that is set as a "Refresh" HTTP header and wait prior or after downloading the set URL
20:56 🔗 MrRadar I haven't had any trouble using wpull to scrape sites I know use CloudFlare as their CDN
20:56 🔗 VADemon e.g. "Refresh: 8;URL=/cdn-cgi/l/chk_jschl?pass=1461960032.464-KSKrRC0DC7"
20:56 🔗 MrRadar Have you considered using wget-lua with a custom script to get through that check?
20:58 🔗 VADemon Hm, that's a better idea than what I was going to pull of, MrRadar
21:32 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
21:42 🔗 VADemon wpull doesn't work, because it doesn't do what a browser would do
21:44 🔗 MrRadar The sites I've scraped must not have had their bot-protection turned up
22:25 🔗 Start has joined #archiveteam-bs
22:57 🔗 Honno has quit IRC (Read error: Operation timed out)
23:29 🔗 schbirid has quit IRC (Remote host closed the connection)
23:44 🔗 Stiletto has quit IRC ()

irclogger-viewer