[00:24] *** antomati_ has joined #archiveteam-bs [00:24] *** swebb sets mode: +o antomati_ [00:25] *** antomatic has quit IRC (Read error: Operation timed out) [01:53] *** ndiddy has quit IRC (Read error: Connection reset by peer) [02:04] *** wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) [02:08] *** wp494 has joined #archiveteam-bs [02:48] so i added some better code for grabbing Arirang streams [02:49] i had to change it cause index_3_av.m3u8 was not always the 850x480 stream [02:49] as i just curl -s "$masterurl" | grep -A1 850x480 | grep m3u8 [02:58] *** wp494_ has joined #archiveteam-bs [03:03] *** wp494 has quit IRC (Read error: Operation timed out) [03:03] *** wp494_ is now known as wp494 [03:12] *** ravetcofx has joined #archiveteam-bs [03:47] *** Mayeau has joined #archiveteam-bs [03:56] *** Mayonaise has quit IRC (Ping timeout: 864 seconds) [03:56] *** Mayeau is now known as Mayonaise [04:08] *** ravetcofx has quit IRC (Ping timeout: 506 seconds) [05:09] *** Stilett0 has joined #archiveteam-bs [05:12] *** Stiletto has quit IRC (Read error: Operation timed out) [05:52] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:59] *** Sk1d has joined #archiveteam-bs [06:49] *** Start has joined #archiveteam-bs [07:37] *** GE has joined #archiveteam-bs [10:47] FYI: arkiver Sketchcow: bayimg is now done trackerside, I am currently syncing everything from my target over to fos. [11:09] *** GE has quit IRC (Remote host closed the connection) [11:19] *** BlueMaxim has quit IRC (Quit: Leaving) [11:48] *** ravetcofx has joined #archiveteam-bs [12:15] *** ravetcofx has quit IRC (Ping timeout: 260 seconds) [12:35] *** GE has joined #archiveteam-bs [12:54] *** vitzli has joined #archiveteam-bs [12:58] Feel like shit, 4 years late to archive the wiki :[ wayback machine luckily has it, but still. (it's small udev rule, nothing super-important) [12:59] what wiki? [12:59] wiki.countercaster.com [13:00] last update and online in 2012 [13:01] :( [13:01] that sucks [13:06] After I first met wikiteam tools - I began grabs of any small-ish or remotely interesting/useful wikis, it's awesome, but sometimes it's like to read one book, like it, read another, like it, I WANT MORE! HA! NO! fuck you! Author Existence Failure. [13:09] and that is all for drunk Friday confessions, sorry about that [13:37] Kaz: similar thing happened in Australia with metadata. It was originally to 'fight terrorists' and then we have https://www.crikey.com.au/2016/01/18/over-60-agencies-apply-to-snoop-into-your-metadata/ . The majority might have a legitimate use for the information but that's not the point. Race fixing, polluting, work health safety violations / fraud, mislabelling fruit? etc. ≠ terrorism [13:40] Medowar: :D nice! [13:53] *** vitzli has quit IRC (Quit: Leaving) [15:28] *** superkuh has quit IRC (Remote host closed the connection) [15:29] *** Shakespea has joined #archiveteam-bs [15:30] FYI- http://www.panoramio.com/maps-faq [15:30] *** superkuh has joined #archiveteam-bs [15:30] *** Shakespea has left [15:44] *** Start has quit IRC (Quit: Disconnected.) [15:52] *** atrocity has quit IRC (Ping timeout: 260 seconds) [17:28] *** Stilett0 has quit IRC (Read error: Connection reset by peer) [18:03] *** Stiletto has joined #archiveteam-bs [18:06] *** ndiddy has joined #archiveteam-bs [19:33] *** RichardG_ has joined #archiveteam-bs [19:36] *** RichardG has quit IRC (Ping timeout: 250 seconds) [19:49] *** RichardG has joined #archiveteam-bs [19:50] *** RichardG_ has quit IRC (Ping timeout: 250 seconds) [19:51] I want to download a site (for a personal archive at the moment, so not with archivebot) but I'm not quite sure what options to us with wget. I want to download all pages under a specific domain, and all the resources (css, images, javascripts, etc) which will be under different domains. I also want to download all linked pages - so if the main site links to external.com/foo.html then I want that page and all its requisite downloaded [19:51] too. Is this possible with wget, or am I going to have to write something custom for this? [20:02] *** ndiddy has quit IRC (Ping timeout: 633 seconds) [20:04] *** ndiddy has joined #archiveteam-bs [20:09] yes [20:10] one sec [20:10] tapedrive: http://archiveteam.org/index.php?title=Wget#Creating_WARC_with_wget [20:11] or you can use https://github.com/ludios/grab-site [20:11] I've read that (the wiki one), but I'm confused as to what options I should use for downloading all page dependencies from other domains. [20:13] Ah, that grab-site one looks perfect. Thanks! [21:32] *** VADemon has joined #archiveteam-bs [21:35] *** sep332_ has quit IRC (Konversation terminated!) [21:35] *** jrwr has joined #archiveteam-bs [21:50] *** kristian_ has joined #archiveteam-bs [22:02] Okay, I'm using grab-site, but there's an issue. The site I'm archiving has imgur images lined, but imgur's doing some weird redirect thing. [22:02] From the logs: 302 Moved Temporarily http://i.imgur.com/LrowbFM.jpg [22:02] 200 OK http://imgur.com/LrowbFM [22:03] So it's not archiving the image, rather a stupid imgur page. [22:03] Any ideas to get round this? [22:22] *** Stilett0 has joined #archiveteam-bs [22:28] *** Stilett0 has quit IRC (Read error: Operation timed out) [22:28] *** Stiletto has quit IRC (Read error: Operation timed out) [22:29] yeah, imgur needs you to follow the redirect [22:29] i don't know exactly what weird magic they're using atm [22:29] but if you essentially do the request twice (not with the same url, with the one it redirects you to) you'll have a page that has the "real" source image [22:39] second pitfall (since we're in -bs anyway and people might fall into that trap): if you upload an image nowadays, you can not copy the image link from the image it presents because they use a base64 blob (IIRC) in there [22:39] you have to open the "sharing link" on the right...and _there_, the image source is as usual [22:40] So any way to add that rule into grab-site? Or will I just have to manually get them afterwards? [22:47] < too noob to know [22:50] *** Start has joined #archiveteam-bs [22:54] It's not really a problem, as I can go through the log after it's complete, getting all the i.imgur.com images. [23:03] *** BlueMaxim has joined #archiveteam-bs [23:32] *** ravetcofx has joined #archiveteam-bs [23:34] *** ravetcofx has quit IRC (Remote host closed the connection) [23:54] *** Yoshimura has quit IRC (Remote host closed the connection)