#archiveteam-bs 2016-11-25,Fri

↑back Search

Time Nickname Message
00:24 🔗 antomati_ has joined #archiveteam-bs
00:24 🔗 swebb sets mode: +o antomati_
00:25 🔗 antomatic has quit IRC (Read error: Operation timed out)
01:53 🔗 ndiddy has quit IRC (Read error: Connection reset by peer)
02:04 🔗 wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES)
02:08 🔗 wp494 has joined #archiveteam-bs
02:48 🔗 godane so i added some better code for grabbing Arirang streams
02:49 🔗 godane i had to change it cause index_3_av.m3u8 was not always the 850x480 stream
02:49 🔗 godane as i just curl -s "$masterurl" | grep -A1 850x480 | grep m3u8
02:58 🔗 wp494_ has joined #archiveteam-bs
03:03 🔗 wp494 has quit IRC (Read error: Operation timed out)
03:03 🔗 wp494_ is now known as wp494
03:12 🔗 ravetcofx has joined #archiveteam-bs
03:47 🔗 Mayeau has joined #archiveteam-bs
03:56 🔗 Mayonaise has quit IRC (Ping timeout: 864 seconds)
03:56 🔗 Mayeau is now known as Mayonaise
04:08 🔗 ravetcofx has quit IRC (Ping timeout: 506 seconds)
05:09 🔗 Stilett0 has joined #archiveteam-bs
05:12 🔗 Stiletto has quit IRC (Read error: Operation timed out)
05:52 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
05:59 🔗 Sk1d has joined #archiveteam-bs
06:49 🔗 Start has joined #archiveteam-bs
07:37 🔗 GE has joined #archiveteam-bs
10:47 🔗 Medowar FYI: arkiver Sketchcow: bayimg is now done trackerside, I am currently syncing everything from my target over to fos.
11:09 🔗 GE has quit IRC (Remote host closed the connection)
11:19 🔗 BlueMaxim has quit IRC (Quit: Leaving)
11:48 🔗 ravetcofx has joined #archiveteam-bs
12:15 🔗 ravetcofx has quit IRC (Ping timeout: 260 seconds)
12:35 🔗 GE has joined #archiveteam-bs
12:54 🔗 vitzli has joined #archiveteam-bs
12:58 🔗 vitzli Feel like shit, 4 years late to archive the wiki :[ wayback machine luckily has it, but still. (it's small udev rule, nothing super-important)
12:59 🔗 arkiver what wiki?
12:59 🔗 vitzli wiki.countercaster.com
13:00 🔗 vitzli last update and online in 2012
13:01 🔗 arkiver :(
13:01 🔗 arkiver that sucks
13:06 🔗 vitzli After I first met wikiteam tools - I began grabs of any small-ish or remotely interesting/useful wikis, it's awesome, but sometimes it's like to read one book, like it, read another, like it, I WANT MORE! HA! NO! fuck you! Author Existence Failure.
13:09 🔗 vitzli and that is all for drunk Friday confessions, sorry about that
13:37 🔗 Whopper Kaz: similar thing happened in Australia with metadata. It was originally to 'fight terrorists' and then we have https://www.crikey.com.au/2016/01/18/over-60-agencies-apply-to-snoop-into-your-metadata/ . The majority might have a legitimate use for the information but that's not the point. Race fixing, polluting, work health safety violations / fraud, mislabelling fruit? etc. ≠ terrorism
13:40 🔗 arkiver Medowar: :D nice!
13:53 🔗 vitzli has quit IRC (Quit: Leaving)
15:28 🔗 superkuh has quit IRC (Remote host closed the connection)
15:29 🔗 Shakespea has joined #archiveteam-bs
15:30 🔗 Shakespea FYI- http://www.panoramio.com/maps-faq
15:30 🔗 superkuh has joined #archiveteam-bs
15:30 🔗 Shakespea has left
15:44 🔗 Start has quit IRC (Quit: Disconnected.)
15:52 🔗 atrocity has quit IRC (Ping timeout: 260 seconds)
17:28 🔗 Stilett0 has quit IRC (Read error: Connection reset by peer)
18:03 🔗 Stiletto has joined #archiveteam-bs
18:06 🔗 ndiddy has joined #archiveteam-bs
19:33 🔗 RichardG_ has joined #archiveteam-bs
19:36 🔗 RichardG has quit IRC (Ping timeout: 250 seconds)
19:49 🔗 RichardG has joined #archiveteam-bs
19:50 🔗 RichardG_ has quit IRC (Ping timeout: 250 seconds)
19:51 🔗 tapedrive I want to download a site (for a personal archive at the moment, so not with archivebot) but I'm not quite sure what options to us with wget. I want to download all pages under a specific domain, and all the resources (css, images, javascripts, etc) which will be under different domains. I also want to download all linked pages - so if the main site links to external.com/foo.html then I want that page and all its requisite downloaded
19:51 🔗 tapedrive too. Is this possible with wget, or am I going to have to write something custom for this?
20:02 🔗 ndiddy has quit IRC (Ping timeout: 633 seconds)
20:04 🔗 ndiddy has joined #archiveteam-bs
20:09 🔗 Kaz yes
20:10 🔗 Kaz one sec
20:10 🔗 Kaz tapedrive: http://archiveteam.org/index.php?title=Wget#Creating_WARC_with_wget
20:11 🔗 Kaz or you can use https://github.com/ludios/grab-site
20:11 🔗 tapedrive I've read that (the wiki one), but I'm confused as to what options I should use for downloading all page dependencies from other domains.
20:13 🔗 tapedrive Ah, that grab-site one looks perfect. Thanks!
21:32 🔗 VADemon has joined #archiveteam-bs
21:35 🔗 sep332_ has quit IRC (Konversation terminated!)
21:35 🔗 jrwr has joined #archiveteam-bs
21:50 🔗 kristian_ has joined #archiveteam-bs
22:02 🔗 tapedrive Okay, I'm using grab-site, but there's an issue. The site I'm archiving has imgur images lined, but imgur's doing some weird redirect thing.
22:02 🔗 tapedrive From the logs: 302 Moved Temporarily http://i.imgur.com/LrowbFM.jpg
22:02 🔗 tapedrive 200 OK http://imgur.com/LrowbFM
22:03 🔗 tapedrive So it's not archiving the image, rather a stupid imgur page.
22:03 🔗 tapedrive Any ideas to get round this?
22:22 🔗 Stilett0 has joined #archiveteam-bs
22:28 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
22:28 🔗 Stiletto has quit IRC (Read error: Operation timed out)
22:29 🔗 ae_g_i_s yeah, imgur needs you to follow the redirect
22:29 🔗 ae_g_i_s i don't know exactly what weird magic they're using atm
22:29 🔗 ae_g_i_s but if you essentially do the request twice (not with the same url, with the one it redirects you to) you'll have a page that has the "real" source image
22:39 🔗 ae_g_i_s second pitfall (since we're in -bs anyway and people might fall into that trap): if you upload an image nowadays, you can not copy the image link from the image it presents because they use a base64 blob (IIRC) in there
22:39 🔗 ae_g_i_s you have to open the "sharing link" on the right...and _there_, the image source is as usual
22:40 🔗 tapedrive So any way to add that rule into grab-site? Or will I just have to manually get them afterwards?
22:47 🔗 ae_g_i_s < too noob to know
22:50 🔗 Start has joined #archiveteam-bs
22:54 🔗 tapedrive It's not really a problem, as I can go through the log after it's complete, getting all the i.imgur.com images.
23:03 🔗 BlueMaxim has joined #archiveteam-bs
23:32 🔗 ravetcofx has joined #archiveteam-bs
23:34 🔗 ravetcofx has quit IRC (Remote host closed the connection)
23:54 🔗 Yoshimura has quit IRC (Remote host closed the connection)

irclogger-viewer