[01:19] Dear Red Hat, you and your subscriber-only knowledge base can kindly go fuck yourself. Regards, JAA [01:35] JAA: can you still get to it if you set your user-agent to one of the google crawler ones? [01:53] mal: Nope, doesn't look like it. [02:02] =( [03:48] *** odemg_ has quit IRC (Ping timeout: 268 seconds) [04:00] *** odemg_ has joined #archiveteam-ot [05:50] *** godane has quit IRC (Ping timeout: 506 seconds) [05:59] *** Stilett0 has joined #archiveteam-ot [05:59] *** Hecatz has quit IRC (Ping timeout: 268 seconds) [06:01] *** Stiletto has quit IRC (Ping timeout: 268 seconds) [06:02] *** Jon has quit IRC (Ping timeout: 268 seconds) [06:02] *** kiskabak has quit IRC (Ping timeout: 268 seconds) [06:02] *** Jon- has joined #archiveteam-ot [06:02] *** Kaz has quit IRC (Ping timeout: 268 seconds) [06:03] *** Kaz has joined #archiveteam-ot [06:05] *** svchfoo1 has quit IRC (Ping timeout: 268 seconds) [06:05] *** Hecatz has joined #archiveteam-ot [06:16] *** Kaz has quit IRC (se.hub efnet.portlane.se) [06:16] *** odemg_ has quit IRC (se.hub efnet.portlane.se) [06:16] *** Mateon1 has quit IRC (se.hub efnet.portlane.se) [06:22] *** odemg has joined #archiveteam-ot [06:35] *** Mateon1 has joined #archiveteam-ot [07:56] *** svchfoo1 has joined #archiveteam-ot [07:57] *** svchfoo3 sets mode: +o svchfoo1 [08:24] ivan: quick question, sorry to bug you, can you feed `grab-site' urls from a file? [08:24] i.e. `$ grab-site --file=/path/to/urls.txt' [08:25] w0rmhole: -i file [08:25] oh that easy? thank you :) [08:26] yep [08:26] does it combine it into a single warc? [08:26] it's all part of the same crawl [08:27] note grab-site rolls over to a new WARC after 5GB by default [08:27] oh ok i see [08:27] thank you :) [08:34] *** godane has joined #archiveteam-ot [08:34] *** svchfoo1 sets mode: +o godane [08:35] *** schbirid has joined #archiveteam-ot [09:00] *** schbirid has quit IRC (Remote host closed the connection) [09:01] JAA: readyset finished on my side, 44GB. YouTube pull so far has done 150GB. Hearthhead only pulled down 2.3GB before finishing.. rerunning [09:02] stormone has done 21GB [09:02] *stormshielf [09:02] d [09:26] *** BlueMax has quit IRC (Read error: Connection reset by peer) [09:55] *** schbirid has joined #archiveteam-ot [10:00] *** kiskabak has joined #archiveteam-ot [10:01] today i am moving my porn to an external disk, what a proud day [10:01] sorry to bother you again ivan. is there a way to both `--1' a webpage, as well as recursively archive websites all in the same WARC? for example, something like this: `$ grab-site --1="https://websitenottoberecursed.com/" -i /path/to/file/containing/links/to/be/recusced' [10:05] w0rmhole: you could upload a page somewhere that links to every page you don't want recursed [10:06] there might also be some way to tamper with the queue in the sqlite database after the crawl has started, but I have nothing to do that [10:07] the answer to your question is basically "no", unless someone else thinks of something [10:09] Schbirid why not add to the internet archive? [10:09] it will only get darked for a while [10:10] way too risky to my porn interests online :P [10:13] A dummy account works [10:15] Why would you download porn? [10:15] are you new here [10:15] yes... [10:17] Unless it's a backup of an endanndered porn site [10:19] you're in a place where people redirect their anxiety at archiving everything they have the slightest attachment to [10:19] people have that with porn as with everything else [10:20] ivan: so, for now, i would need to create a webpage/pastebin entry containing all of the `--1' urls, add that to the local file containing the urls i want recursed, and feed that into grab-site? [10:20] sorry my english is not the best, so i sometimes struggle to understand [10:20] w0rmhole: yeah, I think that might work [10:21] thank you, i will give that a try [10:25] jut_: dude, that certain image or movie might not even be online anymore [10:26] yeah [10:42] *** godane has quit IRC (Ping timeout: 260 seconds) [10:46] *** godane has joined #archiveteam-ot [10:46] *** svchfoo3 sets mode: +o godane [10:49] *** godane has quit IRC (Read error: Operation timed out) [10:54] *** godane has joined #archiveteam-ot [10:55] *** svchfoo3 sets mode: +o godane [10:58] *** godane has quit IRC (Read error: Operation timed out) [11:00] HCross: Yeah, hearthhead uses JS for everything. That won't be easy to archive fully. [11:00] yea.. phantomjs just vomited errors at me when I tried to use that, and Brozzler doesnt seem to want to go [11:03] My tf2outpost.com grab finished around an hour ago, except for those same five broken URLs as before. [11:05] *** godane has joined #archiveteam-ot [11:05] *** svchfoo1 sets mode: +o godane [11:06] This is a bit smaller than dotaoutpost. I wonder why. [11:08] stormshield seems to be happy to cough data up [11:09] 188k URLs left [11:09] Updated the wiki page. [11:10] Oh, Storm Shield One is still running. Updating again. [11:13] im grabbing a copy anyway [11:13] readyset is uploading [11:14] https://archive.org/details/ReadySet22092018 [11:14] Just to avoid a misunderstanding: I'm not grabbing SS1. I just misinterpreted your message earlier as it being complete. [11:15] Link added [11:15] Also, we should probably move this to -bs or a dedicated channel. [11:17] Hitting 200Mbps single threaded upload today to the IA [11:31] *** godane has quit IRC (Ping timeout: 268 seconds) [12:26] HCross: I see the same thing as yesterday: some files upload at normal speeds, others at <1 MB/s. [12:26] hm, im coming from my own ASN atm - it might be an OVH thing [12:27] Yeah, could be. [12:32] I suspect a congested transit somewhere - or a port [12:33] JAA: can you traceroute 185.186.9.137 please, ive got an idea (from your OVH) [12:38] HCross: It goes to London on OVH's network, then to 195.66.227.147 > 185.186.9.126 > 185.186.9.137. I see a bit of loss on the 195.x.x.x IP and some ping spikes seemingly originating from there. Worst case ping of almost a second... [12:38] Thats fine, thats my LINX IP [12:38] and my router is more focused on dealing with prod traffic, then some ICMP [12:39] if you want... I could setup a wireguard tunnel that you could try [12:41] Ah, I see. Zero experience with WireGuard, but I fear that it might not be easy to set up since my server's running on a quite old software stack. [12:42] *** schbirid has quit IRC (Read error: Operation timed out) [12:42] only thing is, the max you could do is 100Mbit through me [12:44] *** schbirid has joined #archiveteam-ot [12:46] I only have a 200 Mb/s uplink anyway, and I usually don't even push that much either. But unless it slows down even more, I think it's fine for now. Even at the slowest speeds I've seen so far, it should finish in ~24 hours. I still have a few hundred GB disk free, so that's okay. Thanks for the offer though; if the problem gets worse, I might get back to it. [12:46] ok [12:49] You'd also need to write some route rules on your end [16:35] *** wp494 has quit IRC (Ping timeout: 492 seconds) [16:36] *** wp494 has joined #archiveteam-ot [18:00] *** VerifiedJ has joined #archiveteam-ot [18:11] *** Mateon1 has quit IRC (Ping timeout: 492 seconds) [18:11] *** Mateon1 has joined #archiveteam-ot [18:25] *** VerifiedJ has quit IRC (Quit: Leaving) [18:27] *** VerifiedJ has joined #archiveteam-ot [18:29] *** schbirid has quit IRC (Read error: Operation timed out) [18:31] *** schbirid has joined #archiveteam-ot [19:36] *** Kaz has joined #archiveteam-ot [20:05] *** schbirid has quit IRC (Remote host closed the connection) [21:03] *** icedice has joined #archiveteam-ot [22:15] *** godane has joined #archiveteam-ot [22:15] *** svchfoo1 sets mode: +o godane [22:18] *** BlueMax has joined #archiveteam-ot [23:20] *** icedice has quit IRC (Quit: Leaving) [23:21] *** icedice has joined #archiveteam-ot