#archiveteam-ot 2018-09-23,Sun

↑back Search

Time Nickname Message
01:19 🔗 JAA Dear Red Hat, you and your subscriber-only knowledge base can kindly go fuck yourself. Regards, JAA
01:35 🔗 mal JAA: can you still get to it if you set your user-agent to one of the google crawler ones?
01:53 🔗 JAA mal: Nope, doesn't look like it.
02:02 🔗 mal =(
03:48 🔗 odemg_ has quit IRC (Ping timeout: 268 seconds)
04:00 🔗 odemg_ has joined #archiveteam-ot
05:50 🔗 godane has quit IRC (Ping timeout: 506 seconds)
05:59 🔗 Stilett0 has joined #archiveteam-ot
05:59 🔗 Hecatz has quit IRC (Ping timeout: 268 seconds)
06:01 🔗 Stiletto has quit IRC (Ping timeout: 268 seconds)
06:02 🔗 Jon has quit IRC (Ping timeout: 268 seconds)
06:02 🔗 kiskabak has quit IRC (Ping timeout: 268 seconds)
06:02 🔗 Jon- has joined #archiveteam-ot
06:02 🔗 Kaz has quit IRC (Ping timeout: 268 seconds)
06:03 🔗 Kaz has joined #archiveteam-ot
06:05 🔗 svchfoo1 has quit IRC (Ping timeout: 268 seconds)
06:05 🔗 Hecatz has joined #archiveteam-ot
06:16 🔗 Kaz has quit IRC (se.hub efnet.portlane.se)
06:16 🔗 odemg_ has quit IRC (se.hub efnet.portlane.se)
06:16 🔗 Mateon1 has quit IRC (se.hub efnet.portlane.se)
06:22 🔗 odemg has joined #archiveteam-ot
06:35 🔗 Mateon1 has joined #archiveteam-ot
07:56 🔗 svchfoo1 has joined #archiveteam-ot
07:57 🔗 svchfoo3 sets mode: +o svchfoo1
08:24 🔗 w0rmhole ivan: quick question, sorry to bug you, can you feed `grab-site' urls from a file?
08:24 🔗 w0rmhole i.e. `$ grab-site --file=/path/to/urls.txt'
08:25 🔗 ivan w0rmhole: -i file
08:25 🔗 w0rmhole oh that easy? thank you :)
08:26 🔗 ivan yep
08:26 🔗 w0rmhole does it combine it into a single warc?
08:26 🔗 ivan it's all part of the same crawl
08:27 🔗 ivan note grab-site rolls over to a new WARC after 5GB by default
08:27 🔗 w0rmhole oh ok i see
08:27 🔗 w0rmhole thank you :)
08:34 🔗 godane has joined #archiveteam-ot
08:34 🔗 svchfoo1 sets mode: +o godane
08:35 🔗 schbirid has joined #archiveteam-ot
09:00 🔗 schbirid has quit IRC (Remote host closed the connection)
09:01 🔗 HCross JAA: readyset finished on my side, 44GB. YouTube pull so far has done 150GB. Hearthhead only pulled down 2.3GB before finishing.. rerunning
09:02 🔗 HCross stormone has done 21GB
09:02 🔗 HCross *stormshielf
09:02 🔗 HCross d
09:26 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
09:55 🔗 schbirid has joined #archiveteam-ot
10:00 🔗 kiskabak has joined #archiveteam-ot
10:01 🔗 schbirid today i am moving my porn to an external disk, what a proud day
10:01 🔗 w0rmhole sorry to bother you again ivan. is there a way to both `--1' a webpage, as well as recursively archive websites all in the same WARC? for example, something like this: `$ grab-site --1="https://websitenottoberecursed.com/" -i /path/to/file/containing/links/to/be/recusced'
10:05 🔗 ivan w0rmhole: you could upload a page somewhere that links to every page you don't want recursed
10:06 🔗 ivan there might also be some way to tamper with the queue in the sqlite database after the crawl has started, but I have nothing to do that
10:07 🔗 ivan the answer to your question is basically "no", unless someone else thinks of something
10:09 🔗 Flashfire Schbirid why not add to the internet archive?
10:09 🔗 Flashfire it will only get darked for a while
10:10 🔗 schbirid way too risky to my porn interests online :P
10:13 🔗 Flashfire A dummy account works
10:15 🔗 jut_ Why would you download porn?
10:15 🔗 ivan are you new here
10:15 🔗 jut_ yes...
10:17 🔗 jut_ Unless it's a backup of an endanndered porn site
10:19 🔗 ivan you're in a place where people redirect their anxiety at archiving everything they have the slightest attachment to
10:19 🔗 ivan people have that with porn as with everything else
10:20 🔗 w0rmhole ivan: so, for now, i would need to create a webpage/pastebin entry containing all of the `--1' urls, add that to the local file containing the urls i want recursed, and feed that into grab-site?
10:20 🔗 w0rmhole sorry my english is not the best, so i sometimes struggle to understand
10:20 🔗 ivan w0rmhole: yeah, I think that might work
10:21 🔗 w0rmhole thank you, i will give that a try
10:25 🔗 schbirid jut_: dude, that certain image or movie might not even be online anymore
10:26 🔗 w0rmhole yeah
10:42 🔗 godane has quit IRC (Ping timeout: 260 seconds)
10:46 🔗 godane has joined #archiveteam-ot
10:46 🔗 svchfoo3 sets mode: +o godane
10:49 🔗 godane has quit IRC (Read error: Operation timed out)
10:54 🔗 godane has joined #archiveteam-ot
10:55 🔗 svchfoo3 sets mode: +o godane
10:58 🔗 godane has quit IRC (Read error: Operation timed out)
11:00 🔗 JAA HCross: Yeah, hearthhead uses JS for everything. That won't be easy to archive fully.
11:00 🔗 HCross yea.. phantomjs just vomited errors at me when I tried to use that, and Brozzler doesnt seem to want to go
11:03 🔗 JAA My tf2outpost.com grab finished around an hour ago, except for those same five broken URLs as before.
11:05 🔗 godane has joined #archiveteam-ot
11:05 🔗 svchfoo1 sets mode: +o godane
11:06 🔗 JAA This is a bit smaller than dotaoutpost. I wonder why.
11:08 🔗 HCross stormshield seems to be happy to cough data up
11:09 🔗 HCross 188k URLs left
11:09 🔗 JAA Updated the wiki page.
11:10 🔗 JAA Oh, Storm Shield One is still running. Updating again.
11:13 🔗 HCross im grabbing a copy anyway
11:13 🔗 HCross readyset is uploading
11:14 🔗 HCross https://archive.org/details/ReadySet22092018
11:14 🔗 JAA Just to avoid a misunderstanding: I'm not grabbing SS1. I just misinterpreted your message earlier as it being complete.
11:15 🔗 JAA Link added
11:15 🔗 JAA Also, we should probably move this to -bs or a dedicated channel.
11:17 🔗 HCross Hitting 200Mbps single threaded upload today to the IA
11:31 🔗 godane has quit IRC (Ping timeout: 268 seconds)
12:26 🔗 JAA HCross: I see the same thing as yesterday: some files upload at normal speeds, others at <1 MB/s.
12:26 🔗 HCross hm, im coming from my own ASN atm - it might be an OVH thing
12:27 🔗 JAA Yeah, could be.
12:32 🔗 HCross I suspect a congested transit somewhere - or a port
12:33 🔗 HCross JAA: can you traceroute 185.186.9.137 please, ive got an idea (from your OVH)
12:38 🔗 JAA HCross: It goes to London on OVH's network, then to 195.66.227.147 > 185.186.9.126 > 185.186.9.137. I see a bit of loss on the 195.x.x.x IP and some ping spikes seemingly originating from there. Worst case ping of almost a second...
12:38 🔗 HCross Thats fine, thats my LINX IP
12:38 🔗 HCross and my router is more focused on dealing with prod traffic, then some ICMP
12:39 🔗 HCross if you want... I could setup a wireguard tunnel that you could try
12:41 🔗 JAA Ah, I see. Zero experience with WireGuard, but I fear that it might not be easy to set up since my server's running on a quite old software stack.
12:42 🔗 schbirid has quit IRC (Read error: Operation timed out)
12:42 🔗 HCross only thing is, the max you could do is 100Mbit through me
12:44 🔗 schbirid has joined #archiveteam-ot
12:46 🔗 JAA I only have a 200 Mb/s uplink anyway, and I usually don't even push that much either. But unless it slows down even more, I think it's fine for now. Even at the slowest speeds I've seen so far, it should finish in ~24 hours. I still have a few hundred GB disk free, so that's okay. Thanks for the offer though; if the problem gets worse, I might get back to it.
12:46 🔗 HCross ok
12:49 🔗 HCross You'd also need to write some route rules on your end
16:35 🔗 wp494 has quit IRC (Ping timeout: 492 seconds)
16:36 🔗 wp494 has joined #archiveteam-ot
18:00 🔗 VerifiedJ has joined #archiveteam-ot
18:11 🔗 Mateon1 has quit IRC (Ping timeout: 492 seconds)
18:11 🔗 Mateon1 has joined #archiveteam-ot
18:25 🔗 VerifiedJ has quit IRC (Quit: Leaving)
18:27 🔗 VerifiedJ has joined #archiveteam-ot
18:29 🔗 schbirid has quit IRC (Read error: Operation timed out)
18:31 🔗 schbirid has joined #archiveteam-ot
19:36 🔗 Kaz has joined #archiveteam-ot
20:05 🔗 schbirid has quit IRC (Remote host closed the connection)
21:03 🔗 icedice has joined #archiveteam-ot
22:15 🔗 godane has joined #archiveteam-ot
22:15 🔗 svchfoo1 sets mode: +o godane
22:18 🔗 BlueMax has joined #archiveteam-ot
23:20 🔗 icedice has quit IRC (Quit: Leaving)
23:21 🔗 icedice has joined #archiveteam-ot

irclogger-viewer