Time |
Nickname |
Message |
01:19
🔗
|
JAA |
Dear Red Hat, you and your subscriber-only knowledge base can kindly go fuck yourself. Regards, JAA |
01:35
🔗
|
mal |
JAA: can you still get to it if you set your user-agent to one of the google crawler ones? |
01:53
🔗
|
JAA |
mal: Nope, doesn't look like it. |
02:02
🔗
|
mal |
=( |
03:48
🔗
|
|
odemg_ has quit IRC (Ping timeout: 268 seconds) |
04:00
🔗
|
|
odemg_ has joined #archiveteam-ot |
05:50
🔗
|
|
godane has quit IRC (Ping timeout: 506 seconds) |
05:59
🔗
|
|
Stilett0 has joined #archiveteam-ot |
05:59
🔗
|
|
Hecatz has quit IRC (Ping timeout: 268 seconds) |
06:01
🔗
|
|
Stiletto has quit IRC (Ping timeout: 268 seconds) |
06:02
🔗
|
|
Jon has quit IRC (Ping timeout: 268 seconds) |
06:02
🔗
|
|
kiskabak has quit IRC (Ping timeout: 268 seconds) |
06:02
🔗
|
|
Jon- has joined #archiveteam-ot |
06:02
🔗
|
|
Kaz has quit IRC (Ping timeout: 268 seconds) |
06:03
🔗
|
|
Kaz has joined #archiveteam-ot |
06:05
🔗
|
|
svchfoo1 has quit IRC (Ping timeout: 268 seconds) |
06:05
🔗
|
|
Hecatz has joined #archiveteam-ot |
06:16
🔗
|
|
Kaz has quit IRC (se.hub efnet.portlane.se) |
06:16
🔗
|
|
odemg_ has quit IRC (se.hub efnet.portlane.se) |
06:16
🔗
|
|
Mateon1 has quit IRC (se.hub efnet.portlane.se) |
06:22
🔗
|
|
odemg has joined #archiveteam-ot |
06:35
🔗
|
|
Mateon1 has joined #archiveteam-ot |
07:56
🔗
|
|
svchfoo1 has joined #archiveteam-ot |
07:57
🔗
|
|
svchfoo3 sets mode: +o svchfoo1 |
08:24
🔗
|
w0rmhole |
ivan: quick question, sorry to bug you, can you feed `grab-site' urls from a file? |
08:24
🔗
|
w0rmhole |
i.e. `$ grab-site --file=/path/to/urls.txt' |
08:25
🔗
|
ivan |
w0rmhole: -i file |
08:25
🔗
|
w0rmhole |
oh that easy? thank you :) |
08:26
🔗
|
ivan |
yep |
08:26
🔗
|
w0rmhole |
does it combine it into a single warc? |
08:26
🔗
|
ivan |
it's all part of the same crawl |
08:27
🔗
|
ivan |
note grab-site rolls over to a new WARC after 5GB by default |
08:27
🔗
|
w0rmhole |
oh ok i see |
08:27
🔗
|
w0rmhole |
thank you :) |
08:34
🔗
|
|
godane has joined #archiveteam-ot |
08:34
🔗
|
|
svchfoo1 sets mode: +o godane |
08:35
🔗
|
|
schbirid has joined #archiveteam-ot |
09:00
🔗
|
|
schbirid has quit IRC (Remote host closed the connection) |
09:01
🔗
|
HCross |
JAA: readyset finished on my side, 44GB. YouTube pull so far has done 150GB. Hearthhead only pulled down 2.3GB before finishing.. rerunning |
09:02
🔗
|
HCross |
stormone has done 21GB |
09:02
🔗
|
HCross |
*stormshielf |
09:02
🔗
|
HCross |
d |
09:26
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
09:55
🔗
|
|
schbirid has joined #archiveteam-ot |
10:00
🔗
|
|
kiskabak has joined #archiveteam-ot |
10:01
🔗
|
schbirid |
today i am moving my porn to an external disk, what a proud day |
10:01
🔗
|
w0rmhole |
sorry to bother you again ivan. is there a way to both `--1' a webpage, as well as recursively archive websites all in the same WARC? for example, something like this: `$ grab-site --1="https://websitenottoberecursed.com/" -i /path/to/file/containing/links/to/be/recusced' |
10:05
🔗
|
ivan |
w0rmhole: you could upload a page somewhere that links to every page you don't want recursed |
10:06
🔗
|
ivan |
there might also be some way to tamper with the queue in the sqlite database after the crawl has started, but I have nothing to do that |
10:07
🔗
|
ivan |
the answer to your question is basically "no", unless someone else thinks of something |
10:09
🔗
|
Flashfire |
Schbirid why not add to the internet archive? |
10:09
🔗
|
Flashfire |
it will only get darked for a while |
10:10
🔗
|
schbirid |
way too risky to my porn interests online :P |
10:13
🔗
|
Flashfire |
A dummy account works |
10:15
🔗
|
jut_ |
Why would you download porn? |
10:15
🔗
|
ivan |
are you new here |
10:15
🔗
|
jut_ |
yes... |
10:17
🔗
|
jut_ |
Unless it's a backup of an endanndered porn site |
10:19
🔗
|
ivan |
you're in a place where people redirect their anxiety at archiving everything they have the slightest attachment to |
10:19
🔗
|
ivan |
people have that with porn as with everything else |
10:20
🔗
|
w0rmhole |
ivan: so, for now, i would need to create a webpage/pastebin entry containing all of the `--1' urls, add that to the local file containing the urls i want recursed, and feed that into grab-site? |
10:20
🔗
|
w0rmhole |
sorry my english is not the best, so i sometimes struggle to understand |
10:20
🔗
|
ivan |
w0rmhole: yeah, I think that might work |
10:21
🔗
|
w0rmhole |
thank you, i will give that a try |
10:25
🔗
|
schbirid |
jut_: dude, that certain image or movie might not even be online anymore |
10:26
🔗
|
w0rmhole |
yeah |
10:42
🔗
|
|
godane has quit IRC (Ping timeout: 260 seconds) |
10:46
🔗
|
|
godane has joined #archiveteam-ot |
10:46
🔗
|
|
svchfoo3 sets mode: +o godane |
10:49
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
10:54
🔗
|
|
godane has joined #archiveteam-ot |
10:55
🔗
|
|
svchfoo3 sets mode: +o godane |
10:58
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
11:00
🔗
|
JAA |
HCross: Yeah, hearthhead uses JS for everything. That won't be easy to archive fully. |
11:00
🔗
|
HCross |
yea.. phantomjs just vomited errors at me when I tried to use that, and Brozzler doesnt seem to want to go |
11:03
🔗
|
JAA |
My tf2outpost.com grab finished around an hour ago, except for those same five broken URLs as before. |
11:05
🔗
|
|
godane has joined #archiveteam-ot |
11:05
🔗
|
|
svchfoo1 sets mode: +o godane |
11:06
🔗
|
JAA |
This is a bit smaller than dotaoutpost. I wonder why. |
11:08
🔗
|
HCross |
stormshield seems to be happy to cough data up |
11:09
🔗
|
HCross |
188k URLs left |
11:09
🔗
|
JAA |
Updated the wiki page. |
11:10
🔗
|
JAA |
Oh, Storm Shield One is still running. Updating again. |
11:13
🔗
|
HCross |
im grabbing a copy anyway |
11:13
🔗
|
HCross |
readyset is uploading |
11:14
🔗
|
HCross |
https://archive.org/details/ReadySet22092018 |
11:14
🔗
|
JAA |
Just to avoid a misunderstanding: I'm not grabbing SS1. I just misinterpreted your message earlier as it being complete. |
11:15
🔗
|
JAA |
Link added |
11:15
🔗
|
JAA |
Also, we should probably move this to -bs or a dedicated channel. |
11:17
🔗
|
HCross |
Hitting 200Mbps single threaded upload today to the IA |
11:31
🔗
|
|
godane has quit IRC (Ping timeout: 268 seconds) |
12:26
🔗
|
JAA |
HCross: I see the same thing as yesterday: some files upload at normal speeds, others at <1 MB/s. |
12:26
🔗
|
HCross |
hm, im coming from my own ASN atm - it might be an OVH thing |
12:27
🔗
|
JAA |
Yeah, could be. |
12:32
🔗
|
HCross |
I suspect a congested transit somewhere - or a port |
12:33
🔗
|
HCross |
JAA: can you traceroute 185.186.9.137 please, ive got an idea (from your OVH) |
12:38
🔗
|
JAA |
HCross: It goes to London on OVH's network, then to 195.66.227.147 > 185.186.9.126 > 185.186.9.137. I see a bit of loss on the 195.x.x.x IP and some ping spikes seemingly originating from there. Worst case ping of almost a second... |
12:38
🔗
|
HCross |
Thats fine, thats my LINX IP |
12:38
🔗
|
HCross |
and my router is more focused on dealing with prod traffic, then some ICMP |
12:39
🔗
|
HCross |
if you want... I could setup a wireguard tunnel that you could try |
12:41
🔗
|
JAA |
Ah, I see. Zero experience with WireGuard, but I fear that it might not be easy to set up since my server's running on a quite old software stack. |
12:42
🔗
|
|
schbirid has quit IRC (Read error: Operation timed out) |
12:42
🔗
|
HCross |
only thing is, the max you could do is 100Mbit through me |
12:44
🔗
|
|
schbirid has joined #archiveteam-ot |
12:46
🔗
|
JAA |
I only have a 200 Mb/s uplink anyway, and I usually don't even push that much either. But unless it slows down even more, I think it's fine for now. Even at the slowest speeds I've seen so far, it should finish in ~24 hours. I still have a few hundred GB disk free, so that's okay. Thanks for the offer though; if the problem gets worse, I might get back to it. |
12:46
🔗
|
HCross |
ok |
12:49
🔗
|
HCross |
You'd also need to write some route rules on your end |
16:35
🔗
|
|
wp494 has quit IRC (Ping timeout: 492 seconds) |
16:36
🔗
|
|
wp494 has joined #archiveteam-ot |
18:00
🔗
|
|
VerifiedJ has joined #archiveteam-ot |
18:11
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 492 seconds) |
18:11
🔗
|
|
Mateon1 has joined #archiveteam-ot |
18:25
🔗
|
|
VerifiedJ has quit IRC (Quit: Leaving) |
18:27
🔗
|
|
VerifiedJ has joined #archiveteam-ot |
18:29
🔗
|
|
schbirid has quit IRC (Read error: Operation timed out) |
18:31
🔗
|
|
schbirid has joined #archiveteam-ot |
19:36
🔗
|
|
Kaz has joined #archiveteam-ot |
20:05
🔗
|
|
schbirid has quit IRC (Remote host closed the connection) |
21:03
🔗
|
|
icedice has joined #archiveteam-ot |
22:15
🔗
|
|
godane has joined #archiveteam-ot |
22:15
🔗
|
|
svchfoo1 sets mode: +o godane |
22:18
🔗
|
|
BlueMax has joined #archiveteam-ot |
23:20
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
23:21
🔗
|
|
icedice has joined #archiveteam-ot |