Time |
Nickname |
Message |
00:00
🔗
|
|
Rondom has quit IRC (Remote host closed the connection) |
00:01
🔗
|
|
Rondom has joined #archiveteam |
00:03
🔗
|
|
Soni has quit IRC (Read error: Operation timed out) |
00:15
🔗
|
|
dboard2 is now known as dboard |
00:59
🔗
|
|
Laverne has quit IRC (Read error: Operation timed out) |
01:24
🔗
|
|
Laverne has joined #archiveteam |
01:47
🔗
|
|
username1 has joined #archiveteam |
01:50
🔗
|
|
schbirid2 has quit IRC (Read error: Operation timed out) |
01:56
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 255 seconds) |
02:46
🔗
|
|
Stilett0 has joined #archiveteam |
03:10
🔗
|
|
Soni has joined #archiveteam |
03:24
🔗
|
|
qw3rty5 has joined #archiveteam |
03:28
🔗
|
|
qw3rty4 has quit IRC (Read error: Operation timed out) |
03:34
🔗
|
SketchCow |
---------------------------------- |
03:34
🔗
|
SketchCow |
All Archiveteam Programmy Nerds |
03:34
🔗
|
SketchCow |
You presence is requested |
03:34
🔗
|
SketchCow |
In #last20 |
03:34
🔗
|
SketchCow |
---------------------------------- |
04:03
🔗
|
|
jrwr has quit IRC (Max SendQ exceeded) |
04:16
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
04:23
🔗
|
|
ZexaronS has joined #archiveteam |
04:23
🔗
|
|
Sk1d has joined #archiveteam |
04:43
🔗
|
|
Stilett0 has quit IRC (Read error: Connection reset by peer) |
04:44
🔗
|
|
Stilett0 has joined #archiveteam |
04:45
🔗
|
|
Stilett0 is now known as Stiletto |
05:08
🔗
|
|
MMovie2 has quit IRC (Ping timeout: 600 seconds) |
05:12
🔗
|
|
MMovie has joined #archiveteam |
05:25
🔗
|
|
ZexaronS has quit IRC (Quit: Leaving) |
05:36
🔗
|
|
ZexaronS has joined #archiveteam |
06:13
🔗
|
Nemo_bis |
SketchCow splendid appearance in http://blog.archive.org/2017/10/13/the-20th-century-time-machine/ :) |
06:15
🔗
|
|
jrwr has joined #archiveteam |
06:17
🔗
|
Nemo_bis |
wow https://archive.org/details/last20 |
06:17
🔗
|
|
Pixi` has quit IRC (Quit: Pixi`) |
06:17
🔗
|
|
Pixi has joined #archiveteam |
06:37
🔗
|
|
K4k has quit IRC (Ping timeout: 255 seconds) |
06:38
🔗
|
|
K4k has joined #archiveteam |
07:01
🔗
|
|
ZexaronS- has joined #archiveteam |
07:02
🔗
|
|
ZexaronS has quit IRC (Ping timeout: 260 seconds) |
07:15
🔗
|
|
Guest has joined #archiveteam |
07:19
🔗
|
|
Guest has quit IRC (Connection closed) |
07:22
🔗
|
|
Soni has quit IRC (Ping timeout: 272 seconds) |
07:39
🔗
|
|
Soni has joined #archiveteam |
07:42
🔗
|
|
atomotic has joined #archiveteam |
07:43
🔗
|
|
hive-mind has quit IRC (Remote host closed the connection) |
07:50
🔗
|
|
hive-mind has joined #archiveteam |
07:50
🔗
|
|
Honno has joined #archiveteam |
09:00
🔗
|
|
Jonison has joined #archiveteam |
09:00
🔗
|
|
pizzaiolo has quit IRC (Quit: pizzaiolo) |
09:32
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
09:37
🔗
|
|
atomotic has quit IRC (Quit: atomotic) |
09:40
🔗
|
|
atomotic has joined #archiveteam |
10:01
🔗
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
10:02
🔗
|
|
Mateon1 has joined #archiveteam |
10:29
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
10:47
🔗
|
|
schbirid2 has joined #archiveteam |
10:50
🔗
|
|
username1 has quit IRC (Read error: Operation timed out) |
11:05
🔗
|
|
icedice has joined #archiveteam |
11:06
🔗
|
|
Valentine has joined #archiveteam |
11:14
🔗
|
|
atomotic has quit IRC (Quit: atomotic) |
11:51
🔗
|
|
atomotic has joined #archiveteam |
11:58
🔗
|
|
c0mpass has joined #archiveteam |
11:59
🔗
|
c0mpass |
I have a question, If I run the warrior on my dedicated, nothing illegal is going though it right? |
12:01
🔗
|
JAA |
Define: "illegal" |
12:02
🔗
|
c0mpass |
Uhhh |
12:02
🔗
|
c0mpass |
You know what I mean |
12:02
🔗
|
JAA |
Strictly speaking, almost everything we archive is protected by copyright, and in some jurisdictions, laws regarding unauthorised access to computer systems might apply. |
12:02
🔗
|
joepie91 |
aside from that, many archived sites are user content sites |
12:02
🔗
|
c0mpass |
Reason why I ask is I have about 40Gbps of available servers |
12:02
🔗
|
joepie91 |
that usually contain technically-illegal content *somewhere* |
12:03
🔗
|
joepie91 |
so the more useful question, I think, is "will I get in trouble for running the warrior" |
12:03
🔗
|
c0mpass |
Yeah |
12:03
🔗
|
c0mpass |
Thats basically it |
12:03
🔗
|
joepie91 |
to which my answer would be "usually not, but if you're doing 40gps, that might change" |
12:03
🔗
|
joepie91 |
mostly because at 40gbps the site owners start going "wtf?" :) |
12:03
🔗
|
c0mpass |
Lmao |
12:03
🔗
|
joepie91 |
c0mpass: depending on the amount of storage space, you may be better off running an rsync target |
12:03
🔗
|
joepie91 |
ie. a collection server where warriors send their archived data to |
12:04
🔗
|
joepie91 |
before it ends up in the Internet Archive |
12:04
🔗
|
c0mpass |
All low storage, NVME servers. |
12:04
🔗
|
joepie91 |
ah, crap |
12:04
🔗
|
c0mpass |
Work gives me 4 10Gbps servers free as a perk |
12:04
🔗
|
JAA |
It's also worth mentioning that the warrior projects are usually rate limited, so you wouldn't actually fire 40 Gbit/s at the targets. |
12:04
🔗
|
c0mpass |
I just dont want to get fired becuase of downloading illegal stuff on them |
12:04
🔗
|
joepie91 |
c0mpass: so I'd say that it's probably safe to run the warrior (I don't think anybody's ever gotten in trouble for it? automated IP bans at worst), but I wouldn't try to do so at 40gbps |
12:05
🔗
|
joepie91 |
basically, make it not come across as an attack |
12:05
🔗
|
c0mpass |
I mean I could throttle it to gigabit |
12:05
🔗
|
c0mpass |
even 500 meg |
12:05
🔗
|
joepie91 |
yeah, you'd probably want to throttle to way less |
12:05
🔗
|
joepie91 |
500mbps is probably the upper bound of what you can get away with before site owners start asking questions |
12:05
🔗
|
joepie91 |
(ballpark guess, mind) |
12:05
🔗
|
joepie91 |
also depends whether it's all from the same IP range, etc. |
12:05
🔗
|
c0mpass |
I mean if I just do the yahoo answers thing then I should have no issues at 40 |
12:06
🔗
|
JAA |
Yahoo throttles heavily. |
12:06
🔗
|
c0mpass |
IP's are all in the same block |
12:06
🔗
|
joepie91 |
right. then you'd want to maintain one ratelimit for all of them |
12:06
🔗
|
c0mpass |
Second question. |
12:06
🔗
|
joepie91 |
probably safe to run with tens of threads for most projects, especially partaking in the higher-bandwidth ones like video sites |
12:06
🔗
|
c0mpass |
If I were to do this though a VPN |
12:06
🔗
|
joepie91 |
just not the heavily throttled projects :) |
12:07
🔗
|
joepie91 |
c0mpass: it's generally discouraged to run warriors on anything other than a direct uncensored pipe to the internet, because there are too many factors inbetween that could corrupt the data |
12:07
🔗
|
joepie91 |
provider cockups, block pages, etc. |
12:07
🔗
|
c0mpass |
thats what I thought |
12:08
🔗
|
joepie91 |
even adding a VPN would basically double the amount of parties that could be messing up the responses :P |
12:08
🔗
|
c0mpass |
Hi BartoCH |
12:08
🔗
|
BartoCH |
hullo |
12:08
🔗
|
c0mpass |
Yeah figured. |
12:08
🔗
|
c0mpass |
BartoCH: yes. |
12:08
🔗
|
BartoCH |
hrhr |
12:08
🔗
|
c0mpass |
Okay well I'll set this up on one server and see how it goes |
12:10
🔗
|
joepie91 |
c0mpass: hm, only just realized we're in #archiveteam. if you have further questions, prefer to switch to #archiveteam-bs as this channel is mostly for low-noise announcements and "oh no this site is dying, did you hear" type messages :) |
12:10
🔗
|
c0mpass |
Ohhh so sorry |
13:00
🔗
|
|
atomotic has quit IRC (Quit: atomotic) |
13:24
🔗
|
Valentine |
hi all, can I get the Archive Team's help to save the news site AsiaOne, which might shut down as early as next month? https://sg.news.yahoo.com/sph-news-aggregator-site-asiaone-close-090754309.html |
13:30
🔗
|
JAA |
I'll throw it into ArchiveBot. Because there is a huge queue currently, I'm not sure if it will be grabbed in time, but let's try... |
13:36
🔗
|
JAA |
Note, that won't grab everything, e.g. no videos (I think). |
14:12
🔗
|
Valentine |
that's fine, thanks! |
14:13
🔗
|
|
godane has quit IRC (Quit: Leaving.) |
14:57
🔗
|
|
godane has joined #archiveteam |
15:06
🔗
|
|
klapperst has joined #archiveteam |
15:07
🔗
|
klapperst |
hi |
15:13
🔗
|
|
Jonison has quit IRC (Read error: Connection reset by peer) |
15:20
🔗
|
|
ZexaronS- has quit IRC (Quit: Leaving) |
15:32
🔗
|
|
atomotic has joined #archiveteam |
15:38
🔗
|
|
schbirid2 has quit IRC (Quit: Leaving) |
15:48
🔗
|
|
Xe has quit IRC (Max SendQ exceeded) |
15:52
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
16:02
🔗
|
|
Xe has joined #archiveteam |
16:06
🔗
|
|
icedice has joined #archiveteam |
16:07
🔗
|
|
klapperst has quit IRC (Quit: Page closed) |
16:34
🔗
|
|
schbirid has joined #archiveteam |
16:41
🔗
|
|
atomotic has quit IRC (Quit: atomotic) |
16:42
🔗
|
|
ZexaronS has joined #archiveteam |
17:24
🔗
|
|
ZexaronS has quit IRC (Quit: Leaving) |
17:47
🔗
|
|
Starholme has joined #archiveteam |
17:55
🔗
|
|
kepler45 has joined #archiveteam |
17:59
🔗
|
|
bRick5772 has joined #archiveteam |
18:41
🔗
|
|
icedice has quit IRC (Read error: Connection reset by peer) |
18:42
🔗
|
|
icedice has joined #archiveteam |
19:07
🔗
|
|
kris33 has joined #archiveteam |
19:25
🔗
|
|
atrocity has quit IRC (Read error: Operation timed out) |
19:36
🔗
|
|
kris33 has quit IRC (Textual IRC Client: www.textualapp.com) |
20:44
🔗
|
atomicthu |
https://scrapinghub.com/platform this looks extremely useful if a bit expensive |
20:47
🔗
|
joepie91 |
atomicthu: that looks fantastically proprietary :) |
20:47
🔗
|
atomicthu |
yep |
20:47
🔗
|
atomicthu |
not a thing you can download, just a serve |
20:47
🔗
|
atomicthu |
*service |
20:47
🔗
|
atomicthu |
since it's 2017 and Gotta Make Mad VC Cash yo |
20:47
🔗
|
joepie91 |
heh |
20:47
🔗
|
joepie91 |
"NPM package as a service" |
20:48
🔗
|
joepie91 |
more seriously; probably not useful for archiveteam |
20:48
🔗
|
joepie91 |
due to its proprietary nature |
21:01
🔗
|
|
MMovie has quit IRC (Read error: Operation timed out) |
21:09
🔗
|
|
bRick5772 has quit IRC (Quit: Leaving.) |
21:29
🔗
|
|
Honno has joined #archiveteam |
21:42
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:47
🔗
|
|
MMovie has joined #archiveteam |
21:58
🔗
|
|
MMovie2 has joined #archiveteam |
22:02
🔗
|
|
Valentin- has joined #archiveteam |
22:02
🔗
|
|
MMovie has quit IRC (Read error: Operation timed out) |
22:03
🔗
|
|
Valentine has quit IRC (Ping timeout: 506 seconds) |
22:06
🔗
|
|
MMovie has joined #archiveteam |
22:10
🔗
|
|
MMovie2 has quit IRC (Read error: Operation timed out) |
22:21
🔗
|
|
underscor has quit IRC (Quit: No Ping reply in 180 seconds.) |
22:22
🔗
|
|
underscor has joined #archiveteam |
22:22
🔗
|
|
swebb sets mode: +o underscor |
22:26
🔗
|
atomicthu |
joepie91: i was more looking at the "crawlera" part since it works as a proxy |
22:26
🔗
|
atomicthu |
might be useful for sites that limit bandwidth per-IP |
23:00
🔗
|
|
Starholme has quit IRC (Quit: Page closed) |
23:12
🔗
|
|
dashcloud has joined #archiveteam |
23:21
🔗
|
|
kepler45 has quit IRC (Quit: Leaving) |
23:27
🔗
|
|
MMovie2 has joined #archiveteam |
23:28
🔗
|
|
Gfy has quit IRC (Read error: Operation timed out) |
23:28
🔗
|
|
MMovie has quit IRC (Read error: Operation timed out) |
23:31
🔗
|
|
BlueMaxim has joined #archiveteam |
23:32
🔗
|
|
Gfy has joined #archiveteam |
23:39
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
23:55
🔗
|
|
PotcFdk has quit IRC (~'o'/) |
23:59
🔗
|
|
MMovie has joined #archiveteam |