| Time |
Nickname |
Message |
|
00:00
🔗
|
|
Rondom has quit IRC (Remote host closed the connection) |
|
00:01
🔗
|
|
Rondom has joined #archiveteam |
|
00:03
🔗
|
|
Soni has quit IRC (Read error: Operation timed out) |
|
00:15
🔗
|
|
dboard2 is now known as dboard |
|
00:59
🔗
|
|
Laverne has quit IRC (Read error: Operation timed out) |
|
01:24
🔗
|
|
Laverne has joined #archiveteam |
|
01:47
🔗
|
|
username1 has joined #archiveteam |
|
01:50
🔗
|
|
schbirid2 has quit IRC (Read error: Operation timed out) |
|
01:56
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 255 seconds) |
|
02:46
🔗
|
|
Stilett0 has joined #archiveteam |
|
03:10
🔗
|
|
Soni has joined #archiveteam |
|
03:24
🔗
|
|
qw3rty5 has joined #archiveteam |
|
03:28
🔗
|
|
qw3rty4 has quit IRC (Read error: Operation timed out) |
|
03:34
🔗
|
SketchCow |
---------------------------------- |
|
03:34
🔗
|
SketchCow |
All Archiveteam Programmy Nerds |
|
03:34
🔗
|
SketchCow |
You presence is requested |
|
03:34
🔗
|
SketchCow |
In #last20 |
|
03:34
🔗
|
SketchCow |
---------------------------------- |
|
04:03
🔗
|
|
jrwr has quit IRC (Max SendQ exceeded) |
|
04:16
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
|
04:23
🔗
|
|
ZexaronS has joined #archiveteam |
|
04:23
🔗
|
|
Sk1d has joined #archiveteam |
|
04:43
🔗
|
|
Stilett0 has quit IRC (Read error: Connection reset by peer) |
|
04:44
🔗
|
|
Stilett0 has joined #archiveteam |
|
04:45
🔗
|
|
Stilett0 is now known as Stiletto |
|
05:08
🔗
|
|
MMovie2 has quit IRC (Ping timeout: 600 seconds) |
|
05:12
🔗
|
|
MMovie has joined #archiveteam |
|
05:25
🔗
|
|
ZexaronS has quit IRC (Quit: Leaving) |
|
05:36
🔗
|
|
ZexaronS has joined #archiveteam |
|
06:13
🔗
|
Nemo_bis |
SketchCow splendid appearance in http://blog.archive.org/2017/10/13/the-20th-century-time-machine/ :) |
|
06:15
🔗
|
|
jrwr has joined #archiveteam |
|
06:17
🔗
|
Nemo_bis |
wow https://archive.org/details/last20 |
|
06:17
🔗
|
|
Pixi` has quit IRC (Quit: Pixi`) |
|
06:17
🔗
|
|
Pixi has joined #archiveteam |
|
06:37
🔗
|
|
K4k has quit IRC (Ping timeout: 255 seconds) |
|
06:38
🔗
|
|
K4k has joined #archiveteam |
|
07:01
🔗
|
|
ZexaronS- has joined #archiveteam |
|
07:02
🔗
|
|
ZexaronS has quit IRC (Ping timeout: 260 seconds) |
|
07:15
🔗
|
|
Guest has joined #archiveteam |
|
07:19
🔗
|
|
Guest has quit IRC (Connection closed) |
|
07:22
🔗
|
|
Soni has quit IRC (Ping timeout: 272 seconds) |
|
07:39
🔗
|
|
Soni has joined #archiveteam |
|
07:42
🔗
|
|
atomotic has joined #archiveteam |
|
07:43
🔗
|
|
hive-mind has quit IRC (Remote host closed the connection) |
|
07:50
🔗
|
|
hive-mind has joined #archiveteam |
|
07:50
🔗
|
|
Honno has joined #archiveteam |
|
09:00
🔗
|
|
Jonison has joined #archiveteam |
|
09:00
🔗
|
|
pizzaiolo has quit IRC (Quit: pizzaiolo) |
|
09:32
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
|
09:37
🔗
|
|
atomotic has quit IRC (Quit: atomotic) |
|
09:40
🔗
|
|
atomotic has joined #archiveteam |
|
10:01
🔗
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
|
10:02
🔗
|
|
Mateon1 has joined #archiveteam |
|
10:29
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
|
10:47
🔗
|
|
schbirid2 has joined #archiveteam |
|
10:50
🔗
|
|
username1 has quit IRC (Read error: Operation timed out) |
|
11:05
🔗
|
|
icedice has joined #archiveteam |
|
11:06
🔗
|
|
Valentine has joined #archiveteam |
|
11:14
🔗
|
|
atomotic has quit IRC (Quit: atomotic) |
|
11:51
🔗
|
|
atomotic has joined #archiveteam |
|
11:58
🔗
|
|
c0mpass has joined #archiveteam |
|
11:59
🔗
|
c0mpass |
I have a question, If I run the warrior on my dedicated, nothing illegal is going though it right? |
|
12:01
🔗
|
JAA |
Define: "illegal" |
|
12:02
🔗
|
c0mpass |
Uhhh |
|
12:02
🔗
|
c0mpass |
You know what I mean |
|
12:02
🔗
|
JAA |
Strictly speaking, almost everything we archive is protected by copyright, and in some jurisdictions, laws regarding unauthorised access to computer systems might apply. |
|
12:02
🔗
|
joepie91 |
aside from that, many archived sites are user content sites |
|
12:02
🔗
|
c0mpass |
Reason why I ask is I have about 40Gbps of available servers |
|
12:02
🔗
|
joepie91 |
that usually contain technically-illegal content *somewhere* |
|
12:03
🔗
|
joepie91 |
so the more useful question, I think, is "will I get in trouble for running the warrior" |
|
12:03
🔗
|
c0mpass |
Yeah |
|
12:03
🔗
|
c0mpass |
Thats basically it |
|
12:03
🔗
|
joepie91 |
to which my answer would be "usually not, but if you're doing 40gps, that might change" |
|
12:03
🔗
|
joepie91 |
mostly because at 40gbps the site owners start going "wtf?" :) |
|
12:03
🔗
|
c0mpass |
Lmao |
|
12:03
🔗
|
joepie91 |
c0mpass: depending on the amount of storage space, you may be better off running an rsync target |
|
12:03
🔗
|
joepie91 |
ie. a collection server where warriors send their archived data to |
|
12:04
🔗
|
joepie91 |
before it ends up in the Internet Archive |
|
12:04
🔗
|
c0mpass |
All low storage, NVME servers. |
|
12:04
🔗
|
joepie91 |
ah, crap |
|
12:04
🔗
|
c0mpass |
Work gives me 4 10Gbps servers free as a perk |
|
12:04
🔗
|
JAA |
It's also worth mentioning that the warrior projects are usually rate limited, so you wouldn't actually fire 40 Gbit/s at the targets. |
|
12:04
🔗
|
c0mpass |
I just dont want to get fired becuase of downloading illegal stuff on them |
|
12:04
🔗
|
joepie91 |
c0mpass: so I'd say that it's probably safe to run the warrior (I don't think anybody's ever gotten in trouble for it? automated IP bans at worst), but I wouldn't try to do so at 40gbps |
|
12:05
🔗
|
joepie91 |
basically, make it not come across as an attack |
|
12:05
🔗
|
c0mpass |
I mean I could throttle it to gigabit |
|
12:05
🔗
|
c0mpass |
even 500 meg |
|
12:05
🔗
|
joepie91 |
yeah, you'd probably want to throttle to way less |
|
12:05
🔗
|
joepie91 |
500mbps is probably the upper bound of what you can get away with before site owners start asking questions |
|
12:05
🔗
|
joepie91 |
(ballpark guess, mind) |
|
12:05
🔗
|
joepie91 |
also depends whether it's all from the same IP range, etc. |
|
12:05
🔗
|
c0mpass |
I mean if I just do the yahoo answers thing then I should have no issues at 40 |
|
12:06
🔗
|
JAA |
Yahoo throttles heavily. |
|
12:06
🔗
|
c0mpass |
IP's are all in the same block |
|
12:06
🔗
|
joepie91 |
right. then you'd want to maintain one ratelimit for all of them |
|
12:06
🔗
|
c0mpass |
Second question. |
|
12:06
🔗
|
joepie91 |
probably safe to run with tens of threads for most projects, especially partaking in the higher-bandwidth ones like video sites |
|
12:06
🔗
|
c0mpass |
If I were to do this though a VPN |
|
12:06
🔗
|
joepie91 |
just not the heavily throttled projects :) |
|
12:07
🔗
|
joepie91 |
c0mpass: it's generally discouraged to run warriors on anything other than a direct uncensored pipe to the internet, because there are too many factors inbetween that could corrupt the data |
|
12:07
🔗
|
joepie91 |
provider cockups, block pages, etc. |
|
12:07
🔗
|
c0mpass |
thats what I thought |
|
12:08
🔗
|
joepie91 |
even adding a VPN would basically double the amount of parties that could be messing up the responses :P |
|
12:08
🔗
|
c0mpass |
Hi BartoCH |
|
12:08
🔗
|
BartoCH |
hullo |
|
12:08
🔗
|
c0mpass |
Yeah figured. |
|
12:08
🔗
|
c0mpass |
BartoCH: yes. |
|
12:08
🔗
|
BartoCH |
hrhr |
|
12:08
🔗
|
c0mpass |
Okay well I'll set this up on one server and see how it goes |
|
12:10
🔗
|
joepie91 |
c0mpass: hm, only just realized we're in #archiveteam. if you have further questions, prefer to switch to #archiveteam-bs as this channel is mostly for low-noise announcements and "oh no this site is dying, did you hear" type messages :) |
|
12:10
🔗
|
c0mpass |
Ohhh so sorry |
|
13:00
🔗
|
|
atomotic has quit IRC (Quit: atomotic) |
|
13:24
🔗
|
Valentine |
hi all, can I get the Archive Team's help to save the news site AsiaOne, which might shut down as early as next month? https://sg.news.yahoo.com/sph-news-aggregator-site-asiaone-close-090754309.html |
|
13:30
🔗
|
JAA |
I'll throw it into ArchiveBot. Because there is a huge queue currently, I'm not sure if it will be grabbed in time, but let's try... |
|
13:36
🔗
|
JAA |
Note, that won't grab everything, e.g. no videos (I think). |
|
14:12
🔗
|
Valentine |
that's fine, thanks! |
|
14:13
🔗
|
|
godane has quit IRC (Quit: Leaving.) |
|
14:57
🔗
|
|
godane has joined #archiveteam |
|
15:06
🔗
|
|
klapperst has joined #archiveteam |
|
15:07
🔗
|
klapperst |
hi |
|
15:13
🔗
|
|
Jonison has quit IRC (Read error: Connection reset by peer) |
|
15:20
🔗
|
|
ZexaronS- has quit IRC (Quit: Leaving) |
|
15:32
🔗
|
|
atomotic has joined #archiveteam |
|
15:38
🔗
|
|
schbirid2 has quit IRC (Quit: Leaving) |
|
15:48
🔗
|
|
Xe has quit IRC (Max SendQ exceeded) |
|
15:52
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
|
16:02
🔗
|
|
Xe has joined #archiveteam |
|
16:06
🔗
|
|
icedice has joined #archiveteam |
|
16:07
🔗
|
|
klapperst has quit IRC (Quit: Page closed) |
|
16:34
🔗
|
|
schbirid has joined #archiveteam |
|
16:41
🔗
|
|
atomotic has quit IRC (Quit: atomotic) |
|
16:42
🔗
|
|
ZexaronS has joined #archiveteam |
|
17:24
🔗
|
|
ZexaronS has quit IRC (Quit: Leaving) |
|
17:47
🔗
|
|
Starholme has joined #archiveteam |
|
17:55
🔗
|
|
kepler45 has joined #archiveteam |
|
17:59
🔗
|
|
bRick5772 has joined #archiveteam |
|
18:41
🔗
|
|
icedice has quit IRC (Read error: Connection reset by peer) |
|
18:42
🔗
|
|
icedice has joined #archiveteam |
|
19:07
🔗
|
|
kris33 has joined #archiveteam |
|
19:25
🔗
|
|
atrocity has quit IRC (Read error: Operation timed out) |
|
19:36
🔗
|
|
kris33 has quit IRC (Textual IRC Client: www.textualapp.com) |
|
20:44
🔗
|
atomicthu |
https://scrapinghub.com/platform this looks extremely useful if a bit expensive |
|
20:47
🔗
|
joepie91 |
atomicthu: that looks fantastically proprietary :) |
|
20:47
🔗
|
atomicthu |
yep |
|
20:47
🔗
|
atomicthu |
not a thing you can download, just a serve |
|
20:47
🔗
|
atomicthu |
*service |
|
20:47
🔗
|
atomicthu |
since it's 2017 and Gotta Make Mad VC Cash yo |
|
20:47
🔗
|
joepie91 |
heh |
|
20:47
🔗
|
joepie91 |
"NPM package as a service" |
|
20:48
🔗
|
joepie91 |
more seriously; probably not useful for archiveteam |
|
20:48
🔗
|
joepie91 |
due to its proprietary nature |
|
21:01
🔗
|
|
MMovie has quit IRC (Read error: Operation timed out) |
|
21:09
🔗
|
|
bRick5772 has quit IRC (Quit: Leaving.) |
|
21:29
🔗
|
|
Honno has joined #archiveteam |
|
21:42
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
|
21:47
🔗
|
|
MMovie has joined #archiveteam |
|
21:58
🔗
|
|
MMovie2 has joined #archiveteam |
|
22:02
🔗
|
|
Valentin- has joined #archiveteam |
|
22:02
🔗
|
|
MMovie has quit IRC (Read error: Operation timed out) |
|
22:03
🔗
|
|
Valentine has quit IRC (Ping timeout: 506 seconds) |
|
22:06
🔗
|
|
MMovie has joined #archiveteam |
|
22:10
🔗
|
|
MMovie2 has quit IRC (Read error: Operation timed out) |
|
22:21
🔗
|
|
underscor has quit IRC (Quit: No Ping reply in 180 seconds.) |
|
22:22
🔗
|
|
underscor has joined #archiveteam |
|
22:22
🔗
|
|
swebb sets mode: +o underscor |
|
22:26
🔗
|
atomicthu |
joepie91: i was more looking at the "crawlera" part since it works as a proxy |
|
22:26
🔗
|
atomicthu |
might be useful for sites that limit bandwidth per-IP |
|
23:00
🔗
|
|
Starholme has quit IRC (Quit: Page closed) |
|
23:12
🔗
|
|
dashcloud has joined #archiveteam |
|
23:21
🔗
|
|
kepler45 has quit IRC (Quit: Leaving) |
|
23:27
🔗
|
|
MMovie2 has joined #archiveteam |
|
23:28
🔗
|
|
Gfy has quit IRC (Read error: Operation timed out) |
|
23:28
🔗
|
|
MMovie has quit IRC (Read error: Operation timed out) |
|
23:31
🔗
|
|
BlueMaxim has joined #archiveteam |
|
23:32
🔗
|
|
Gfy has joined #archiveteam |
|
23:39
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
|
23:55
🔗
|
|
PotcFdk has quit IRC (~'o'/) |
|
23:59
🔗
|
|
MMovie has joined #archiveteam |