[00:00] *** Rondom has quit IRC (Remote host closed the connection) [00:01] *** Rondom has joined #archiveteam [00:03] *** Soni has quit IRC (Read error: Operation timed out) [00:15] *** dboard2 is now known as dboard [00:59] *** Laverne has quit IRC (Read error: Operation timed out) [01:24] *** Laverne has joined #archiveteam [01:47] *** username1 has joined #archiveteam [01:50] *** schbirid2 has quit IRC (Read error: Operation timed out) [01:56] *** Stilett0 has quit IRC (Ping timeout: 255 seconds) [02:46] *** Stilett0 has joined #archiveteam [03:10] *** Soni has joined #archiveteam [03:24] *** qw3rty5 has joined #archiveteam [03:28] *** qw3rty4 has quit IRC (Read error: Operation timed out) [03:34] ---------------------------------- [03:34] All Archiveteam Programmy Nerds [03:34] You presence is requested [03:34] In #last20 [03:34] ---------------------------------- [04:03] *** jrwr has quit IRC (Max SendQ exceeded) [04:16] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:23] *** ZexaronS has joined #archiveteam [04:23] *** Sk1d has joined #archiveteam [04:43] *** Stilett0 has quit IRC (Read error: Connection reset by peer) [04:44] *** Stilett0 has joined #archiveteam [04:45] *** Stilett0 is now known as Stiletto [05:08] *** MMovie2 has quit IRC (Ping timeout: 600 seconds) [05:12] *** MMovie has joined #archiveteam [05:25] *** ZexaronS has quit IRC (Quit: Leaving) [05:36] *** ZexaronS has joined #archiveteam [06:13] SketchCow splendid appearance in http://blog.archive.org/2017/10/13/the-20th-century-time-machine/ :) [06:15] *** jrwr has joined #archiveteam [06:17] wow https://archive.org/details/last20 [06:17] *** Pixi` has quit IRC (Quit: Pixi`) [06:17] *** Pixi has joined #archiveteam [06:37] *** K4k has quit IRC (Ping timeout: 255 seconds) [06:38] *** K4k has joined #archiveteam [07:01] *** ZexaronS- has joined #archiveteam [07:02] *** ZexaronS has quit IRC (Ping timeout: 260 seconds) [07:15] *** Guest has joined #archiveteam [07:19] *** Guest has quit IRC (Connection closed) [07:22] *** Soni has quit IRC (Ping timeout: 272 seconds) [07:39] *** Soni has joined #archiveteam [07:42] *** atomotic has joined #archiveteam [07:43] *** hive-mind has quit IRC (Remote host closed the connection) [07:50] *** hive-mind has joined #archiveteam [07:50] *** Honno has joined #archiveteam [09:00] *** Jonison has joined #archiveteam [09:00] *** pizzaiolo has quit IRC (Quit: pizzaiolo) [09:32] *** Honno has quit IRC (Read error: Operation timed out) [09:37] *** atomotic has quit IRC (Quit: atomotic) [09:40] *** atomotic has joined #archiveteam [10:01] *** Mateon1 has quit IRC (Read error: Operation timed out) [10:02] *** Mateon1 has joined #archiveteam [10:29] *** BlueMaxim has quit IRC (Quit: Leaving) [10:47] *** schbirid2 has joined #archiveteam [10:50] *** username1 has quit IRC (Read error: Operation timed out) [11:05] *** icedice has joined #archiveteam [11:06] *** Valentine has joined #archiveteam [11:14] *** atomotic has quit IRC (Quit: atomotic) [11:51] *** atomotic has joined #archiveteam [11:58] *** c0mpass has joined #archiveteam [11:59] I have a question, If I run the warrior on my dedicated, nothing illegal is going though it right? [12:01] Define: "illegal" [12:02] Uhhh [12:02] You know what I mean [12:02] Strictly speaking, almost everything we archive is protected by copyright, and in some jurisdictions, laws regarding unauthorised access to computer systems might apply. [12:02] aside from that, many archived sites are user content sites [12:02] Reason why I ask is I have about 40Gbps of available servers [12:02] that usually contain technically-illegal content *somewhere* [12:03] so the more useful question, I think, is "will I get in trouble for running the warrior" [12:03] Yeah [12:03] Thats basically it [12:03] to which my answer would be "usually not, but if you're doing 40gps, that might change" [12:03] mostly because at 40gbps the site owners start going "wtf?" :) [12:03] Lmao [12:03] c0mpass: depending on the amount of storage space, you may be better off running an rsync target [12:03] ie. a collection server where warriors send their archived data to [12:04] before it ends up in the Internet Archive [12:04] All low storage, NVME servers. [12:04] ah, crap [12:04] Work gives me 4 10Gbps servers free as a perk [12:04] It's also worth mentioning that the warrior projects are usually rate limited, so you wouldn't actually fire 40 Gbit/s at the targets. [12:04] I just dont want to get fired becuase of downloading illegal stuff on them [12:04] c0mpass: so I'd say that it's probably safe to run the warrior (I don't think anybody's ever gotten in trouble for it? automated IP bans at worst), but I wouldn't try to do so at 40gbps [12:05] basically, make it not come across as an attack [12:05] I mean I could throttle it to gigabit [12:05] even 500 meg [12:05] yeah, you'd probably want to throttle to way less [12:05] 500mbps is probably the upper bound of what you can get away with before site owners start asking questions [12:05] (ballpark guess, mind) [12:05] also depends whether it's all from the same IP range, etc. [12:05] I mean if I just do the yahoo answers thing then I should have no issues at 40 [12:06] Yahoo throttles heavily. [12:06] IP's are all in the same block [12:06] right. then you'd want to maintain one ratelimit for all of them [12:06] Second question. [12:06] probably safe to run with tens of threads for most projects, especially partaking in the higher-bandwidth ones like video sites [12:06] If I were to do this though a VPN [12:06] just not the heavily throttled projects :) [12:07] c0mpass: it's generally discouraged to run warriors on anything other than a direct uncensored pipe to the internet, because there are too many factors inbetween that could corrupt the data [12:07] provider cockups, block pages, etc. [12:07] thats what I thought [12:08] even adding a VPN would basically double the amount of parties that could be messing up the responses :P [12:08] Hi BartoCH [12:08] hullo [12:08] Yeah figured. [12:08] BartoCH: yes. [12:08] hrhr [12:08] Okay well I'll set this up on one server and see how it goes [12:10] c0mpass: hm, only just realized we're in #archiveteam. if you have further questions, prefer to switch to #archiveteam-bs as this channel is mostly for low-noise announcements and "oh no this site is dying, did you hear" type messages :) [12:10] Ohhh so sorry [13:00] *** atomotic has quit IRC (Quit: atomotic) [13:24] hi all, can I get the Archive Team's help to save the news site AsiaOne, which might shut down as early as next month? https://sg.news.yahoo.com/sph-news-aggregator-site-asiaone-close-090754309.html [13:30] I'll throw it into ArchiveBot. Because there is a huge queue currently, I'm not sure if it will be grabbed in time, but let's try... [13:36] Note, that won't grab everything, e.g. no videos (I think). [14:12] that's fine, thanks! [14:13] *** godane has quit IRC (Quit: Leaving.) [14:57] *** godane has joined #archiveteam [15:06] *** klapperst has joined #archiveteam [15:07] hi [15:13] *** Jonison has quit IRC (Read error: Connection reset by peer) [15:20] *** ZexaronS- has quit IRC (Quit: Leaving) [15:32] *** atomotic has joined #archiveteam [15:38] *** schbirid2 has quit IRC (Quit: Leaving) [15:48] *** Xe has quit IRC (Max SendQ exceeded) [15:52] *** icedice has quit IRC (Quit: Leaving) [16:02] *** Xe has joined #archiveteam [16:06] *** icedice has joined #archiveteam [16:07] *** klapperst has quit IRC (Quit: Page closed) [16:34] *** schbirid has joined #archiveteam [16:41] *** atomotic has quit IRC (Quit: atomotic) [16:42] *** ZexaronS has joined #archiveteam [17:24] *** ZexaronS has quit IRC (Quit: Leaving) [17:47] *** Starholme has joined #archiveteam [17:55] *** kepler45 has joined #archiveteam [17:59] *** bRick5772 has joined #archiveteam [18:41] *** icedice has quit IRC (Read error: Connection reset by peer) [18:42] *** icedice has joined #archiveteam [19:07] *** kris33 has joined #archiveteam [19:25] *** atrocity has quit IRC (Read error: Operation timed out) [19:36] *** kris33 has quit IRC (Textual IRC Client: www.textualapp.com) [20:44] https://scrapinghub.com/platform this looks extremely useful if a bit expensive [20:47] atomicthu: that looks fantastically proprietary :) [20:47] yep [20:47] not a thing you can download, just a serve [20:47] *service [20:47] since it's 2017 and Gotta Make Mad VC Cash yo [20:47] heh [20:47] "NPM package as a service" [20:48] more seriously; probably not useful for archiveteam [20:48] due to its proprietary nature [21:01] *** MMovie has quit IRC (Read error: Operation timed out) [21:09] *** bRick5772 has quit IRC (Quit: Leaving.) [21:29] *** Honno has joined #archiveteam [21:42] *** schbirid has quit IRC (Quit: Leaving) [21:47] *** MMovie has joined #archiveteam [21:58] *** MMovie2 has joined #archiveteam [22:02] *** Valentin- has joined #archiveteam [22:02] *** MMovie has quit IRC (Read error: Operation timed out) [22:03] *** Valentine has quit IRC (Ping timeout: 506 seconds) [22:06] *** MMovie has joined #archiveteam [22:10] *** MMovie2 has quit IRC (Read error: Operation timed out) [22:21] *** underscor has quit IRC (Quit: No Ping reply in 180 seconds.) [22:22] *** underscor has joined #archiveteam [22:22] *** swebb sets mode: +o underscor [22:26] joepie91: i was more looking at the "crawlera" part since it works as a proxy [22:26] might be useful for sites that limit bandwidth per-IP [23:00] *** Starholme has quit IRC (Quit: Page closed) [23:12] *** dashcloud has joined #archiveteam [23:21] *** kepler45 has quit IRC (Quit: Leaving) [23:27] *** MMovie2 has joined #archiveteam [23:28] *** Gfy has quit IRC (Read error: Operation timed out) [23:28] *** MMovie has quit IRC (Read error: Operation timed out) [23:31] *** BlueMaxim has joined #archiveteam [23:32] *** Gfy has joined #archiveteam [23:39] *** Honno has quit IRC (Read error: Operation timed out) [23:55] *** PotcFdk has quit IRC (~'o'/) [23:59] *** MMovie has joined #archiveteam