[00:02] *** BlueMaxim has joined #archiveteam [00:54] *** SimpBrain has quit IRC (Read error: Operation timed out) [01:01] *** xk_id has joined #archiveteam [01:02] *** godane has quit IRC (Read error: Operation timed out) [01:23] *** SimpBrain has joined #archiveteam [01:31] *** Ghost_of_ has quit IRC (Quit: Leaving) [01:38] *** zenguy_pc has quit IRC (Read error: Operation timed out) [01:53] *** zenguy_pc has joined #archiveteam [02:20] *** bwn has quit IRC (Read error: Operation timed out) [02:34] *** JesseW has joined #archiveteam [02:35] *** Ravenloft has quit IRC (Remote host closed the connection) [02:47] *** primus104 has quit IRC (Leaving.) [03:13] *** JesseW has quit IRC (Leaving.) [03:41] *** zenguy_pc has quit IRC (Read error: Operation timed out) [03:52] *** wyatt8750 has quit IRC (Remote host closed the connection) [03:52] *** zenguy_pc has joined #archiveteam [03:53] *** wyatt8750 has joined #archiveteam [04:00] *** bwn has joined #archiveteam [04:10] *** dashcloud has quit IRC (Ping timeout: 252 seconds) [04:14] *** dashcloud has joined #archiveteam [04:38] *** aaaaaaaaa has quit IRC (Leaving) [04:43] *** JesseW has joined #archiveteam [05:42] *** zenguy_pc has quit IRC (Read error: Operation timed out) [05:53] *** zenguy_pc has joined #archiveteam [06:02] *** WinterFox has joined #archiveteam [06:08] *** dashcloud has quit IRC (Read error: Operation timed out) [06:11] *** dashcloud has joined #archiveteam [06:42] *** cvb has joined #archiveteam [07:04] *** godane has joined #archiveteam [07:12] *** insane_al has quit IRC (Remote host closed the connection) [07:40] *** zenguy_pc has quit IRC (Read error: Operation timed out) [07:46] *** primus104 has joined #archiveteam [07:54] *** zenguy_pc has joined #archiveteam [07:56] *** zer0c00l has joined #archiveteam [08:47] *** Cameron_D has joined #archiveteam [08:52] *** JesseW has quit IRC (Leaving.) [08:54] *** bzc6p_ has joined #archiveteam [08:57] *** bzc6p has quit IRC (Read error: Operation timed out) [08:57] *** xk_id has quit IRC (Remote host closed the connection) [09:05] *** bwn has quit IRC (Read error: Operation timed out) [09:30] *** atomotic has joined #archiveteam [09:31] *** xk_id has joined #archiveteam [09:32] *** Ghost_of_ has joined #archiveteam [09:36] *** schbirid has joined #archiveteam [09:36] New items added to the wiki grab! [09:37] I'll do my best to get the yuku project running with new items today [09:39] *** primus104 has quit IRC (Leaving.) [09:39] *** zenguy_pc has quit IRC (Read error: Operation timed out) [09:52] *** bwn has joined #archiveteam [10:00] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [10:47] *** ohhdemgir has quit IRC (Ping timeout: 252 seconds) [11:25] spinning up some wiki pipelines, been a while since i've thrown some power around [11:46] *** primus104 has joined #archiveteam [11:49] *** Ungstein has quit IRC (Quit: Leaving.) [11:50] *** Ungstein has joined #archiveteam [12:00] I redirected my juice to wikis-grab too [12:06] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [12:06] *** atomotic has joined #archiveteam [12:28] *** WinterFox has quit IRC (Remote host closed the connection) [12:48] *** Ghost_of_ has quit IRC (Quit: Leaving) [13:56] *** Darkstar has quit IRC (Ping timeout: 506 seconds) [14:05] *** Darkstar has joined #archiveteam [14:32] *** scyther has joined #archiveteam [14:35] The new IA search engine is working better than expected, and collections (when I move them) show up within a couple minutes now. [14:40] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [14:45] *** Deewiant has quit IRC (Ping timeout: 198 seconds) [14:46] *** Atluxity has quit IRC (Ping timeout: 370 seconds) [14:47] *** Deewiant has joined #archiveteam [14:47] *** Atluxity has joined #archiveteam [14:49] nice [14:50] *** nertzy has joined #archiveteam [14:54] cool [15:08] woot [15:08] Yeah, it's blazing. I'm throwing a lot of shit at it. [15:15] *** Start has quit IRC (Quit: Disconnected.) [15:15] I've got a wiki that I'm grabbing external items for that's got 540K pages. This should be a fun grab. [15:16] My entire tmux backlog for that page is "Found and queued 500 URLs, continuing..." [15:16] wow [15:18] *** primus104 has quit IRC (Leaving.) [15:25] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [15:29] phuzion: which wiki is that? [15:29] or which item? [15:31] *** xk_id has quit IRC (Remote host closed the connection) [15:34] arkiver: http://lt.biologija.wikia.com/wiki/ [15:34] 583K pages, rather [15:34] so many external links.... http://lt.biologija.wikia.com/wiki/Ragys [15:35] Hooooooooly shit ok maybe I won't be able to handle that item. [15:36] if you have enough time and enough space, why not? [15:36] 100GB disk on that box. [15:36] I could easily see the WARC being 100GB+ for that. [15:36] We'll see I guess [15:37] Yeah, I'll keep an eye on my disk usage for a day or two and see how it goes. [15:37] ok [15:42] *** vitzli has joined #archiveteam [15:54] *** Start has joined #archiveteam [16:05] *** Start has quit IRC (Read error: Operation timed out) [16:11] https://archive.org/details/pris-the-world [16:12] godane: nice [16:29] *** Start has joined #archiveteam [16:31] *** scyther has quit IRC (Leaving) [16:38] *** JesseW has joined #archiveteam [16:40] *** bithippo has joined #archiveteam [16:41] Hello #archiveteam! Would someone be able to drop http://www.mfat.govt.nz/Treaties-and-International-Law/01-Treaties-for-which-NZ-is-Depositary/0-Trans-Pacific-Partnership-Text.php into archivebot when they have a moment? Thank you!! [16:42] bithippo: done [16:43] *** arkiver2 has joined #archiveteam [16:43] @DFJustin: :bow: [16:44] Thanks again, ArchiveBot dashboard shows it being done super quick. [16:45] *** arkiver2 has quit IRC (Client Quit) [16:57] *** JesseW has quit IRC (Leaving.) [17:00] *** Start has quit IRC (Quit: Disconnected.) [17:02] *** bithippo has quit IRC (Quit: Page closed) [17:14] do we know who "chip" in the wikis-grab is? [17:14] howdy [17:14] right [17:14] you just added a lot of instances to the grab? [17:15] 20x10 [17:15] 10 concurrent that is [17:18] *** Start has joined #archiveteam [17:19] so I am wondering why the items per hours is less now [17:20] looked like there was a little bit of an rsync bottle neck for a bit [17:20] ah, ok [17:29] *** primus104 has joined #archiveteam [17:29] *** vitzli has quit IRC (Quit: Leaving) [17:35] yeah sorry, that was caused by me :( [17:44] Kazzy: how? [17:44] just trying to understand stuff here [17:45] rsync host is capped at 25 connections (FOS i think, logs flying too fast for me to check at this minute) [17:45] sounds right [17:45] i'm running multiple instances, accidentally had 4 upload slots running for each, ended up eating a ton of the connections [17:45] aha [17:49] fos has 4.9 TB free on the gamefront mount, so we might be able to ramp it up again [17:49] SketchCow's call [17:49] its load average is also back to "oh that looks ok" [17:50] *** afics has quit IRC (Read error: Operation timed out) [17:50] *** Apathy has quit IRC (Read error: Operation timed out) [17:51] *** cadbury has quit IRC (Read error: Operation timed out) [17:54] *** matthusby has quit IRC (Ping timeout: 606 seconds) [17:54] *** matthusby has joined #archiveteam [17:56] if you need a log file to look at http://54.85.211.51/, and search "rsync error" [17:56] 25 indeed, thanks achip [17:58] *** Cameron_D has quit IRC (Ping timeout: 606 seconds) [18:09] *** Apathy has joined #archiveteam [18:09] *** afics has joined #archiveteam [18:10] So, FOS is really getting hammered. [18:10] Yes, we have 5tb free back on 1 [18:10] We got that by me basically shoving all the gamefront into 0 [18:11] So the thing is really grinding through wikis, two gamefront threads, and a couple ZIP operations for the minecraft drive I took in. [18:11] *** cadbury has joined #archiveteam [18:23] *** zenguy_pc has joined #archiveteam [18:25] *** bwn has quit IRC (Ping timeout: 255 seconds) [18:27] SketchCow: Do you feel its best to wait with gamefront, or engage? [18:33] Does this error mean "FOS has too many open connections" or something else? [18:33] http://pastebin.com/3Fwpnbze [18:34] I see it timed out, but is FOS refusing to accept the connection because it's overloaded or is there something else going on with my machine? [18:38] *** Start has quit IRC (Quit: Disconnected.) [18:56] Well, I'm mostly trying to bring the thing down to a more reasonable load. [18:57] Is there something OTHER than gamefront in need of the machine? [18:57] Otherwise, yeah, turn it on and let's see if it chokes [18:58] Can someone help me pull http://martyy92.hyves.nl out of our WARCs? [18:58] SketchCow: everything that is curently going to FOS has no deadline [18:58] OK, then turn on gamefront [18:59] Ok, can we also higher the max rsync connections? [18:59] Just quadrupled it [19:00] Some projects with a deadline are coming up, like screenr [19:00] Will pause GameFront for that if needed [19:00] *** bwn has joined #archiveteam [19:00] So, for the FTP grab project. [19:01] If anyone here has some FTPs that need to be saved, please create a list [19:01] i went after rain wilson twitter account using archivebox [19:02] also grabbed a google cachc of suckington account [19:05] *** aaaaaaaaa has joined #archiveteam [19:05] I think we can use filemare and other FTP indexers or lists on the internet to find FTPs for the grab [19:06] Maybe give a higher priority to edu and government FTPs? [19:07] yes, who knows how many professors are serving FTPs for their little projects [19:07] not to mention how much in software started as university research projects [19:08] I have a small zmap scan of the internet on port 21 from some time ago, let me see if I can track that down [19:08] Basically, I ran zmap on a DO instance until I got banned from DO lol [19:08] i do too, i got about 60% of ipv4 [19:09] and then i got a phonecall from upstream [19:09] "Hey, you're generating a shitload of abuse reports" right? [19:10] yeah [19:10] He just wanted to see how many blacklists he could appear on [19:15] i'm starting to upload my www.700wlw.com/media/play urls [19:16] turns out wayback machine only have 15 urls in that path [19:18] I'll start move my guns over to gamefront [19:29] aaaaaaaaa: yes, and the very large amount of scientific data [19:29] vk.nl [19:29] oops, sorry [19:44] *** MMovie has quit IRC (Ping timeout: 310 seconds) [19:46] more items added to the wikis project! [19:49] *** SilSte has quit IRC (Read error: Connection reset by peer) [19:49] *** Silvan has joined #archiveteam [19:51] *** MMovie has joined #archiveteam [20:07] arkiver: nice, thanks. [20:08] Damn, achip, how many threads are you running right now on wikis? [20:09] Oh [20:09] I should read the backlog. 200 threads. [20:17] http://piratepad.net/effteepees is the ftp list we were working from in #effteepee [20:26] DFJustin: random stuff from my bookmarks ftp://ftp.cmdl.noaa.gov/ ftp://ftp.fbo.gov/ ftp://de.aminet.net/ ftp://ftp.uni-erlangen.de/ ftp://lvlmirror.mhgaming.com/ ftp://www.artfiles.org/ [20:26] some huge, some small, some fast, some slow, some mirrors [20:26] *** zenguy_pc has quit IRC (Read error: Operation timed out) [20:33] *** bzc6p__ has joined #archiveteam [20:39] *** zenguy_pc has joined #archiveteam [20:39] *** bzc6p_ has quit IRC (Read error: Operation timed out) [20:48] *** Start has joined #archiveteam [20:54] archivebot on http://data.deutschebahn.com/ would be nice, ~30MB total. thanks! [20:58] it has been done [21:00] thanks [21:02] *** scyther has joined #archiveteam [21:03] I've now spent 5 years cleaning up the godane inbox [21:03] A force of nature [21:17] Is he over 500k items yet? I know he was getting near [21:19] 494k as of yesterday, iirc [21:22] *** Nemo_bis has joined #archiveteam [21:22] *** jspiros has quit IRC (Ping timeout: 186 seconds) [21:22] *** Nemo_bis_ has quit IRC (Ping timeout: 186 seconds) [21:23] *** jspiros has joined #archiveteam [21:24] 495k now [21:26] clue in a newbie, what's that about? [21:26] godane uploads a lot of things to archive.org [21:26] *** bzc6p_ has joined #archiveteam [21:27] oh so this is a file inbox, not email.. [21:27] makes sense now, got it :) [21:28] A LOT of things [21:28] I'm not sure when he eats and sleeps [21:28] i still have to do more crazy stuff later [21:29] i'm about to eat right now [21:29] myself: https://archive.org/details/@chris85 [21:30] like a true archivist, his keyboard has a repository of food crumbs going back several years, each tagged with metadata [21:30] SketchCow: what would be a good size per items for the ftp grab? [21:31] *** bzc6p__ has quit IRC (Read error: Operation timed out) [21:40] *** Start has quit IRC (Quit: Disconnected.) [21:41] *** jleclanch has quit IRC (Remote host closed the connection) [21:42] godane: have a good appetite! [21:42] as we say in germany at least [21:44] *** schbirid has quit IRC (Quit: Leaving) [21:44] *** Start has joined #archiveteam [21:45] *** jleclanch has joined #archiveteam [21:48] arkiver: can you do me a favor real quick and release that wiki we were talking about earlier and re-assign it to the nick phuzion-1 for me? [21:49] sure [21:49] get too big? [21:49] No, I wanna be able to monitor it in its own window. [21:50] and I think the thread it was running under died anyways [21:50] may also want to run "df -h" in a loop too. [21:50] mediawikieu:lt.biologija.wikia.com/api.php:lt.biologija.wikia.com/wiki/ added to user phuzion-1 [21:50] arkiver: thanks :) [21:50] ;) [22:05] *** jleclanch has quit IRC (Remote host closed the connection) [22:10] *** jleclanch has joined #archiveteam [22:12] *** bwn has quit IRC (Read error: Operation timed out) [22:15] *** bzc6p_ is now known as bzc6p [22:16] *** Start has quit IRC (Quit: Disconnected.) [22:20] SketchCow: I'm thinking items of 200 MB for the FTP grab [22:24] *** zenguy_pc has quit IRC (Read error: Operation timed out) [22:28] *** bwn has joined #archiveteam [22:30] *** scyther has quit IRC (Read error: Connection reset by peer) [22:30] I have a file with 23 million IP addresses that responded on port 21, is that useful to us? [22:30] lol [22:30] Granted, the data is from August of 2014 [22:31] But maybe we can start working with that? [22:35] *** BlueMaxim has joined #archiveteam [22:39] *** zenguy_pc has joined #archiveteam [22:44] *** rxhivert has joined #archiveteam [22:44] *** rxhivert has quit IRC (Connection closed) [22:45] *** rxhivert has joined #archiveteam [22:45] can somebody clear the queue for rxhivert, rxhivert2 and rxhivert5? [22:45] had to reboot server [22:46] (game front grab) [22:52] *** jleclanch has quit IRC (Remote host closed the connection) [22:53] *** jleclanch has joined #archiveteam [22:55] In case anyone wants 23 million IPs that responded on port 21 about 14 months ago, check this file out: http://irc.teh-server.com/files/ftpsites.gz [23:18] *** Ravenloft has joined #archiveteam [23:30] I want that. [23:31] *** rxhivert has quit IRC (Quit: rxhivert) [23:32] lol [23:36] SketchCow: Take it and enjoy :) [23:56] *** Start has joined #archiveteam