[00:01] *** xk_id has quit IRC (Remote host closed the connection) [00:01] *** xk_id has joined #archiveteam [00:05] *** dashcloud has quit IRC (Read error: Connection reset by peer) [00:07] *** dashcloud has joined #archiveteam [00:22] *** xk_id has quit IRC (Remote host closed the connection) [00:23] *** xk_id has joined #archiveteam [00:29] *** bwn has joined #archiveteam [00:35] *** maseck has quit IRC (Read error: Operation timed out) [00:35] *** GLaDOS has quit IRC (Ping timeout: 252 seconds) [00:36] *** GLaDOS has joined #archiveteam [00:36] *** zenguy_pc has joined #archiveteam [00:36] *** maseck has joined #archiveteam [00:44] *** bwn_ has joined #archiveteam [00:50] *** Boltsie__ has joined #archiveteam [00:52] *** cvb has quit IRC (Read error: Operation timed out) [00:54] *** bwn has quit IRC (Read error: Operation timed out) [00:58] *** godane has joined #archiveteam [01:01] *** Pythia has joined #archiveteam [01:03] *** xk_id has quit IRC (Remote host closed the connection) [01:28] *** bwn_ has quit IRC (Read error: Operation timed out) [01:36] *** philpem has quit IRC (Ping timeout: 252 seconds) [02:04] *** xk_id has joined #archiveteam [02:07] *** primus104 has quit IRC (Leaving.) [02:15] *** xk_id has quit IRC (Read error: Operation timed out) [02:38] *** maseck has quit IRC (Read error: Operation timed out) [02:39] *** schbirid2 has joined #archiveteam [02:41] *** xk_id has joined #archiveteam [02:42] *** maseck has joined #archiveteam [02:43] *** zenguy_pc has quit IRC (Read error: Operation timed out) [02:43] *** schbirid has quit IRC (Ping timeout: 310 seconds) [02:56] *** xk_id has quit IRC (Read error: Operation timed out) [03:18] *** xk_id has joined #archiveteam [03:31] *** xk_id has quit IRC (Read error: Operation timed out) [03:53] *** nekomune has quit IRC (Quit: Something bad happened.) [03:54] Back [03:55] Turns out it makes more sense for me to be home. [03:55] *** nekomune has joined #archiveteam [03:59] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [04:04] *** chfoo has joined #archiveteam [04:05] *** chfoo has quit IRC (Remote host closed the connection) [04:06] *** chfoo has joined #archiveteam [04:14] *** GLaDOS has quit IRC (Read error: Operation timed out) [04:14] *** GLaDOS has joined #archiveteam [04:19] *** GLaDOS has quit IRC (Ping timeout: 252 seconds) [04:19] *** GLaDOS has joined #archiveteam [04:57] *** aaaaaaaaa has quit IRC (Leaving) [05:03] *** Sk1d has quit IRC (Read error: Operation timed out) [05:03] *** bwn_ has joined #archiveteam [05:19] *** redlob has joined #archiveteam [05:22] *** redlob_ has quit IRC (Read error: Operation timed out) [05:26] *** xk_id has joined #archiveteam [05:37] *** xk_id has quit IRC (Ping timeout: 615 seconds) [05:47] *** vOYtEC has quit IRC (Read error: Connection reset by peer) [05:51] *** vOYtEC has joined #archiveteam [06:00] *** WinterFox has joined #archiveteam [06:02] https://leclan.ch/public/rsi-image-sources.txt <- near-exhaustive dump of all images on robertsspaceindustries.com in their original size. they're all on a very speedy cdn, too, but be gentle with it (i grabbed them all in 15 mins from a chicago server). 6.3gb total. [06:02] there's a lot of crap but also an incredible amount of extremely high res concept art [06:06] scraping scripts i used to generate that list: https://github.com/jleclanche/scrape-scripts/tree/master/rsi [06:38] *** melody has quit IRC (Ping timeout: 252 seconds) [06:40] *** melody has joined #archiveteam [06:41] *** remsen has joined #archiveteam [07:03] *** melody has quit IRC (Ping timeout: 258 seconds) [07:08] I paused docstoc [07:09] the websites is suddenly having problems [07:17] *** remsen has quit IRC (Leaving) [07:18] *** Sk1d has joined #archiveteam [07:25] aha [07:29] *** godane has quit IRC (Quit: Leaving.) [07:46] *** godane has joined #archiveteam [08:06] *** atomotic has joined #archiveteam [08:17] !con rgn3zoeo0g9n0r9byr3wmb2n 5 [08:17] ugh [08:18] *** remsen has joined #archiveteam [08:26] *** xk_id has joined #archiveteam [08:34] *** primus104 has joined #archiveteam [08:38] *** cvb has joined #archiveteam [08:39] *** arkiver2 has joined #archiveteam [08:47] *** Ungstein has joined #archiveteam [08:55] *** bwn_ has quit IRC (Read error: Operation timed out) [08:57] *** lbft_ has quit IRC (Bye) [08:58] *** lbft has joined #archiveteam [09:10] *** lbft has quit IRC (Quit: Bye) [09:11] *** lbft has joined #archiveteam [09:12] *** Smiley has quit IRC (Read error: Operation timed out) [09:21] *** lbft has quit IRC (Quit: Bye) [09:23] *** lbft has joined #archiveteam [09:31] *** cvb has quit IRC (Quit: Leaving) [09:33] *** Jordan has quit IRC (WeeChat 1.0.1) [09:55] *** Ungstein has quit IRC (Quit: Leaving.) [09:59] *** Ungstein has joined #archiveteam [09:59] *** bwn_ has joined #archiveteam [10:04] *** xk_id_ has joined #archiveteam [10:06] *** xk_id has quit IRC (Read error: Operation timed out) [10:07] *** xk_id_ has quit IRC (Ping timeout: 183 seconds) [10:12] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [10:19] *** godane has quit IRC (Leaving.) [10:20] *** godane has joined #archiveteam [10:34] *** xk_id has joined #archiveteam [10:34] *** mr-b has quit IRC (Read error: Operation timed out) [10:43] *** mr-b has joined #archiveteam [10:50] *** primus104 has quit IRC (Leaving.) [10:52] *** xk_id_ has joined #archiveteam [10:52] *** xk_id_ has quit IRC (Remote host closed the connection) [10:53] *** xk_id_ has joined #archiveteam [10:53] *** xk_id_ has quit IRC (Connection closed) [10:53] *** xk_id_ has joined #archiveteam [10:54] *** vitzli has joined #archiveteam [10:59] *** xk_id_ has quit IRC (Remote host closed the connection) [11:00] *** xk_id_ has joined #archiveteam [11:00] *** xk_id has quit IRC (Read error: Operation timed out) [11:16] *** HCross2 has joined #archiveteam [11:17] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [11:30] *** HCross2 has quit IRC () [11:32] *** Smiley has joined #archiveteam [12:02] *** arkiver2 has quit IRC (Ping timeout: 252 seconds) [12:04] *** Ungstein has quit IRC (Quit: Leaving.) [12:05] *** w0rp has quit IRC (Read error: Operation timed out) [12:10] *** arkiver2 has joined #archiveteam [12:15] *** Ungstein has joined #archiveteam [12:19] *** maseck_ has joined #archiveteam [12:19] *** maseck has quit IRC (Read error: Operation timed out) [12:32] *** WinterFox has quit IRC (Remote host closed the connection) [12:39] *** Ungstein1 has joined #archiveteam [12:41] *** Ungstein has quit IRC (Ping timeout: 252 seconds) [12:42] *** vitzli has quit IRC (Quit: Leaving) [12:47] *** luckcolor has joined #archiveteam [12:48] hey guys [12:48] arkiver: my warrior hitted again an infinite loop [12:48] arkiver: it's still going after 16 hours [12:49] arkiver: and one it's running in 7 hours [12:54] here's the webèpage dump if you need it https://www.dropbox.com/s/qj8atw3xhskj96x/ArchiveTeam%20Warrior%20log%202.7z?dl=0 [13:02] also a lot of items are failing [13:02] wget exit code -6 [13:03] same here [13:03] loads of 503 [13:08] yeah probably they are trhottling requests [13:12] *** JSharp___ has quit IRC (Remote host closed the connection) [13:12] *** Boltsie__ has quit IRC (Remote host closed the connection) [13:12] *** zyphlar__ has quit IRC (Read error: Connection reset by peer) [13:19] *** JSharp___ has joined #archiveteam [13:31] *** arkiver2 has quit IRC (Ping timeout: 252 seconds) [13:39] *** zyphlar__ has joined #archiveteam [13:39] got again an looping url [13:40] this time is reapeating to infinite /image/image/image [13:43] *** melody has joined #archiveteam [13:49] *** xk_id_ has quit IRC (Remote host closed the connection) [14:20] *** primus104 has joined #archiveteam [14:24] *** xk_id has joined #archiveteam [14:28] *** philpem has joined #archiveteam [14:32] *** Atom-- has joined #archiveteam [14:37] *** Atom__ has quit IRC (Read error: Operation timed out) [14:54] *** VADemon has joined #archiveteam [15:03] *** Boltsie__ has joined #archiveteam [15:08] *** nystrom has quit IRC (Leaving) [15:28] *** nertzy has joined #archiveteam [15:38] *** primus104 has quit IRC (Leaving.) [15:44] *** atomotic has joined #archiveteam [16:00] yeah, I think they set a limit on requests [16:00] Oh, it's all ending in .jpg, .gif or .png [16:01] luckcolor: thanks [16:02] No problemo :P [16:04] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [16:18] Loop is fixed. [16:18] Please update the docstoc scripts [16:20] *** xk_id has quit IRC (Remote host closed the connection) [16:21] 300000 new items added to docstoc [16:22] *** xk_id has joined #archiveteam [16:27] *** xk_id has quit IRC (Remote host closed the connection) [16:32] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [17:05] *** superkuh has quit IRC (Read error: Connection reset by peer) [17:05] *** kris33 has joined #archiveteam [17:18] *** will has quit IRC (Quit: Goodbye) [17:21] *** will has joined #archiveteam [17:43] *** xk_id has joined #archiveteam [17:47] *** remsen2 has joined #archiveteam [17:47] *** remsen2 has quit IRC (Client Quit) [17:49] *** xk_id_ has joined #archiveteam [17:49] *** xk_id has quit IRC (Read error: Connection reset by peer) [17:52] *** remsen has quit IRC (Read error: Operation timed out) [17:53] *** schbirid2 has quit IRC (Remote host closed the connection) [17:59] *** primus104 has joined #archiveteam [18:13] *** nightpool has joined #archiveteam [18:15] *** HarryCros is now known as HcROSS [18:15] *** HcROSS is now known as HCross [18:18] *** nightpool has quit IRC (Ping timeout: 310 seconds) [18:22] *** kris33 has quit IRC (Textual IRC Client: www.textualapp.com) [18:26] great news arkiver [18:27] it was causing my VMs to crash [18:27] (at least I blame that) [18:29] ALL HAIL BUY NOTHING DAY [18:30] what a great day for catsitting [18:34] *** nightpool has joined #archiveteam [18:39] are we throtling docstoc items still? [18:39] *** w0rp has joined #archiveteam [18:44] *** w0rp has quit IRC (Read error: Connection reset by peer) [18:44] *** xk_id_ has quit IRC (Remote host closed the connection) [18:46] *** w0rp has joined #archiveteam [18:48] *** nightpool has quit IRC (Ping timeout: 258 seconds) [19:01] Atluxity: there's room for more concurrent if that's what you mean [19:01] it also looks like docstoc is limiting our grab... [19:04] threw 80 concurrent on that i think [19:06] ok [19:06] yeah, we can definitely use more concurrent [19:06] only thing letting me down is the 2gb ram :( [19:09] How do I update it while it's going? [19:09] (one of my jobs with the loops has gotten over 190,000 URLs, lol) [19:11] arkiver: starting up my hose [19:16] kyan [19:16] just stop adn reopen the program [19:17] there's no way to resume that i know [19:17] Oh, ok, thansk! [19:17] Won't it take a while to stop, though, since it's got those big ones there? [19:18] no [19:18] you have to foce stop it [19:18] just ctrl+c if you are on a console [19:19] then wait for the normal items to finish then ctrl+c again [19:19] if yu are using the waarior then [19:19] stop [19:19] wait for the right items to get complted [19:19] then stop [19:19] Oh, ok, thanks! [19:21] I don't think that worked "Project code is out of date and needs to be upgraded. To remedy this problem immediately, you may reboot your warrior. Retrying after 250 seconds..." [19:21] I'm using scripts, not warrior [19:22] *** remsen has joined #archiveteam [19:24] arkiver: just poke me if you want more concurrents. started with 490 now and we'll let that hum for a little [19:29] kyan: Stop it again and do a `git pull` [19:30] matthusb|, cool, thanks :) [19:30] *** matthusb| is now known as matthusb_ [19:30] Yay, it's working! [19:30] Thanks! [19:33] https://archive.org/details/archiveteam_docstoc [19:35] *** xk_id has joined #archiveteam [19:40] https://archive.org/details/archivebot is still going at it, and man, it's going to be a while [19:40] I have no idea why some of those skipped. [19:40] But it's working, they're going on, that machine is living life, who's to bitch [19:41] but bitching is so easy ... [19:42] SketchCow: nice screenshots! [19:47] Yeah, there is a secondary set of code I can run to go "get rid of the blanks, make the nicest screenshot the official one" etc [19:48] it would be nice if the circles to pick an image at the bottom on the item preview weren't white on white [19:50] There's just a general situation there where some of the screenshots being made are shitball [19:51] Excuse me, I'm going to go into the shipping container - I have a home for my gaming consoles and gaming magazines because fuck gamers [19:51] (The place they're going is The MADE in Oakland, CA) [19:51] So much of my stuff is gamer crapola [20:08] *** remsen has quit IRC (Read error: Operation timed out) [20:14] *** nightpool has joined #archiveteam [20:24] *** nightpool has quit IRC (Ping timeout: 252 seconds) [20:35] Just came back. [20:35] Oh, how I wish I knew 4-6 people who lived near me [20:36] Are the gaming magazines getting scanned and uploaded? [20:46] Maybe, eventually [20:49] They hope to have funding. [20:49] It has an equivalent chance of being scanned in Oakland as here. [20:49] And here, they're just going to die. [20:49] I will never prioritize a gaming magazine over, say, anything else [20:50] I am not worried about the basements wanting to get their hands on scanned Nintendo Power [21:00] *** PotcFdk has quit IRC (~'o'/) [21:02] *** Start has quit IRC (Read error: Connection reset by peer) [21:02] *** Start has joined #archiveteam [21:02] *** remsen has joined #archiveteam [21:07] arkiver: I want to add 490 more, ok? [21:08] sure [21:08] I'll limit it if the site can't handle it [21:09] *** luckcolor has quit IRC (Ping timeout: 252 seconds) [21:10] SketchCow: nice collection! [21:10] *** BlueMaxim has joined #archiveteam [21:10] We don't have a reply yet from Chris DiBona [21:11] I hope it didn't go into the spam folder [21:11] SketchCow: if we don't get a reply soon, do you think you can try to reach him? [21:21] *** PotcFdk has joined #archiveteam [21:38] First WARC of GameFront are in the Wayback Machin! [21:38] download and everything seems to be working very good https://web.archive.org/web/20151024041948/http://www.gamefront.com/files/3456061/River_Kit_zip [21:49] arkiver: the increased concurrencies did not seem to have any effect [21:49] *** khaoohs_ has joined #archiveteam [21:51] *** khaoohs has quit IRC (Read error: Operation timed out) [21:59] *** jonimus has joined #archiveteam [22:04] [23:03] dunno if anyone here uses (used) Rdio, but I think they disabled ads seeing as they're going bankrupt haha [22:05] *** WinterFox has joined #archiveteam [22:05] how would that be a sign of going bankrupt? [22:05] i think it is already known [22:05] they're closing [22:06] oh, aquired.. w/e [22:08] *** khaoohs_ has quit IRC (Read error: Operation timed out) [22:10] a music streaming site went ad free due to paid subscriptions [22:10] not unheard of going ad free [22:25] *** godane has quit IRC (Quit: Leaving.) [22:34] *** khaoohs has joined #archiveteam [22:36] *** Muad-Dib has quit IRC (Quit: ZNC - http://znc.in) [22:42] *** bwn_ has quit IRC (Read error: Operation timed out) [22:54] *** nightpool has joined #archiveteam [23:04] Warriors will not be able to do the FTP project [23:05] Because of the size some of the items can be [23:26] can't ftp tell you the size of stuff before you try to download it? [23:27] it would be cool if a warrior could see if an item is too big or not [23:32] Script runners can manually set a max size [23:32] separate discovery and downloading maybe? compile a list of files and file sizes and then distribute evenly sized lists of urls to warriors [23:32] I got the whole discovery working [23:32] It's finding all files and folders and splitting those up in lists [23:32] Then scripts runners grab a list, download the files from it and rsync it to the target [23:33] https://ia601503.us.archive.org/25/items/ftptestfksdgds/testftp4 [23:33] yeah so just incorporate file size into that so a list can be 1 1gb file or 100 10mb files or etc. [23:33] see last three lines [23:34] *** bwn_ has joined #archiveteam