[00:22] *** mistym has quit IRC (Remote host closed the connection) [00:35] *** mistym has joined #archiveteam [00:41] *** Froggypwn has quit IRC (Read error: Connection reset by peer) [00:44] *** Froggypwn has joined #archiveteam [00:51] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [00:54] *** Spritecla has quit IRC (Quit: [Quit message changed because SOME-body is an asshole.]) [01:10] *** xtr-201 has joined #archiveteam [01:14] *** JesseW has joined #archiveteam [01:40] *** wednesday has quit IRC (Quit: Be the change that you wish to see in the world.) [01:40] *** wednesday has joined #archiveteam [01:41] *** Spirit has quit IRC (Read error: Operation timed out) [01:53] *** Spirit has joined #archiveteam [02:03] *** Start has joined #archiveteam [02:14] *** primus104 has quit IRC (Leaving.) [02:25] *** xk_id has quit IRC (Remote host closed the connection) [02:29] *** Spritecla has joined #archiveteam [02:34] *** Yiffiel_d has quit IRC (Ping timeout: 252 seconds) [03:23] *** bsmith096 has joined #archiveteam [03:26] *** xk_id has joined #archiveteam [03:32] *** wm_ has quit IRC (Ping timeout: 240 seconds) [03:34] *** mistym has quit IRC (Remote host closed the connection) [03:34] *** Start_ has joined #archiveteam [03:34] *** xk_id has quit IRC (Read error: Operation timed out) [03:38] *** wm_ has joined #archiveteam [03:38] *** Start has quit IRC (Ping timeout: 306 seconds) [03:38] *** Start_ is now known as Start [03:56] *** xk_id has joined #archiveteam [04:04] *** Emcy has quit IRC (Ping timeout: 483 seconds) [04:05] arkiver: in progress [04:08] *** qwebirc34 has joined #archiveteam [04:09] *** mistym has joined #archiveteam [04:11] *** bsmith096 has quit IRC (Ping timeout: 244 seconds) [04:14] *** xk_id has quit IRC (Read error: Operation timed out) [04:31] SketchCow: is it appropriate to ask about Archive Team-specific IA-related things in #internetarchive, or should I just write it here? [04:31] * JesseW is also curious about this [04:31] I'd like to know if one of our upload servers was banned from s3.us.archive.org or is just experiencing upstream issues [04:34] specifically, archiveteam.kenshin.sg can neither make HTTP requests nor ping to s3.us.archive.org or archive.org; it looks like an upstream router's gone off but I can't tell from the trace alone [04:34] hmm? [04:34] i'm here, let me know what u need help with [04:34] ok i see the problem [04:34] give me a bit to work around it [04:35] oh hey [04:35] heh I guess I could have messaged you too :P [04:35] heh np [04:35] let me route around it [04:36] there's some outage at the peering platform [04:37] ah [04:37] *** aaaaaaaaa has quit IRC (Leaving) [04:38] i'm deactivating the sessions, should be back up in 10 minutes via a different path [04:39] it's back up now [04:39] *** xk_id has joined #archiveteam [04:39] that was fast, thanks [04:40] np. helps that i have control over the network :P [04:40] heh yes [04:42] *** RedType has quit IRC (Remote host closed the connection) [04:44] *** xk_id has quit IRC (Read error: Operation timed out) [04:46] Do we want everything archiveteam has uploaded to be directly in the archiveteam collection? (It currently has 39,562 items in it.) [04:46] (while the "archiveteam" subject tag has only 8,845 items) [04:47] *** RedType has joined #archiveteam [04:52] *** Spritecla has quit IRC (Quit: [Quit message changed because SOME-body is an asshole.]) [04:53] (of which 1507 items with that subject tag are NOT in the collection) [05:02] *** mistym has quit IRC (Remote host closed the connection) [05:03] *** mistym has joined #archiveteam [05:14] *** trs80 has quit IRC (Ping timeout: 186 seconds) [05:15] *** trs80 has joined #archiveteam [05:21] https://archive.org/details/wallbase.cc-id-20000-99999 -- ~ 20,000 wallpapers from a (now defunt) wallpaper site [05:29] *** Jonimus has quit IRC (Read error: Operation timed out) [05:30] *** marvinw has quit IRC (Read error: Operation timed out) [05:30] *** lrkj_ has quit IRC (Read error: Operation timed out) [05:30] *** lrkj has joined #archiveteam [05:30] *** ripvanwin has quit IRC (Read error: Operation timed out) [05:30] *** sep332 has quit IRC (Write error: Broken pipe) [05:30] *** toad2 has quit IRC (Read error: Operation timed out) [05:30] *** phuzion has quit IRC (Read error: Operation timed out) [05:30] *** nwf has quit IRC (Read error: Operation timed out) [05:30] *** ats has quit IRC (Read error: Operation timed out) [05:30] *** aMunster has quit IRC (Write error: Broken pipe) [05:30] *** maz_ has quit IRC (Write error: Broken pipe) [05:30] *** wyatt8740 has quit IRC (Read error: Operation timed out) [05:30] *** dxrt has quit IRC (Read error: Operation timed out) [05:31] *** RKenshin has joined #archiveteam [05:31] *** jk[SVP] has quit IRC (Read error: Operation timed out) [05:31] *** chfoo has quit IRC (Read error: Operation timed out) [05:32] *** human39 has quit IRC (Read error: Operation timed out) [05:32] *** Jogie has quit IRC (Read error: Operation timed out) [05:32] *** Kenshin has quit IRC (Read error: Operation timed out) [05:32] *** RKenshin is now known as Kenshin [05:32] *** vegbrasil has quit IRC (Read error: Operation timed out) [05:32] *** Jogie has joined #archiveteam [05:33] *** will has quit IRC (Read error: Operation timed out) [05:33] *** will has joined #archiveteam [05:34] *** chfoo has joined #archiveteam [05:34] *** dxrt has joined #archiveteam [05:36] *** ats has joined #archiveteam [05:36] *** jk[SVP] has joined #archiveteam [05:49] *** marvinw has joined #archiveteam [05:49] *** signius has quit IRC (Read error: Operation timed out) [06:00] holy file count [06:01] DFJustin: yeah, it makes it a bit painful to open... [06:02] *** mistym_ has joined #archiveteam [06:09] *** mistym has quit IRC (Read error: Operation timed out) [06:20] *** scyther has joined #archiveteam [06:31] *** espes__ has joined #archiveteam [06:45] *** mistym_ has quit IRC (Remote host closed the connection) [06:45] *** mistym has joined #archiveteam [06:59] *** RedType has quit IRC (Quit: leaving) [07:00] *** RedType has joined #archiveteam [07:05] *** dashcloud has quit IRC (Ping timeout: 252 seconds) [07:08] *** dashcloud has joined #archiveteam [07:08] *** mistym has quit IRC (Remote host closed the connection) [07:11] *** JesseW has quit IRC (Quit: Leaving.) [07:41] *** ripvanwin has joined #archiveteam [07:41] *** vegbrasil has joined #archiveteam [07:41] *** wyatt8740 has joined #archiveteam [07:42] *** sep332 has joined #archiveteam [07:43] *** maz_ has joined #archiveteam [07:44] *** aMunster has joined #archiveteam [07:46] *** toad1 has joined #archiveteam [07:46] *** signius has joined #archiveteam [07:49] I would like to suggest a new project, preserving java mobile phone games. I know of no site or group having systematic interest in these hidden gems. [07:50] Would anybody be interested? [07:53] *** nwf has joined #archiveteam [08:00] *** Jonimus has joined #archiveteam [08:17] *** kyanz`bot has joined #archiveteam [08:19] *** Start has quit IRC (Read error: Operation timed out) [08:21] *** kyanz`bot has quit IRC (Client Quit) [08:23] *** primus104 has joined #archiveteam [08:24] *** kyanz`bot has joined #archiveteam [08:24] *** kyanz`bot has quit IRC (Client Quit) [08:25] *** kyanz`bot has joined #archiveteam [08:28] *** kyanz`bot has quit IRC (Client Quit) [08:32] *** kyanz`bot has joined #archiveteam [09:01] *** RedType has quit IRC (Ping timeout: 252 seconds) [09:10] *** xk_id has joined #archiveteam [09:17] *** kyanz`bot has quit IRC (Quit: KeyboardInterrupt) [09:18] *** kyanz`bot has joined #archiveteam [09:23] *** xk_id has quit IRC (Read error: Operation timed out) [09:26] *** RedType has joined #archiveteam [09:40] *** antomati_ has joined #archiveteam [09:40] *** swebb sets mode: +o antomati_ [09:40] *** antomatic has quit IRC (Read error: Operation timed out) [09:42] *** BlueMaxim has quit IRC (Quit: Leaving) [09:52] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [09:57] *** dashcloud has joined #archiveteam [10:24] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [10:34] *** dashcloud has joined #archiveteam [10:39] *** Muad-Dib has joined #archiveteam [10:42] *** xk_id has joined #archiveteam [10:55] *** xk_id has quit IRC (Read error: Operation timed out) [10:57] *** Guest25 has joined #archiveteam [10:58] o/ [11:01] *** primus104 has quit IRC (Leaving.) [11:02] so recently our government launched a postcode system, terribly designed (DHL, FedEx, etc. publicly said they weren't going to use it), spent €27m on designing and deploying it, and the people who won the tender are trying profiting from it [11:04] I've figured out a way I could grab all postcode/address/area/coordinate combinations from their web API, struggling to get enough resources to carry it out though. figured you guys would be experts on scraping large amounts of data. anyone have some tips? [11:05] when I say large, I need to check 390,625 codes per routing key (there are 139), so around 54,296,875 HTTP requests in total [11:13] *** dashcloud has quit IRC (Read error: Operation timed out) [11:16] *** dashcloud has joined #archiveteam [11:18] *** Sellyme has quit IRC (Read error: Operation timed out) [11:41] Well, you probably don't want to get blocked. So many IPs + delay between request + jitter is a good start [11:42] And probably don't hit them too hard, as they might notice it if it goes down [11:44] *** Guest25 has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [11:48] *** Guest25 has joined #archiveteam [11:48] ersi: how would I go about running on different IPs simultaneously? I have my home connection + a server, but that server only has one IP [11:49] I don't have much money to spend on resources, which is the real limiting factor [11:53] you'll probably need to use Amazon's AWS or similar cloud services to get all the IPs [12:12] hmm [12:17] AFAIK you won't need to spend a fortune on AWS - and it's easy and cheap (AFAIK you get to do it for free a few times) to rotate IPs [12:18] Another way would be to.. make a ArchiveTeam warrior project and then maybe people will select it and contribute, if it gets added to the warrior [12:18] the only thing is this may be a legal grey-area [12:19] the API itself isn't rate limited, and it's against their ToS to "abuse" it [12:20] fuck legal grey areas [12:20] Well, you want to rate limit to limit your exposure anyway [12:21] also to make it harder to block [12:21] true. my approach before was to send 160 requests per second, didn't end well as you can guess [12:22] no harm in making a warrior project. I'll look it up, at the moment my implementation is in JS [12:23] 160 requests per second is... pretty much from (what I assume) a single IP or two [12:23] yup. got a lot of data, ended up getting my access revoked after a while [12:24] ArchiveTeam looks to be more for services that are shutting down, still think this fits in? [12:24] sure [12:28] *** xk_id has joined #archiveteam [12:33] *** tomwsmf-a has joined #archiveteam [12:34] *** xk_id has quit IRC (Read error: Connection reset by peer) [12:34] *** xk_id has joined #archiveteam [12:40] *** xk_id has quit IRC (Remote host closed the connection) [12:41] *** xk_id has joined #archiveteam [12:51] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [12:51] *** Emcy has joined #archiveteam [12:51] yahoosucks [12:53] ersi: ready to create a project. should I use my own GH or should the project be on the ArchiveTeam org? [13:02] Huh? [13:02] There's only a wiki on archiveteam.org [13:02] I mean for the source code [13:03] You need to have a repo somewhere, GitHub is fine [13:23] *** dashcloud has quit IRC (Read error: Connection reset by peer) [13:24] *** dashcloud has joined #archiveteam [13:40] *** trs80 has quit IRC (Ping timeout: 186 seconds) [13:43] started it off anyway: http://archiveteam.org/index.php?title=Eircode [13:45] Always good to have info on it :) [13:58] *** kyan_ has joined #archiveteam [14:00] *** kyan has quit IRC (Ping timeout: 258 seconds) [14:04] *** kyanz`bot has quit IRC (Read error: Operation timed out) [14:06] *** trs80 has joined #archiveteam [14:15] *** thechip has joined #archiveteam [14:16] *** primus104 has joined #archiveteam [14:17] *** thechip has quit IRC (Remote host closed the connection) [14:19] *** K4k has joined #archiveteam [14:25] is it possible to execute multiple wget operations simultaneously using seesaw? [14:37] *** bentpins has quit IRC (Ping timeout: 483 seconds) [14:37] *** balrog has quit IRC (Read error: Connection reset by peer) [14:46] *** kyanz`bot has joined #archiveteam [14:47] *** kyan_ is now known as kyan [14:47] *** balrog has joined #archiveteam [14:47] *** swebb sets mode: +o balrog [14:48] *** mistym has joined #archiveteam [14:50] *** JesseW has joined #archiveteam [14:59] *** mistym has quit IRC (Remote host closed the connection) [15:00] *** mistym has joined #archiveteam [15:02] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [15:07] scyther: yes [15:07] Do you have any more info on those java mobile games? [15:07] yipdw: thanks!! [15:08] eh, what info would you want? [15:08] *** mistym has quit IRC (Read error: Operation timed out) [15:08] well, sites where they are hosted? [15:08] There is a whole lot of them, and nobody has even tried to sort them, so we need to grab all we can [15:09] there are multiple small sites, i dont have the links now, but if there is interest, i will seek them out [15:11] *** JesseW has quit IRC (Quit: Leaving.) [15:11] pages like this: http://java.mob.org/ [15:11] but there are hundereds of pages, and thousands of games [15:12] i think that at first we need to find out what different versions of games exist (eg. different resolustions and sometimes even mobile phone types) [15:12] and than try to grab as much games as we can, slowly sorting them [15:13] most of these games also have translatons [15:17] *** mistym has joined #archiveteam [15:20] *** mistym has quit IRC (Remote host closed the connection) [15:21] *** mistym has joined #archiveteam [15:23] *** Start has joined #archiveteam [15:24] *** mistym_ has joined #archiveteam [15:25] ping arkiver [15:29] *** mistym has quit IRC (Read error: Operation timed out) [15:29] *** Ctrl-S has quit IRC (Remote host closed the connection) [15:29] *** filippo__ has quit IRC (Remote host closed the connection) [15:42] *** tomwsmf-a has joined #archiveteam [15:48] *** primus104 has quit IRC (Leaving.) [15:50] *** scyther has quit IRC (Read error: Connection reset by peer) [15:55] *** deathy has quit IRC (Remote host closed the connection) [16:00] *** mistym_ has quit IRC (Remote host closed the connection) [16:07] hmm, http://warriorhq.archiveteam.org/downloads/wget-lua/wget-1.14.lua.LATEST.tar.bz2 is returning a 404 it seems [16:08] *** deathy has joined #archiveteam [16:16] *** mistym has joined #archiveteam [16:31] AtomicGamer archival was a success [16:31] ~110000 files, ~9.5 terabytes. they let me rsync. https://archive.org/search.php?query=subject%3A%22atomicgamer%22 i will get it a collection [16:35] Guest25: thanks, i will fix that in a few hours [16:43] great [16:52] *** nmnn has joined #archiveteam [16:56] *** Ctrl-S has joined #archiveteam [17:09] *** Start has quit IRC (Quit: Disconnected.) [17:19] *** Start has joined #archiveteam [17:22] *** K4k has quit IRC (Quit: WeeChat 1.2) [17:30] *** godane has left [17:30] *** godane has joined #archiveteam [17:33] That was one very good save [17:35] *** filippo__ has joined #archiveteam [17:39] *** aaaaaaaaa has joined #archiveteam [17:39] *** swebb sets mode: +o aaaaaaaaa [17:40] *** Start has quit IRC (Quit: Disconnected.) [17:48] nice one Spirit [17:50] *** nmnn has quit IRC (Remote host closed the connection) [17:55] *** qwebirc34 has quit IRC (Ping timeout: 240 seconds) [18:02] *** Start has joined #archiveteam [18:03] should I run my own tracker to make sure everything's functioning properly? [18:06] you can [18:06] *** Fletcher has quit IRC (Read error: Connection reset by peer) [18:06] *** diacope has quit IRC (Read error: Connection reset by peer) [18:07] there exists an ArchiveTeam development environment VM at https://github.com/ArchiveTeam/archiveteam-dev-env [18:07] the archiveteam-dev-env has one set up inside it: https://github.com/ArchiveTeam/archiveteam-dev-env [18:08] got most of it running on a server at the moment using the guide on the wiki :) [18:09] if all goes well, can the project make use of ArchiveTeam's tracker (even unlisted on the main page)? [18:09] *** diacope has joined #archiveteam [18:09] it can; how much space do you anticipate it using [18:09] we already shove too much into IA [18:09] if you can provide your own storage that would be good also [18:10] I can get you a rough estimate, one second [18:10] if it's under a few terabytes you're probably fine [18:10] I have a VPS with 40GB storage which would be more than enough for the project [18:10] oh 40 GB? [18:10] I mean if that's what it'll come out to then yeah warrior is totes fine [18:11] it'll probably be around the region of 5GB, it's mostly a bunch of small JSON files [18:11] ok yeah no problems there [18:12] great [18:13] have a mirror of wget-1.14.lua.LATEST.tar.bz2 by any chance? [18:13] you'll want to consolidate the files into zip or tar before inflicting them on IA [18:13] yeah [18:13] if we warrior them we can adapt megawarc to zip or tar them or something [18:14] Guest25: I don't, it might not have moved along with the rest of warriorhq [18:14] chfoo might have the build, xmc can also spin up the old instance and we can move it from there [18:15] yeah, chfoo said he'd sort it out later so I'll wait for that [18:15] there is also https://github.com/ArchiveTeam/wget-lua/releases/tag/v1.14.lua2 but that may have some build issues [18:16] *** primus104 has joined #archiveteam [18:20] *** Fletcher has joined #archiveteam [18:24] *** Start has quit IRC (Quit: Disconnected.) [18:26] downloads should be back now [18:31] Guest25: if you'd like we can write a project for it [18:33] if you want to that's fine by me, I do need to brush up on my python skills though [18:34] the project shouldn't be too hard, what I've noticed recently though is they throttle you after ~15 requests (could take up to 3 seconds for a response) [18:38] *** Start has joined #archiveteam [18:40] yipdw: what's missing from new-hq? [18:40] i actually haven't yet spun down the old one [18:40] ssh to shilling.corp.xrtc.net [18:40] *** diacope has quit IRC (Quit: ZNC - http://znc.in) [18:40] *** Fletcher has quit IRC (Quit: k) [18:41] xmc: ah cool -- I think we need to drag over the wget bundle on warriorhq [18:41] * xmc nod [18:41] or maybe it's already there and the server path mapping needs to be fixed, I haven't checked [18:43] downloads should be working already [18:44] *** diacope has joined #archiveteam [18:44] working for me, got everything set up [18:45] *** dashcloud has quit IRC (Ping timeout: 265 seconds) [18:50] *** dashcloud has joined #archiveteam [18:53] *** Start has quit IRC (Quit: Disconnected.) [19:00] *** mistym has quit IRC (Remote host closed the connection) [19:01] *** mistym has joined #archiveteam [19:03] *** Start has joined #archiveteam [19:14] *** scyther has joined #archiveteam [19:14] *** dashcloud has quit IRC (Read error: Connection reset by peer) [19:17] *** dashcloud has joined #archiveteam [19:27] *** habi has joined #archiveteam [19:29] *** habi has quit IRC (Client Quit) [19:31] german net activists have been accused of treason, their site died, they use IA https://twitter.com/netzpolitik/status/626821681080848385 [19:31] @netzpolitik: Wer uns wegen #Landesverrat finanziell unterstützen will, findet hier alle Infos: https://t.co/AHTphbpUC5 http://t.co/0ocWESJwtM [19:36] *** habi has joined #archiveteam [19:37] *** mistym has quit IRC (Remote host closed the connection) [19:41] *** mistym has joined #archiveteam [19:46] *** Fletcher has joined #archiveteam [19:55] *** habi has left [19:56] *** dashcloud has quit IRC (Ping timeout: 240 seconds) [20:00] *** dashcloud has joined #archiveteam [20:03] *** aa has joined #archiveteam [20:03] *** aa has quit IRC (Client Quit) [20:06] *** dashcloud has quit IRC (Read error: Operation timed out) [20:08] *** kurico has joined #archiveteam [20:09] *** dashcloud has joined #archiveteam [20:09] dunno if this is the appropriate channel to post this but I have complete scraped databases of all 4chan textboards [20:10] link (hosted on archive.org): http://bbs.progrider.org/prog/read/1397949784/34 [20:10] *** habi has joined #archiveteam [20:10] cool [20:11] would be nice if someone add these to the wiki as i don't have a account there [20:12] signing up for account is pretty quick and easy [20:13] let's give it a try. [20:13] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [20:13] yahoosucks [20:17] added to the page, i guess that's it for me now [20:18] *** kurico has left see you internet cowboy [20:36] *** habi has left [20:54] *** Spirit has quit IRC (Leaving) [21:01] *** mistym has quit IRC (Remote host closed the connection) [21:10] *** kyan_ has joined #archiveteam [21:10] *** kyan has quit IRC (Ping timeout: 258 seconds) [21:13] *** kyanz`bot has quit IRC (Read error: Operation timed out) [21:13] *** kyanz`bot has joined #archiveteam [21:15] *** mistym has joined #archiveteam [21:30] *** scyther has quit IRC (Read error: Connection reset by peer) [21:33] *** philpem has joined #archiveteam [21:41] *** mistym has quit IRC (Remote host closed the connection) [21:51] *** Ravenloft has joined #archiveteam [21:55] *** mistym has joined #archiveteam [22:18] *** zenguy_pc has quit IRC (Read error: Connection reset by peer) [22:19] *** zenguy_pc has joined #archiveteam [22:32] *** Guest25 is now known as expr_ [22:43] *** primus104 has quit IRC (Read error: Connection reset by peer) [22:47] *** garyrh has quit IRC (Remote host closed the connection) [22:48] *** primus104 has joined #archiveteam [22:50] *** primus105 has joined #archiveteam [22:55] *** primus104 has quit IRC (Read error: Operation timed out) [23:07] *** RedType has quit IRC (Read error: Operation timed out) [23:08] *** RedType has joined #archiveteam [23:11] *** zenguy_pc has quit IRC (Read error: Connection reset by peer) [23:12] *** zenguy_pc has joined #archiveteam [23:22] *** zenguy_pc has quit IRC (Read error: Connection reset by peer) [23:22] *** zenguy_pc has joined #archiveteam [23:38] *** dashcloud has quit IRC (Read error: Operation timed out) [23:42] *** dashcloud has joined #archiveteam [23:49] *** zenguy_pc has quit IRC (Read error: Connection reset by peer) [23:49] *** zenguy_pc has joined #archiveteam [23:51] *** zenguy_pc has quit IRC (Read error: Connection reset by peer) [23:52] *** zenguy_pc has joined #archiveteam [23:53] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [23:59] *** zenguy_pc has quit IRC (Read error: Connection reset by peer) [23:59] *** expr_ has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…)