[00:06] *** RedType has quit IRC (Quit: leaving) [00:07] *** RedType has joined #archiveteam-bs [00:11] *** schbirid has quit IRC (Quit: Leaving) [00:26] *** RedType has quit IRC (Quit: leaving) [00:34] *** RedType has joined #archiveteam-bs [00:39] SketchCow, can we archive 19TB of mouse genomesand send them onto FOS and the IA? [00:40] With the FTP project that is ^ [00:40] SketchCow: some more info https://www.sanger.ac.uk/sanger/Mouse_SnpViewer/rel-1303 [00:40] http://www.sanger.ac.uk/science/data/mouse-genomes-project [00:41] *** aaaaaaaaa has joined #archiveteam-bs [00:48] To be exact, that FTP ^ is 19422781428266 bytes [01:47] *** bwn has quit IRC (Read error: Operation timed out) [03:20] *** primus104 has quit IRC (Leaving.) [03:21] *** BlueMaxim has joined #archiveteam-bs [03:22] *** chazchaz has quit IRC (Read error: Operation timed out) [03:23] *** espes__ has quit IRC (Read error: Operation timed out) [03:23] *** espes__ has joined #archiveteam-bs [03:32] *** chazchaz has joined #archiveteam-bs [03:33] https://twitter.com/uppfinnarn/status/670229025445519360 [03:33] computers, ladies and gentlemen [03:46] first 10 issues of metropop that i got are getting uploaded [03:46] issues 400 to 410 [04:05] https://archive.org/details/metropop-400 [04:05] will fix metadata later [04:20] *** aaaaaaaaa has quit IRC (Leaving) [04:32] *** chazchaz has quit IRC (Read error: Operation timed out) [04:32] *** espes__ has quit IRC (Read error: Operation timed out) [04:34] *** espes__ has joined #archiveteam-bs [04:39] *** chazchaz has joined #archiveteam-bs [05:33] *** bwn has joined #archiveteam-bs [05:59] *** Sk1d has quit IRC (Ping timeout: 252 seconds) [06:58] *** vitzli has joined #archiveteam-bs [08:08] *** bwn has quit IRC (Read error: Connection reset by peer) [08:09] *** bwn has joined #archiveteam-bs [08:10] *** remsen has quit IRC (Read error: Operation timed out) [08:21] *** primus104 has joined #archiveteam-bs [08:37] *** schbirid has joined #archiveteam-bs [09:16] *** cvb has joined #archiveteam-bs [09:20] *** bwn has quit IRC (Read error: Operation timed out) [09:29] so i'm at 519k items now [09:53] *** primus104 has quit IRC (Leaving.) [10:01] *** vitzli has quit IRC (Quit: Leaving) [10:59] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [11:16] uploaded: https://archive.org/details/russian-Katalog_Lego-1994 [11:25] *** remsen has joined #archiveteam-bs [12:10] *** primus104 has joined #archiveteam-bs [12:19] *** vitzli has joined #archiveteam-bs [12:20] godane: holy nostalgia, i remember that catalogue (in german) from my lego heydays :) [12:35] *** VADemon has joined #archiveteam-bs [12:36] i figure people will like that [12:43] btw i'm starting to get 2400k TheBlaze TV videos again [12:44] whats funny is it just started happening when i was getting 500k copies of Glenn Beck and Dana [12:44] then 2015-11-10 of dana when into 2400k [12:45] anyways i'm grabbing some of the documentary at 2400k since i only have 500k copies of those [13:20] so looks like i may have missed adding the 4 hour mp3 to some of glenn beck insider podcast [13:21] i believe the 4 hour started in july 2010 [13:27] *** primus104 has quit IRC (Leaving.) [13:49] *** cvb has quit IRC (Quit: Leaving) [13:56] *** arkiver2 has joined #archiveteam-bs [14:04] *** Lord_Nigh has quit IRC (Ping timeout: 252 seconds) [14:05] *** Lord_Nigh has joined #archiveteam-bs [14:09] *** arkiver2 has quit IRC (Ping timeout: 252 seconds) [16:08] well then [16:08] http://pastebin.com/scraping [16:08] pretty sure this page wasn't there before [16:08] lol [16:08] interesting [16:09] so pastebin offers a lifetime pro account for $30 at the moment [16:09] which gives you access to their scraping API [16:09] do we care about scraping pastebin enough to drop $30on it? [16:14] meh [16:23] Sure we could split it up if needs be [16:29] * joepie91 ping phuzion [16:30] so, they require you to use one whitelisted IP [16:30] I can probably set up scraping [16:30] Yeah, that's the only thing I particularly noticed about it was the single IP restriction [16:30] if there's no risk of them changing the layout or banning me for it, then it should be pretty much zero-maintenance [16:30] phuzion: I can kind of understand it, though [16:31] Yeah, otherwise they'd get shared all over the place [16:31] yep [16:31] Like we were planning to do :) [16:31] especially given that it's lifetime [16:31] heh [16:31] I can always just set up a proxy [16:31] Huh [16:31] to which I can add credentials for archiveteam people [16:31] Right [16:31] and that internally rate limits [16:31] But what's the point if you can just do the scraping yourself? [16:31] phuzion: we might have, for example, people who need to scrape specific pastes [16:31] to fetch URLs from [16:31] stuff like that [16:32] Ahhhh, gotcha. [16:32] and if we're going to pool $ for a pastebin account, then we might as well make it a collective one [16:32] Right [16:33] That's shockingly cheap for essentially unfettered access to all of pastebin though, isn't it? [16:33] yes [16:33] ish [16:33] compared to other paid APIs, yes, it's cheap [16:33] but the idea of charging for an API is a bit stupid to begin with [16:33] :p [16:33] Depends on what the API does, IMO. [16:34] not really [16:34] see, the thing is [16:34] an API call is cheaper than a frontend call for the same data [16:34] to the provider [16:34] so why limit API calls but not frontend calls? [16:34] sure, I can understand limits for eg. complex search queries [16:34] frontend calls are assumed to also be generating ad revenue. [16:34] but those should be enforced in the frontend also [16:34] not if scrapers use them [16:34] that's my point [16:35] if somebody wants to scrape, they can either use the API or the frontend [16:35] neither is going to bring in ad rev [16:35] but the API is cheaper for them to provide [16:35] so why make that the one with the limit? :P [16:35] So give them access to the API so they can do so in a less costly manner, gotcha [16:35] basically [16:35] it benefits both provider and consumer to have a limitless API (or equivalent limits to the frontend) [16:36] anyhow, phuzion, HCross, ideas on $? [16:36] my current budget remains 0 until hopefully-soon, but idk how long that deal is going to be availabl for [16:36] available* [16:36] they call it a "black friday deal" but it's not like pastebin isn't _constantly_ running promos [16:37] I can chip in $10 or something, but I don't have paypal. [16:37] they always seem to be doing promos. My budget is 0 until the start of Dec when I get paid [16:37] phuzion: what do you have? [16:38] Where in the world are you? [16:38] who, me? or phuzion? [16:38] Unless you take credit cards directly, I probably can't give you money over the internet now that I think about it. [16:38] lol [16:38] phuzion [16:38] I'm in the US, btw. [16:38] phuzion: you can go through my donation form... :P [16:38] creditcard-via-paypal [16:40] joepie91: let me know once you've got a few other people to chip in on this and I'll send a few bucks your way for it. [16:41] noted :P I will await responses from others... [16:41] so, because it has scrolled out of view: pooling together $30 for a pastebin pro lifetime account, which gives API access for one(?) IP which means we can scrape properly without blocks [16:41] anybody who wants to chip in, please maketh thyself known [16:42] joepie91, you going to hold the Pastebin acc? Or let someone like SketchCow own it [16:42] HCross: I don't mind either way, but it's probably most practical if the one doing the scraping has access to the account (because of the IP whitelisting) [16:42] even if that's just an `archiveteam` account changing hands when needed [16:42] ye, just wondering if they will pick up on account sharing [16:43] shouldn't matter [16:43] it's just going to be one IP using it [16:44] *** primus104 has joined #archiveteam-bs [16:48] joepie91: how do you accept money? I can probably give you the $30 depending on if I can get you the money [16:48] PayPal I think dashcloud [16:49] or CC via PP [16:49] *** ndiddy has joined #archiveteam-bs [16:49] got a link to your site joepie91? [16:50] dashcloud: http://cryto.net/~joepie91/donate.html [16:50] dashcloud: plus SEPA and such [16:55] joepie91 I can pitch the $30 for the lifetime account, if the funding isn't there now [16:55] achip, looks like dashcloud has it. You could go halves? [16:55] if dashcloud would like, sure [16:58] * joepie91 will await some kind of agreement [16:59] (I'll just register an 'archiveteam' account, although with a less obvious name - sleeping dogs etc) [16:59] sure- I can't seem to edit my selection from $30 though without cancelling this transaction though [16:59] (and I'll CC credentials to SketchCow so that the bus factor is 2) [17:01] okay- the money should be on its way to you now [17:02] dashcloud: St**** Za*****? [17:02] (idk if your name is public) [17:02] yep [17:02] okay :P [17:02] you actually sent euro [17:02] heh [17:02] that's what I was offered [17:02] though I don't think it differs much after transaction fees [17:02] yeah [17:02] right [17:03] 28.48 post-transaction-fees [17:03] == 30.17 $ [17:03] apparently the difference isn't that much right nwo [17:03] lol [17:03] yeah, exchange rate is shit at the moment [17:03] dashcloud: can you think of a fun non-obvious username? [17:03] for this project? [17:03] dashcloud: for the pastebin account [17:03] for archiveteam [17:04] :p [17:04] is f451 too obvious? [17:04] I.. have no idea what that means [17:04] fahrenheit 451, the famous Ray Bradbury story [17:04] ahh [17:04] worksforme [17:04] :P [17:05] achip: dashcloud already paid the full amount [17:06] if you want to donate anyway, it won't be a waste [17:06] will leave that decision up to achip :P I can send back the $ if wanted [17:07] *** SN4T14 has quit IRC (Read error: Operation timed out) [17:07] mmk, it should activate pro status in a few mins [17:08] .. [17:08] dashcloud: I typoed the username [17:08] gg me [17:08] ah well, less obvious I guses [17:08] guess* [17:10] okay, so [17:10] if I get hit by a bus, poke SketchCow - just sent him a copy of the account credentials [17:10] if SketchCow gets hit by a bus, we're probably all doomed [17:13] * ersi hops on the Magic School bus [17:27] joepie91: can you op swebb [17:37] *** joepie91 sets mode: +o swebb [17:37] *** swebb sets mode: +o DFJustin [17:37] *** swebb sets mode: +o SadDM [17:37] *** swebb sets mode: +o antomatic [17:37] *** swebb sets mode: +o balrog [17:37] *** swebb sets mode: +o xmc [17:37] xmc: there you go [17:37] thanks :) [17:37] ops-spreader engaged [17:37] :p [17:37] looks like housemates may not be coming back home tonight [17:38] went to a protest, and 20 arrests have been reported... [17:40] :| [17:41] *** remsen has quit IRC (Read error: Operation timed out) [17:41] xmc: they'll probably be released in the morning [17:41] as usually happens [17:41] "oh we didn't actually have a reason to arrest you, sorry, out you go" [17:41] pretty much a standard tactic for defusing protests in NL [17:54] *** SN4T14 has joined #archiveteam-bs [18:09] same thing here, but with more beatings and fewer apologies [18:10] no apologies here [18:14] good night all [18:14] *** vitzli has quit IRC (Leaving) [18:14] I kind of imagined the apology, this isn't Canada :) [18:17] thanks gnome3 for crashing whenyou try to run your fucking stupid new file dialog [18:42] *** Microguru has quit IRC (Quit: Microguru) [18:45] btw [18:45] http://www.polygon.com/2015/8/18/9173621/ryan-north-stuck-hole-twitter [18:45] Twitter Plays Stuck-In-A-Hole [18:45] (cc SketchCow ) [18:53] I think we got that when it happened P: [18:57] joepie91, it's like that episode of it's always sunny [19:03] *** primus104 has quit IRC (Leaving.) [19:11] *** nomadpeng has joined #archiveteam-bs [19:33] tracker not listing anything? [19:34] *** primus104 has joined #archiveteam-bs [19:42] joepie91, what are the protests? Couldn't find anything on google news... [19:42] kyan: Pegida == neonazis, essentially [19:43] unfortunately it's being imported to NL [19:43] they were doing a protest today, and there was a corresponding anti-protest [19:43] *** Start has quit IRC (Ping timeout: 310 seconds) [19:44] aah :( [19:44] *** Start has joined #archiveteam-bs [19:44] thanks for enlightening me :) [19:45] kyan: I should clarify [19:45] neonazis pretending not to be neonazis [19:45] doing the entire "concerned citizen" spiel [19:45] you know the drill, probably [19:45] it's fairly telling that the founder of the organization is an arms dealer [19:45] I've heard of similar, but never run across them in real life thanks goodnes [19:46] * kyan is lucky to live in an area without too many nutcases [19:48] I'm sure there's a few actual concerned citizens in those groups as well [19:48] sure, they're just hitched up by the shady folks [19:48] something something Hitler [19:55] *** aaaaaaaaa has joined #archiveteam-bs [19:55] *** swebb sets mode: +o aaaaaaaaa [19:59] *** godane has quit IRC (Leaving.) [20:01] *** godane has joined #archiveteam-bs [20:15] *** wyatt8750 is now known as wyatt8749 [20:15] *** wyatt8749 is now known as wyatt8740 [20:44] Is DocumentCloud mirrored to IA? [20:48] kyan: it was until two years ago, it seems: https://archive.org/details/documentcloud [20:48] Hmm. [20:48] Cool, gwern is fiddling about with the AT datasets [20:48] ersi: oh? [20:49] Saw some PR's to warcat on Github from gwern [20:49] no, not PRs - opened issues [20:49] They're well written (of course, it's gwern :) ) [20:49] I seem to keep running into gwern everywhere [20:49] lol [20:50] so this is the lion king magazine i'm uploading: http://lionking.wikia.com/wiki/The_Lion_King:_A_Nature_Fun_and_Learn_Series [20:51] *** aaaaaaaaa has quit IRC (Read error: Connection reset by peer) [20:52] *** aaaaaaaaa has joined #archiveteam-bs [20:52] *** swebb sets mode: +o aaaaaaaaa [20:54] *** aaaaaaaaa sets mode: +oooo chfoo closure godane midas [20:54] *** aaaaaaaaa sets mode: +oo nico_32 yipdw [20:56] *** schbirid has quit IRC (Quit: Leaving) [20:57] uploaded: https://archive.org/details/The_Lion_King_-_A_Nature_Fun_and_Learn_Series-01 [21:15] *** VADemon has quit IRC (Read error: Connection reset by peer) [21:16] *** godane has quit IRC (Quit: Leaving.) [21:17] *** aaaaaaaaa has quit IRC (Read error: Connection reset by peer) [21:17] *** aaaaaaaaa has joined #archiveteam-bs [21:17] *** swebb sets mode: +o aaaaaaaaa [21:18] *** godane has joined #archiveteam-bs [21:25] *** nomadpeng has quit IRC (Quit: Leaving) [21:45] *** Start has quit IRC (Read error: Connection reset by peer) [21:45] *** Start has joined #archiveteam-bs [21:46] *** ens has joined #archiveteam-bs [21:59] *** balrog has quit IRC (Bye) [22:13] *** balrog has joined #archiveteam-bs [22:13] *** swebb sets mode: +o balrog [22:23] *** bwn has joined #archiveteam-bs [22:36] *** chazchaz has quit IRC (Read error: Operation timed out) [22:37] *** espes__ has quit IRC (Read error: Operation timed out) [22:42] *** chazchaz has joined #archiveteam-bs [22:50] *** BlueMaxim has joined #archiveteam-bs [22:57] *** ParkerR has joined #archiveteam-bs [22:57] *** espes__ has joined #archiveteam-bs [22:57] ndiddy, Ok master. whatever you say [22:58] ... [22:58] :) [23:01] so... if you want to do stuff there's the archive team warrior which is a virtualbox appliance that you run and automatically backs up websites [23:02] http://www.archiveteam.org/index.php?title=Main_Page [23:17] *** Marcelo has joined #archiveteam-bs [23:36] *** ParkerR has quit IRC (Remote host closed the connection) [23:37] *** ParkerR has joined #archiveteam-bs [23:44] *** nightpool has joined #archiveteam-bs [23:47] *** DMackey has joined #archiveteam-bs [23:56] *** Marcelo has quit IRC (Quit: Page closed)