[00:01] *** Start has joined #archiveteam [00:05] WHat [00:06] Huh, FOS rebooted [00:06] Must have been a system issue [00:08] rsync is back. [00:08] Archivebot uploader is back. [00:08] Web server is back. [00:09] Freeze sourceforge project to Monday [00:09] I am sure it will take a couple days to talk to these people. [00:09] SketchCow: is it ok if I write them a mail tomorrow? [00:10] Or are you going to talk to them (might be better) [00:11] a lot of ArchiveBot jobs seem to have terminated. [00:11] I'm going to writeto them [00:12] Ok [00:13] Sent him something. [00:13] Bowing and scraping not needed. Want to find out what speed needs to be done. [00:14] Ok, good [00:14] how do you politely word something that says "you're fucking everything up and we want to archive it all before it goes to shit" [00:14] I hope they are cooperative [00:15] I just said I heard we were downloading too fast, what's a good speed [00:15] sounds good [00:15] If they're not coorperative, I'll make noise. [00:15] This is not my first rodeo, and they are not my first clowns [00:16] I remember twitpic [00:16] I'm sure we blew up their shit infrastructure [00:16] Well, twitpic guy's an ass [00:16] SketchCow: actually I don't think we blew up anything [00:16] it was running fine [00:17] I didn't really notice any slowdowns or such things [00:17] Zoocasa started! [00:17] not in the warrior currently [00:18] #zoohouse [00:18] https://github.com/ArchiveTeam/zoocasa-grab [00:19] Andover is obviously a company on the outs. Reduced resources, small staff. Of course their network would be the least expensive options. [00:20] I'm sure they saw a spike and freaked. [00:23] *** mistym has quit IRC (Remote host closed the connection) [00:27] *** Ungstein has quit IRC (Ping timeout: 265 seconds) [00:29] *** koo5 has quit IRC (Read error: Operation timed out) [00:50] Started Halo uploading again, hopefully that will blow out soon [01:03] *** Boltsie has quit IRC (Ping timeout: 370 seconds) [01:15] *** Ungstein has joined #archiveteam [01:28] *** username1 has joined #archiveteam [01:29] *** JesseW has joined #archiveteam [01:30] *** schbirid2 has quit IRC (Read error: Operation timed out) [01:44] *** mistym has joined #archiveteam [02:03] *** aschmitz has quit IRC (Remote host closed the connection) [02:17] *** primus104 has quit IRC (Leaving.) [02:28] *** trill_ has joined #archiveteam [03:01] *** zenguy_pc has quit IRC (Read error: Connection reset by peer) [03:04] *** JRWR has quit IRC (Remote host closed the connection) [03:07] *** mistym has quit IRC (Remote host closed the connection) [03:18] *** zenguy_pc has joined #archiveteam [03:25] *** mistym has joined #archiveteam [03:40] *** JRWR has joined #archiveteam [03:53] *** antomati_ has joined #archiveteam [03:53] *** swebb sets mode: +o antomati_ [03:54] *** zhongfu has quit IRC (Remote host closed the connection) [03:57] *** antomatic has quit IRC (Ping timeout: 370 seconds) [03:58] *** zhongfu has joined #archiveteam [04:12] *** Muad-Dib has joined #archiveteam [04:30] *** aaaaaaaaa has quit IRC (Leaving) [04:37] *** cjp__ has joined #archiveteam [04:39] *** cjp__ has quit IRC (Client Quit) [04:41] *** Boltsie has joined #archiveteam [04:42] sounds like Toshiba is starting to kill off support for old models: http://www.vogons.org/viewtopic.php?f=46&t=43805 [04:51] *** TheLovina has joined #archiveteam [05:02] at a first glance, everything seems to be on cdgenp01.csd.toshiba.com [05:03] googling site:csd.toshiba.com also shows some relics of their older site [05:10] download pages appear to be sequential: http://support.toshiba.com/support/viewContentDetail?contentId=4006772 [05:12] their robots.txt prevents any downloads from going into wayback: http://cdgenp01.csd.toshiba.com/robots.txt [05:17] *** Famicoman has quit IRC (Ping timeout: 512 seconds) [05:21] *** JesseW has quit IRC (Read error: Operation timed out) [05:29] i created a wiki page for it: http://archiveteam.org/index.php?title=Toshiba_Support [05:29] * Start is afk for the night [05:31] *** Famicoman has joined #archiveteam [05:35] *** bzc6p_ has joined #archiveteam [05:35] *** swebb sets mode: +o bzc6p_ [05:41] *** bzc6p has quit IRC (Ping timeout: 600 seconds) [05:49] What is the irc channel name for the zoocasa grab? [05:51] #zoohouse [05:59] thanks [06:18] *** WubTheCap has quit IRC (Quit: Leaving) [06:35] *** mistym has quit IRC (Remote host closed the connection) [07:35] *** mistym has joined #archiveteam [07:41] *** mistym has quit IRC (Ping timeout: 252 seconds) [07:46] *** jmc_ has joined #archiveteam [07:47] *** khaoohs_ has joined #archiveteam [07:48] *** primus104 has joined #archiveteam [07:49] *** wp494_ has joined #archiveteam [07:49] *** Start has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** kisspunch has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** wp494 has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** xtr-201 has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** jmc has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** kniffy has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** Riviera has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** goekesmi has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** useretail has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** SadDM has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** Jonimus has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** khaoohs has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** sb057 has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** DFJustin has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** DFJustinZ has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** rduser has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** yuvadm_ has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** mr-b has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** w0rp has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** wacky has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** Sanqui has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** warthurto has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** chfoo- has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** dx- has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** thefinn93 has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** jk[SVP] has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** offby1 has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:49] *** matthusby has quit IRC (ircd.shaw.ca irc.shaw.ca) [07:51] *** Start has joined #archiveteam [07:51] *** kisspunch has joined #archiveteam [07:51] *** xtr-201 has joined #archiveteam [07:51] *** kniffy has joined #archiveteam [07:51] *** Riviera has joined #archiveteam [07:51] *** goekesmi has joined #archiveteam [07:51] *** useretail has joined #archiveteam [07:51] *** SadDM has joined #archiveteam [07:51] *** Jonimus has joined #archiveteam [07:51] *** khaoohs has joined #archiveteam [07:51] *** sb057 has joined #archiveteam [07:51] *** rduser has joined #archiveteam [07:51] *** yuvadm_ has joined #archiveteam [07:51] *** wacky has joined #archiveteam [07:51] *** Sanqui has joined #archiveteam [07:51] *** warthurto has joined #archiveteam [07:51] *** dx- has joined #archiveteam [07:51] *** thefinn93 has joined #archiveteam [07:51] *** offby1 has joined #archiveteam [07:51] *** matthusby has joined #archiveteam [07:51] *** irc.shaw.ca sets mode: +o SadDM [07:51] *** swebb sets mode: +o SadDM [07:51] *** SadDM_ has joined #archiveteam [07:51] *** swebb sets mode: +o SadDM_ [07:51] *** wacky has quit IRC (Read error: Connection reset by peer) [07:51] *** yuvadm_ has quit IRC (Read error: Connection reset by peer) [07:51] *** wacky_ has joined #archiveteam [07:51] *** kisspunch has quit IRC (Ping timeout: 370 seconds) [07:51] *** SadDM has quit IRC (Ping timeout: 370 seconds) [07:51] *** goekesmi has quit IRC (Remote host closed the connection) [07:51] *** goekesmi has joined #archiveteam [07:52] *** xtr-201 has quit IRC (Ping timeout: 370 seconds) [07:52] *** khaoohs has quit IRC (Ping timeout: 370 seconds) [07:52] *** jk[[SVP]] has joined #archiveteam [07:53] *** mr-b has joined #archiveteam [07:53] *** DFJustinZ has joined #archiveteam [07:54] *** kisspunch has joined #archiveteam [07:54] *** dashcloud has quit IRC (Read error: Operation timed out) [07:54] *** chfoo- has joined #archiveteam [07:57] *** dashcloud has joined #archiveteam [07:58] *** w0rp_ has joined #archiveteam [08:03] *** yuvadm has joined #archiveteam [08:05] *** w0rp_ is now known as w0rp [08:05] *** jk[[SVP]] is now known as jk[SVP] [08:23] *** primus104 has quit IRC (Leaving.) [08:23] *** landshark has quit IRC (Read error: Operation timed out) [08:25] *** jmc_ has quit IRC (Ping timeout: 362 seconds) [08:37] *** trill_ has quit IRC (Quit: Page closed) [09:00] I'm going to look into downloading all tiles from Yahoo maps [09:08] Ok, got it. We'll download all tiles from yahoo maps [09:08] Will see about road data and streetview too in a bit [09:24] *** antomati_ is now known as antomatic [09:27] Morning all. I am going to get on Zoohouse once the git downloads [09:50] HCross, i wouldnt bother i set it up earlier & its rate limited so hard it would be faster to archive it with a pencil & paper [09:51] I can see that now [09:51] They told me they got it covered & to work on urlteam or something else instead [09:52] I might run both at once [09:53] Its utter nonsense that SF reckon they was getting slammed too hard with the rsync.....i do not believe that 20 odd users with $5 VPS boxes was enough to bring them to a crawl [09:53] tbh, I was using a dedi [09:53] yeah but that really wouldnt make any difference [09:54] yeah, as it had the performance of a vps [09:54] SketchCow: we'll be getting all tiles from yahoo maps [09:54] We'll start at the highest level and go down [09:54] is that what zoohouse is [09:54] But if they were bought to a crawl so easily they got bigger issues to worry about with their backend & network infrastructure [09:54] We might not get the lowest level, that would be tens of billions of tiles [09:56] HCross, zoohouse is an estate agents (i think) [09:56] yeah, what is the yahoo maps one [09:57] arkiver, I am assuming the Yahoo Tiles grab will require some boxes with decent storage capacity [09:57] maybe, I'm not sure [09:58] arkiver, Well depending what happens with the sourceforge debacle the Yahoo grab is def another project i would be happy to throw some resources at [09:58] ok [09:58] ditto signius [09:59] I do take issue with how Yahoo just kill off projects with little or no notice [09:59] I also have the same gripe with Google for doing the same * def want to be involved with the Google Code project when it starts [10:12] seems that zoocasa doesnt like us, all my stuff is 503'ing from their end [10:32] seems to be going again [10:40] *** mistym has joined #archiveteam [10:54] *** mistym has quit IRC (Read error: Operation timed out) [10:54] *** bzc6p__ has joined #archiveteam [10:54] *** swebb sets mode: +o bzc6p__ [11:01] *** bzc6p_ has quit IRC (Ping timeout: 600 seconds) [11:22] scripts for the yahoo maps grab are created [11:23] testing and then we'll start [11:23] well, not yet [11:27] whats the IRCD [11:27] IRC I mean [11:28] SketchCow: what do you think of grabbing tiles from yahoo maps? We'll start with the highest level. The scripts are ready, so we can start immediatly [12:03] Every later of yahoomaps has (2^layernumber)^2 tiles [12:04] lowest layernumber = 0, highest layernumber = 20 [12:06] and for tile we're going to download: [12:08] http://localhost:8090/replay/20150619112120/http://1.base.maps.api.here.com/maptile/2.1/maptile/187ddf591c/normal.day/16/18667/25000/256/png8?lg=ENG&token=TrLJuXVK62IQk0vuXFzaig%3D%3D&requestid=yahoo.prod&app_id=eAdkWGYRoc4RfxVo0Z4B [12:08] http://localhost:8090/replay/20150619112120/http://1.aerial.maps.api.here.com/maptile/2.1/maptile/187ddf591c/satellite.day/16/18667/25000/256/jpg?lg=ENG&token=TrLJuXVK62IQk0vuXFzaig%3D%3D&requestid=yahoo.prod&app_id=eAdkWGYRoc4RfxVo0Z4B [12:08] http://localhost:8090/replay/20150619112120/http://1.aerial.maps.api.here.com/maptile/2.1/maptile/187ddf591c/hybrid.day/16/18667/25000/256/jpg?lg=ENG&token=TrLJuXVK62IQk0vuXFzaig%3D%3D&requestid=yahoo.prod&app_id=eAdkWGYRoc4RfxVo0Z4B [12:08] oops [12:08] http://1.base.maps.api.here.com/maptile/2.1/maptile/187ddf591c/normal.day/16/18667/25000/256/png8?lg=ENG&token=TrLJuXVK62IQk0vuXFzaig%3D%3D&requestid=yahoo.prod&app_id=eAdkWGYRoc4RfxVo0Z4B [12:08] http://1.aerial.maps.api.here.com/maptile/2.1/maptile/187ddf591c/satellite.day/16/18667/25000/256/jpg?lg=ENG&token=TrLJuXVK62IQk0vuXFzaig%3D%3D&requestid=yahoo.prod&app_id=eAdkWGYRoc4RfxVo0Z4B [12:08] http://1.aerial.maps.api.here.com/maptile/2.1/maptile/187ddf591c/hybrid.day/16/18667/25000/256/jpg?lg=ENG&token=TrLJuXVK62IQk0vuXFzaig%3D%3D&requestid=yahoo.prod&app_id=eAdkWGYRoc4RfxVo0Z4B [12:09] there's also live traffic, but I guess we don't need live traffic saved. It won't be really live anymore by the time yahoo maps is gone [12:21] *** Lowfry has joined #archiveteam [12:21] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [12:22] Lowfry: yahoosucks [12:22] true lol [12:32] *** Lowfry_ has joined #archiveteam [12:34] *** L0WFRY has joined #archiveteam [12:34] *** L0WFRY has quit IRC (Client Quit) [12:38] *** sankin has joined #archiveteam [12:40] *** Lowfry has quit IRC (Ping timeout: 512 seconds) [12:41] *** mistym has joined #archiveteam [12:42] *** Lowfry_ has quit IRC (Ping timeout: 512 seconds) [12:44] *** Fusl has quit IRC (Read error: Operation timed out) [12:45] *** mistym has quit IRC (Read error: Operation timed out) [12:50] *** Fusl has joined #archiveteam [13:06] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [13:18] arkiver, is there a channel for the yahoo map grab [13:29] *** bisko has joined #archiveteam [13:30] *** primus104 has joined #archiveteam [13:40] *** koo5 has joined #archiveteam [14:03] *** Start has quit IRC (Disconnected.) [14:11] *** lexicon has quit IRC (Read error: Operation timed out) [14:11] *** xk_id has joined #archiveteam [14:13] *** lexicon has joined #archiveteam [14:16] *** koo5 has quit IRC (Read error: Operation timed out) [14:24] signius: no [14:27] ok [14:31] *** mistym has joined #archiveteam [14:33] *** DFJustin has joined #archiveteam [14:33] *** swebb sets mode: +o DFJustin [14:35] *** sankin has quit IRC (Leaving.) [14:42] *** wacky_ has quit IRC (Ping timeout: 265 seconds) [14:47] *** sankin has joined #archiveteam [14:55] *** InAUGral has joined #archiveteam [14:57] *** bisko has quit IRC (Read error: Operation timed out) [15:08] *** JesseW has joined #archiveteam [15:19] *** InAUGral has quit IRC (Ping timeout: 265 seconds) [15:21] Do you need any server help for the yahoo grab [15:24] *** mistym has quit IRC (Remote host closed the connection) [15:27] *** Ungstein has quit IRC (Quit: Leaving.) [15:28] *** Ungstein has joined #archiveteam [15:28] HCross: currently not, I do if we are going to run this project [15:28] ok, let me know [15:28] I need a "Go" from SketchCow for the yahoomaps project, because alle this is going to be hosted on archive.org [15:28] *** Boltsie has quit IRC (Read error: Connection reset by peer) [15:29] got a box that has direct peering with yahoo I thinj [15:29] think [15:29] great. [15:29] *** xtr-201 has joined #archiveteam [15:29] SketchCow: second project is Xfire. [15:30] Xfire is a gaiming website with currently more then 35 million users. http://crash.xfire.com/ [15:30] isnt that time warner cable or am I confusing it with something [15:30] It hosts screenshots, videos (million of videos) and game information. [15:31] Xfire should have shutdown 12 june, but is still online. This means it could shut down any moment now [15:32] https://twitter.com/buckleyw/status/609513240624664576 [15:33] SketchCow: if we are going to download all videos and screenshots from the website we are getting many TB's, probably more then 30T [15:33] What do you think? [15:33] Videos grab scripts is ready [15:33] *** Ungstein has quit IRC (Quit: Leaving.) [15:33] Getting the rest of the website ready too [15:33] *** Ungstein has joined #archiveteam [15:34] just give me a shout when you are ready [15:34] ok [15:36] *** JesseW has quit IRC (Quit: Leaving.) [15:39] *** bzc6p__ has quit IRC (Read error: Operation timed out) [15:39] that sounds like a long project, we will need a ton of warriors for the xfire project [15:40] *** Ungstein has quit IRC (Quit: Leaving.) [15:43] *** primus104 has quit IRC (Leaving.) [15:46] we usually have more warriors than a site can actually handle [15:46] *** Ungstein has joined #archiveteam [15:56] *** Start has joined #archiveteam [16:07] *** bzc6p__ has joined #archiveteam [16:07] *** swebb sets mode: +o bzc6p__ [16:10] *** mistym has joined #archiveteam [16:11] *** bzc6p__ is now known as bzc6p [16:17] *** mistym has quit IRC (Remote host closed the connection) [16:18] *** mistym has joined #archiveteam [16:21] arkiver: we should do a warrior project for support.toshiba.com [16:22] they've recently been purging old support downloads [16:22] i created a page for it: http://archiveteam.org/index.php?title=Toshiba_Support [16:23] *** bzc6p has quit IRC (Read error: Operation timed out) [16:32] *** koo5 has joined #archiveteam [16:32] *** xk_id has quit IRC (Remote host closed the connection) [16:32] Start: the vogons thread has been updated to say two things. 1. the info is still obtainable via some of the country TLDs (ex. toshiba.co.uk) 2. some people may or may not have archived it all in years past (ie. not helpful info LOL) [16:33] 1. *some of the info [16:35] *** bzc6p has joined #archiveteam [16:35] *** swebb sets mode: +o bzc6p [16:45] *** Start has quit IRC (Disconnected.) [16:55] arkiver: why are you downloading yahoo maps [16:55] they just use openstreetmap [16:55] or commercial sources, i forget [16:56] *** aaaaaaaaa has joined #archiveteam [16:56] *** swebb sets mode: +o aaaaaaaaa [17:04] *** signius has quit IRC (Quit: Leaving) [17:06] *** signius has joined #archiveteam [17:06] *** signius has quit IRC (Client Quit) [17:06] *** signius has joined #archiveteam [17:07] *** bzc6p_ has joined #archiveteam [17:07] *** swebb sets mode: +o bzc6p_ [17:07] *** bzc6p has quit IRC (Read error: Connection reset by peer) [17:08] *** bzc6p_ is now known as bzc6p [17:15] *** dashcloud has quit IRC (Read error: Connection reset by peer) [17:16] *** dashcloud has joined #archiveteam [17:19] I believe Yahoo uses HERE, which provides maps for yahoo, bing, Amazon and a couple others [17:19] *** bzc6p has quit IRC (Ping timeout: 601 seconds) [17:20] https://developer.here.com/ [17:23] *** xk_id has joined #archiveteam [17:25] *** bzc6p has joined #archiveteam [17:25] *** swebb sets mode: +o bzc6p [17:26] there are tools for tile grabbing btw [17:28] yeah [17:29] yeah, HERE is fine [17:29] it's like one of a few profitable parts of Nokia [17:29] i don't think it's worth the time, effort, and disk space. [17:29] the map data is also utterly unusable anyway [17:29] you might as well just improve OSM [17:30] map tile archiving would be awesome though since they change styles etc [17:30] that said if Yahoo stores custom layers then those are probably worth going after [17:30] but it is HUGE data [17:31] http://wiki.openstreetmap.org/wiki/Tile_disk_usage for osm [17:32] yeah, fetching map data via HTTP would be prerendering it [17:32] 54,000 GB into IA is an ass move [17:32] I know HERE isn't OSM but even so [17:32] o.O [17:34] i hate to be the cranky suspenders-wearing old man, but that's not a good idea [17:34] custom layer data e.g. KMLs or whatnot are typically much smaller *and* are a lot more interesting [17:34] not to a cartographer like me ;) [17:34] so use the USGS data [17:35] or etc. [17:35] i wish we had google maps tiles from every year [17:35] just to see how the style evolved [17:35] username1: you'd rather have tiles than source data? [17:35] hm [17:35] well, you can sample it i guess [17:35] and the usability but that is even harder to capture [17:35] different aspects [17:35] i'm a geographer, not a cartographer :P [17:35] :) [17:41] *** username1 is now known as schbirid [17:49] *** schbirid2 has joined #archiveteam [17:49] Cyphertite is closing; https://gist.github.com/joepie91/1c659fe7704e98520e17 [17:49] *** schbirid2 has quit IRC (Read error: Connection reset by peer) [18:12] *** cjp_ has joined #archiveteam [18:17] http://www.waybackhn.com/ [18:35] HELLO HELLO [18:35] Zoocasa has asked us to pull back a little [18:35] Can we half the thing [18:38] *** oldcad has joined #archiveteam [18:39] ok [18:39] SketchCow: it's at 150 per minute now [18:39] however, we will not make it at that speed [18:39] *** J08nY has joined #archiveteam [18:41] *** mistym has quit IRC (Remote host closed the connection) [18:41] SketchCow: have you read what I wrote about Yahoo Maps and Xfire? [18:43] *** landshark has joined #archiveteam [18:43] Xfire should have closed june 12, 35 million users [18:44] arkiver: not "only" 24? [18:45] bzc6p: 35,783,196 [18:46] Arkiver - make it a quarter for now. 40 per minute. [18:46] SketchCow: ok [18:46] I realize that we wouldn't make it. I'm trying to work with this guy. [18:46] they banned our useragent, I'll have to requeue some thinigs [18:46] Ok, thank you [18:46] He did it because users couldn't get in. [18:46] We DPOS'd [18:46] Let me know, I'll mail him, he'll turn it back up [18:47] ok [18:47] Meanwhile I'll get the xfire scripts more ready, they now only support videos. [18:48] We need to make a decisions on that. The website is half-dead and grabbing all videos will be 10s of TBs [18:49] No to yahoo maps [18:49] ok [18:51] Yes to xfire [18:51] ok [18:51] Sourceforge, we're doing pre-emptive downloading, because we think they're cocks. So we can work it out. [18:51] But I suspect we're going to have issues with Zoocasa. [18:52] Apparently entire computing staff is fired to one guy [18:52] G'day Archiveteam. [18:52] Appreciate that you guys are archiving Zoocasa.com, but could you please throttle things back a little bit? We're getting a lot of traffic from you and archivebot simultaneously as well as the general public, but your crawlers are being too aggressive and are basically DDOSing U.S. and everyone was getting 503s while our app servers became saturated. I've had to add you to a block list at least temp [18:52] orarily until you can ease up a bit. Archivebot is crawling at a rate of 1.2 req/s with 4 connections and a 500-1000 ms delay if that helps, but your crawlers were just ton aggressive for our servers this week. Because we're in the middle of a shutdown there's not a lot I can do to add more resources, so if you guys could tone down your crawling a bit, I can remove the blocking. We're running at a b [18:52] it of a disadvantage here because of the shutdown so throwing more servers at the app cluster probably won't be happening. My hands are a little tied. :/ [18:52] Cheers and thanks. [18:52] - Jay [18:53] I see [18:54] I guess there's not a lot we can do there [18:55] Maybe they could keep it up a few days after the 22nd? [18:56] *** landshark has quit IRC (Read error: Operation timed out) [18:58] *** Start has joined #archiveteam [19:00] *** Start_ has joined #archiveteam [19:00] *** Start has quit IRC (Read error: Connection reset by peer) [19:09] *** koo5 has quit IRC (Read error: Operation timed out) [19:15] SketchCow: our useragent is unbanned from zoocasa. We're running at 40/min now [19:21] arkiver: Xfire is going to be a bitch to mirror, any help you need, let me know [19:21] JRWR: ok [19:23] *** Stiletto has quit IRC (Read error: Operation timed out) [19:25] *** Start_ has quit IRC (Disconnected.) [19:29] *** K4k has joined #archiveteam [19:29] zoocasa has lifted the ban [19:29] He says he can whip up a rule to increase bandwidth as it goes [19:29] he says 23rd is last day [19:29] I am asking about us getting some special post-shutdown. But assume it goes down [19:29] *** Start has joined #archiveteam [19:30] *** iamcold has joined #archiveteam [19:31] Your doing good work SketchCow [19:31] what's with the multiple files here? https://archive.org/details/archiveteam_pomf [19:31] *** Stiletto has joined #archiveteam [19:32] *** primus104 has joined #archiveteam [19:34] *** jmc has joined #archiveteam [19:37] *** mistym has joined #archiveteam [19:41] ruukasu: The ~3 TB of Pomf archive is stored in 22.5 GB chunks, in separate items. [19:42] They are already available in the Wayback Machine, though. [19:44] *** dashcloud has quit IRC (Read error: Operation timed out) [19:47] *** dashcloud has joined #archiveteam [19:52] *** Froggypwn has quit IRC (Quit: ~ Trillian Astra - www.trillian.im ~) [20:10] *** iamcold has quit IRC (Quit: Page closed) [20:13] *** aaaaaaaaa has quit IRC (Ping timeout: 600 seconds) [20:16] *** aaaaaaaaa has joined #archiveteam [20:16] *** swebb sets mode: +o aaaaaaaaa [20:21] *** Start has quit IRC (Disconnected.) [20:23] *** wp494_ has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) [20:23] *** wp494 has joined #archiveteam [20:32] *** SimpBrai1 has joined #archiveteam [20:35] *** SimpBrain has quit IRC (Ping timeout: 258 seconds) [20:46] bzc6p: I tried going to a pomf link in wayback and it said it wasn't there, do I have to change the url at all? [20:47] ruukasu: you don't need to change the URL for wayback [20:49] Well, maybe the wayback importing hasn't finished yet... [20:49] yeah I've tried like 3 files from different time periods and they're all getting "page not archived" [20:51] *** dashcloud has quit IRC (Read error: Operation timed out) [20:54] *** sankin has quit IRC (Leaving.) [21:00] *** dashcloud has joined #archiveteam [21:03] It can take a few days/weeks for the warcs to be indexed by wayback. [21:07] *** primus105 has joined #archiveteam [21:11] *** primus104 has quit IRC (Read error: Operation timed out) [21:17] *** K4k has quit IRC (Ping timeout: 370 seconds) [21:23] *** scyther has joined #archiveteam [21:30] *** scyther has quit IRC (Read error: Connection reset by peer) [21:52] *** lbft has quit IRC (Read error: Operation timed out) [21:54] *** dashcloud has quit IRC (Read error: Operation timed out) [21:58] *** lbft has joined #archiveteam [22:00] SketchCow: zoocasa is currently at 40/min. can we do 300/min? or whatever it takes to get everything before the end of the 23rd? [22:01] *** dashcloud has joined #archiveteam [22:04] no [22:04] scroll back about 4 hours [22:19] ok [22:19] #xfired for xfire! [22:24] *** J08nY has quit IRC (Quit: Page closed) [22:45] *** koo5 has joined #archiveteam [22:46] *** TheLovina has quit IRC (Read error: Connection reset by peer) [22:46] *** TheLovina has joined #archiveteam [22:48] We have started the Xfire grab! [22:48] #xfired [22:49] \o/ [22:49] save all you can! [22:54] *** DopefishJ has joined #archiveteam [22:54] *** swebb sets mode: +o DopefishJ [22:55] *** wp494 has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** xtr-201 has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** DFJustin has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** chfoo- has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** kisspunch has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** DFJustinZ has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** mr-b has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** SadDM_ has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** kniffy has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** Riviera has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** useretail has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** Jonimus has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** sb057 has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** rduser has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** Sanqui has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** warthurto has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** dx- has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** thefinn93 has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** offby1 has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:55] *** matthusby has quit IRC (ircd.shaw.ca irc.shaw.ca) [22:59] *** landshark has joined #archiveteam [23:01] *** BlueMaxim has joined #archiveteam [23:01] *** chfoo-_ has joined #archiveteam [23:02] SketchCow: Xfire is running! [23:03] *** SN4T14_ has joined #archiveteam [23:04] *** sb058 has joined #archiveteam [23:07] *** wp494_ has joined #archiveteam [23:07] *** SN4T14 has quit IRC (Ping timeout: 306 seconds) [23:09] *** kisspunc- has joined #archiveteam [23:10] *** kisspunc- is now known as kisspunch [23:10] *** TheLovina has quit IRC (Read error: Connection reset by peer) [23:12] *** TheLovina has joined #archiveteam [23:13] *** dx has joined #archiveteam [23:13] *** warthurto has joined #archiveteam [23:14] *** wp494_ is now known as wp494 [23:15] *** xk_id has quit IRC (Remote host closed the connection) [23:21] I cant seem to get wget-lua to compile on ubuntu 15.04 [23:22] pastie an error log, JRWR [23:23] POD document had syntax errors at /usr/bin/pod2man line 71 [23:23] recompiling now, ill pull a full log soon [23:23] *** Emcy_ has quit IRC (Read error: Connection reset by peer) [23:23] JRWR: https://github.com/ArchiveTeam/xfire-grab#wget-lua-was-not-successfully-built [23:23] *** Emcy_ has joined #archiveteam [23:24] Ramp up zoocasa [23:24] Double it [23:24] Seewhat happens [23:24] He says they're absolutely deleting on 23rd [23:25] If I show up with my station wagon full from LTO-4 Tapes, ask him if I can get a backup [23:25] *** Peetz0r has joined #archiveteam [23:25] *** Peetz0r_ has quit IRC (Read error: Connection reset by peer) [23:26] *** primus104 has joined #archiveteam [23:27] can you do something about the rate limiting? i've been hitting the rate limit for hours now [23:27] i tseems it's the same users who get jobs [23:29] *** primus105 has quit IRC (Read error: Operation timed out) [23:31] *** mr-b has joined #archiveteam [23:31] *** kniffy has joined #archiveteam [23:31] *** Riviera has joined #archiveteam [23:31] *** useretail has joined #archiveteam [23:31] *** Jonimus has joined #archiveteam [23:31] *** rduser has joined #archiveteam [23:31] *** Sanqui has joined #archiveteam [23:31] *** thefinn93 has joined #archiveteam [23:31] *** matthusby has joined #archiveteam [23:33] Regardomg zoocasa - turn it up to maximum on Monday afternoon [23:34] *** _0x2A has joined #archiveteam [23:34] *** koo5 has quit IRC (Read error: Operation timed out) [23:35] *** mr-b has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:35] *** kniffy has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:35] *** Riviera has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:35] *** useretail has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:35] *** Jonimus has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:35] *** rduser has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:35] *** Sanqui has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:35] *** thefinn93 has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:35] *** matthusby has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:35] I got one job just now.... I'll leave it overnight [23:35] SketchCow: zoocasa is at 100/min now] [23:37] *** dashcloud has quit IRC (Read error: Operation timed out) [23:38] *** mr-b has joined #archiveteam [23:38] *** kniffy has joined #archiveteam [23:38] *** Riviera has joined #archiveteam [23:38] *** useretail has joined #archiveteam [23:38] *** Jonimus has joined #archiveteam [23:38] *** rduser has joined #archiveteam [23:38] *** Sanqui has joined #archiveteam [23:38] *** thefinn93 has joined #archiveteam [23:38] *** matthusby has joined #archiveteam [23:38] Thank youuuuu [23:38] What IS Zoocasa [23:40] It's a real estate brokerage website thing. [23:41] http://www.torontorealtyblog.com/archives/9091 says: Zoocasa is a website, backed by Rogers, and headed up by Lawrence Dale (previously associated with Realty Sellers and T.O. Solds), whose goal, according to their website, is to “Help people make smarter home buying and selling decisions.” [23:43] *** dashcloud has joined #archiveteam [23:45] *** mr-b has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:45] *** kniffy has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:45] *** Riviera has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:45] *** useretail has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:45] *** Jonimus has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:45] *** rduser has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:45] *** Sanqui has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:45] *** thefinn93 has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:45] *** matthusby has quit IRC (ircd.shaw.ca irc.shaw.ca) [23:51] *** rduser` has joined #archiveteam [23:55] https://github.com/conformal [23:55] these all need archiving [23:55] as do these: https://github.com/btcsuite [23:56] but I have to do a bunch of stuff, so would be good if somebody else could do that [23:56] (cc godane)