[00:09] got home from work and I tried my same execution that I ran before on Yahoo Dir, and its working again [00:09] so like geocities, there is a timeout on the blocking [00:10] I guess I can try and increase the wait time between requests, but I think something a bit more might be needed. [00:16] ionpulse: this is yahoo? [00:16] you probably need a wait of 1 second [00:16] maybe 2 [00:16] hmm [00:16] lemme check [00:17] best results was with random 0.5 to 3 second [00:17] for yahoo groups at least, a year ago [00:17] Hey there. [00:18] hey SketchCow [00:18] I have been in an Agile meeting all day [00:18] Our committee had to come up with questions [00:19] Mine I got through was WHAT DO WE DO WHEN THE ARCHIVE DIES [00:19] Ol' Angel of Death [00:19] Do people know I got my statue [00:19] ionpulse: can you pretend to be googlebot [00:19] yeah, seen it on twitter [00:20] https://archive.org/details/BuildingLibrariesTogether20141028 [00:20] I'll bite, what do you do when the archive dies [00:22] We just needed to provide the questions [00:22] That was today [00:33] ok cool, thanks balrog [00:33] yes this is yahoo, yahoo directory to be specific [00:34] DFJustin: Not far up I posted the wget command I am using, and I am posing as googlebot [01:43] cool [01:48] Anyway, sorry for not being around. Today and tomorrow are crazy. Back on 24 hour Jason on Monday [01:48] all jason, all the time [05:43] Unstoppable juggernaut [07:37] So, I respond to someone on hackernews. [07:38] His response: "If you're this flip about [subject] I regret defending you and assisting in the twitpic download" [07:38] Just for the record: Sit out the next archiveteam project. Thanks. [07:39] SketchCow: watch out, he'll be like Justin Hammer [07:43] I sit in a hackernews thread trying to answer everything as completely and calmly as possible while devs get real work done, and someone goes "huh, your answer isn't quite right to me, allow me to shit right in your hand" [08:00] Meanwhile.... adding 800 working arcade game keyboards. [10:19] SketchCow: Agile meeting running all day doesn't sound very agile :) [11:08] SketchCow: Do we already have some more inforation about the storage for panoramio? [11:08] If we want to have the full panoramio, a lot of data, we will need quite some time to download that [11:08] and the panoramio websites won't stay up forever, so the sooner we can start the better [11:09] the scripts are ready to start [15:00] How big do we think this will be [15:02] 42 [15:04] db48x: thats -bs stuff [15:06] man, the wayback really needs youtube [15:06] shit like this: https://www.youtube.com/watch?v=v2wv-oVC9sE [15:06] espes__: archive.org grabs youtube btw [15:07] espes__: lol [15:07] midas: I'm not sure what you mean [15:08] espes__: I have a strange urge to record that video on my phone... [15:09] midas: really? chucking an url into /save doesn't seem to work [15:10] if it has more than X views it grabs it if im not mistaken [15:11] so it wouldn't have worked for that journalist :/ [15:12] with 39 views, no [15:12] but you can grab it with youtube-dl [15:12] or a phone and scroll the comments [15:12] :p [15:12] grab all the metadata [15:13] (the orignal had <1000, but sure) [15:15] "if it has more than X views" [15:15] https://web.archive.org/web/20141031054021/http://www.youtube.com/watch?v=9bZkp7q19f0 [15:15] "Sorry, the Wayback Machine does not have this video archived." [15:15] :P [15:23] supposedly their criterion is every video mentioned in a tweet, but I haven't observed this to be th case in practice [15:24] and yeah as far as I know using /save doesn't put it into their system properly [15:43] SketchCow: I think my estimate was around 300 TB [15:44] BUT it can also be 400 TB [15:44] Just somewhere around that [15:47] yeah, just recalculated, 300 TB should probably be enough for panoramio. Estimate is based on 100+ files [15:53] social.bioware.com - the social site for bioware game users that tracks your game data, hosts user blogs/forums/game mods and modding tools - looks like it's being phased out in favor of their new forums and Dragon Age Keep. The Wayback Machine can't seem to get past the language choice splash page, so nothing has been archived. Can it be saved? [15:54] everything on the internet can be saved [15:54] let's take a look [15:55] so [15:56] does only http://social.bioware.com/ need to be archived? or also http://social.bioware.com/n7hq/agegate/ and blog and other bioware sites? [15:57] someone can download the site using a cookies.txt with wget [15:59] As far as I know, only the old site is being phased out. I forgot they redirected everything to a new splash, as I usually go straight to the legacy site here: http://social.bioware.com/browse_bw_projects.php [16:01] http://social.bioware.com/n7hq/ looks like it can be skipped. [16:18] 300tb on top of the tb we're doing.... [16:21] Twitpic is huge, Panorimo is way too huge [16:22] what, that's only all of archive.org's remaining space, it's not like you guys have anything else to do right https://home.archive.org/~tracey/mrtg/df.html [16:27] I believe there are some things that can only be seen if logged into the legacy social network, user profiles and projects that were set to BSN members only. Like this dude: http://social.bioware.com/112329/ [16:29] SketchCow: I'm going in to SF today; are you guys doing lunch at IA? [16:29] SketchCow: right now we're doing around 80 TB for twitpic (I think) [16:30] then a few tens of TB fo halo (probably) (but it's read-only, so we ca do it slowly) [16:30] for* [16:30] let's see. maybe 100 GB for genforum for ancestry [16:30] and that's it probably [16:31] SketchCow: I believe you said something to us that if we are able to get funding of a certain amount of money that we are then able to get offline storage at IA, which will be made public over time [16:32] not sure though if that was for twitpic or panoramio [16:34] Twitpic [16:34] Twitpic was 120tb and Brewster needed $20k [16:37] SketchCow: http://paste.archivingyoursh.it/raw/wugebojono as far as I know that is about panoramio [16:37] have you talked to any of those twitter guys who were offering to pay noah [16:37] right before you said that we were talking about the size of panoramio [16:44] SketchCow: how are we doing on that fund-raising, btw? [16:58] -- New project: #yolohalo [16:58] --------------------------------- [16:58] -- Scripts ready, tracker ready [16:58] -- Starting today or tomorrow [16:58] --------------------------------- [17:11] arkiver: what's it waiting on? [17:14] db48x: I ve [17:14] I'm very busy with something else right now [17:15] I want to start it when I have some free time, tomorrow morning that is, so I can watch it for some in case something goes wrong [17:15] * db48x nods [18:28] I don't know where we are on fund raising. [23:23] ahoy [23:25] how about the metadata? [23:25] I've slacked off last 2 days, but 3 night shifts from tomorrow night [23:35] More metadata the better. I will help too [23:41] ----------------------------------------------- [23:41] Jason will be back to fulltime archive team on monday [23:41] ----------------------------------------------- [23:42] On top of everything else, I'm kind of sick [23:42] well I got to g or h so far, and someone else was working from bottom up [23:42] Excellent. [23:42] I'd guess 90% of items so far have at least _some_ kind of description. [23:42] I've only had trouble with a few really generic names [23:43] get well! [23:50] SketchCow: you? sick? that's impossible [23:50] get well, though :P [23:50] get well soon* [23:51] wishing somebody "get well" without the "soon" just sounds to me like you're expecting them to have a deadly incurable disease, heh [23:51] but that may be my not-native-English brain speaking :) [23:55] :D [23:55] if i heard someone say "get well!" [23:55] i'd raise an eyebrow, it'd sound ... odd.... [23:56] get well now! [23:56] getwellapp.com [23:59] thoughts about the government destroying h1b records after 5 years? shady [23:59] altho you dont have to keep irs docs around after 10 yrs I think...