#archiveteam 2014-10-31,Fri

↑back Search

Time Nickname Message
00:09 🔗 ionpulse got home from work and I tried my same execution that I ran before on Yahoo Dir, and its working again
00:09 🔗 ionpulse so like geocities, there is a timeout on the blocking
00:10 🔗 ionpulse I guess I can try and increase the wait time between requests, but I think something a bit more might be needed.
00:16 🔗 balrog ionpulse: this is yahoo?
00:16 🔗 balrog you probably need a wait of 1 second
00:16 🔗 balrog maybe 2
00:16 🔗 balrog hmm
00:16 🔗 balrog lemme check
00:17 🔗 balrog best results was with random 0.5 to 3 second
00:17 🔗 balrog for yahoo groups at least, a year ago
00:17 🔗 SketchCow Hey there.
00:18 🔗 balrog hey SketchCow
00:18 🔗 SketchCow I have been in an Agile meeting all day
00:18 🔗 SketchCow Our committee had to come up with questions
00:19 🔗 SketchCow Mine I got through was WHAT DO WE DO WHEN THE ARCHIVE DIES
00:19 🔗 SketchCow Ol' Angel of Death
00:19 🔗 SketchCow Do people know I got my statue
00:19 🔗 DFJustin ionpulse: can you pretend to be googlebot
00:19 🔗 balrog yeah, seen it on twitter
00:20 🔗 SketchCow https://archive.org/details/BuildingLibrariesTogether20141028
00:20 🔗 DFJustin I'll bite, what do you do when the archive dies
00:22 🔗 SketchCow We just needed to provide the questions
00:22 🔗 SketchCow That was today
00:33 🔗 ionpulse ok cool, thanks balrog
00:33 🔗 ionpulse yes this is yahoo, yahoo directory to be specific
00:34 🔗 ionpulse DFJustin: Not far up I posted the wget command I am using, and I am posing as googlebot
01:43 🔗 DFJustin cool
01:48 🔗 SketchCow Anyway, sorry for not being around. Today and tomorrow are crazy. Back on 24 hour Jason on Monday
01:48 🔗 xmc all jason, all the time
05:43 🔗 SketchCow Unstoppable juggernaut
07:37 🔗 SketchCow So, I respond to someone on hackernews.
07:38 🔗 SketchCow His response: "If you're this flip about [subject] I regret defending you and assisting in the twitpic download"
07:38 🔗 SketchCow Just for the record: Sit out the next archiveteam project. Thanks.
07:39 🔗 yipdw SketchCow: watch out, he'll be like Justin Hammer
07:43 🔗 SketchCow I sit in a hackernews thread trying to answer everything as completely and calmly as possible while devs get real work done, and someone goes "huh, your answer isn't quite right to me, allow me to shit right in your hand"
08:00 🔗 SketchCow Meanwhile.... adding 800 working arcade game keyboards.
10:19 🔗 ersi SketchCow: Agile meeting running all day doesn't sound very agile :)
11:08 🔗 arkiver SketchCow: Do we already have some more inforation about the storage for panoramio?
11:08 🔗 arkiver If we want to have the full panoramio, a lot of data, we will need quite some time to download that
11:08 🔗 arkiver and the panoramio websites won't stay up forever, so the sooner we can start the better
11:09 🔗 arkiver the scripts are ready to start
15:00 🔗 SketchCow How big do we think this will be
15:02 🔗 db48x 42
15:04 🔗 midas db48x: thats -bs stuff
15:06 🔗 espes__ man, the wayback really needs youtube
15:06 🔗 espes__ shit like this: https://www.youtube.com/watch?v=v2wv-oVC9sE
15:06 🔗 midas espes__: archive.org grabs youtube btw
15:07 🔗 db48x espes__: lol
15:07 🔗 db48x midas: I'm not sure what you mean
15:08 🔗 db48x espes__: I have a strange urge to record that video on my phone...
15:09 🔗 espes__ midas: really? chucking an url into /save doesn't seem to work
15:10 🔗 midas if it has more than X views it grabs it if im not mistaken
15:11 🔗 espes__ so it wouldn't have worked for that journalist :/
15:12 🔗 midas with 39 views, no
15:12 🔗 midas but you can grab it with youtube-dl
15:12 🔗 midas or a phone and scroll the comments
15:12 🔗 midas :p
15:12 🔗 midas grab all the metadata
15:13 🔗 espes__ (the orignal had <1000, but sure)
15:15 🔗 espes__ "if it has more than X views"
15:15 🔗 espes__ https://web.archive.org/web/20141031054021/http://www.youtube.com/watch?v=9bZkp7q19f0
15:15 🔗 espes__ "Sorry, the Wayback Machine does not have this video archived."
15:15 🔗 espes__ :P
15:23 🔗 DFJustin supposedly their criterion is every video mentioned in a tweet, but I haven't observed this to be th case in practice
15:24 🔗 DFJustin and yeah as far as I know using /save doesn't put it into their system properly
15:43 🔗 arkiver SketchCow: I think my estimate was around 300 TB
15:44 🔗 arkiver BUT it can also be 400 TB
15:44 🔗 arkiver Just somewhere around that
15:47 🔗 arkiver yeah, just recalculated, 300 TB should probably be enough for panoramio. Estimate is based on 100+ files
15:53 🔗 n00b973 social.bioware.com - the social site for bioware game users that tracks your game data, hosts user blogs/forums/game mods and modding tools - looks like it's being phased out in favor of their new forums and Dragon Age Keep. The Wayback Machine can't seem to get past the language choice splash page, so nothing has been archived. Can it be saved?
15:54 🔗 arkiver everything on the internet can be saved
15:54 🔗 arkiver let's take a look
15:55 🔗 arkiver so
15:56 🔗 arkiver does only http://social.bioware.com/ need to be archived? or also http://social.bioware.com/n7hq/agegate/ and blog and other bioware sites?
15:57 🔗 arkiver someone can download the site using a cookies.txt with wget
15:59 🔗 n00b973 As far as I know, only the old site is being phased out. I forgot they redirected everything to a new splash, as I usually go straight to the legacy site here: http://social.bioware.com/browse_bw_projects.php
16:01 🔗 n00b973 http://social.bioware.com/n7hq/ looks like it can be skipped.
16:18 🔗 SketchCow 300tb on top of the tb we're doing....
16:21 🔗 SketchCow Twitpic is huge, Panorimo is way too huge
16:22 🔗 DFJustin what, that's only all of archive.org's remaining space, it's not like you guys have anything else to do right https://home.archive.org/~tracey/mrtg/df.html
16:27 🔗 n00b973 I believe there are some things that can only be seen if logged into the legacy social network, user profiles and projects that were set to BSN members only. Like this dude: http://social.bioware.com/112329/
16:29 🔗 db48x SketchCow: I'm going in to SF today; are you guys doing lunch at IA?
16:29 🔗 arkiver SketchCow: right now we're doing around 80 TB for twitpic (I think)
16:30 🔗 arkiver then a few tens of TB fo halo (probably) (but it's read-only, so we ca do it slowly)
16:30 🔗 arkiver for*
16:30 🔗 arkiver let's see. maybe 100 GB for genforum for ancestry
16:30 🔗 arkiver and that's it probably
16:31 🔗 arkiver SketchCow: I believe you said something to us that if we are able to get funding of a certain amount of money that we are then able to get offline storage at IA, which will be made public over time
16:32 🔗 arkiver not sure though if that was for twitpic or panoramio
16:34 🔗 SketchCow Twitpic
16:34 🔗 SketchCow Twitpic was 120tb and Brewster needed $20k
16:37 🔗 arkiver SketchCow: http://paste.archivingyoursh.it/raw/wugebojono as far as I know that is about panoramio
16:37 🔗 DFJustin have you talked to any of those twitter guys who were offering to pay noah
16:37 🔗 arkiver right before you said that we were talking about the size of panoramio
16:44 🔗 db48x SketchCow: how are we doing on that fund-raising, btw?
16:58 🔗 arkiver -- New project: #yolohalo
16:58 🔗 arkiver ---------------------------------
16:58 🔗 arkiver -- Scripts ready, tracker ready
16:58 🔗 arkiver -- Starting today or tomorrow
16:58 🔗 arkiver ---------------------------------
17:11 🔗 db48x arkiver: what's it waiting on?
17:14 🔗 arkiver db48x: I ve
17:14 🔗 arkiver I'm very busy with something else right now
17:15 🔗 arkiver I want to start it when I have some free time, tomorrow morning that is, so I can watch it for some in case something goes wrong
17:15 🔗 * db48x nods
18:28 🔗 SketchCow I don't know where we are on fund raising.
23:23 🔗 lemonkey ahoy
23:25 🔗 Smiley how about the metadata?
23:25 🔗 Smiley I've slacked off last 2 days, but 3 night shifts from tomorrow night
23:35 🔗 SketchCow More metadata the better. I will help too
23:41 🔗 SketchCow -----------------------------------------------
23:41 🔗 SketchCow Jason will be back to fulltime archive team on monday
23:41 🔗 SketchCow -----------------------------------------------
23:42 🔗 SketchCow On top of everything else, I'm kind of sick
23:42 🔗 Smiley well I got to g or h so far, and someone else was working from bottom up
23:42 🔗 SketchCow Excellent.
23:42 🔗 Smiley I'd guess 90% of items so far have at least _some_ kind of description.
23:42 🔗 Smiley I've only had trouble with a few really generic names
23:43 🔗 schbirid get well!
23:50 🔗 joepie91 SketchCow: you? sick? that's impossible
23:50 🔗 joepie91 get well, though :P
23:50 🔗 joepie91 get well soon*
23:51 🔗 joepie91 wishing somebody "get well" without the "soon" just sounds to me like you're expecting them to have a deadly incurable disease, heh
23:51 🔗 joepie91 but that may be my not-native-English brain speaking :)
23:55 🔗 Smiley :D
23:55 🔗 Smiley if i heard someone say "get well!"
23:55 🔗 Smiley i'd raise an eyebrow, it'd sound ... odd....
23:56 🔗 Ravenloft get well now!
23:56 🔗 garyrh getwellapp.com
23:59 🔗 lemonkey thoughts about the government destroying h1b records after 5 years? shady
23:59 🔗 lemonkey altho you dont have to keep irs docs around after 10 yrs I think...

irclogger-viewer