#archiveteam 2013-05-18,Sat

↑back Search

Time Nickname Message
00:16 🔗 wp494 ooooooh shiiiit
00:16 🔗 wp494 http://www.theverge.com/2013/5/17/4342012/yahoo-reportedly-nearing-1-1-billion-deal-to-acquire-tumblr
00:17 🔗 wp494 is it already on fire drill?
00:26 🔗 DFJustin we've already done proof-of-concept grabs in the past https://archive.org/details/archiveteam-tumblr-test-warc
00:26 🔗 DFJustin we straight up don't have enough storage to hold tumblr though
02:21 🔗 cmx so much porn
02:27 🔗 godane i'm finally finding more call for help episodes
02:28 🔗 godane in dialup format of course
05:39 🔗 SketchCow Hey.
05:46 🔗 SketchCow Everyone wants us to back up tumblr.
05:46 🔗 SketchCow I may make some noise for attention, but we have lots of time.
05:46 🔗 SketchCow I wonder how big it is.
06:09 🔗 BlueMax Can we even get a rough estimate?
06:10 🔗 omf_ Of course. We just sample the whole
06:10 🔗 omf_ Pick a few hundred random blogs and download them
09:51 🔗 tryphon Is that a good idea to run multipe VM of the Warrior?
09:51 🔗 tryphon (on the same computer/IP I mean)
09:56 🔗 Smiley you can,
09:56 🔗 Smiley not a problem at all, just needs lots of resources.
09:56 🔗 Smiley tryphon: you on windows or linux?
09:56 🔗 tryphon os x
09:56 🔗 Smiley ah :D
09:56 🔗 tryphon 10.7.4
09:57 🔗 Smiley then yeah multiple warriors is your easiest way if you want to do that.
09:57 🔗 tryphon But I also have a synology nas (powerpc though)
09:57 🔗 Smiley well yuo can try and run the scripts directly
09:57 🔗 Smiley instructions are on our wiki.
09:58 🔗 tryphon will have a look ;) thx
10:02 🔗 tryphon hmm two VM that target the same port 8001 is ok?
10:06 🔗 Tomcat_ tryphon: Don't think so.
10:09 🔗 Tomcat_ Either you won't be able to access one warrior's web interface, or one might not work at all.
10:09 🔗 Tomcat_ Not sure how it behaves.
10:10 🔗 tryphon It seems that the second VM doesn't have any network activity at all.
10:10 🔗 tryphon And we can not (easely) change the port, right.
10:10 🔗 tryphon -. +?
10:14 🔗 GLaDOS I believe virtualbox has an option to route the port to a different port somewhere
10:28 🔗 HeaD or you change the port in /home/warrior/warrior-code2/warrior-runner.sh
10:28 🔗 tryphon @GLaDOS fount it :) 1/ Go to VM prefs > network > adaptater 1 - http://imgur.com/NPsa0He
10:29 🔗 tryphon then go to "port forwading" and change the "host port" - http://imgur.com/NPsa0He
10:29 🔗 tryphon oops, first picture would have been http://imgur.com/r8jrs7d sorry
11:07 🔗 Smiley yeah you change it in port forwarding
11:07 🔗 Smiley don't change the warrior code, i think it'll revert at next update.
14:45 🔗 lart :( uploading a 4GB item at 50kB/s
17:03 🔗 Smiley D:
18:06 🔗 SketchCow I successfully torrented the TOSEC-PIX
18:09 🔗 godane cool
18:14 🔗 SketchCow Yeah
18:15 🔗 SketchCow So, I'm going to make the decision to unpack it and install it.
19:01 🔗 antomatic Tumblr is interesting but enormous. If you assume a similar average size of material per-user as Posterous (not necessarily safe) - say 2mb per user - then it's immediately 200TB before you even start.
19:02 🔗 antomatic Could easily be several times that, or worse.
19:04 🔗 antomatic /me steps out to buy some extra USB sticks, just in case
19:04 🔗 antomatic :)
19:08 🔗 antomatic That grab from 2010 averages at 72mb per user - so 7200TB
19:08 🔗 godane i grabbed g4tv tumbler
19:08 🔗 godane that was over 400mb
19:09 🔗 godane if i remember right
19:12 🔗 antomatic Mm, there are some enormous posterous blogs too, but the raw average is so low purely because there's so many users whose entries are tiny, or text-only, or spam.
19:13 🔗 antomatic Assume Moore's Law and say that user's Tumblrs double in size every 18 months (bigger/better pictures, more content, video, etc.)
19:13 🔗 antomatic That's 288mb per user, or 28 petabytes.
19:13 🔗 antomatic 14 times the size of the entire Wayback machine.
19:14 🔗 antomatic Wait, this can't be right..
19:14 🔗 omf_ you just realized that
19:14 🔗 SketchCow ha ha
19:14 🔗 godane most video is likely links to youtube
19:15 🔗 antomatic If you do decide to archive tumblr, I think I'll be washing my hair that day. But good luck with that. :)
19:16 🔗 godane the most likely tumbler accouts to archive if its to big is to go after ones that are linked to alot of wikis
19:18 🔗 antomatic "All tumblrs are equal, but some are more celebrilicious than others." :)
19:18 🔗 godane cause there most likely have a lot of info that is good
19:18 🔗 godane agree but at 28pb its like backing up facebook or youtube
19:18 🔗 godane its just not going to be alot of it
19:18 🔗 DFJustin antomatic: not impossible, megaupload had 28pb when it went down
19:18 🔗 DFJustin facebook has over 100pb of storage now
19:19 🔗 antomatic Coo..
19:19 🔗 antomatic Even keeping up with the new content (about 70 million new posts each day) would be a stretch.
19:19 🔗 antomatic It does sound like fun, though.. :)
19:20 🔗 godane it almost have go like geocities did
19:21 🔗 godane be mostly died in 2 years and stay up for the next 11 years after that
19:25 🔗 godane i uploaded some more tekzilla episodes
19:29 🔗 godane SketchCow: I got a interview of Richard Garriott
19:32 🔗 SketchCow Great
20:03 🔗 omf_ Tumblr has 107.8 million blogs adn 50.6 billion posts
20:07 🔗 godane interview with Michael Limbar from Angel Studios
20:07 🔗 godane *Limber
20:27 🔗 godane i found a 2 part interview with Yu Suzuki
20:28 🔗 godane and a intereview with Kevin Eastman
23:55 🔗 link343 WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
23:58 🔗 DFJustin 'yahoosucks'
23:58 🔗 link343 thank you fair sir

irclogger-viewer