#archiveteam 2012-02-22,Wed

โ†‘back Search

Time Nickname Message
09:34 ๐Ÿ”— arrith undersco2: so just did some tests to find out how much disk space ubuntu server 10.04 and 11.10 would minimally require
09:35 ๐Ÿ”— arrith i did the press f4 at the "install ubuntu", "check disc for defects", etc screen then selected "minimal install"
09:35 ๐Ÿ”— arrith final installed size was 603 megs for 11.10 and 524 megs for 10.04 for minimal install
09:36 ๐Ÿ”— arrith tried to install on a 1 gig dynamic vdi, wasn't taking (kept getting errors and i executed a shell in the install and saw the partitions it made were full), so i tried 2 gig and it worked fine. the vdi never got over 750 megs so i'm not sure what was going on
09:38 ๐Ÿ”— arrith also did min vm, couldn't get 11.10 to boot, might be a vbox that's too old or something, but the vdi was 740. while 10.04 was 392 root + 12 boot (due to lvm) and the vdi was 559
09:38 ๐Ÿ”— RedType arrith: probably temp files
09:38 ๐Ÿ”— arrith RedType: yeah but i'd expect that to balloon the vdi just the same
09:38 ๐Ÿ”— RedType yeah that's true
09:38 ๐Ÿ”— RedType does the vdi have some sort of compression?
09:39 ๐Ÿ”— arrith unless it somehow knows to reuse space these days. which i hadn't heard
09:39 ๐Ÿ”— arrith RedType: not that i'm aware of, that would be nice
09:39 ๐Ÿ”— RedType i dont think virtualbox does but...
09:39 ๐Ÿ”— arrith yeah there's like qcow2
09:39 ๐Ÿ”— arrith pretty sure that does some compression
09:39 ๐Ÿ”— RedType how big was your swap
09:40 ๐Ÿ”— arrith so final install for any of them was under 610 megs. so i'd be safe and keep 2 gigs for installing, but only need around 610 for storage of the vdi
09:40 ๐Ÿ”— arrith RedType: on guest or host? since it'd be dynamic. but only a few hundred megs. 1 GB minus 800 or so, so around 200 megs
09:41 ๐Ÿ”— RedType i dont think the vdi would increase in size unless that space was specifically allocated
09:41 ๐Ÿ”— arrith haven't tested compressing the vdi, but personally i've yet to keep a vm around long enough to want to compress its vdi
09:41 ๐Ÿ”— RedType which is why the vdi would hit 740~, but be using 1GB
09:41 ๐Ÿ”— RedType if the swap was made but never used, the vdi wouldn't increase (drastically) in size
09:41 ๐Ÿ”— RedType (i think)
09:42 ๐Ÿ”— arrith RedType: well one thing is you have to particularly zero the free space using a special tool depending on the os, then do a vdi compress with VBoxManage
09:42 ๐Ÿ”— arrith so if the space is used it shows up in the vdi, the vdi won't shrink. also i had a ls -lh running every 2 seconds in a while loop just to be sure
09:42 ๐Ÿ”— RedType arrith: try allocating a gigantic partition in a vdi
09:42 ๐Ÿ”— RedType like a fresh vdi
09:43 ๐Ÿ”— RedType the vdi wouldnt incraese in size unless you're like 0ing out the contents of the partition or something odd
09:43 ๐Ÿ”— arrith when the partitions got written out the vdi went from like 24 bytes to 2 megs
09:43 ๐Ÿ”— arrith yeah
09:43 ๐Ÿ”— arrith hm so yeah i guess it could be swap
09:43 ๐Ÿ”— RedType it adds up
09:44 ๐Ÿ”— RedType 1000~-740~ sort of equals close to 256 ;)
09:44 ๐Ÿ”— arrith ah yeah
09:44 ๐Ÿ”— arrith well, i'm not too particular about finding the minimum space necessary. i'd like to hear if someone investigates but i just wanted a ballpark figure on final install usage
09:45 ๐Ÿ”— arrith debian beating ubuntu handily at 400 megs even
09:48 ๐Ÿ”— arrith btw http://www.youtube.com/watch?v=EJ_wXOFQV3M
09:49 ๐Ÿ”— RedType arrith: http://i.imgur.com/t6TF9.png
09:49 ๐Ÿ”— RedType rms basically has almost every email response pre typed
09:50 ๐Ÿ”— arrith RedType: being a public figure and doing your own email is pretty crazy. he's said it takes quite a few hours
09:50 ๐Ÿ”— arrith and if you assume he does responses like that, must be crazy
09:51 ๐Ÿ”— RedType he's said multiple times.
09:56 ๐Ÿ”— chronomex report from the digitization lab: today we scanned 3,000 slides in 8 hours
09:56 ๐Ÿ”— chronomex sample: http://www.flickr.com/photos/afiler/6773815684/
10:00 ๐Ÿ”— Coderjoe arrith: I wonder if the filesystem was issuing TRIM commands and if the virtual disk device understood them to remove unused space
10:11 ๐Ÿ”— arrith Coderjoe: that would be pretty fancy
12:54 ๐Ÿ”— ivan` is there some crawling software that lets you filter out URL patterns during the crawl?
12:55 ๐Ÿ”— ersi what are you trying to do? or what do you want to do? Map/crawl a domain?
12:55 ๐Ÿ”— ivan` I'm often grabbing wikis and other sites that have nasty surprises like endless historical pages or ?commentPage=29312
12:55 ๐Ÿ”— ivan` if there's some massive filter list for those, that would be cool too
12:56 ๐Ÿ”— ersi so you'd like to get all commentPages? or what? :)
12:56 ๐Ÿ”— ivan` another example: blogspot ?search= pages. I want to avoid getting the redundant stuff.
12:58 ๐Ÿ”— ivan` so my ideal crawler would show me everything it has queued and let me enter patterns to exclude
12:58 ๐Ÿ”— ivan` maybe I will write one in Clojure someday
12:59 ๐Ÿ”— ivan` call it "livecrawling" :-
12:59 ๐Ÿ”— tef heh
12:59 ๐Ÿ”— tef ivan`: it would be easy to add tbh
12:59 ๐Ÿ”— ersi If you were to automate it, how would it know if the content is redundant? Unless it's a perfect match if you grab both to memory?
12:59 ๐Ÿ”— ivan` I, the human, decide it is not worth grabbing ("nearly redundant")
13:00 ๐Ÿ”— tef just stick a web interface on the queue, allow updating the filters
13:00 ๐Ÿ”— ivan` yes
13:00 ๐Ÿ”— ersi hm~
13:00 ๐Ÿ”— tef but that would only be effective in small crawls
13:00 ๐Ÿ”— tef small, targeted crawls
13:00 ๐Ÿ”— ersi Wouldn't you need to map out the domain first? So you'd see the patterns starting to emerge
13:01 ๐Ÿ”— ivan` not really. you'd just see the crawler downloading crap and add a filter
13:01 ๐Ÿ”— void_ ivan`: https://webarchive.jira.com/wiki/display/Heritrix/Avoiding+Too+Much+Dynamic+Content
13:01 ๐Ÿ”— void_ ivan`: with heritrix you can stop and resume a crawl
13:01 ๐Ÿ”— ersi Hm, I guess
13:02 ๐Ÿ”— ivan` void_: cool, I'll check it out
13:02 ๐Ÿ”— tef tbh we do that at work - stop and restart crawls
13:02 ๐Ÿ”— tef ersi: I was going to add a logfile to resume from for the simple crawler
13:03 ๐Ÿ”— void_ the ideal would be a queue manager for wget-warc ;)
13:03 ๐Ÿ”— tef heh
13:03 ๐Ÿ”— tef pfft wget :v
13:03 ๐Ÿ”— tef void_: the ideal thing would be a shared queue on heroku, and clients that can hop on/hop off to capture content
13:04 ๐Ÿ”— tef as well as automated upload facility that handled slots/etc
13:06 ๐Ÿ”— tef so people who contribute don't need do much more than just run a program
13:23 ๐Ÿ”— Soojin http://www.archive.org/details/LawAndDisorderInLagosNigeria-LouisTheroux hmm shouldn't this be considered abuse of archive.orgs resources?:)
13:24 ๐Ÿ”— Soojin somebody posted it on reddit.
13:24 ๐Ÿ”— Soojin don't get why they use archive.org for this kind of stuff, theres millions of youtube clones.
13:24 ๐Ÿ”— ersi What?
13:24 ๐Ÿ”— ersi You mean they use archive.org as hosting?
13:28 ๐Ÿ”— Soojin yeah for common stuff, like long movies or anime.
13:28 ๐Ÿ”— Soojin but I think more often they use cryptic names and don't post it blatantly like this one.
14:34 ๐Ÿ”— tef ersi: but yeah if you're gonna make changes push them back to github pls :3
14:41 ๐Ÿ”— ersi tef: yeah, signed up for an account earlier today
14:45 ๐Ÿ”— tef you can fork it or I can add you
16:41 ๐Ÿ”— SketchCow HI GANG
16:42 ๐Ÿ”— SketchCow Wow, those are some AWESOME slides, chronomex
16:47 ๐Ÿ”— SketchCow 650 tapes ร‚ยท 1,000 hours ร‚ยท 1,378 WAV files ร‚ยท 637 GB ร‚ยท 691 JPEG scans of cassette liner cards & literature
16:47 ๐Ÿ”— SketchCow (This is what I'm adding to the Archive next week from a collection.)
16:50 ๐Ÿ”— ersi Sha-BANG!'
16:56 ๐Ÿ”— Nemo_bis SketchCow, and how did you digitize those tapes? Some automagical automatic tape feeder? :)
16:57 ๐Ÿ”— dnova microphone put up real close to his Talkboy
17:00 ๐Ÿ”— SketchCow Yeah, mostly I'd hold my iphone across the room from the TV
17:08 ๐Ÿ”— DFJustin got the unarchiver guy to support those old-format iso images https://code.google.com/p/theunarchiver/issues/detail?id=434
17:14 ๐Ÿ”— kennethre DFJustin: nice! <3 The Unarchiver
22:55 ๐Ÿ”— undersco2 So, I got accepted to RIT
22:55 ๐Ÿ”— undersco2 :D
22:55 ๐Ÿ”— undersco2 Pretty psyched about that
22:56 ๐Ÿ”— dnova ehhhhhhhh
22:56 ๐Ÿ”— dnova so did I
22:56 ๐Ÿ”— dnova and I didn't go
22:56 ๐Ÿ”— dnova where else did you apply?
22:56 ๐Ÿ”— undersco2 nowhere else
22:56 ๐Ÿ”— dnova oh come on!!!
22:57 ๐Ÿ”— undersco2 I procrastinated as fuck
22:57 ๐Ÿ”— undersco2 Also, I really like RIT
22:57 ๐Ÿ”— dnova why
22:57 ๐Ÿ”— undersco2 It's hard to quantify, honestly
22:57 ๐Ÿ”— undersco2 I've been asked that numerous times today
22:57 ๐Ÿ”— dnova RIT is actually the only private school I've ever been accepted to
22:57 ๐Ÿ”— undersco2 and I can't really describe it
22:58 ๐Ÿ”— dnova well, I hope the exhorbitant tuition is worth it to you
22:58 ๐Ÿ”— dnova (or they're giving you a scholarship)
23:00 ๐Ÿ”— dnova I just hate to see someone spend 80 thousand dollars on an undergraduate education that could be had for far, far less
23:01 ๐Ÿ”— dashcloud congrats undersco2 - have a major in mind yet (it's fine if you don't)
23:01 ๐Ÿ”— undersco2 I put Applied Networking and Systems Engineering on the application
23:01 ๐Ÿ”— dnova what department is that? CS?
23:01 ๐Ÿ”— undersco2 With Computer Focused Electrical Engineering and Computer Science as my second and third interests, respectively
23:02 ๐Ÿ”— undersco2 Uh
23:02 ๐Ÿ”— undersco2 I dunno
23:02 ๐Ÿ”— undersco2 http://www.nssa.rit.edu/
23:02 ๐Ÿ”— dnova heh ok.
23:02 ๐Ÿ”— undersco2 http://www.nssa.rit.edu/?q=node/4 looks like it's its own?
23:03 ๐Ÿ”— dnova guess so
23:04 ๐Ÿ”— dnova welp.
23:06 ๐Ÿ”— dnova you'll do fine wherever you go, I'm sure :)
23:06 ๐Ÿ”— dnova unlike me, currently in the process of being ejected from the phd program, haha
23:06 ๐Ÿ”— chronomex >.>
23:06 ๐Ÿ”— chronomex it's not that hard to get un-ejected
23:06 ๐Ÿ”— dnova certainly not
23:07 ๐Ÿ”— dnova but I have no interest in continuing here unfortunately
23:07 ๐Ÿ”— dnova so I'm going to sever as cordially as I can while hopefully still collecting fellowship checks until may
23:07 ๐Ÿ”— chronomex ah
23:08 ๐Ÿ”— dnova I actually found out today my advisor wants to drop me
23:08 ๐Ÿ”— dnova rightfully so, haha
23:10 ๐Ÿ”— dnova speaking of which, anyone here live in Austria?

irclogger-viewer