[09:34] undersco2: so just did some tests to find out how much disk space ubuntu server 10.04 and 11.10 would minimally require [09:35] i did the press f4 at the "install ubuntu", "check disc for defects", etc screen then selected "minimal install" [09:35] final installed size was 603 megs for 11.10 and 524 megs for 10.04 for minimal install [09:36] tried to install on a 1 gig dynamic vdi, wasn't taking (kept getting errors and i executed a shell in the install and saw the partitions it made were full), so i tried 2 gig and it worked fine. the vdi never got over 750 megs so i'm not sure what was going on [09:38] also did min vm, couldn't get 11.10 to boot, might be a vbox that's too old or something, but the vdi was 740. while 10.04 was 392 root + 12 boot (due to lvm) and the vdi was 559 [09:38] arrith: probably temp files [09:38] RedType: yeah but i'd expect that to balloon the vdi just the same [09:38] yeah that's true [09:38] does the vdi have some sort of compression? [09:39] unless it somehow knows to reuse space these days. which i hadn't heard [09:39] RedType: not that i'm aware of, that would be nice [09:39] i dont think virtualbox does but... [09:39] yeah there's like qcow2 [09:39] pretty sure that does some compression [09:39] how big was your swap [09:40] so final install for any of them was under 610 megs. so i'd be safe and keep 2 gigs for installing, but only need around 610 for storage of the vdi [09:40] RedType: on guest or host? since it'd be dynamic. but only a few hundred megs. 1 GB minus 800 or so, so around 200 megs [09:41] i dont think the vdi would increase in size unless that space was specifically allocated [09:41] haven't tested compressing the vdi, but personally i've yet to keep a vm around long enough to want to compress its vdi [09:41] which is why the vdi would hit 740~, but be using 1GB [09:41] if the swap was made but never used, the vdi wouldn't increase (drastically) in size [09:41] (i think) [09:42] RedType: well one thing is you have to particularly zero the free space using a special tool depending on the os, then do a vdi compress with VBoxManage [09:42] so if the space is used it shows up in the vdi, the vdi won't shrink. also i had a ls -lh running every 2 seconds in a while loop just to be sure [09:42] arrith: try allocating a gigantic partition in a vdi [09:42] like a fresh vdi [09:43] the vdi wouldnt incraese in size unless you're like 0ing out the contents of the partition or something odd [09:43] when the partitions got written out the vdi went from like 24 bytes to 2 megs [09:43] yeah [09:43] hm so yeah i guess it could be swap [09:43] it adds up [09:44] 1000~-740~ sort of equals close to 256 ;) [09:44] ah yeah [09:44] well, i'm not too particular about finding the minimum space necessary. i'd like to hear if someone investigates but i just wanted a ballpark figure on final install usage [09:45] debian beating ubuntu handily at 400 megs even [09:48] btw http://www.youtube.com/watch?v=EJ_wXOFQV3M [09:49] arrith: http://i.imgur.com/t6TF9.png [09:49] rms basically has almost every email response pre typed [09:50] RedType: being a public figure and doing your own email is pretty crazy. he's said it takes quite a few hours [09:50] and if you assume he does responses like that, must be crazy [09:51] he's said multiple times. [09:56] report from the digitization lab: today we scanned 3,000 slides in 8 hours [09:56] sample: http://www.flickr.com/photos/afiler/6773815684/ [10:00] arrith: I wonder if the filesystem was issuing TRIM commands and if the virtual disk device understood them to remove unused space [10:11] Coderjoe: that would be pretty fancy [12:54] is there some crawling software that lets you filter out URL patterns during the crawl? [12:55] what are you trying to do? or what do you want to do? Map/crawl a domain? [12:55] I'm often grabbing wikis and other sites that have nasty surprises like endless historical pages or ?commentPage=29312 [12:55] if there's some massive filter list for those, that would be cool too [12:56] so you'd like to get all commentPages? or what? :) [12:56] another example: blogspot ?search= pages. I want to avoid getting the redundant stuff. [12:58] so my ideal crawler would show me everything it has queued and let me enter patterns to exclude [12:58] maybe I will write one in Clojure someday [12:59] call it "livecrawling" :- [12:59] heh [12:59] ivan`: it would be easy to add tbh [12:59] If you were to automate it, how would it know if the content is redundant? Unless it's a perfect match if you grab both to memory? [12:59] I, the human, decide it is not worth grabbing ("nearly redundant") [13:00] just stick a web interface on the queue, allow updating the filters [13:00] yes [13:00] hm~ [13:00] but that would only be effective in small crawls [13:00] small, targeted crawls [13:00] Wouldn't you need to map out the domain first? So you'd see the patterns starting to emerge [13:01] not really. you'd just see the crawler downloading crap and add a filter [13:01] ivan`: https://webarchive.jira.com/wiki/display/Heritrix/Avoiding+Too+Much+Dynamic+Content [13:01] ivan`: with heritrix you can stop and resume a crawl [13:01] Hm, I guess [13:02] void_: cool, I'll check it out [13:02] tbh we do that at work - stop and restart crawls [13:02] ersi: I was going to add a logfile to resume from for the simple crawler [13:03] the ideal would be a queue manager for wget-warc ;) [13:03] heh [13:03] pfft wget :v [13:03] void_: the ideal thing would be a shared queue on heroku, and clients that can hop on/hop off to capture content [13:04] as well as automated upload facility that handled slots/etc [13:06] so people who contribute don't need do much more than just run a program [13:23] http://www.archive.org/details/LawAndDisorderInLagosNigeria-LouisTheroux hmm shouldn't this be considered abuse of archive.orgs resources?:) [13:24] somebody posted it on reddit. [13:24] don't get why they use archive.org for this kind of stuff, theres millions of youtube clones. [13:24] What? [13:24] You mean they use archive.org as hosting? [13:28] yeah for common stuff, like long movies or anime. [13:28] but I think more often they use cryptic names and don't post it blatantly like this one. [14:34] ersi: but yeah if you're gonna make changes push them back to github pls :3 [14:41] tef: yeah, signed up for an account earlier today [14:45] you can fork it or I can add you [16:41] HI GANG [16:42] Wow, those are some AWESOME slides, chronomex [16:47] 650 tapes · 1,000 hours · 1,378 WAV files · 637 GB · 691 JPEG scans of cassette liner cards & literature [16:47] (This is what I'm adding to the Archive next week from a collection.) [16:50] Sha-BANG!' [16:56] SketchCow, and how did you digitize those tapes? Some automagical automatic tape feeder? :) [16:57] microphone put up real close to his Talkboy [17:00] Yeah, mostly I'd hold my iphone across the room from the TV [17:08] got the unarchiver guy to support those old-format iso images https://code.google.com/p/theunarchiver/issues/detail?id=434 [17:14] DFJustin: nice! <3 The Unarchiver [22:55] So, I got accepted to RIT [22:55] :D [22:55] Pretty psyched about that [22:56] ehhhhhhhh [22:56] so did I [22:56] and I didn't go [22:56] where else did you apply? [22:56] nowhere else [22:56] oh come on!!! [22:57] I procrastinated as fuck [22:57] Also, I really like RIT [22:57] why [22:57] It's hard to quantify, honestly [22:57] I've been asked that numerous times today [22:57] RIT is actually the only private school I've ever been accepted to [22:57] and I can't really describe it [22:58] well, I hope the exhorbitant tuition is worth it to you [22:58] (or they're giving you a scholarship) [23:00] I just hate to see someone spend 80 thousand dollars on an undergraduate education that could be had for far, far less [23:01] congrats undersco2 - have a major in mind yet (it's fine if you don't) [23:01] I put Applied Networking and Systems Engineering on the application [23:01] what department is that? CS? [23:01] With Computer Focused Electrical Engineering and Computer Science as my second and third interests, respectively [23:02] Uh [23:02] I dunno [23:02] http://www.nssa.rit.edu/ [23:02] heh ok. [23:02] http://www.nssa.rit.edu/?q=node/4 looks like it's its own? [23:03] guess so [23:04] welp. [23:06] you'll do fine wherever you go, I'm sure :) [23:06] unlike me, currently in the process of being ejected from the phd program, haha [23:06] >.> [23:06] it's not that hard to get un-ejected [23:06] certainly not [23:07] but I have no interest in continuing here unfortunately [23:07] so I'm going to sever as cordially as I can while hopefully still collecting fellowship checks until may [23:07] ah [23:08] I actually found out today my advisor wants to drop me [23:08] rightfully so, haha [23:10] speaking of which, anyone here live in Austria?