Time |
Nickname |
Message |
09:34
๐
|
arrith |
undersco2: so just did some tests to find out how much disk space ubuntu server 10.04 and 11.10 would minimally require |
09:35
๐
|
arrith |
i did the press f4 at the "install ubuntu", "check disc for defects", etc screen then selected "minimal install" |
09:35
๐
|
arrith |
final installed size was 603 megs for 11.10 and 524 megs for 10.04 for minimal install |
09:36
๐
|
arrith |
tried to install on a 1 gig dynamic vdi, wasn't taking (kept getting errors and i executed a shell in the install and saw the partitions it made were full), so i tried 2 gig and it worked fine. the vdi never got over 750 megs so i'm not sure what was going on |
09:38
๐
|
arrith |
also did min vm, couldn't get 11.10 to boot, might be a vbox that's too old or something, but the vdi was 740. while 10.04 was 392 root + 12 boot (due to lvm) and the vdi was 559 |
09:38
๐
|
RedType |
arrith: probably temp files |
09:38
๐
|
arrith |
RedType: yeah but i'd expect that to balloon the vdi just the same |
09:38
๐
|
RedType |
yeah that's true |
09:38
๐
|
RedType |
does the vdi have some sort of compression? |
09:39
๐
|
arrith |
unless it somehow knows to reuse space these days. which i hadn't heard |
09:39
๐
|
arrith |
RedType: not that i'm aware of, that would be nice |
09:39
๐
|
RedType |
i dont think virtualbox does but... |
09:39
๐
|
arrith |
yeah there's like qcow2 |
09:39
๐
|
arrith |
pretty sure that does some compression |
09:39
๐
|
RedType |
how big was your swap |
09:40
๐
|
arrith |
so final install for any of them was under 610 megs. so i'd be safe and keep 2 gigs for installing, but only need around 610 for storage of the vdi |
09:40
๐
|
arrith |
RedType: on guest or host? since it'd be dynamic. but only a few hundred megs. 1 GB minus 800 or so, so around 200 megs |
09:41
๐
|
RedType |
i dont think the vdi would increase in size unless that space was specifically allocated |
09:41
๐
|
arrith |
haven't tested compressing the vdi, but personally i've yet to keep a vm around long enough to want to compress its vdi |
09:41
๐
|
RedType |
which is why the vdi would hit 740~, but be using 1GB |
09:41
๐
|
RedType |
if the swap was made but never used, the vdi wouldn't increase (drastically) in size |
09:41
๐
|
RedType |
(i think) |
09:42
๐
|
arrith |
RedType: well one thing is you have to particularly zero the free space using a special tool depending on the os, then do a vdi compress with VBoxManage |
09:42
๐
|
arrith |
so if the space is used it shows up in the vdi, the vdi won't shrink. also i had a ls -lh running every 2 seconds in a while loop just to be sure |
09:42
๐
|
RedType |
arrith: try allocating a gigantic partition in a vdi |
09:42
๐
|
RedType |
like a fresh vdi |
09:43
๐
|
RedType |
the vdi wouldnt incraese in size unless you're like 0ing out the contents of the partition or something odd |
09:43
๐
|
arrith |
when the partitions got written out the vdi went from like 24 bytes to 2 megs |
09:43
๐
|
arrith |
yeah |
09:43
๐
|
arrith |
hm so yeah i guess it could be swap |
09:43
๐
|
RedType |
it adds up |
09:44
๐
|
RedType |
1000~-740~ sort of equals close to 256 ;) |
09:44
๐
|
arrith |
ah yeah |
09:44
๐
|
arrith |
well, i'm not too particular about finding the minimum space necessary. i'd like to hear if someone investigates but i just wanted a ballpark figure on final install usage |
09:45
๐
|
arrith |
debian beating ubuntu handily at 400 megs even |
09:48
๐
|
arrith |
btw http://www.youtube.com/watch?v=EJ_wXOFQV3M |
09:49
๐
|
RedType |
arrith: http://i.imgur.com/t6TF9.png |
09:49
๐
|
RedType |
rms basically has almost every email response pre typed |
09:50
๐
|
arrith |
RedType: being a public figure and doing your own email is pretty crazy. he's said it takes quite a few hours |
09:50
๐
|
arrith |
and if you assume he does responses like that, must be crazy |
09:51
๐
|
RedType |
he's said multiple times. |
09:56
๐
|
chronomex |
report from the digitization lab: today we scanned 3,000 slides in 8 hours |
09:56
๐
|
chronomex |
sample: http://www.flickr.com/photos/afiler/6773815684/ |
10:00
๐
|
Coderjoe |
arrith: I wonder if the filesystem was issuing TRIM commands and if the virtual disk device understood them to remove unused space |
10:11
๐
|
arrith |
Coderjoe: that would be pretty fancy |
12:54
๐
|
ivan` |
is there some crawling software that lets you filter out URL patterns during the crawl? |
12:55
๐
|
ersi |
what are you trying to do? or what do you want to do? Map/crawl a domain? |
12:55
๐
|
ivan` |
I'm often grabbing wikis and other sites that have nasty surprises like endless historical pages or ?commentPage=29312 |
12:55
๐
|
ivan` |
if there's some massive filter list for those, that would be cool too |
12:56
๐
|
ersi |
so you'd like to get all commentPages? or what? :) |
12:56
๐
|
ivan` |
another example: blogspot ?search= pages. I want to avoid getting the redundant stuff. |
12:58
๐
|
ivan` |
so my ideal crawler would show me everything it has queued and let me enter patterns to exclude |
12:58
๐
|
ivan` |
maybe I will write one in Clojure someday |
12:59
๐
|
ivan` |
call it "livecrawling" :- |
12:59
๐
|
tef |
heh |
12:59
๐
|
tef |
ivan`: it would be easy to add tbh |
12:59
๐
|
ersi |
If you were to automate it, how would it know if the content is redundant? Unless it's a perfect match if you grab both to memory? |
12:59
๐
|
ivan` |
I, the human, decide it is not worth grabbing ("nearly redundant") |
13:00
๐
|
tef |
just stick a web interface on the queue, allow updating the filters |
13:00
๐
|
ivan` |
yes |
13:00
๐
|
ersi |
hm~ |
13:00
๐
|
tef |
but that would only be effective in small crawls |
13:00
๐
|
tef |
small, targeted crawls |
13:00
๐
|
ersi |
Wouldn't you need to map out the domain first? So you'd see the patterns starting to emerge |
13:01
๐
|
ivan` |
not really. you'd just see the crawler downloading crap and add a filter |
13:01
๐
|
void_ |
ivan`: https://webarchive.jira.com/wiki/display/Heritrix/Avoiding+Too+Much+Dynamic+Content |
13:01
๐
|
void_ |
ivan`: with heritrix you can stop and resume a crawl |
13:01
๐
|
ersi |
Hm, I guess |
13:02
๐
|
ivan` |
void_: cool, I'll check it out |
13:02
๐
|
tef |
tbh we do that at work - stop and restart crawls |
13:02
๐
|
tef |
ersi: I was going to add a logfile to resume from for the simple crawler |
13:03
๐
|
void_ |
the ideal would be a queue manager for wget-warc ;) |
13:03
๐
|
tef |
heh |
13:03
๐
|
tef |
pfft wget :v |
13:03
๐
|
tef |
void_: the ideal thing would be a shared queue on heroku, and clients that can hop on/hop off to capture content |
13:04
๐
|
tef |
as well as automated upload facility that handled slots/etc |
13:06
๐
|
tef |
so people who contribute don't need do much more than just run a program |
13:23
๐
|
Soojin |
http://www.archive.org/details/LawAndDisorderInLagosNigeria-LouisTheroux hmm shouldn't this be considered abuse of archive.orgs resources?:) |
13:24
๐
|
Soojin |
somebody posted it on reddit. |
13:24
๐
|
Soojin |
don't get why they use archive.org for this kind of stuff, theres millions of youtube clones. |
13:24
๐
|
ersi |
What? |
13:24
๐
|
ersi |
You mean they use archive.org as hosting? |
13:28
๐
|
Soojin |
yeah for common stuff, like long movies or anime. |
13:28
๐
|
Soojin |
but I think more often they use cryptic names and don't post it blatantly like this one. |
14:34
๐
|
tef |
ersi: but yeah if you're gonna make changes push them back to github pls :3 |
14:41
๐
|
ersi |
tef: yeah, signed up for an account earlier today |
14:45
๐
|
tef |
you can fork it or I can add you |
16:41
๐
|
SketchCow |
HI GANG |
16:42
๐
|
SketchCow |
Wow, those are some AWESOME slides, chronomex |
16:47
๐
|
SketchCow |
650 tapes รยท 1,000 hours รยท 1,378 WAV files รยท 637 GB รยท 691 JPEG scans of cassette liner cards & literature |
16:47
๐
|
SketchCow |
(This is what I'm adding to the Archive next week from a collection.) |
16:50
๐
|
ersi |
Sha-BANG!' |
16:56
๐
|
Nemo_bis |
SketchCow, and how did you digitize those tapes? Some automagical automatic tape feeder? :) |
16:57
๐
|
dnova |
microphone put up real close to his Talkboy |
17:00
๐
|
SketchCow |
Yeah, mostly I'd hold my iphone across the room from the TV |
17:08
๐
|
DFJustin |
got the unarchiver guy to support those old-format iso images https://code.google.com/p/theunarchiver/issues/detail?id=434 |
17:14
๐
|
kennethre |
DFJustin: nice! <3 The Unarchiver |
22:55
๐
|
undersco2 |
So, I got accepted to RIT |
22:55
๐
|
undersco2 |
:D |
22:55
๐
|
undersco2 |
Pretty psyched about that |
22:56
๐
|
dnova |
ehhhhhhhh |
22:56
๐
|
dnova |
so did I |
22:56
๐
|
dnova |
and I didn't go |
22:56
๐
|
dnova |
where else did you apply? |
22:56
๐
|
undersco2 |
nowhere else |
22:56
๐
|
dnova |
oh come on!!! |
22:57
๐
|
undersco2 |
I procrastinated as fuck |
22:57
๐
|
undersco2 |
Also, I really like RIT |
22:57
๐
|
dnova |
why |
22:57
๐
|
undersco2 |
It's hard to quantify, honestly |
22:57
๐
|
undersco2 |
I've been asked that numerous times today |
22:57
๐
|
dnova |
RIT is actually the only private school I've ever been accepted to |
22:57
๐
|
undersco2 |
and I can't really describe it |
22:58
๐
|
dnova |
well, I hope the exhorbitant tuition is worth it to you |
22:58
๐
|
dnova |
(or they're giving you a scholarship) |
23:00
๐
|
dnova |
I just hate to see someone spend 80 thousand dollars on an undergraduate education that could be had for far, far less |
23:01
๐
|
dashcloud |
congrats undersco2 - have a major in mind yet (it's fine if you don't) |
23:01
๐
|
undersco2 |
I put Applied Networking and Systems Engineering on the application |
23:01
๐
|
dnova |
what department is that? CS? |
23:01
๐
|
undersco2 |
With Computer Focused Electrical Engineering and Computer Science as my second and third interests, respectively |
23:02
๐
|
undersco2 |
Uh |
23:02
๐
|
undersco2 |
I dunno |
23:02
๐
|
undersco2 |
http://www.nssa.rit.edu/ |
23:02
๐
|
dnova |
heh ok. |
23:02
๐
|
undersco2 |
http://www.nssa.rit.edu/?q=node/4 looks like it's its own? |
23:03
๐
|
dnova |
guess so |
23:04
๐
|
dnova |
welp. |
23:06
๐
|
dnova |
you'll do fine wherever you go, I'm sure :) |
23:06
๐
|
dnova |
unlike me, currently in the process of being ejected from the phd program, haha |
23:06
๐
|
chronomex |
>.> |
23:06
๐
|
chronomex |
it's not that hard to get un-ejected |
23:06
๐
|
dnova |
certainly not |
23:07
๐
|
dnova |
but I have no interest in continuing here unfortunately |
23:07
๐
|
dnova |
so I'm going to sever as cordially as I can while hopefully still collecting fellowship checks until may |
23:07
๐
|
chronomex |
ah |
23:08
๐
|
dnova |
I actually found out today my advisor wants to drop me |
23:08
๐
|
dnova |
rightfully so, haha |
23:10
๐
|
dnova |
speaking of which, anyone here live in Austria? |