[00:21] folks, is there a simple website somewhere that takes the "can I have a cookie?" gag and actually leaves you a cookie on your computer? [00:57] looks like arstechnica.com does put / after page/2 with years [00:57] but everything else has folder meaning the / is after the page number [01:00] i'm only doing 1998 to 2011 of arstechnica.com [01:01] so i have full years of these articles [14:40] does anyone have a script to split a directory (full of files and sub dirs) into directories of a smaller size target? SketchCow is said to have one but he is busy [15:06] thats going to be a complicated script :S [15:06] depends how complex/smart you make it of course [15:07] well simpliest is move one file into directory, if directory isn't over sizeX then move another file in, ad infinium. [15:07] to get the list of files use find. [17:28] schbiridi: Yea, tools to use: find and split. :) [18:39] now this is awesome: http://archive.arstechnica.com/ [18:42] chronomex: Tracker is misbehaving agian JFYI [18:51] so my arstechica.com dump is grabing all the comments [18:51] it doing 1998 to 2004 in one warc [18:51] just the artciles [18:52] then i will go though that and try to grab images from cdn.arstechnica.com [18:52] neat, that's been on my wishlist for a long time [18:53] good going godane :) [18:53] WOO HOO [18:53] there is only just 4000 urls in my index list [18:53] *little over 4000 between 1998 to 2004 [18:54] its like just over 3700 for 2005 alone [18:54] 6700 for 2006 [19:02] i'm grabing all arstechnica.com domains now [19:03] getting images from origin.arstechnica.com and www.arstechica.com [19:04] the www.arstechnica.com is a redirected to the cdn.arstechnica.com [19:06] i may add the www.arstechinca.com urls to my image grabs so there with cdn part of the url added to them [19:36] i'm going to start uploading my twilight cds [19:36] i think the first 15 are in mdf format [19:36] i'm going upload also a .iso with the original .mdf [21:48] uploaded: http://archive.org/details/cdrom-twilight-001 [21:58] I knew livewebbing this would come in handy one day http://web-beta.archive.org/web/20110630032950/http://omgpleasestopcosplaying.tumblr.com/ [23:22] DFJustin: that's awesome- did you see the Flo cosplay at the bottom of the page? [23:26] http://www.savewalterwhite.com/ [23:36] yes [23:36] I don't actually know who flo is though [23:47] it's the lady from the Progressive Auto commercial [23:49] Did you know that "Flo" from the progressive auto commercials make $500k/yr for those commercials? [23:50] There is a *lot* of money in auto insurance. Geiko has been the #1 advertiser (TV & Internet) since 2000. [23:52] swebb: did you like the Gecko or the cavemen from Geico better? [23:53] I'm impartial. :) [23:53] I disliked both of them equally.