Time |
Nickname |
Message |
00:21
🔗
|
dashcloud |
folks, is there a simple website somewhere that takes the "can I have a cookie?" gag and actually leaves you a cookie on your computer? |
00:57
🔗
|
godane |
looks like arstechnica.com does put / after page/2 with years |
00:57
🔗
|
godane |
but everything else has folder meaning the / is after the page number |
01:00
🔗
|
godane |
i'm only doing 1998 to 2011 of arstechnica.com |
01:01
🔗
|
godane |
so i have full years of these articles |
14:40
🔗
|
schbiridi |
does anyone have a script to split a directory (full of files and sub dirs) into directories of a smaller size target? SketchCow is said to have one but he is busy |
15:06
🔗
|
SmileyG |
thats going to be a complicated script :S |
15:06
🔗
|
schbiridi |
depends how complex/smart you make it of course |
15:07
🔗
|
SmileyG |
well simpliest is move one file into directory, if directory isn't over sizeX then move another file in, ad infinium. |
15:07
🔗
|
SmileyG |
to get the list of files use find. |
17:28
🔗
|
swebb |
schbiridi: Yea, tools to use: find and split. :) |
18:39
🔗
|
godane |
now this is awesome: http://archive.arstechnica.com/ |
18:42
🔗
|
ersi |
chronomex: Tracker is misbehaving agian JFYI |
18:51
🔗
|
godane |
so my arstechica.com dump is grabing all the comments |
18:51
🔗
|
godane |
it doing 1998 to 2004 in one warc |
18:51
🔗
|
godane |
just the artciles |
18:52
🔗
|
godane |
then i will go though that and try to grab images from cdn.arstechnica.com |
18:52
🔗
|
ersi |
neat, that's been on my wishlist for a long time |
18:53
🔗
|
ersi |
good going godane :) |
18:53
🔗
|
SketchCow |
WOO HOO |
18:53
🔗
|
godane |
there is only just 4000 urls in my index list |
18:53
🔗
|
godane |
*little over 4000 between 1998 to 2004 |
18:54
🔗
|
godane |
its like just over 3700 for 2005 alone |
18:54
🔗
|
godane |
6700 for 2006 |
19:02
🔗
|
godane |
i'm grabing all arstechnica.com domains now |
19:03
🔗
|
godane |
getting images from origin.arstechnica.com and www.arstechica.com |
19:04
🔗
|
godane |
the www.arstechnica.com is a redirected to the cdn.arstechnica.com |
19:06
🔗
|
godane |
i may add the www.arstechinca.com urls to my image grabs so there with cdn part of the url added to them |
19:36
🔗
|
godane |
i'm going to start uploading my twilight cds |
19:36
🔗
|
godane |
i think the first 15 are in mdf format |
19:36
🔗
|
godane |
i'm going upload also a .iso with the original .mdf |
21:48
🔗
|
godane |
uploaded: http://archive.org/details/cdrom-twilight-001 |
21:58
🔗
|
DFJustin |
I knew livewebbing this would come in handy one day http://web-beta.archive.org/web/20110630032950/http://omgpleasestopcosplaying.tumblr.com/ |
23:22
🔗
|
dashcloud |
DFJustin: that's awesome- did you see the Flo cosplay at the bottom of the page? |
23:26
🔗
|
Panasonic |
http://www.savewalterwhite.com/ |
23:36
🔗
|
DFJustin |
yes |
23:36
🔗
|
DFJustin |
I don't actually know who flo is though |
23:47
🔗
|
dashcloud |
it's the lady from the Progressive Auto commercial |
23:49
🔗
|
swebb |
Did you know that "Flo" from the progressive auto commercials make $500k/yr for those commercials? |
23:50
🔗
|
swebb |
There is a *lot* of money in auto insurance. Geiko has been the #1 advertiser (TV & Internet) since 2000. |
23:52
🔗
|
dashcloud |
swebb: did you like the Gecko or the cavemen from Geico better? |
23:53
🔗
|
swebb |
I'm impartial. :) |
23:53
🔗
|
swebb |
I disliked both of them equally. |