#archiveteam-bs 2012-11-27,Tue

↑back Search

Time Nickname Message
00:21 🔗 dashcloud folks, is there a simple website somewhere that takes the "can I have a cookie?" gag and actually leaves you a cookie on your computer?
00:57 🔗 godane looks like arstechnica.com does put / after page/2 with years
00:57 🔗 godane but everything else has folder meaning the / is after the page number
01:00 🔗 godane i'm only doing 1998 to 2011 of arstechnica.com
01:01 🔗 godane so i have full years of these articles
14:40 🔗 schbiridi does anyone have a script to split a directory (full of files and sub dirs) into directories of a smaller size target? SketchCow is said to have one but he is busy
15:06 🔗 SmileyG thats going to be a complicated script :S
15:06 🔗 schbiridi depends how complex/smart you make it of course
15:07 🔗 SmileyG well simpliest is move one file into directory, if directory isn't over sizeX then move another file in, ad infinium.
15:07 🔗 SmileyG to get the list of files use find.
17:28 🔗 swebb schbiridi: Yea, tools to use: find and split. :)
18:39 🔗 godane now this is awesome: http://archive.arstechnica.com/
18:42 🔗 ersi chronomex: Tracker is misbehaving agian JFYI
18:51 🔗 godane so my arstechica.com dump is grabing all the comments
18:51 🔗 godane it doing 1998 to 2004 in one warc
18:51 🔗 godane just the artciles
18:52 🔗 godane then i will go though that and try to grab images from cdn.arstechnica.com
18:52 🔗 ersi neat, that's been on my wishlist for a long time
18:53 🔗 ersi good going godane :)
18:53 🔗 SketchCow WOO HOO
18:53 🔗 godane there is only just 4000 urls in my index list
18:53 🔗 godane *little over 4000 between 1998 to 2004
18:54 🔗 godane its like just over 3700 for 2005 alone
18:54 🔗 godane 6700 for 2006
19:02 🔗 godane i'm grabing all arstechnica.com domains now
19:03 🔗 godane getting images from origin.arstechnica.com and www.arstechica.com
19:04 🔗 godane the www.arstechnica.com is a redirected to the cdn.arstechnica.com
19:06 🔗 godane i may add the www.arstechinca.com urls to my image grabs so there with cdn part of the url added to them
19:36 🔗 godane i'm going to start uploading my twilight cds
19:36 🔗 godane i think the first 15 are in mdf format
19:36 🔗 godane i'm going upload also a .iso with the original .mdf
21:48 🔗 godane uploaded: http://archive.org/details/cdrom-twilight-001
21:58 🔗 DFJustin I knew livewebbing this would come in handy one day http://web-beta.archive.org/web/20110630032950/http://omgpleasestopcosplaying.tumblr.com/
23:22 🔗 dashcloud DFJustin: that's awesome- did you see the Flo cosplay at the bottom of the page?
23:26 🔗 Panasonic http://www.savewalterwhite.com/
23:36 🔗 DFJustin yes
23:36 🔗 DFJustin I don't actually know who flo is though
23:47 🔗 dashcloud it's the lady from the Progressive Auto commercial
23:49 🔗 swebb Did you know that "Flo" from the progressive auto commercials make $500k/yr for those commercials?
23:50 🔗 swebb There is a *lot* of money in auto insurance. Geiko has been the #1 advertiser (TV & Internet) since 2000.
23:52 🔗 dashcloud swebb: did you like the Gecko or the cavemen from Geico better?
23:53 🔗 swebb I'm impartial. :)
23:53 🔗 swebb I disliked both of them equally.

irclogger-viewer