[01:23] https://sphotos-b.xx.fbcdn.net/hphotos-snc6/10239_532076040155187_2068485922_n.jpg [03:52] chronomex: lolumad? [03:52] ;D [03:53] wut [03:58] ME GUSTA [03:58] So, we've now set up to ingest almost all archiveteam WARC-based items into the wayback machine. [03:59] Mid-month the wayback machine will ideally begin showing these more recent sites. [04:01] now thats cool [04:01] That's quite hot. [04:03] i'm temped to make a livecd of wayback machine [04:04] just download warc.gz to a folder and it starts working [04:29] Nice. [04:54] luckly i'm getting images with my theregister.co.uk dumps [04:55] this dump is meant to be more for data mine [04:56] its mostly just going to be the articles [04:56] by year [05:08] looks like i have 4 theblazetv glenn beck shows that are 4 hours i have to grab [05:08] :-( [05:08] once my 2005 run of theregister is done i will go to windows and start grabing that [05:09] good news is it looks like the rnc specials on theblazetv is up now [10:39] lol this is fun [10:39] rocking out to utterly random music selections on the archive [11:36] Actually alard its starting to make sense now I understand how its called lol [11:36] So it pulls the "normal" profile page [11:36] it then grabs varaious parts and "table.inserts" them into.... well a table [11:36] which will give you the list of download items? [11:50] http://www.youtube.com/watch?v=cKcWyQ5aJlY&feature=g-user-u [11:54] ~. [14:01] SmileyG: The table (in lua everything is a 'table', but this table is really just a list) is the list of urls that gets added to Wget's queue. [14:01] So it downloads the profile page, looks inside, finds pagination links, queues the page urls, the album urls etc. [14:05] nice [14:05] alard: you may have an idea on this... my script fetched about 1 million usernames so far [14:05] and that's it [14:05] however [14:05] https://www.google.nl/search?sugexp=chrome,mod=8&sourceid=chrome&ie=UTF-8&q=site%3Acommunity.webshots.com+inurl%3Auser [14:05] approx 11.400.000 results [14:06] Should we start a #webshots channel? [14:06] since there's very little duplicates in the gathered list of users, I suspect I only have small (popular) subset [14:06] Where did you look for your current list? [14:06] yes, probably [14:06] it starts at the community category index, then gets the top users for each category [14:06] but the top users list is limited to 100 pages [14:06] per category [14:06] that's 100 * 100 users per category [14:07] (max) [15:41] yeah errr zee fuxed. [20:13] alard is a hero in a city of heroes [20:59] holy balls [20:59] http://fastdesign7.com/ [20:59] http://jalopnik.com/5875229/what-happens-when-you-put-four-donuts-on-a-c63-amg [20:59] dear lord. [22:09] i can grab some old webuser magazines now [22:10] it go on usenet in the last week