#archiveteam-bs 2012-10-04,Thu

↑back Search

Time Nickname Message
01:23 🔗 Coderjoe https://sphotos-b.xx.fbcdn.net/hphotos-snc6/10239_532076040155187_2068485922_n.jpg
03:52 🔗 underscor chronomex: lolumad?
03:52 🔗 underscor ;D
03:53 🔗 chronomex wut
03:58 🔗 SketchCow ME GUSTA
03:58 🔗 SketchCow So, we've now set up to ingest almost all archiveteam WARC-based items into the wayback machine.
03:59 🔗 SketchCow Mid-month the wayback machine will ideally begin showing these more recent sites.
04:01 🔗 godane now thats cool
04:01 🔗 GLaDOS That's quite hot.
04:03 🔗 godane i'm temped to make a livecd of wayback machine
04:04 🔗 godane just download warc.gz to a folder and it starts working
04:29 🔗 Nintendud Nice.
04:54 🔗 godane luckly i'm getting images with my theregister.co.uk dumps
04:55 🔗 godane this dump is meant to be more for data mine
04:56 🔗 godane its mostly just going to be the articles
04:56 🔗 godane by year
05:08 🔗 godane looks like i have 4 theblazetv glenn beck shows that are 4 hours i have to grab
05:08 🔗 godane :-(
05:08 🔗 godane once my 2005 run of theregister is done i will go to windows and start grabing that
05:09 🔗 godane good news is it looks like the rnc specials on theblazetv is up now
10:39 🔗 SmileyG lol this is fun
10:39 🔗 SmileyG rocking out to utterly random music selections on the archive
11:36 🔗 SmileyG Actually alard its starting to make sense now I understand how its called lol
11:36 🔗 SmileyG So it pulls the "normal" profile page
11:36 🔗 SmileyG it then grabs varaious parts and "table.inserts" them into.... well a table
11:36 🔗 SmileyG which will give you the list of download items?
11:50 🔗 Soojin http://www.youtube.com/watch?v=cKcWyQ5aJlY&feature=g-user-u
11:54 🔗 C-Keen ~.
14:01 🔗 alard SmileyG: The table (in lua everything is a 'table', but this table is really just a list) is the list of urls that gets added to Wget's queue.
14:01 🔗 alard So it downloads the profile page, looks inside, finds pagination links, queues the page urls, the album urls etc.
14:05 🔗 SmileyG nice
14:05 🔗 joepie91 alard: you may have an idea on this... my script fetched about 1 million usernames so far
14:05 🔗 joepie91 and that's it
14:05 🔗 joepie91 however
14:05 🔗 joepie91 https://www.google.nl/search?sugexp=chrome,mod=8&sourceid=chrome&ie=UTF-8&q=site%3Acommunity.webshots.com+inurl%3Auser
14:05 🔗 joepie91 approx 11.400.000 results
14:06 🔗 alard Should we start a #webshots channel?
14:06 🔗 joepie91 since there's very little duplicates in the gathered list of users, I suspect I only have small (popular) subset
14:06 🔗 alard Where did you look for your current list?
14:06 🔗 joepie91 yes, probably
14:06 🔗 joepie91 it starts at the community category index, then gets the top users for each category
14:06 🔗 joepie91 but the top users list is limited to 100 pages
14:06 🔗 joepie91 per category
14:06 🔗 joepie91 that's 100 * 100 users per category
14:07 🔗 joepie91 (max)
15:41 🔗 SmileyG yeah errr zee fuxed.
20:13 🔗 chronomex alard is a hero in a city of heroes
20:59 🔗 undersco2 holy balls
20:59 🔗 undersco2 http://fastdesign7.com/
20:59 🔗 chronomex http://jalopnik.com/5875229/what-happens-when-you-put-four-donuts-on-a-c63-amg
20:59 🔗 SmileyG dear lord.
22:09 🔗 godane i can grab some old webuser magazines now
22:10 🔗 godane it go on usenet in the last week

irclogger-viewer