Time |
Nickname |
Message |
01:23
🔗
|
Coderjoe |
https://sphotos-b.xx.fbcdn.net/hphotos-snc6/10239_532076040155187_2068485922_n.jpg |
03:52
🔗
|
underscor |
chronomex: lolumad? |
03:52
🔗
|
underscor |
;D |
03:53
🔗
|
chronomex |
wut |
03:58
🔗
|
SketchCow |
ME GUSTA |
03:58
🔗
|
SketchCow |
So, we've now set up to ingest almost all archiveteam WARC-based items into the wayback machine. |
03:59
🔗
|
SketchCow |
Mid-month the wayback machine will ideally begin showing these more recent sites. |
04:01
🔗
|
godane |
now thats cool |
04:01
🔗
|
GLaDOS |
That's quite hot. |
04:03
🔗
|
godane |
i'm temped to make a livecd of wayback machine |
04:04
🔗
|
godane |
just download warc.gz to a folder and it starts working |
04:29
🔗
|
Nintendud |
Nice. |
04:54
🔗
|
godane |
luckly i'm getting images with my theregister.co.uk dumps |
04:55
🔗
|
godane |
this dump is meant to be more for data mine |
04:56
🔗
|
godane |
its mostly just going to be the articles |
04:56
🔗
|
godane |
by year |
05:08
🔗
|
godane |
looks like i have 4 theblazetv glenn beck shows that are 4 hours i have to grab |
05:08
🔗
|
godane |
:-( |
05:08
🔗
|
godane |
once my 2005 run of theregister is done i will go to windows and start grabing that |
05:09
🔗
|
godane |
good news is it looks like the rnc specials on theblazetv is up now |
10:39
🔗
|
SmileyG |
lol this is fun |
10:39
🔗
|
SmileyG |
rocking out to utterly random music selections on the archive |
11:36
🔗
|
SmileyG |
Actually alard its starting to make sense now I understand how its called lol |
11:36
🔗
|
SmileyG |
So it pulls the "normal" profile page |
11:36
🔗
|
SmileyG |
it then grabs varaious parts and "table.inserts" them into.... well a table |
11:36
🔗
|
SmileyG |
which will give you the list of download items? |
11:50
🔗
|
Soojin |
http://www.youtube.com/watch?v=cKcWyQ5aJlY&feature=g-user-u |
11:54
🔗
|
C-Keen |
~. |
14:01
🔗
|
alard |
SmileyG: The table (in lua everything is a 'table', but this table is really just a list) is the list of urls that gets added to Wget's queue. |
14:01
🔗
|
alard |
So it downloads the profile page, looks inside, finds pagination links, queues the page urls, the album urls etc. |
14:05
🔗
|
SmileyG |
nice |
14:05
🔗
|
joepie91 |
alard: you may have an idea on this... my script fetched about 1 million usernames so far |
14:05
🔗
|
joepie91 |
and that's it |
14:05
🔗
|
joepie91 |
however |
14:05
🔗
|
joepie91 |
https://www.google.nl/search?sugexp=chrome,mod=8&sourceid=chrome&ie=UTF-8&q=site%3Acommunity.webshots.com+inurl%3Auser |
14:05
🔗
|
joepie91 |
approx 11.400.000 results |
14:06
🔗
|
alard |
Should we start a #webshots channel? |
14:06
🔗
|
joepie91 |
since there's very little duplicates in the gathered list of users, I suspect I only have small (popular) subset |
14:06
🔗
|
alard |
Where did you look for your current list? |
14:06
🔗
|
joepie91 |
yes, probably |
14:06
🔗
|
joepie91 |
it starts at the community category index, then gets the top users for each category |
14:06
🔗
|
joepie91 |
but the top users list is limited to 100 pages |
14:06
🔗
|
joepie91 |
per category |
14:06
🔗
|
joepie91 |
that's 100 * 100 users per category |
14:07
🔗
|
joepie91 |
(max) |
15:41
🔗
|
SmileyG |
yeah errr zee fuxed. |
20:13
🔗
|
chronomex |
alard is a hero in a city of heroes |
20:59
🔗
|
undersco2 |
holy balls |
20:59
🔗
|
undersco2 |
http://fastdesign7.com/ |
20:59
🔗
|
chronomex |
http://jalopnik.com/5875229/what-happens-when-you-put-four-donuts-on-a-c63-amg |
20:59
🔗
|
SmileyG |
dear lord. |
22:09
🔗
|
godane |
i can grab some old webuser magazines now |
22:10
🔗
|
godane |
it go on usenet in the last week |