#warrior 2013-02-23,Sat

↑back Search

Time Nickname Message
00:16 🔗 omf_ Okay so far I got it from 8033 domains to 3735
00:16 🔗 omf_ I expect a few hundred more to fall off before I am done
00:45 🔗 omf_ alard, I am going to look into setting up a warror server myself at a later date. I want to collect stats data that can we can publish as CC0 so there is more research out there
00:46 🔗 alard omf_: Excellent.
00:48 🔗 omf_ I have been a web developer for 17 years now and we only recently had large scale open data
00:48 🔗 omf_ google, yahoo, craigslist
00:48 🔗 omf_ they all started giving bits out
00:49 🔗 omf_ that lead to others and more community projects. I just see the next logical step being stats from millions of pages at a time
00:49 🔗 omf_ so more of the higher scalability end so beginners have something to learn from
10:51 🔗 alard omf_: Let's continue here.
10:51 🔗 omf_ The links between all these sites and inside these sites is a mess
10:52 🔗 omf_ a full premap would make this process easy but with no way knowing when shit is turned off, we do not have that kind of time
10:52 🔗 alard The universal-tracker system works best if you can split your task into small, but not too small subtasks.
10:53 🔗 alard But you have a small number of very large sites, is that correct?
10:53 🔗 omf_ looks that way
10:54 🔗 omf_ planetquake, the forums and a few others
10:54 🔗 omf_ I am still trying to figure out why the few attempted made completed without getting nearly everything
10:55 🔗 omf_ take this url for instance
10:55 🔗 omf_ http://planetquake.gamespy.com/View.php?view=POTD.Detail&id=4222
10:55 🔗 omf_ now all I have to do is minus one the number on the end and I got the previous page
10:56 🔗 omf_ there is also a list page with 167 pages of results
10:56 🔗 omf_ and yet wget got none of it
10:56 🔗 omf_ part of that I know is the cross domain image fetching bs
10:56 🔗 omf_ How do we bake that in
10:57 🔗 omf_ all images have this kind of url http://pnmedia.gamespy.com/planetquake.gamespy.com/fms/images/potd/4199/1323262539_fullres.jpg
11:00 🔗 alard That depends on your Wget parameters, probably.
11:00 🔗 omf_ Yeah I figure out what they need to be
11:00 🔗 omf_ I will have a few people test it
11:01 🔗 alard Is there a list of the sites you want to save on the wiki?
11:02 🔗 omf_ I don't have permission to create a wiki page to post it.
11:02 🔗 alard You don't have an account?
11:03 🔗 omf_ I have an account it just cannot create pages
11:03 🔗 omf_ just update and edit
11:03 🔗 omf_ never really had the need
11:03 🔗 alard That's weird. Should I create a page that you can then edit?
11:03 🔗 alard Since I have the impression that you're trying to save a lot of very different sites.
11:04 🔗 omf_ basically everything under 1up, gamespy, ugo and ign
11:04 🔗 omf_ it is all getting turned off
11:04 🔗 alard What do I call the page?
11:05 🔗 alard http://archiveteam.org/index.php?title=IGN
11:06 🔗 omf_ we called the chat room ispygames
11:06 🔗 alard Ah.

irclogger-viewer