[00:04] <db48x> parsons: how is Meetup Anywhere different from the rest of Meetup?
[00:10] <db48x> I should upgrade my cpu so that I could actually play these arcade games
[01:02] <DFJustin> we're still working on making it faster
[01:11] <aaaaaaaaa> I wonder if there shouldn't be some sort of benchmark and you can compare your benchmark score to a recommendation in the page.  That way as speed increases come from either better software or more powerful hardware, people can more easily determine what they can run.
[01:12] <DFJustin> I'd love that, go write it
[01:37] <db48x> DFJustin: and the browsers are getting faster too. It'll get there eventually, but in the mean time it would be a nice excuse to upgrade
[01:38] <db48x> aaaaaaaaa: I wonder if instrumenting MAME would be enough
[01:39] <db48x> build a version of MAME that reports some metrics (instructions per second or something) about the simulation, then build it for each platform
[01:39] <db48x> then compare that with the original hardware specs
[03:25] <joepie91> aaaaaaaaa: isn't that basically what Microsofts performance index set out (and failed) to do?
[03:36] <aaaaaaaaa> I suppose, but with the performance index, there are all sorts of measures and different software is limited by different things.  I think (but could be wrong) that jsmess is dependent on cpu.
[03:39] <joepie91> aaaaaaaaa: yes, and that's exactly the reason it failed :P
[05:28] <DFJustin> yes cpu
[14:36] <parsons> db48x: Meetup Everywhere is an experimental platform -- more top-down than bottom up. A single group could create a community (Reddit, Coursera, etc) and people could sign on to create local "chapters"
[14:37] <parsons> It was somewhat successful but on a small scale. We're trying to get successful Meetup Everywhere groups to move over to the main platform before the shutdown
[17:59] <SadDM> SketchCow: http://linux.slashdot.org/story/14/10/30/1614249/slashdot-asks-appropriate-place-for-free--open-source-software-artifacts
[19:14] <schbirid> i wonder how big gfycat is and if it might be incredibly journeyed some day
[20:06] <ionpulse> Anyone working or have worked on yahoo dir? getting haulted at sub folders, like trying to start off at games or science instead of just dir.yahoo.com
[20:07] <ionpulse> using wget
[20:13] <ionpulse> recursive retrieval is simply not working
[20:15] <schbirid> what happens?
[20:16] <ionpulse> its now following any links
[20:16] <ionpulse> i have tried everything
[20:16] <arkiver> ionpulse what link are you starting from?
[20:16] <arkiver> what are you trying to download?
[20:16] <ionpulse> https://dir.yahoo.com/recreation/games/video_games/
[20:16] <schbirid> what's your commandline?
[20:17] <arkiver> might be that the href is not supported
[20:17] <schbirid> the urls onthe page have Uppercase letters
[20:17] <arkiver> the way it is written
[20:17] <schbirid> while the starting url you posted does not
[20:18] <schbirid> erm
[20:18] <schbirid> lol
[20:18] <schbirid> i should sleep
[20:18] <ionpulse> yes I noticed the case issue
[20:19] <ionpulse> gonna try ignore-case quick
[20:20] <schbirid> you guys getting certificate issues too?
[20:22] <ionpulse> ok i got it
[20:22] <ionpulse> you do have to set --ignore-case with wget
[20:22] <schbirid> yay
[20:23] <ionpulse> there are a few more critical things as well, like --no-check-certificate, and a regex reject to kill the alphabetical option
[20:23] <ionpulse> if you don't block that you will get everything twice
[20:23] <ionpulse> and given the narrow time window to get this stuff, i am blocking the alpha sort option
[20:24] <schbirid> oh is it shutting down?
[20:24] <ionpulse> yea, thought tommorrow was shutoff
[20:24] <aaaaaaaaa> Yahoo directory is shutting down December 31st, IIRC
[20:24] <ionpulse> ah
[20:25] <ionpulse> i thought it was October 31st
[20:26] <aaaaaaaaa> http://www.theverge.com/2014/9/27/6854139/yahoo-directory-once-the-center-of-a-web-empire-will-shut-down
[20:27] <ionpulse> ok nice
[20:30] <aaaaaaaaa> you are probably thinking of qwiki, which is done on November 1st.
[20:41] <ionpulse> i will have an adjusted cmdline string here in a sec for wget, so you guys can get a jump on this easier if you havn't already
[20:42] <ionpulse> i have to merge in some stuff from my wgetrc so its a standalone working execution
[20:51] <schbirid> nice "ERROR 999: Unable to process request at this time -- error 999."
[20:52] <ionpulse> Okay, here is a working wget commandline for Yahoo Directory:
[20:53] <ionpulse> wget -rkEpH -l inf -np --random-wait -w 0.5 --restrict-file-names=windows --trust-server-names=on -Ddir.yahoo.com,yahooapis.com,yimg.com -Pydir_games --no-check-certificate --secure-protocol=auto --user-agent="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" --referer="https://dir.yahoo.com/recreation/games/" --reject-regex '(.*)(\?o\=a)(.*)' --ignore-case -e robots=off https
[20:53] <ionpulse> ://dir.yahoo.com/recreation/games/video_games/
[20:53] <ionpulse> sorry if this is a bit messy, as I had to quickly weave in stuff I usually have in an rc file
[20:54] <ionpulse> and there is some armor in here that "may" not be necessary, like ignore robots
[20:54] <ionpulse> but I just want the process to run smoothly without snags
[20:54] <ionpulse> like referrer may not even be needed but I added it anyway
[20:55] <ionpulse> That reject regex is important though if you don't want to grab double everything.
[20:56] <ionpulse> Right now I am just grabbing computer/game related, then art/science in a non-alphabetical grab. Then, depending on how long those processes run for, could do a complete grab.
[20:56] <ionpulse> However it makes sense to do it based on category and multi-thread it.
[20:57] <ionpulse> AWS EC2's might come in handy
[21:07] <schbirid> ionpulse: dont forget warc!
[21:07] <schbirid> --warc-file="dir.yahoo.com_$(date +%Y%m%d)" --warc-cdx
[21:11] <ionpulse> I am archiving stuff different than you guys. I have different types of projects going on.
[21:11] <ionpulse> So I don't tag with warc
[21:12] <ionpulse> Makes it especially hard on websites because only a percentage of the site ends up being tagged. As I have complex post process routines that stitch in more data than would otherwise be archived by a set it and forget it wget/httrac run.
[21:13] <ionpulse> But yes, traditionally, if someone were to use warc, that would be added (as many Archive Team projects do)
[21:14] <schbirid> making wget built a warc only costs space (tiny, so it's just ~1/3 more) and you can just shove them into IA. would be nice
[21:14] <ionpulse> I am grabbing data out of Yahoo Web Directory to parse it for working links to archive on the web, and then to extract dead sites out of IA.
[21:15] <ionpulse> So Yahoo Dir is a means to an end for some of my other projects.
[21:18] <ionpulse> the video games run finished already
[21:19] <ionpulse> its only 583 files
[21:24] <ionpulse> yea... whats up with that ERROR 999
[21:24] <ionpulse> wtf
[21:25] <ionpulse> geocities was like this
[21:27] <wp494> behold: {{specialcase}} can now be used for sites with a special case such as twitpic/4chan
[21:27] <wp494> I was going to make a "hybrid" one for sites like 4chan that actively purge data, but decided against it
[21:27] <wp494> if there's support I can get one rolling
[22:14] <ionpulse> so i dropped the user-agent for googlebot and Yahoo Directory started working again
[22:14] <ionpulse> the ERROR 999 went away
[22:14] <ionpulse> So we have to find the right combination of user-agent and wait time most likely