[00:04] <db48x> parsons: how is Meetup Anywhere different from the rest of Meetup? [00:10] <db48x> I should upgrade my cpu so that I could actually play these arcade games [01:02] <DFJustin> we're still working on making it faster [01:11] <aaaaaaaaa> I wonder if there shouldn't be some sort of benchmark and you can compare your benchmark score to a recommendation in the page. That way as speed increases come from either better software or more powerful hardware, people can more easily determine what they can run. [01:12] <DFJustin> I'd love that, go write it [01:37] <db48x> DFJustin: and the browsers are getting faster too. It'll get there eventually, but in the mean time it would be a nice excuse to upgrade [01:38] <db48x> aaaaaaaaa: I wonder if instrumenting MAME would be enough [01:39] <db48x> build a version of MAME that reports some metrics (instructions per second or something) about the simulation, then build it for each platform [01:39] <db48x> then compare that with the original hardware specs [03:25] <joepie91> aaaaaaaaa: isn't that basically what Microsofts performance index set out (and failed) to do? [03:36] <aaaaaaaaa> I suppose, but with the performance index, there are all sorts of measures and different software is limited by different things. I think (but could be wrong) that jsmess is dependent on cpu. [03:39] <joepie91> aaaaaaaaa: yes, and that's exactly the reason it failed :P [05:28] <DFJustin> yes cpu [14:36] <parsons> db48x: Meetup Everywhere is an experimental platform -- more top-down than bottom up. A single group could create a community (Reddit, Coursera, etc) and people could sign on to create local "chapters" [14:37] <parsons> It was somewhat successful but on a small scale. We're trying to get successful Meetup Everywhere groups to move over to the main platform before the shutdown [17:59] <SadDM> SketchCow: http://linux.slashdot.org/story/14/10/30/1614249/slashdot-asks-appropriate-place-for-free--open-source-software-artifacts [19:14] <schbirid> i wonder how big gfycat is and if it might be incredibly journeyed some day [20:06] <ionpulse> Anyone working or have worked on yahoo dir? getting haulted at sub folders, like trying to start off at games or science instead of just dir.yahoo.com [20:07] <ionpulse> using wget [20:13] <ionpulse> recursive retrieval is simply not working [20:15] <schbirid> what happens? [20:16] <ionpulse> its now following any links [20:16] <ionpulse> i have tried everything [20:16] <arkiver> ionpulse what link are you starting from? [20:16] <arkiver> what are you trying to download? [20:16] <ionpulse> https://dir.yahoo.com/recreation/games/video_games/ [20:16] <schbirid> what's your commandline? [20:17] <arkiver> might be that the href is not supported [20:17] <schbirid> the urls onthe page have Uppercase letters [20:17] <arkiver> the way it is written [20:17] <schbirid> while the starting url you posted does not [20:18] <schbirid> erm [20:18] <schbirid> lol [20:18] <schbirid> i should sleep [20:18] <ionpulse> yes I noticed the case issue [20:19] <ionpulse> gonna try ignore-case quick [20:20] <schbirid> you guys getting certificate issues too? [20:22] <ionpulse> ok i got it [20:22] <ionpulse> you do have to set --ignore-case with wget [20:22] <schbirid> yay [20:23] <ionpulse> there are a few more critical things as well, like --no-check-certificate, and a regex reject to kill the alphabetical option [20:23] <ionpulse> if you don't block that you will get everything twice [20:23] <ionpulse> and given the narrow time window to get this stuff, i am blocking the alpha sort option [20:24] <schbirid> oh is it shutting down? [20:24] <ionpulse> yea, thought tommorrow was shutoff [20:24] <aaaaaaaaa> Yahoo directory is shutting down December 31st, IIRC [20:24] <ionpulse> ah [20:25] <ionpulse> i thought it was October 31st [20:26] <aaaaaaaaa> http://www.theverge.com/2014/9/27/6854139/yahoo-directory-once-the-center-of-a-web-empire-will-shut-down [20:27] <ionpulse> ok nice [20:30] <aaaaaaaaa> you are probably thinking of qwiki, which is done on November 1st. [20:41] <ionpulse> i will have an adjusted cmdline string here in a sec for wget, so you guys can get a jump on this easier if you havn't already [20:42] <ionpulse> i have to merge in some stuff from my wgetrc so its a standalone working execution [20:51] <schbirid> nice "ERROR 999: Unable to process request at this time -- error 999." [20:52] <ionpulse> Okay, here is a working wget commandline for Yahoo Directory: [20:53] <ionpulse> wget -rkEpH -l inf -np --random-wait -w 0.5 --restrict-file-names=windows --trust-server-names=on -Ddir.yahoo.com,yahooapis.com,yimg.com -Pydir_games --no-check-certificate --secure-protocol=auto --user-agent="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" --referer="https://dir.yahoo.com/recreation/games/" --reject-regex '(.*)(\?o\=a)(.*)' --ignore-case -e robots=off https [20:53] <ionpulse> ://dir.yahoo.com/recreation/games/video_games/ [20:53] <ionpulse> sorry if this is a bit messy, as I had to quickly weave in stuff I usually have in an rc file [20:54] <ionpulse> and there is some armor in here that "may" not be necessary, like ignore robots [20:54] <ionpulse> but I just want the process to run smoothly without snags [20:54] <ionpulse> like referrer may not even be needed but I added it anyway [20:55] <ionpulse> That reject regex is important though if you don't want to grab double everything. [20:56] <ionpulse> Right now I am just grabbing computer/game related, then art/science in a non-alphabetical grab. Then, depending on how long those processes run for, could do a complete grab. [20:56] <ionpulse> However it makes sense to do it based on category and multi-thread it. [20:57] <ionpulse> AWS EC2's might come in handy [21:07] <schbirid> ionpulse: dont forget warc! [21:07] <schbirid> --warc-file="dir.yahoo.com_$(date +%Y%m%d)" --warc-cdx [21:11] <ionpulse> I am archiving stuff different than you guys. I have different types of projects going on. [21:11] <ionpulse> So I don't tag with warc [21:12] <ionpulse> Makes it especially hard on websites because only a percentage of the site ends up being tagged. As I have complex post process routines that stitch in more data than would otherwise be archived by a set it and forget it wget/httrac run. [21:13] <ionpulse> But yes, traditionally, if someone were to use warc, that would be added (as many Archive Team projects do) [21:14] <schbirid> making wget built a warc only costs space (tiny, so it's just ~1/3 more) and you can just shove them into IA. would be nice [21:14] <ionpulse> I am grabbing data out of Yahoo Web Directory to parse it for working links to archive on the web, and then to extract dead sites out of IA. [21:15] <ionpulse> So Yahoo Dir is a means to an end for some of my other projects. [21:18] <ionpulse> the video games run finished already [21:19] <ionpulse> its only 583 files [21:24] <ionpulse> yea... whats up with that ERROR 999 [21:24] <ionpulse> wtf [21:25] <ionpulse> geocities was like this [21:27] <wp494> behold: {{specialcase}} can now be used for sites with a special case such as twitpic/4chan [21:27] <wp494> I was going to make a "hybrid" one for sites like 4chan that actively purge data, but decided against it [21:27] <wp494> if there's support I can get one rolling [22:14] <ionpulse> so i dropped the user-agent for googlebot and Yahoo Directory started working again [22:14] <ionpulse> the ERROR 999 went away [22:14] <ionpulse> So we have to find the right combination of user-agent and wait time most likely