[00:04] parsons: how is Meetup Anywhere different from the rest of Meetup? [00:10] I should upgrade my cpu so that I could actually play these arcade games [01:02] we're still working on making it faster [01:11] I wonder if there shouldn't be some sort of benchmark and you can compare your benchmark score to a recommendation in the page. That way as speed increases come from either better software or more powerful hardware, people can more easily determine what they can run. [01:12] I'd love that, go write it [01:37] DFJustin: and the browsers are getting faster too. It'll get there eventually, but in the mean time it would be a nice excuse to upgrade [01:38] aaaaaaaaa: I wonder if instrumenting MAME would be enough [01:39] build a version of MAME that reports some metrics (instructions per second or something) about the simulation, then build it for each platform [01:39] then compare that with the original hardware specs [03:25] aaaaaaaaa: isn't that basically what Microsofts performance index set out (and failed) to do? [03:36] I suppose, but with the performance index, there are all sorts of measures and different software is limited by different things. I think (but could be wrong) that jsmess is dependent on cpu. [03:39] aaaaaaaaa: yes, and that's exactly the reason it failed :P [05:28] yes cpu [14:36] db48x: Meetup Everywhere is an experimental platform -- more top-down than bottom up. A single group could create a community (Reddit, Coursera, etc) and people could sign on to create local "chapters" [14:37] It was somewhat successful but on a small scale. We're trying to get successful Meetup Everywhere groups to move over to the main platform before the shutdown [17:59] SketchCow: http://linux.slashdot.org/story/14/10/30/1614249/slashdot-asks-appropriate-place-for-free--open-source-software-artifacts [19:14] i wonder how big gfycat is and if it might be incredibly journeyed some day [20:06] Anyone working or have worked on yahoo dir? getting haulted at sub folders, like trying to start off at games or science instead of just dir.yahoo.com [20:07] using wget [20:13] recursive retrieval is simply not working [20:15] what happens? [20:16] its now following any links [20:16] i have tried everything [20:16] ionpulse what link are you starting from? [20:16] what are you trying to download? [20:16] https://dir.yahoo.com/recreation/games/video_games/ [20:16] what's your commandline? [20:17] might be that the href is not supported [20:17] the urls onthe page have Uppercase letters [20:17] the way it is written [20:17] while the starting url you posted does not [20:18] erm [20:18] lol [20:18] i should sleep [20:18] yes I noticed the case issue [20:19] gonna try ignore-case quick [20:20] you guys getting certificate issues too? [20:22] ok i got it [20:22] you do have to set --ignore-case with wget [20:22] yay [20:23] there are a few more critical things as well, like --no-check-certificate, and a regex reject to kill the alphabetical option [20:23] if you don't block that you will get everything twice [20:23] and given the narrow time window to get this stuff, i am blocking the alpha sort option [20:24] oh is it shutting down? [20:24] yea, thought tommorrow was shutoff [20:24] Yahoo directory is shutting down December 31st, IIRC [20:24] ah [20:25] i thought it was October 31st [20:26] http://www.theverge.com/2014/9/27/6854139/yahoo-directory-once-the-center-of-a-web-empire-will-shut-down [20:27] ok nice [20:30] you are probably thinking of qwiki, which is done on November 1st. [20:41] i will have an adjusted cmdline string here in a sec for wget, so you guys can get a jump on this easier if you havn't already [20:42] i have to merge in some stuff from my wgetrc so its a standalone working execution [20:51] nice "ERROR 999: Unable to process request at this time -- error 999." [20:52] Okay, here is a working wget commandline for Yahoo Directory: [20:53] wget -rkEpH -l inf -np --random-wait -w 0.5 --restrict-file-names=windows --trust-server-names=on -Ddir.yahoo.com,yahooapis.com,yimg.com -Pydir_games --no-check-certificate --secure-protocol=auto --user-agent="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" --referer="https://dir.yahoo.com/recreation/games/" --reject-regex '(.*)(\?o\=a)(.*)' --ignore-case -e robots=off https [20:53] ://dir.yahoo.com/recreation/games/video_games/ [20:53] sorry if this is a bit messy, as I had to quickly weave in stuff I usually have in an rc file [20:54] and there is some armor in here that "may" not be necessary, like ignore robots [20:54] but I just want the process to run smoothly without snags [20:54] like referrer may not even be needed but I added it anyway [20:55] That reject regex is important though if you don't want to grab double everything. [20:56] Right now I am just grabbing computer/game related, then art/science in a non-alphabetical grab. Then, depending on how long those processes run for, could do a complete grab. [20:56] However it makes sense to do it based on category and multi-thread it. [20:57] AWS EC2's might come in handy [21:07] ionpulse: dont forget warc! [21:07] --warc-file="dir.yahoo.com_$(date +%Y%m%d)" --warc-cdx [21:11] I am archiving stuff different than you guys. I have different types of projects going on. [21:11] So I don't tag with warc [21:12] Makes it especially hard on websites because only a percentage of the site ends up being tagged. As I have complex post process routines that stitch in more data than would otherwise be archived by a set it and forget it wget/httrac run. [21:13] But yes, traditionally, if someone were to use warc, that would be added (as many Archive Team projects do) [21:14] making wget built a warc only costs space (tiny, so it's just ~1/3 more) and you can just shove them into IA. would be nice [21:14] I am grabbing data out of Yahoo Web Directory to parse it for working links to archive on the web, and then to extract dead sites out of IA. [21:15] So Yahoo Dir is a means to an end for some of my other projects. [21:18] the video games run finished already [21:19] its only 583 files [21:24] yea... whats up with that ERROR 999 [21:24] wtf [21:25] geocities was like this [21:27] behold: {{specialcase}} can now be used for sites with a special case such as twitpic/4chan [21:27] I was going to make a "hybrid" one for sites like 4chan that actively purge data, but decided against it [21:27] if there's support I can get one rolling [22:14] so i dropped the user-agent for googlebot and Yahoo Directory started working again [22:14] the ERROR 999 went away [22:14] So we have to find the right combination of user-agent and wait time most likely