Time |
Nickname |
Message |
00:04
🔗
|
db48x |
parsons: how is Meetup Anywhere different from the rest of Meetup? |
00:10
🔗
|
db48x |
I should upgrade my cpu so that I could actually play these arcade games |
01:02
🔗
|
DFJustin |
we're still working on making it faster |
01:11
🔗
|
aaaaaaaaa |
I wonder if there shouldn't be some sort of benchmark and you can compare your benchmark score to a recommendation in the page. That way as speed increases come from either better software or more powerful hardware, people can more easily determine what they can run. |
01:12
🔗
|
DFJustin |
I'd love that, go write it |
01:37
🔗
|
db48x |
DFJustin: and the browsers are getting faster too. It'll get there eventually, but in the mean time it would be a nice excuse to upgrade |
01:38
🔗
|
db48x |
aaaaaaaaa: I wonder if instrumenting MAME would be enough |
01:39
🔗
|
db48x |
build a version of MAME that reports some metrics (instructions per second or something) about the simulation, then build it for each platform |
01:39
🔗
|
db48x |
then compare that with the original hardware specs |
03:25
🔗
|
joepie91 |
aaaaaaaaa: isn't that basically what Microsofts performance index set out (and failed) to do? |
03:36
🔗
|
aaaaaaaaa |
I suppose, but with the performance index, there are all sorts of measures and different software is limited by different things. I think (but could be wrong) that jsmess is dependent on cpu. |
03:39
🔗
|
joepie91 |
aaaaaaaaa: yes, and that's exactly the reason it failed :P |
05:28
🔗
|
DFJustin |
yes cpu |
14:36
🔗
|
parsons |
db48x: Meetup Everywhere is an experimental platform -- more top-down than bottom up. A single group could create a community (Reddit, Coursera, etc) and people could sign on to create local "chapters" |
14:37
🔗
|
parsons |
It was somewhat successful but on a small scale. We're trying to get successful Meetup Everywhere groups to move over to the main platform before the shutdown |
17:59
🔗
|
SadDM |
SketchCow: http://linux.slashdot.org/story/14/10/30/1614249/slashdot-asks-appropriate-place-for-free--open-source-software-artifacts |
19:14
🔗
|
schbirid |
i wonder how big gfycat is and if it might be incredibly journeyed some day |
20:06
🔗
|
ionpulse |
Anyone working or have worked on yahoo dir? getting haulted at sub folders, like trying to start off at games or science instead of just dir.yahoo.com |
20:07
🔗
|
ionpulse |
using wget |
20:13
🔗
|
ionpulse |
recursive retrieval is simply not working |
20:15
🔗
|
schbirid |
what happens? |
20:16
🔗
|
ionpulse |
its now following any links |
20:16
🔗
|
ionpulse |
i have tried everything |
20:16
🔗
|
arkiver |
ionpulse what link are you starting from? |
20:16
🔗
|
arkiver |
what are you trying to download? |
20:16
🔗
|
ionpulse |
https://dir.yahoo.com/recreation/games/video_games/ |
20:16
🔗
|
schbirid |
what's your commandline? |
20:17
🔗
|
arkiver |
might be that the href is not supported |
20:17
🔗
|
schbirid |
the urls onthe page have Uppercase letters |
20:17
🔗
|
arkiver |
the way it is written |
20:17
🔗
|
schbirid |
while the starting url you posted does not |
20:18
🔗
|
schbirid |
erm |
20:18
🔗
|
schbirid |
lol |
20:18
🔗
|
schbirid |
i should sleep |
20:18
🔗
|
ionpulse |
yes I noticed the case issue |
20:19
🔗
|
ionpulse |
gonna try ignore-case quick |
20:20
🔗
|
schbirid |
you guys getting certificate issues too? |
20:22
🔗
|
ionpulse |
ok i got it |
20:22
🔗
|
ionpulse |
you do have to set --ignore-case with wget |
20:22
🔗
|
schbirid |
yay |
20:23
🔗
|
ionpulse |
there are a few more critical things as well, like --no-check-certificate, and a regex reject to kill the alphabetical option |
20:23
🔗
|
ionpulse |
if you don't block that you will get everything twice |
20:23
🔗
|
ionpulse |
and given the narrow time window to get this stuff, i am blocking the alpha sort option |
20:24
🔗
|
schbirid |
oh is it shutting down? |
20:24
🔗
|
ionpulse |
yea, thought tommorrow was shutoff |
20:24
🔗
|
aaaaaaaaa |
Yahoo directory is shutting down December 31st, IIRC |
20:24
🔗
|
ionpulse |
ah |
20:25
🔗
|
ionpulse |
i thought it was October 31st |
20:26
🔗
|
aaaaaaaaa |
http://www.theverge.com/2014/9/27/6854139/yahoo-directory-once-the-center-of-a-web-empire-will-shut-down |
20:27
🔗
|
ionpulse |
ok nice |
20:30
🔗
|
aaaaaaaaa |
you are probably thinking of qwiki, which is done on November 1st. |
20:41
🔗
|
ionpulse |
i will have an adjusted cmdline string here in a sec for wget, so you guys can get a jump on this easier if you havn't already |
20:42
🔗
|
ionpulse |
i have to merge in some stuff from my wgetrc so its a standalone working execution |
20:51
🔗
|
schbirid |
nice "ERROR 999: Unable to process request at this time -- error 999." |
20:52
🔗
|
ionpulse |
Okay, here is a working wget commandline for Yahoo Directory: |
20:53
🔗
|
ionpulse |
wget -rkEpH -l inf -np --random-wait -w 0.5 --restrict-file-names=windows --trust-server-names=on -Ddir.yahoo.com,yahooapis.com,yimg.com -Pydir_games --no-check-certificate --secure-protocol=auto --user-agent="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" --referer="https://dir.yahoo.com/recreation/games/" --reject-regex '(.*)(\?o\=a)(.*)' --ignore-case -e robots=off https |
20:53
🔗
|
ionpulse |
://dir.yahoo.com/recreation/games/video_games/ |
20:53
🔗
|
ionpulse |
sorry if this is a bit messy, as I had to quickly weave in stuff I usually have in an rc file |
20:54
🔗
|
ionpulse |
and there is some armor in here that "may" not be necessary, like ignore robots |
20:54
🔗
|
ionpulse |
but I just want the process to run smoothly without snags |
20:54
🔗
|
ionpulse |
like referrer may not even be needed but I added it anyway |
20:55
🔗
|
ionpulse |
That reject regex is important though if you don't want to grab double everything. |
20:56
🔗
|
ionpulse |
Right now I am just grabbing computer/game related, then art/science in a non-alphabetical grab. Then, depending on how long those processes run for, could do a complete grab. |
20:56
🔗
|
ionpulse |
However it makes sense to do it based on category and multi-thread it. |
20:57
🔗
|
ionpulse |
AWS EC2's might come in handy |
21:07
🔗
|
schbirid |
ionpulse: dont forget warc! |
21:07
🔗
|
schbirid |
--warc-file="dir.yahoo.com_$(date +%Y%m%d)" --warc-cdx |
21:11
🔗
|
ionpulse |
I am archiving stuff different than you guys. I have different types of projects going on. |
21:11
🔗
|
ionpulse |
So I don't tag with warc |
21:12
🔗
|
ionpulse |
Makes it especially hard on websites because only a percentage of the site ends up being tagged. As I have complex post process routines that stitch in more data than would otherwise be archived by a set it and forget it wget/httrac run. |
21:13
🔗
|
ionpulse |
But yes, traditionally, if someone were to use warc, that would be added (as many Archive Team projects do) |
21:14
🔗
|
schbirid |
making wget built a warc only costs space (tiny, so it's just ~1/3 more) and you can just shove them into IA. would be nice |
21:14
🔗
|
ionpulse |
I am grabbing data out of Yahoo Web Directory to parse it for working links to archive on the web, and then to extract dead sites out of IA. |
21:15
🔗
|
ionpulse |
So Yahoo Dir is a means to an end for some of my other projects. |
21:18
🔗
|
ionpulse |
the video games run finished already |
21:19
🔗
|
ionpulse |
its only 583 files |
21:24
🔗
|
ionpulse |
yea... whats up with that ERROR 999 |
21:24
🔗
|
ionpulse |
wtf |
21:25
🔗
|
ionpulse |
geocities was like this |
21:27
🔗
|
wp494 |
behold: {{specialcase}} can now be used for sites with a special case such as twitpic/4chan |
21:27
🔗
|
wp494 |
I was going to make a "hybrid" one for sites like 4chan that actively purge data, but decided against it |
21:27
🔗
|
wp494 |
if there's support I can get one rolling |
22:14
🔗
|
ionpulse |
so i dropped the user-agent for googlebot and Yahoo Directory started working again |
22:14
🔗
|
ionpulse |
the ERROR 999 went away |
22:14
🔗
|
ionpulse |
So we have to find the right combination of user-agent and wait time most likely |