[00:19] i'm doing another pull of g4tv thefeed [03:41] anyone need a copy of stacker 2.0? http://i.imgur.com/7vlry82.jpg [03:42] yes, that is 15 copies [06:08] Why Stacker? MS-DOS 6.0 is working fine for me. [07:14] stack of stacker [07:14] stackerstack [07:25] so chronomex, why the short nick [07:25] he's gone Web 2.0 [07:25] soon we'll be able to like him and share him [07:47] * cmx puts out like button [07:49] cmx_[Like 👍] [07:49] for the unicodally challenged that's http://www.fileformat.info/info/unicode/char/1f44d/index.htm [07:50] I'll meet you halfway [07:50] cmx [Like ✔] [07:54] Cool, twrist likes me [12:13] hey joepie91 how did the T17 forums go [13:17] BlueMax: it ran out of RAM :( [13:18] this is starting to get annoying [13:18] ack. [13:18] I have an 878MB WARC here [13:19] a find | wc -l gives me 94465 files [13:19] no idea how complete it is [13:19] BlueMax: do you have a box with enough RAM to run this on? my 1GB wasn't enough :| [13:20] unfortunately I don't have Linux on any of my boxes [13:22] there's wget for Windows, no? [13:22] afaik WARC functionality is now in mainline [13:22] and even if I did unfortunately my net is bonked [13:23] and by bonked I mean capped, fucking Australia [13:33] ah [13:33] :( [13:33] I think the problem is that wget-warc keeps all to-do URLs in RAM [13:33] rather than storing them on the filesystem [13:33] so on big sites, it just OOMs [13:33] doesn't scale.. at all [13:34] that seems relatively inefficent [13:34] very much so [13:35] it kind of sucks because I have a nice unmetered box [13:35] at a datacenter that doesn't give half a fuck about abusemail [13:35] yet the RAM is what keels over my wget-warc processes on big sites... [14:07] swap ALL the things! [14:08] * BlueMax swaps twrist for a real boy [14:08] ono [14:13] right [14:13] BlueMax; I'm going to be trying to wget-warc the team17 forums on my laptop [14:13] it has 4GB of RAM [14:13] and a bunch of swap [14:13] that should hopefully be enough [14:13] also, twrist, I was trying to run it on an openvz VPS [14:13] you can't just add swap there [14:14] ..right [14:14] right [14:47] damnit [14:47] wget-warc won't build [14:47] or well [14:47] wget-dev [15:05] whats occuring world? [15:12] BlueMax: well, it's running [15:12] k [15:12] stupid compiling stuff [15:12] gl [15:12] thanks :P [15:44] goddamnit [15:44] I'm seeing 500 errors [15:44] on team17 forums [15:45] every few pages [15:45] oh dear [15:45] seeing a lot of them now [15:47] I guess I'll have to do a second pass... [15:47] * joepie91 wonders why the 500 errors are occurring [15:49] well shit [15:49] :| [15:50] do you need a login? [15:54] no, it actually seems to be a server issue [15:54] I have a working login [15:54] and I can load the same pages in my browser just fine [15:54] perhaps vbulletin is just really shit [15:55] also, http://warc.readthedocs.org/en/latest <- is this known to work correctly? [15:55] cc omf_ and alard in particular [15:56] considering writing my own WARC spider, that uses on-disk files to keep track of state [15:56] to get around the RAM issue [15:57] I don't use that library [15:57] joepie91: I've never used it, but since it's from IA it might work? [15:57] huh, it's from IA? [15:58] The github link links to internetarchive/warc. [15:58] so is the broken s3upload script [15:58] i'm dong a mirror now of it [15:58] odd [15:58] it's a fork of it apparently [15:58] oh well :) [15:59] guess it'll be good to go then [16:00] i got the stupid s= problem again [16:01] which turn cause none of the topics to be downloaded [16:08] HELP [16:08] the s= session just will not go away [16:09] i can't mirror team17 forums in less some one helps me [16:09] and with my connection it may take a bit [16:09] godane: what in particular do you need help with? [16:10] wget keeps getting the s= urls on the forum pages [16:10] are you using stored login cookies? [16:10] because vbulletin does that when it can't store cookies iirc [16:10] if you use the commands on the wiki with an account, it should stop doing that [16:11] without an account it should work also, in theory, but not sure how [16:11] and you'll probably need to login anyway to get everything [16:11] sec [16:11] src/wget --save-cookies team17-cookies.txt --post-data 'vb_login_username=USERNAMEGOESHERE&vb_login_password=PASSWORDGOESHERE&securitytoken=guest&cookieuser=1&do=login' http://forum.team17.com/login.php?do=login [16:11] src/wget --load-cookies team17-cookies.txt -e robots=off --wait 0.25 "http://forum.team17.com/" --mirror --warc-file="at-team17-forum" [16:11] assuming src/wget is your warc-supporting wget inary [16:11] binary * [16:11] run those successively with valid login details and it should work fine :) [16:12] i'm using wget-lua [16:12] any particular reason? [16:12] it grabs all images [16:13] including ones on different hosts [16:13] doesn't --mirror also do that? [16:13] page-requisites included etc [16:13] or am I confused? [16:13] i have span-hosts [16:14] anyway, wget-lua should do the above as well, perhaps without --warc-file if warc support is not compiled in [16:14] err.. [16:14] >.> [16:14] accidentally started a second client [16:15] now it looks like its grabing everything [16:17] my problem was i was saving the cookies over without loading it in [16:17] ah [21:03] I wonder if one could hack more swap around a single process (like a wget spider) by first creating a disk file (sparse works fine), mmaping it, and then calling exec to start the other process [21:04] I know I used a large disk file and mmap to add memory in a program I was writing that was exhausting memory otherwise [21:06] in this hack idea, I think the mmaped file stays open and mapped until the kernel cleans up from the process exiting [21:07] I suppose it probably wouldn't work, as the malloc functions wouldn't know about the extra memory [21:08] in my program, I had functions I wrote to hand out sections of that mmaped space for use. [22:35] Coderjoe: one of the issues would be that it'd be terrifyingly slow [22:35] because it's optimized for working with flash memory, not a filesystem [22:35] er [22:35] RAM * [22:35] at least, that's what I'd expect to happen [23:34] soooooo [23:34] http://todo.cryto.net/ :D [23:35] you can start using it straight away, and register + save your existing items if you like it [23:35] (free) [23:52] so i got The-Lounge part of team17 forum