#archiveteam-bs 2013-06-09,Sun

โ†‘back Search

Time Nickname Message
00:19 ๐Ÿ”— godane i'm doing another pull of g4tv thefeed
03:41 ๐Ÿ”— Coderjoe anyone need a copy of stacker 2.0? http://i.imgur.com/7vlry82.jpg
03:42 ๐Ÿ”— Coderjoe yes, that is 15 copies
06:08 ๐Ÿ”— turnkit Why Stacker? MS-DOS 6.0 is working fine for me.
07:14 ๐Ÿ”— cmx stack of stacker
07:14 ๐Ÿ”— cmx stackerstack
07:25 ๐Ÿ”— BlueMax so chronomex, why the short nick
07:25 ๐Ÿ”— yipdw he's gone Web 2.0
07:25 ๐Ÿ”— yipdw soon we'll be able to like him and share him
07:47 ๐Ÿ”— * cmx puts out like button
07:49 ๐Ÿ”— cmx cmx_[Like รฐยŸย‘ย]
07:49 ๐Ÿ”— cmx for the unicodally challenged that's http://www.fileformat.info/info/unicode/char/1f44d/index.htm
07:50 ๐Ÿ”— twrist I'll meet you halfway
07:50 ๐Ÿ”— twrist cmx [Like รขยœย”]
07:54 ๐Ÿ”— cmx Cool, twrist likes me
12:13 ๐Ÿ”— BlueMax hey joepie91 how did the T17 forums go
13:17 ๐Ÿ”— joepie91 BlueMax: it ran out of RAM :(
13:18 ๐Ÿ”— joepie91 this is starting to get annoying
13:18 ๐Ÿ”— BlueMax ack.
13:18 ๐Ÿ”— joepie91 I have an 878MB WARC here
13:19 ๐Ÿ”— joepie91 a find | wc -l gives me 94465 files
13:19 ๐Ÿ”— joepie91 no idea how complete it is
13:19 ๐Ÿ”— joepie91 BlueMax: do you have a box with enough RAM to run this on? my 1GB wasn't enough :|
13:20 ๐Ÿ”— BlueMax unfortunately I don't have Linux on any of my boxes
13:22 ๐Ÿ”— joepie91 there's wget for Windows, no?
13:22 ๐Ÿ”— joepie91 afaik WARC functionality is now in mainline
13:22 ๐Ÿ”— BlueMax and even if I did unfortunately my net is bonked
13:23 ๐Ÿ”— BlueMax and by bonked I mean capped, fucking Australia
13:33 ๐Ÿ”— joepie91 ah
13:33 ๐Ÿ”— joepie91 :(
13:33 ๐Ÿ”— joepie91 I think the problem is that wget-warc keeps all to-do URLs in RAM
13:33 ๐Ÿ”— joepie91 rather than storing them on the filesystem
13:33 ๐Ÿ”— joepie91 so on big sites, it just OOMs
13:33 ๐Ÿ”— joepie91 doesn't scale.. at all
13:34 ๐Ÿ”— BlueMax that seems relatively inefficent
13:34 ๐Ÿ”— joepie91 very much so
13:35 ๐Ÿ”— joepie91 it kind of sucks because I have a nice unmetered box
13:35 ๐Ÿ”— joepie91 at a datacenter that doesn't give half a fuck about abusemail
13:35 ๐Ÿ”— joepie91 yet the RAM is what keels over my wget-warc processes on big sites...
14:07 ๐Ÿ”— twrist swap ALL the things!
14:08 ๐Ÿ”— * BlueMax swaps twrist for a real boy
14:08 ๐Ÿ”— twrist ono
14:13 ๐Ÿ”— joepie91 right
14:13 ๐Ÿ”— joepie91 BlueMax; I'm going to be trying to wget-warc the team17 forums on my laptop
14:13 ๐Ÿ”— joepie91 it has 4GB of RAM
14:13 ๐Ÿ”— joepie91 and a bunch of swap
14:13 ๐Ÿ”— joepie91 that should hopefully be enough
14:13 ๐Ÿ”— joepie91 also, twrist, I was trying to run it on an openvz VPS
14:13 ๐Ÿ”— joepie91 you can't just add swap there
14:14 ๐Ÿ”— twrist ..right
14:14 ๐Ÿ”— BlueMax right
14:47 ๐Ÿ”— joepie91 damnit
14:47 ๐Ÿ”— joepie91 wget-warc won't build
14:47 ๐Ÿ”— joepie91 or well
14:47 ๐Ÿ”— joepie91 wget-dev
15:05 ๐Ÿ”— SmileyG whats occuring world?
15:12 ๐Ÿ”— joepie91 BlueMax: well, it's running
15:12 ๐Ÿ”— BlueMax k
15:12 ๐Ÿ”— joepie91 stupid compiling stuff
15:12 ๐Ÿ”— BlueMax gl
15:12 ๐Ÿ”— joepie91 thanks :P
15:44 ๐Ÿ”— joepie91 goddamnit
15:44 ๐Ÿ”— joepie91 I'm seeing 500 errors
15:44 ๐Ÿ”— joepie91 on team17 forums
15:45 ๐Ÿ”— joepie91 every few pages
15:45 ๐Ÿ”— joepie91 oh dear
15:45 ๐Ÿ”— joepie91 seeing a lot of them now
15:47 ๐Ÿ”— joepie91 I guess I'll have to do a second pass...
15:47 ๐Ÿ”— * joepie91 wonders why the 500 errors are occurring
15:49 ๐Ÿ”— joepie91 well shit
15:49 ๐Ÿ”— joepie91 :|
15:50 ๐Ÿ”— godane do you need a login?
15:54 ๐Ÿ”— joepie91 no, it actually seems to be a server issue
15:54 ๐Ÿ”— joepie91 I have a working login
15:54 ๐Ÿ”— joepie91 and I can load the same pages in my browser just fine
15:54 ๐Ÿ”— joepie91 perhaps vbulletin is just really shit
15:55 ๐Ÿ”— joepie91 also, http://warc.readthedocs.org/en/latest <- is this known to work correctly?
15:55 ๐Ÿ”— joepie91 cc omf_ and alard in particular
15:56 ๐Ÿ”— joepie91 considering writing my own WARC spider, that uses on-disk files to keep track of state
15:56 ๐Ÿ”— joepie91 to get around the RAM issue
15:57 ๐Ÿ”— omf_ I don't use that library
15:57 ๐Ÿ”— alard joepie91: I've never used it, but since it's from IA it might work?
15:57 ๐Ÿ”— joepie91 huh, it's from IA?
15:58 ๐Ÿ”— alard The github link links to internetarchive/warc.
15:58 ๐Ÿ”— omf_ so is the broken s3upload script
15:58 ๐Ÿ”— godane i'm dong a mirror now of it
15:58 ๐Ÿ”— joepie91 odd
15:58 ๐Ÿ”— joepie91 it's a fork of it apparently
15:58 ๐Ÿ”— joepie91 oh well :)
15:59 ๐Ÿ”— joepie91 guess it'll be good to go then
16:00 ๐Ÿ”— godane i got the stupid s= problem again
16:01 ๐Ÿ”— godane which turn cause none of the topics to be downloaded
16:08 ๐Ÿ”— godane HELP
16:08 ๐Ÿ”— godane the s= session just will not go away
16:09 ๐Ÿ”— godane i can't mirror team17 forums in less some one helps me
16:09 ๐Ÿ”— godane and with my connection it may take a bit
16:09 ๐Ÿ”— joepie91 godane: what in particular do you need help with?
16:10 ๐Ÿ”— godane wget keeps getting the s= urls on the forum pages
16:10 ๐Ÿ”— joepie91 are you using stored login cookies?
16:10 ๐Ÿ”— joepie91 because vbulletin does that when it can't store cookies iirc
16:10 ๐Ÿ”— joepie91 if you use the commands on the wiki with an account, it should stop doing that
16:11 ๐Ÿ”— joepie91 without an account it should work also, in theory, but not sure how
16:11 ๐Ÿ”— joepie91 and you'll probably need to login anyway to get everything
16:11 ๐Ÿ”— joepie91 sec
16:11 ๐Ÿ”— joepie91 src/wget --save-cookies team17-cookies.txt --post-data 'vb_login_username=USERNAMEGOESHERE&vb_login_password=PASSWORDGOESHERE&securitytoken=guest&cookieuser=1&do=login' http://forum.team17.com/login.php?do=login
16:11 ๐Ÿ”— joepie91 src/wget --load-cookies team17-cookies.txt -e robots=off --wait 0.25 "http://forum.team17.com/" --mirror --warc-file="at-team17-forum"
16:11 ๐Ÿ”— joepie91 assuming src/wget is your warc-supporting wget inary
16:11 ๐Ÿ”— joepie91 binary *
16:11 ๐Ÿ”— joepie91 run those successively with valid login details and it should work fine :)
16:12 ๐Ÿ”— godane i'm using wget-lua
16:12 ๐Ÿ”— joepie91 any particular reason?
16:12 ๐Ÿ”— godane it grabs all images
16:13 ๐Ÿ”— godane including ones on different hosts
16:13 ๐Ÿ”— joepie91 doesn't --mirror also do that?
16:13 ๐Ÿ”— joepie91 page-requisites included etc
16:13 ๐Ÿ”— joepie91 or am I confused?
16:13 ๐Ÿ”— godane i have span-hosts
16:14 ๐Ÿ”— joepie91 anyway, wget-lua should do the above as well, perhaps without --warc-file if warc support is not compiled in
16:14 ๐Ÿ”— joepie91 err..
16:14 ๐Ÿ”— joepie91 >.>
16:14 ๐Ÿ”— joepie91 accidentally started a second client
16:15 ๐Ÿ”— godane now it looks like its grabing everything
16:17 ๐Ÿ”— godane my problem was i was saving the cookies over without loading it in
16:17 ๐Ÿ”— joepie91 ah
21:03 ๐Ÿ”— Coderjoe I wonder if one could hack more swap around a single process (like a wget spider) by first creating a disk file (sparse works fine), mmaping it, and then calling exec to start the other process
21:04 ๐Ÿ”— Coderjoe I know I used a large disk file and mmap to add memory in a program I was writing that was exhausting memory otherwise
21:06 ๐Ÿ”— Coderjoe in this hack idea, I think the mmaped file stays open and mapped until the kernel cleans up from the process exiting
21:07 ๐Ÿ”— Coderjoe I suppose it probably wouldn't work, as the malloc functions wouldn't know about the extra memory
21:08 ๐Ÿ”— Coderjoe in my program, I had functions I wrote to hand out sections of that mmaped space for use.
22:35 ๐Ÿ”— joepie91 Coderjoe: one of the issues would be that it'd be terrifyingly slow
22:35 ๐Ÿ”— joepie91 because it's optimized for working with flash memory, not a filesystem
22:35 ๐Ÿ”— joepie91 er
22:35 ๐Ÿ”— joepie91 RAM *
22:35 ๐Ÿ”— joepie91 at least, that's what I'd expect to happen
23:34 ๐Ÿ”— joepie91 soooooo
23:34 ๐Ÿ”— joepie91 http://todo.cryto.net/ :D
23:35 ๐Ÿ”— joepie91 you can start using it straight away, and register + save your existing items if you like it
23:35 ๐Ÿ”— joepie91 (free)
23:52 ๐Ÿ”— godane so i got The-Lounge part of team17 forum

irclogger-viewer