Time |
Nickname |
Message |
00:19
๐
|
godane |
i'm doing another pull of g4tv thefeed |
03:41
๐
|
Coderjoe |
anyone need a copy of stacker 2.0? http://i.imgur.com/7vlry82.jpg |
03:42
๐
|
Coderjoe |
yes, that is 15 copies |
06:08
๐
|
turnkit |
Why Stacker? MS-DOS 6.0 is working fine for me. |
07:14
๐
|
cmx |
stack of stacker |
07:14
๐
|
cmx |
stackerstack |
07:25
๐
|
BlueMax |
so chronomex, why the short nick |
07:25
๐
|
yipdw |
he's gone Web 2.0 |
07:25
๐
|
yipdw |
soon we'll be able to like him and share him |
07:47
๐
|
* |
cmx puts out like button |
07:49
๐
|
cmx |
cmx_[Like รฐยยย] |
07:49
๐
|
cmx |
for the unicodally challenged that's http://www.fileformat.info/info/unicode/char/1f44d/index.htm |
07:50
๐
|
twrist |
I'll meet you halfway |
07:50
๐
|
twrist |
cmx [Like รขยย] |
07:54
๐
|
cmx |
Cool, twrist likes me |
12:13
๐
|
BlueMax |
hey joepie91 how did the T17 forums go |
13:17
๐
|
joepie91 |
BlueMax: it ran out of RAM :( |
13:18
๐
|
joepie91 |
this is starting to get annoying |
13:18
๐
|
BlueMax |
ack. |
13:18
๐
|
joepie91 |
I have an 878MB WARC here |
13:19
๐
|
joepie91 |
a find | wc -l gives me 94465 files |
13:19
๐
|
joepie91 |
no idea how complete it is |
13:19
๐
|
joepie91 |
BlueMax: do you have a box with enough RAM to run this on? my 1GB wasn't enough :| |
13:20
๐
|
BlueMax |
unfortunately I don't have Linux on any of my boxes |
13:22
๐
|
joepie91 |
there's wget for Windows, no? |
13:22
๐
|
joepie91 |
afaik WARC functionality is now in mainline |
13:22
๐
|
BlueMax |
and even if I did unfortunately my net is bonked |
13:23
๐
|
BlueMax |
and by bonked I mean capped, fucking Australia |
13:33
๐
|
joepie91 |
ah |
13:33
๐
|
joepie91 |
:( |
13:33
๐
|
joepie91 |
I think the problem is that wget-warc keeps all to-do URLs in RAM |
13:33
๐
|
joepie91 |
rather than storing them on the filesystem |
13:33
๐
|
joepie91 |
so on big sites, it just OOMs |
13:33
๐
|
joepie91 |
doesn't scale.. at all |
13:34
๐
|
BlueMax |
that seems relatively inefficent |
13:34
๐
|
joepie91 |
very much so |
13:35
๐
|
joepie91 |
it kind of sucks because I have a nice unmetered box |
13:35
๐
|
joepie91 |
at a datacenter that doesn't give half a fuck about abusemail |
13:35
๐
|
joepie91 |
yet the RAM is what keels over my wget-warc processes on big sites... |
14:07
๐
|
twrist |
swap ALL the things! |
14:08
๐
|
* |
BlueMax swaps twrist for a real boy |
14:08
๐
|
twrist |
ono |
14:13
๐
|
joepie91 |
right |
14:13
๐
|
joepie91 |
BlueMax; I'm going to be trying to wget-warc the team17 forums on my laptop |
14:13
๐
|
joepie91 |
it has 4GB of RAM |
14:13
๐
|
joepie91 |
and a bunch of swap |
14:13
๐
|
joepie91 |
that should hopefully be enough |
14:13
๐
|
joepie91 |
also, twrist, I was trying to run it on an openvz VPS |
14:13
๐
|
joepie91 |
you can't just add swap there |
14:14
๐
|
twrist |
..right |
14:14
๐
|
BlueMax |
right |
14:47
๐
|
joepie91 |
damnit |
14:47
๐
|
joepie91 |
wget-warc won't build |
14:47
๐
|
joepie91 |
or well |
14:47
๐
|
joepie91 |
wget-dev |
15:05
๐
|
SmileyG |
whats occuring world? |
15:12
๐
|
joepie91 |
BlueMax: well, it's running |
15:12
๐
|
BlueMax |
k |
15:12
๐
|
joepie91 |
stupid compiling stuff |
15:12
๐
|
BlueMax |
gl |
15:12
๐
|
joepie91 |
thanks :P |
15:44
๐
|
joepie91 |
goddamnit |
15:44
๐
|
joepie91 |
I'm seeing 500 errors |
15:44
๐
|
joepie91 |
on team17 forums |
15:45
๐
|
joepie91 |
every few pages |
15:45
๐
|
joepie91 |
oh dear |
15:45
๐
|
joepie91 |
seeing a lot of them now |
15:47
๐
|
joepie91 |
I guess I'll have to do a second pass... |
15:47
๐
|
* |
joepie91 wonders why the 500 errors are occurring |
15:49
๐
|
joepie91 |
well shit |
15:49
๐
|
joepie91 |
:| |
15:50
๐
|
godane |
do you need a login? |
15:54
๐
|
joepie91 |
no, it actually seems to be a server issue |
15:54
๐
|
joepie91 |
I have a working login |
15:54
๐
|
joepie91 |
and I can load the same pages in my browser just fine |
15:54
๐
|
joepie91 |
perhaps vbulletin is just really shit |
15:55
๐
|
joepie91 |
also, http://warc.readthedocs.org/en/latest <- is this known to work correctly? |
15:55
๐
|
joepie91 |
cc omf_ and alard in particular |
15:56
๐
|
joepie91 |
considering writing my own WARC spider, that uses on-disk files to keep track of state |
15:56
๐
|
joepie91 |
to get around the RAM issue |
15:57
๐
|
omf_ |
I don't use that library |
15:57
๐
|
alard |
joepie91: I've never used it, but since it's from IA it might work? |
15:57
๐
|
joepie91 |
huh, it's from IA? |
15:58
๐
|
alard |
The github link links to internetarchive/warc. |
15:58
๐
|
omf_ |
so is the broken s3upload script |
15:58
๐
|
godane |
i'm dong a mirror now of it |
15:58
๐
|
joepie91 |
odd |
15:58
๐
|
joepie91 |
it's a fork of it apparently |
15:58
๐
|
joepie91 |
oh well :) |
15:59
๐
|
joepie91 |
guess it'll be good to go then |
16:00
๐
|
godane |
i got the stupid s= problem again |
16:01
๐
|
godane |
which turn cause none of the topics to be downloaded |
16:08
๐
|
godane |
HELP |
16:08
๐
|
godane |
the s= session just will not go away |
16:09
๐
|
godane |
i can't mirror team17 forums in less some one helps me |
16:09
๐
|
godane |
and with my connection it may take a bit |
16:09
๐
|
joepie91 |
godane: what in particular do you need help with? |
16:10
๐
|
godane |
wget keeps getting the s= urls on the forum pages |
16:10
๐
|
joepie91 |
are you using stored login cookies? |
16:10
๐
|
joepie91 |
because vbulletin does that when it can't store cookies iirc |
16:10
๐
|
joepie91 |
if you use the commands on the wiki with an account, it should stop doing that |
16:11
๐
|
joepie91 |
without an account it should work also, in theory, but not sure how |
16:11
๐
|
joepie91 |
and you'll probably need to login anyway to get everything |
16:11
๐
|
joepie91 |
sec |
16:11
๐
|
joepie91 |
src/wget --save-cookies team17-cookies.txt --post-data 'vb_login_username=USERNAMEGOESHERE&vb_login_password=PASSWORDGOESHERE&securitytoken=guest&cookieuser=1&do=login' http://forum.team17.com/login.php?do=login |
16:11
๐
|
joepie91 |
src/wget --load-cookies team17-cookies.txt -e robots=off --wait 0.25 "http://forum.team17.com/" --mirror --warc-file="at-team17-forum" |
16:11
๐
|
joepie91 |
assuming src/wget is your warc-supporting wget inary |
16:11
๐
|
joepie91 |
binary * |
16:11
๐
|
joepie91 |
run those successively with valid login details and it should work fine :) |
16:12
๐
|
godane |
i'm using wget-lua |
16:12
๐
|
joepie91 |
any particular reason? |
16:12
๐
|
godane |
it grabs all images |
16:13
๐
|
godane |
including ones on different hosts |
16:13
๐
|
joepie91 |
doesn't --mirror also do that? |
16:13
๐
|
joepie91 |
page-requisites included etc |
16:13
๐
|
joepie91 |
or am I confused? |
16:13
๐
|
godane |
i have span-hosts |
16:14
๐
|
joepie91 |
anyway, wget-lua should do the above as well, perhaps without --warc-file if warc support is not compiled in |
16:14
๐
|
joepie91 |
err.. |
16:14
๐
|
joepie91 |
>.> |
16:14
๐
|
joepie91 |
accidentally started a second client |
16:15
๐
|
godane |
now it looks like its grabing everything |
16:17
๐
|
godane |
my problem was i was saving the cookies over without loading it in |
16:17
๐
|
joepie91 |
ah |
21:03
๐
|
Coderjoe |
I wonder if one could hack more swap around a single process (like a wget spider) by first creating a disk file (sparse works fine), mmaping it, and then calling exec to start the other process |
21:04
๐
|
Coderjoe |
I know I used a large disk file and mmap to add memory in a program I was writing that was exhausting memory otherwise |
21:06
๐
|
Coderjoe |
in this hack idea, I think the mmaped file stays open and mapped until the kernel cleans up from the process exiting |
21:07
๐
|
Coderjoe |
I suppose it probably wouldn't work, as the malloc functions wouldn't know about the extra memory |
21:08
๐
|
Coderjoe |
in my program, I had functions I wrote to hand out sections of that mmaped space for use. |
22:35
๐
|
joepie91 |
Coderjoe: one of the issues would be that it'd be terrifyingly slow |
22:35
๐
|
joepie91 |
because it's optimized for working with flash memory, not a filesystem |
22:35
๐
|
joepie91 |
er |
22:35
๐
|
joepie91 |
RAM * |
22:35
๐
|
joepie91 |
at least, that's what I'd expect to happen |
23:34
๐
|
joepie91 |
soooooo |
23:34
๐
|
joepie91 |
http://todo.cryto.net/ :D |
23:35
๐
|
joepie91 |
you can start using it straight away, and register + save your existing items if you like it |
23:35
๐
|
joepie91 |
(free) |
23:52
๐
|
godane |
so i got The-Lounge part of team17 forum |