[00:04] malicious is a ratt! http://pastebin.com/0cKf9Htq [00:04] Spam from the piratebay... [00:04] That bot was on #thepiratebay.org also. [00:08] malicious is a ratt! http://pastebin.com/0cKf9Htq [00:08] malicious is a ratt! http://pastebin.com/0cKf9Htq [00:12] malicious is a ratt! http://pastebin.com/0cKf9Htq [00:33] ndurner1: not to my knowledge [00:33] alard: splinder.heroku.com is hurting my webbrowser again [00:34] ndurner1: it says you're a member [00:37] alard: oh. another graph [00:38] two more, actually [01:36] alard: two suggestions [01:37] make updateChart do nothing unless it's been 30 seconds or so since the last update [01:37] also, change it so that it does not have to loop over all of the seriesData for each downloader every time it updates [01:38] instead, store the scaled values and merely push new points onto the end of the series [01:43] oh, you already do that [01:44] I see, updateChart both builds and updates the first graph, and buildChart builds and updates the second and third [01:55] OK, so I tried to install wget-warc. It failed, said GNUTLS wasn't installed. So I installed GNUTLS. Still says GNUTLS isn't there. Help? [01:55] you need the dev package as well [01:55] headers and config files to tell your compiler where the headers are [01:55] apt-get install ... ? [01:56] gnutls-dev [01:58] OK, it's installed now, with an impressive amount of console-spam. Thanks for the assist. [02:00] you're welcome [02:01] the -dev suffix is a pretty standard convention for getting the headers [02:01] And I wouldn't have known that was what I needed to look for. [02:29] yea, it's one of those crucial pieces of information that is impossible to search for [02:35] underscor: Could you double your rate on spindler? At our current pace, I think it'd take us ~27 days. [02:36] More seriously, it's nice to see my contributions shooting down the list. 6th now! [02:41] heh [02:44] it's so weird to be doing all this work for such a tiny amount of data [02:46] Yeah. It's also a bit weird to be doing it for a site that pretty much none of us had heard of two weeks ago. [02:49] It also makes me wonder what sort of language abilities people here have, as Italian knowledge has certainly been helpful. [03:29] well, I spun up 200 splinder grabbers... dunno for how long [04:06] Sweet. I'll be 7th, momentarily. [04:10] can I run multiple instances from the same directory? [04:13] indeed [05:09] anyone see this: http://www.escapistmagazine.com/news/view/114155-Server-Screw-Up-Kills-MMO [05:16] That's crazy! [05:17] not really unexpected [05:19] Paradoks: I'm running 128 threads now [05:19] IO is maxed, so more won't be productive [05:30] I'm up to over 1500 threads [05:40] amerrykan: see also the danger hiptop [05:41] http://en.wikipedia.org/wiki/Microsoft_data_loss_2009 [05:42] amerrykan: if you need more examples, try looking at bitcoin - biggest trader hacked, (and second), third deleted its keys because it was hosted on amazon [05:45] They don't care about you. [05:46] closure: ha, you're gonna kill your instance [05:47] load is 18 :P [05:47] Microsoft CEO, Steve Ballmer disputed whether there had ever been a data loss, instead describing it as an outage. Ballmer said, “It is not clear there was data loss". However, he said the incident was "not good" for Microsoft.[11] [05:47] Scumbag Ballmer says it's just an "outage" [05:47] "we lost the data about whether we lost data" [05:55] http://a2.sphotos.ak.fbcdn.net/hphotos-ak-ash4/378747_10150540282784768_630194767_11650574_1879268796_n.jpg [05:58] http://jerkcity.com/jerkcity4702.html [05:58] A joke about SketchCow's film? [06:17] Dammit, closure's beating me [06:23] how so? [06:24] His users/h is higher than mine [06:24] Rate over time. closure started like six hours ago or something. [06:24] oh, lol [06:24] holy shit [06:24] closure: how many downloaders are running, and on what [06:24] I thought you meant the compiler [06:24] 1500 on a micro instance I think [06:24] how the fuck is he getting 1500 on a micro? [06:24] mine slows down at 12 [06:24] Very carefully? [06:24] ha [06:24] I could be wrong [06:24] that's it, I'm pushing this to 100 [06:24] micro instance, haha [06:24] p sure he's on a micro [06:26] I'm wondering if all of these warcs contain real data, or if some of it is Splinder throwing up with 200 OK [06:29] ha, closure just grabbed EroticAngel [06:29] mmm, that was a fun handful [08:01] hmm, I have some wget-warcs that have been running for six hours [08:01] that doesn't seem right [08:03] I have one running for 3 days [08:04] these are for splinder, though [08:04] o [08:04] hm. [10:21] db48x: Here you go, the splinder graph should now have fewer redraws: http://splinder.heroku.com/ [10:48] Just to make sure, it does work to run a bunch of the client instances out of the same directory, right? [10:51] Wyatt:Wor: Yes, every client is working in its own directory. [10:52] (The user's directory, that is.) [10:54] I found a problem: Splinder's US site sometimes returns an ERROR 502 Bad Gateway or an ERROR 504 Gateway Timeout. [10:55] alard: PM [10:55] chronomex: Yes, I've seen that, was coming to it. :) [10:55] okay, thanks :) [10:59] alard: All right, thanks. [11:00] I've got about fifteen of these running across three hosts. [11:00] I can probably run a few more. [11:02] How are you managing large numbers of them at once? Just nohup/backgrounding [11:02] screen? [11:03] That's what I've been doing; wasn't sure if there was a more efficient method [11:08] Could one of you please run find data/ -name "wget*.log" -exec grep -H "ERROR 50" "{}" ";" inthe splinder directory? [11:09] I only get these 502, 504 errors on US profiles. [11:17] Likewise, from what I can see [11:18] Okay, thanks, we'll probably need to fix that. [11:18] Since one of those errors means that whole parts of the account are missing. [11:20] The us version is much slower than the Italian version anyway. [11:22] (Part of that may be our doing, of course.) [11:22] haha [12:01] alard: no results [12:02] ndurner1: Okay, good, thanks. [12:02] sure [12:10] AnyHub will be shutting down as of Friday, 18th of November. Please download any important data immediately, as it will be unavailable past that date. [12:10] http://www.anyhub.net/ [12:10] well [12:10] Oh lovely. I may need to add more quota to my VPS... [12:12] The whole db there is ~2.78 TiB [12:12] 1114263 files [12:13] Oh dear. [12:13] freshly uploaded: http://www.anyhub.net/stats [12:22] Email send if they can provide a list of public files [12:25] why? [12:25] http://f.anyhub.net/4Baq and replace chars [12:26] those are random? [12:26] should only be about 14 million combinations tho [12:26] yeah... [12:27] 1114263 files [12:27] thats a low hit ratio [12:28] it seems like they are linearly added, tho [12:31] Hmm, am I just incompetent? I thought it was possible to open a new window inside a detached screen and then run something... [12:32] Ymgve: that great! :D [12:41] Ah, never mind, figured it out. [12:56] Linear numbering would make 62*62*10 = 1,191,640 files, so that matches the statistics. [13:11] We can pull this off [13:11] now we split all the files. [13:12] because it is alot of daya [13:12] if intresed [13:12] tough [13:17] I mean, it is just some file-upload [13:17] not sure if that would be worth the effort [13:21] There may be interesting files in there. [13:23] I'm writing a small script. [13:36] woot [13:41] PepsiMax: https://github.com/ArchiveTeam/anyhub-grab [13:42] git clone git://github.com/ArchiveTeam/anyhub-grab.git ; cd anyhub-grab ; ./get-wget-warc.sh ; ./dld-range.sh 4AA [13:42] (Maybe someone should make a list of 3-letter ranges to claim on the wiki.) [13:42] alard: yeah, I could hit up the wiki. [13:53] http://archiveteam.org/index.php?title=AnyHub [13:58] alard: poop. ./wget-warc not found. [13:58] Saving to: `wget-1.13.4-2574-zlib124-plusfix.tar.bz2' [14:03] alard: configure: error: --with-ssl was given, but GNUTLS is not available. [14:03] wget-warc not successfully built. [14:04] PepsiMax: Please do a git pull, fixed an error (not wget-warc related). [14:04] 1 files changed :) [14:05] For wget-warc, try to install gnutls, gnutls-dev. [14:05] apt-get install gnutls-bin gnutls-dev [14:05] Or something like that. [14:05] (Have to go now, maybe speak to you later.) [14:05] ok! [14:12] Hmm [14:12] ./get-wget-warc.sh: line 29: make: command not found [14:12] wget-warc not successfully built. [14:12] installing that then. [14:21] whoohoo! wget-warc successfully built. [14:38] yipdw: I also have some hung wgets on splinder [14:41] or just really slow ones [14:45] So when the downloader craps out and says "Error downloading 'it:ermejointutt'." that user doesn't get marked as done and someone else tries it later, right? [14:55] DoubleJ: No, that's not a problem, the user will be retried later. [14:59] Good to know. [15:00] PepsiMax: I made a tracker-based script, may be easier than claiming ranges via the wiki. [15:01] So if you can, please kill your script(s), git pull and then run ./dld-client.sh PepsiMax [15:01] alard: holy shit [15:01] this is epic [15:02] alard: hmm, dld-client now exits if it fails to dl a user.. [15:02] closure: Didn't it always? [15:03] maybe, you added new checks though yes? [15:04] alard: It seems I don't have anything to curl :P [15:04] apt-get install curl ? [15:04] closure: Could be, not sure. I did do something with 502, 504 errors. [15:05] hurr im so stupid [15:05] well I had to disable the exit, otherwise I can't keep my thread pools full [15:06] PepsiMax: Also, please also run git pull once more (then wget won't follow redirects to /page/notfound all the time). [15:06] closure: That's okay, I guess, as long as it doesn't mark things done that aren't. [15:07] Sweet! Downloading prefix: OUG [15:07] PepsiMax: Good, you'll show up on http://anyhub.heroku.com/ [15:08] Ah, there you are. [15:08] I downloaded 202MB of 4AA_ til 4AD_ already [15:08] should i purge that? [15:08] Yes, if 202MB is not too much for you that's the easiest way. [15:10] Bye. See you later. [15:10] okey! [15:11] (You can run more than one dld-client at the same time, by the way.) [15:18] Sweet! :D [15:19] 2MB total disk size. Small downloads then [15:19] Yeah. I'm averaging about 5.5MB/user right now. [15:19] Though my sample size is pretty small compared to the "I don't have to pay rent" crowd :) [15:20] DoubleJ: what projct are you doing then? [15:20] I'm AnyHub currenyly. [15:20] Ah, my bad. I thought everybody was on Splinder at the moment. [15:20] Splinder? [15:20] Lets see [15:21] alard: Is this really working? It doesn't look like im actully downloading useful data! [15:23] alard: Is it normal that most of these are like 73k? [15:24] DoubleJ: what is yeah, underscor, im not sure either. [15:25] 10174.dld3 (11/13/2011 04:19:08 PM) (Detached) [15:25] lol crack it up [15:25] 10107.dld2 (11/13/2011 04:18:39 PM) (Detached) [15:25] 9429.dld (11/13/2011 04:07:35 PM) (Detached) [15:25] PepsiMax: Never mind, I was talking about my Splinder d/ls, which aren't relevant to the work you guys are doing on Anyhub. [15:26] okei [15:26] etc #AnyHubTeam [15:33] Good. :P [16:08] http://splinder.heroku.com/ where do people get this much computing power? [16:08] shiiiit [16:09] That was my "not having rent" comment earlier: Some people can afford to drop their take-home pay on S3 instances :) [16:11] yeah DoubleJ, it's all bout the cloud these days. [16:12] beating the shit out of my 551Mhz ADSL homebox [16:12] Hey, the only Linux box I have with a functioning connection is my VM. Disk I/O is, shall we say, less than stellar. [16:12] haha [16:13] 88% CPU idle, 4% iowait. [16:13] fuck the cloud. my computing power is real hardware. [16:13] I use this current box for http server/irssi screen mainly [16:14] closure: it is? [16:14] Nice. [16:14] closure: In that case, my hat's off to you. [16:14] I could pump out some srs shit I guess [16:14] let me boot op my other VM [16:15] closure: when will Splinder end? [16:15] well we're 10% there now [16:16] And they shut down on? [16:27] next week? [16:28] alard: the disk usage stats seem far from reality, btw. I have well over 30 gb [16:28] anyhub appears to be like 90% torrent files [16:29] and 10% gay pr0n [16:31] do we care to continue? [16:32] #AnyHubTeam etc [17:21] Bzzzz http://i.imgur.com/Lje05.png [18:15] closure: Disk usage stats, you mean Splinder? It should be just the sum of all your reported byte sizes. [18:16] However, there's a catch: it only counts the size of the WARC files. [18:16] ah [18:26] * closure passes underscor finally :) [18:26] closure: you must have fast internet then. [18:47] Hello Splinder people: (that is at least closure, underscor, ndurner, yipdw, Schbirid / spirit, Wyatt, db48x, Paradoks, DoubleJ) [18:47] The US site is a bit unstable, so there's a fix for that in Github. [18:48] Please: 1. git pull to prevent future problems; 2. run ./fix-dld.sh YOURNICK to fix existing ones [18:48] I have removed the US usernames since I discovered the problem, but at some point they have to be put back in. [18:49] ok [18:49] The new version of the script is also a bit more resilient to network errors: it will retry etc. [18:49] Thanks. [18:49] c76482c7e20d62d7e474f705d5e2bd6195559674 is the newest? [18:50] ah, I see, I cloned my clone [18:50] bc1f2d34113e62451031faa5f757a0d4b2dcb07a [18:55] hmm, fixed users get counted again on the tracker I think [18:55] Dumb question time: git pull is throwing this error: "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set)." How do I make it do what I say? [18:56] cd .. to the directory containing .git, probably.. I assume you mounted some other drive inside your git clone [18:56] No, it's just a normal folder. [18:57] I just said git pull {URL} instead of the orignal git clone {URL} [18:57] So, I'm assuming that's not how it works? [18:57] very weird [18:57] Wait. I'm supposed to be in the directory this time? [18:58] That might do it. Hang on. [18:58] OK, that was it. [18:58] ah, cute.. yes, you need to be in the directory to use git :) [18:59] you have /home on a different filesystem than / probably. Or something like that. Amusing [18:59] Well, when you clone it makes the directory and moves down to drop everything in [18:59] So updating, it seemed reasonable to do the same thing [19:00] And yes, I did the mount /home as its own partition thing. [19:01] That way when I need to pave over the install at some point I don't lose everything. [19:06] DoubleJ: why the trouble? [19:09] ? [19:11] alard: does the downloader also has a 'nice exit'? (tell it that the current one it's working on is the last one?) [19:11] That was I don't have to babysit the current running ones.. [19:12] DoubleJ: mounting /home as his pme part. [19:12] PepsiMax: It's because Linux installers have two options: Nuke the parition and lose everything, or install over top and hope everything works. [19:13] Since A loses all your data and B is more risk than I'm willing to take, separating / from /home is the only choice I'm happy with [19:13] DoubleJ: always backup (on another disk)! [19:13] (At least, as happy as I can be when I'm using Linux.) [19:17] PepsiMax: To stop your downloader, touch STOP in the working directory. [19:18] closure: Fixed users get counted by the javascript updater, but not by the tracker. If you reload the page the count will be updated. [19:20] alard: I touched anyhub-grab/STOP should this prevent the 7 workers from grabbing more data when the current one is done? [19:20] Yes. [19:21] sweeet [19:21] They also look at the modification time of the STOP file. So, for example: if you want to keep running one tracker, you can start a new one now. That one will keep running (started after the STOP file was touched), while the others will gracefully stop. [19:22] alard: you are a god bless to the archiveteam! [19:22] this is the app I've been dreaming of. [20:04] I wish I had fiber :F [20:13] Gooo kellogs all brain! [20:16] ersi: kellogs? those are food? [20:18] Yeah, cerials [20:18] You said you wanted fiber [20:18] ^ [20:18] I hear banana also works [20:22] hrm [20:23] all but a couple of my downloaders died during the night [20:23] db48x: Check your scrollback. New version that doesn't choke and die, and a fixer for the profiles it choked and died on before. [20:28] i ran ./fix-dld.sh spirit [20:28] it:pepasaera is still incomplete, not fixing. [20:28] says [20:28] us:ButtWipe is still incomplete, not fixing. [20:29] Mustn't... make... buttwipe incomplete joke... [20:31] DoubleJ: so I see :) [20:33] lol@ButtWipe [20:33] Well, apparently ButtWipe shat all over your script [20:41] Whoops [20:41] the server just froze! [20:41] Prefix 49I done: 507M [20:42] congratz [20:45] Schibird: ./fix-dl.sh is not for fixing incomplete downloads. To fix those, run ./dld-single.sh spirit it:pepasaera ; ./dld-single.sh spirit us:ButtWipe [20:45] cheers [20:45] (This is because you can run the fix-dld script while your other downloaders are still running, and still have .incomplete in those users.) [20:46] So only redo users if you know that you're not currently working on them. [20:46] i stopped everything [20:47] Well, you know that, the script doesn't. :) [20:58] alard: pfoe! Prefix 4zK done: 1.1G [21:07] jesus [21:08] I can't even access splinder.com reliably anymore [21:11] oh, haha, I got http://www.splinder.com/profile/Redazione [21:11] they have a blog that goes back to 2002 [21:11] this is gonna be a while [21:32] eduardo caught red handed DDOSING. traced to his home dsl connection. and reported to the Brazilian authorities - ENiGMA - http://pastebin.com/K0pJgwwU [21:42] ndurner, underscor: Please update your anyhub scripts: git pull ; ./fix-dld.sh YOURNICK [21:43] Sorry, hold your fire. [21:45] ndurner, underscor: Yes, please git pull ; ./fix-dld.sh YOURNICK [22:16] Hmm... dld-single.sh seems to be stuck in a loop on -dituttounpo-.splinder.com (blog) even though it appears to exist. Would the leading dash be confusing wget? [22:21] 22:21:42 up 19 days, 15:46, 1 user, load average: 40.95, 53.48, 56.25 [22:21] we;lp [22:21] 18:22:12 up 17:13, 6 users, load average: 306.80, 232.39, 220.47 [22:34] I think closure wins [23:00] lol closure! [23:00] /bin/sh: uptuime: not found [23:01] /bin/sh: uptutime: not found [23:01] nevermind [23:03] 00:03:22 up 18 days, 2:09, 1 user, load average: 1.19, 1.47, 0.98 [23:03] yay works [23:30] hmm [23:30] ERROR contacting tracker. Could not mark 'it:El_Pinta300' done. [23:42] "root@Marcelo:/home/anyhub-grab/wget-1.13.4-2574-dirty# configure: error: --with-ssl was given, but GNUTLS is not available." [23:42] What is the [23:42] What is the problem? I'm noob with Ubuntu Server. [23:43] Whoops, looks like I fat-fingered starting a client, haha [23:43] And then Wyattq went off and get tenth place [23:44] install gnutls-dev [23:44] Wyatt|Wor: heh [23:48] Holy jesus, closure, that is some impressive numbers though [23:48] I noticed some time ago that iowait processes also count towards system load [23:49] As far as the warning to fix my stuff, should I kill all my clients and git pull? [23:49] chronomex: Yup. Also processes that have been deferred into a wait state by process-throttles and similar. [23:50] (That's a bug in the LVE module, though) [23:51] Wyatt|Wor: yea, then run the fix-dld script [23:57] I think it has been compiled. Where is it now? It says to save the executable as ./wget-warc (Sorry for being annoying, but I really don`t know how to use Ubuntu Server, it`s the version installed in my VPS) [23:57] marceloan: did you run the get-wget-warc.sh [23:57] ? [23:57] yes [23:57] it downloaded a file and created a folder [23:58] if the compile succeeded, then it saved the result to the correct location