[00:29] Noticed that the Splinder page mentioned that the project is "Closing" [00:30] can someone tell me how close to the leaderboard I am? [00:35] I just saw your name scroll by on the dashboard [00:36] http://splinder.heroku.com/ [02:49] now thats an interesting question http://www.archive.org/post/401968/we-have-2tb-of-data-to-upload is the data secure? depends, secure from the world seeing it, hell no, secure as in backed up safely, well probably (?) [02:53] R-sync [03:05] I get the feeling they think IA is a service they can subscribe to and use to hold files- otherwise, the questions make very little sense [03:07] I'm asking some questions, then will likely upload Jamendo Albums. [03:14] note to self: try not to crash znc, then forget passwords. [03:29] SketchCow: can I get an rsync slot? [03:30] mostly testing this system i'm doing right now to do this in massive parrallel on heroku (for free) [03:31] SketchCow: nick 'kennethreitz' [03:36] wait [03:36] THE kenneth reitz? [03:36] readibility guy? [03:36] .. [03:36] readability* [03:37] underscor: hah, the one and only :) [03:37] underscor: and you are? [03:37] No way, rad! [03:37] That's so cool! [03:37] how do you know of me? :P [03:37] Readability and httpbin [03:38] ah, awesome :) [03:38] :) [03:38] really want to start getting involved with what the archive team is doing [03:38] That's great [03:38] i've always had a bit of an archival spirit [03:38] shocker :P [03:38] haha [03:40] going to try to run a few hundred of these scrapers in parallel on heroku [03:44] have it mostly working [03:44] a few annoyances (the image doesn't have rsync or wget) [04:01] Hey, kennethre [04:01] hahahahah [04:01] underscor wants your autograph [04:02] SketchCow: apparently :) [04:02] <3 <3 <3 your photo on the wall <3 <3 <3 [04:02] I'll get a slot for you shortly. [04:02] Just finishing something here. [04:02] SketchCow: the one from the readability party? [04:02] excellent [04:07] http://www.archive.org/details/moves-magazine [04:07] Years ago, a guy named Greg Costikyan added all this description of all the issues of that magazine. I wrote him and he let me put them all up. [04:08] So that's another thing off my plate. [04:08] <3 <3 <3 your photo on the wall <3 <3 <3 [04:08] :P [04:08] You're just mad because I never asked for your autograph [04:08] It's OK, I know mine's still in the place of honor on the back of the door [04:11] :D [04:11] I'm just trying to get in your will [04:11] "And to Alex, I leave my trusty infocube, and well-loved collection of 1970s hustlers" [04:20] muahaha http://cl.ly/293x113C1z2F300m3Q36 [04:23] kennethre: Don't those cost money? [04:23] underscor: each 'process' (e.g. session) is $0.05 an hour [04:23] except the first one [04:24] so i can just run this on 50 apps [04:24] for free [04:25] just not sure how much space a dyno has available [04:26] I think the free ones have somewhere around 5MB or something [04:26] it's quite beautiful [04:26] ew [04:26] no [04:26] there's plenty of room [04:27] I have no idea how Heroku works though, so meh. [04:27] NotGLaDOS: i start working there in a few weeks [04:28] That helps. [04:28] and then i can use as much as i want for free [04:28] and plan to take advantage of that [04:28] Muahaha [04:28] :) [04:28] my friend works there, and his bill is ~10k a month [04:28] I should really use that free VPS someone randomly gave me. [04:39] muahaha completely free heroku app now [04:39] this is going to end well [04:40] heroku scale scrape=100 [04:40] just need to get the upload triggering automatically somehow [04:40] heh we're gonna run out of splinder at this rate [04:42] Well, the QA is going to be (slightly) murder. [04:42] We need to think that out [04:49] I like murder [04:51] & the xfiles marathon continues [04:51] bitches [04:53] oops [04:53] :P [04:53] wrong channel [04:53] I was supposed to gloat elsewhere [04:58] HERE WE GO [04:58] Amiga World is on the way! [05:11] is there a channel for universal-tracker development discussion? [05:11] (whee. ruby. haven't touched that yet) [05:14] #splinder has some chatting. [07:08] ^ [07:21] So, some time ago, someone sent me a hard drive for a geocities torrent. [07:21] I was slow. [07:21] Month or something. [07:21] He switches from "where is it" to well, downright abusive. [07:21] Well, there we have a problem. [07:22] So I've had his hard drive on a shelf for, oh, 9 months now. [07:22] lol [07:22] And every, oh, month or so, he sends me a more abusive, more threatening letter. [07:22] I only mention because a few came in today. [07:22] He also discovered my kickstarter. [07:22] So now he's hoppin' mad this evening. [07:23] I literally have the drive sitting on a shelf addressed to him. [07:23] But I have a thing about threats and bullies. [07:23] speaking of hard drives, did you manage to get the friendster data out of my drive? [07:24] last I recall you were still working on finding a system to mount the xfs on [07:25] It is just sitting there. [07:25] But I'll try again. [07:25] xfs is tough, man. [07:25] It really is. [07:25] P.s. try to avoid threatening my life in e-mail [07:25] (Spoiler) [07:26] I see now he is trying to derail my kickstarter. [07:26] That'll be a trick. [07:26] especially since you got the money already [07:27] how's he trying? [07:27] Well, I think he intends to donate $10 so he's a backer and can comment. [07:28] (Doesn't work that way.) [07:28] He did comment on there anyway, but I reported it as spam, so it's gone. [07:28] I expect to have to talk/hang with a few kickstarter people. [07:29] well great, he gave you money. [07:30] kickstarter: turning misguided enemies into misinformed allies [07:30] woo [07:30] http://www.ncdc.noaa.gov/nexradinv/chooseday.jsp?id=kgrr [07:30] level 2 and level 3 radar data from 1995 to present [07:31] (that particular link goes to the grand rapids, mi radar site) [07:31] That's nice [07:31] Let me guess, id == location [07:32] it sounds like they need to retrieve it off tape in a robot, though [07:32] yes [07:32] http://www.archive.org/details/amiga-world&reCache=1 [07:32] Coderjoe: that is not a surprise. [07:32] Of course it does ( 。 ヮ゚) [07:32] All issues of Amiga World [07:32] The user email address is needed when ordering data from NCDC. Due to the size of the Archive, instantaneous access is not possible. The user is emailed when the ordered data has posted to the NCDC FTP site. [07:32] Why do I need to enter my Email Address? [07:32] How long will my order take? [07:32] The amount of time for each order is varies based on the size of the order. An average order of 24 hours of data may take between 5 and 30 minutes. [07:33] i hope there are multiple copies of the data [07:33] perhaps talk to them about loading it into ia? :D [07:33] So, basically, it'd be faster if we bought the NOAA out. [07:34] How many stations are there? [07:34] maybe hundred or so, max [07:35] OK, gotta go to bed again [07:35] Hooray Alternate Side of the Street Parking [07:35] SketchCow has to go down [07:35] Like your mom [07:35] like twitter or splinder, but he goes down IN BED [07:35] LIKE YOUR MOM [07:36] NO [07:36] NOT THAT MENTAL IMAGE [07:36] 16 years*365 days*100 sites*116 products per site [07:36] Include the logs link [07:36] There goes the logs. [07:36] fine fine fine [07:37] Roughly 67744000 requests to NCDC [07:37] Do you think they'll mind? [07:37] CHRONOMEX DESTROYS HISTORY [07:37] underscor: 365.25 days [07:37] underscor: probly. [07:37] SketchCow: jesus h christ [07:37] Probably not. [07:37] 67790400 [07:37] Shut up, you will love austin [07:37] There [07:37] It'll give them a reason to work 24/7 [07:37] we'll be such assholes at every panel [07:37] SketchCow: <3 [07:37] underscor: I think you'd destroy their tape robot [07:37] "WHERE'S THE EXPORT FUNCTION" [07:37] I'm still mad I'm not going [07:37] fucking money [07:37] There are better things to be mad about [07:37] that nad cyst [07:37] SketchCow: I'm thinking of calling all the startup hipster tards evil, unless they actively do good shit. [07:38] Is that cleared up yet? [07:38] underscor: :| [07:38] nah [07:38] they want it to grow bigger first [07:38] What's the theory [07:38] underscor: yeah, that nad cyst. lance that buboe [07:38] the terrible, terrible theory [07:38] It's just a benign back of baby juice [07:38] backup* [07:38] baby juice or other fluids. [07:38] THE SPERM DEATHSTAR [07:38] underscor: does this mean you need to jerk it more, or less? [07:39] need to relieve some pressure [07:39] she said it shouldn't affect it [07:39] lol [07:39] underscor: did you ask or did she volunteer it? [07:39] "Should I keep up my usual three a day or do I have to ease back a bit" [07:39] *snerk* [07:39] I was like "It's already the size of two fucking grapes, how big do you want it to fucking be? [07:40] underscor: four big mutant grapes, ime. [07:40] maybe three. [07:40] * SketchCow installs the Image::DoNotWant perl module [07:40] hahahaha [07:40] chronomex: you have experience with nadcysts? [07:40] Not that I'm gettin' it down with the ladies [07:40] SOMEONE IS TRADING HEAVILY ON THE NADSAQ [07:40] but it's totally obvious [07:41] I don't even know how to fucking describe it [07:41] underscor: wait until it gets tender. [07:41] It's like a baloon that uninflated unevenly [07:41] then be like HOLY FUCK FIX THIS SHIT NOW [07:41] balloon* [07:41] so i can leave the streamer and the upload script running at once, right [07:41] It's always tender [07:41] bsmith093: that's the design, yes. [07:41] SketchCow: what panel are you on? [07:41] it's a tissue covered water balloon :V [07:42] http://expertlabs.aaas.org/thinkup-launcher/ [07:42] that's rad [07:42] aaas, hm. I crashed their expo when it was in town few years ago. [07:43] I also jumped the fence at the association of american geographers expo last year [07:43] jumping the fence at an academic conference is a very surreal experience [07:43] (figurative fence) [07:44] ./upload-finished.sh batcave.textfiles.com::bsmith/splinder/ [07:44] sending incremental file list after some minor hiccups, this is what i have for output, plus a crapload of filetranfers [07:46] Cameron_D is still pulling, according to top [07:46] Mhmm, Stuck on several large profiles methinks [07:46] dashboard says only 78k left [07:47] er, 72k [07:47] 72k for first run, doesn't hurt to do a second scan ;) [07:47] I know there are some unfinished IDs in there [07:47] lolz. [07:49] monsterfail [07:49] http://i.imgur.com/WZL4K.png [07:49] .. [07:49] I should really test the speed of my VPS.. [07:50] Ah, 7M/s [07:50] is there any good reason the file structure is like this it/A/Ak/Aka/Akarui_Tenshi/ [07:51] yes. [07:51] bsmith093: to keep the number of entries per directory down [07:51] are there any hard entry limits fs wise [07:51] no but if it gets big then your computer hates you. [07:51] depends on the filesystem, there's always a limit but it's usually over four billion [07:52] after a few thousand things start to get really slow [07:52] over ni..... beh. forget it [07:52] much easier to shard [07:52] or whatever you call that in archive land [07:52] yeah, whatever we call that [07:52] split? [07:52] sharding works fine. [07:52] Horizontal partitioning [07:53] chronomex: goo lord man, i dont have that many files (digitally) in my entire house, inclusing dupes [07:54] I probably have between 20 and 100 million files. [07:54] afk not have to sleep, its almost 3am inthe east coast, where i am [07:55] bsmith093: same here *sips coffee* [07:55] have too sleep now, typos becoming serious problem :O) [07:56] pardon me while I catalog another 1.5TB (after which the cataloged total will be 13.5TB >_< ) [07:56] afk 4 ~10hrs [07:56] cataloging because currently finding a file on them is a pain in the arse [07:56] Coderjoe: thats a lot of data. [07:57] Wait, what'd happen if archive.org went down? [07:57] oh hay. this disk has toonami captures on it. [07:58] NotGLaDOS: it's unlikely to disappear, as there are multiple copies in multiple places. [07:58] True. [07:58] if it were to go away, though, history'd be up shit creek without much of a paddle [07:58] but I trust brewster to do the right thing. [08:01] well, mostly -- I never met him but I hear he's got his head screwed on right [08:02] damn how I miss circa-2000 cartoon network [08:45] I <3 my workplace [08:45] boss gave me and coworker lockpick sets today [08:45] "here's your thanksgiving bonus, use it to go get your christmas bonus" [08:52] haha [09:00] oh [09:03] splindid [09:03] splinder's back up [09:03] maybe we should try and not kick it offline. [09:03] yes [09:04] I'm just trying to finish up incompletes [09:05] huh, how do I add people to a github organization? [09:06] oh, you have to add them to a group, I see [09:10] hm. [09:10] is there a problem with just adding everyone we know to a "Contributors" team in the Github account [09:10] ? [09:11] with push/pull access to all repositories [09:19] huh [09:19] I think that, if anyone is retrieving US data right now [09:19] you should consider it suspect [09:20] I just finished up four US Splinder profiles in about two seconds, but they're all full of 404s [10:52] i'm still riding a wave of error 6 on both it and us profiles while waiting for the script to wind down [10:53] My script has been winding down for more than 12 hours [10:54] still 20 running, and quite a lot of error 6 [11:19] I'm fixing users with errors, but does the script find them all? [11:19] For instance: http://toolserver.org/~nemobis/89gocciolina89-wget-phase-1.log has some 504 errors, but mostly "no data received" with no code [11:31] ndurner, how many incomplete users do you have? [11:32] Nemo_bis: how do I know? [11:33] count ".incomplete", probably. Doing that now. [11:33] ndurner, well, depends on your method; I look at the open dld-clients and at the numbers in dld-streamer [11:34] * Nemo_bis has 170 [11:51] splinder.heroku.com: -6 to do [12:03] how do we go about verifiying if users are complete/fixing them? [12:06] upload_finished.sh just uploads completed ones, and if understood that correctly SketchCow is willing to run verification scripts on the data on batcave [12:11] Ah, I won't be able to upload until after the site has closed though [12:17] why not? [12:18] Nearly at my bandwidth cap [12:18] And at the rate I upload it will probably take a few days anyway [12:25] There are 266,205 users claimed but still not returned. [12:26] Time to add them back to the queue, I guess? [12:36] Sure, I suppose I could pull it out on my Romanian server. [12:36] (it'll probably fail, but still) [12:37] I tried on my cheap VPS, the disk IO killed it [13:06] my fix-dld was "downloading" a bunch of users, but they were actually empty: wget-phase-1.log with error 404 and nothing else, only ~15 KiB downloaded [13:06] hmmm http://www.us.splinder.com/ [13:43] wow [13:43] www.splinder.com was very slow loading, but it finally did load [13:53] I have 953 .incompletes [13:54] mmm [13:54] 2011-11-22 11:50:03 ERROR 502: Bad Gateway. [13:55] the only error i see in this one "error 6" italian profile [14:01] plenty of error 6 too [14:01] http://news.cnet.com/8301-17852_3-57329204-71/microsofts-new-incentive-for-engineering-hires-bacon/ [14:02] awesome [15:48] why are a quarter of a million or so users gone from the total now> [15:49] wakey wakey, eggs and ... bacon...ey [15:53] dnova: 259613 users have been claimed but never marked 'done'. total = done + todo + out, but the number out is not shown. [15:54] oh [15:54] well if the tracker is no longer providing names, should I touch STOP and let my 600 threads wind down naturally? [15:54] or will you be adding more names back to the queue? [15:54] Earlier today I added 8000 users back to the todo queue, users that had been claimed for more than two days. [15:55] But the results I got back didn't look very healthy, since they were returned really quickly and were almost empty. [15:56] www.splinder.com is very slow at the moment and www.us.splinder.com is down, so maybe it's better to wait. [15:56] I got a tiny bit of data before the tracker ran out of names, but still need to get a "rsync slot" [15:56] only 29M [15:58] ah, out would be a good number to show [15:58] pberry: Can you make a tar file and upload it somewhere? [15:59] closure: Well, yes, on the other hand: out shouldn't be so enormously high. [16:00] well, it's ridiculous to think that 259 thousand grabs are currently running.. I'll bet something dropped those on the floor, either not done or done and the tracker not told [16:01] i was running 7k concurrently when the site went down yesterday [16:05] alard: you bet [16:05] alard: like, just the data directory? [16:05] It makes me wonder: would it help to run so many at the same time? [16:05] pberry: Yes. [16:06] alard: as soon as these last few threads stop I'll get right on that [16:15] alard: I have around 5,200 incompletes and around 520 threads still running [16:15] roughly [16:15] the last ones are probably bigger profiles [16:26] I love the ones that are 4 files and take forever [16:31] closure: or errord. or dropped on the floor yesterday when clients were forcibly killed when splinder went down [16:32] (yesterday morning, eastern US time) [16:33] my ec2 instance has 2598 .incomplete files [16:33] and currently 131 at home [17:05] HI [19:02] just got back did my rsync complete, because the tracker for the streamer is apparently down [19:08] We downloaded it [19:08] Now we need to do a cleanup phase [19:09] all of it? [19:14] Yes. [19:14] We had downtimes in there. [19:14] It won't be hard, I'm sure it'll be a script that generates a new list. [19:15] im running the fix script now, i have a really long list of apparently incomplete profiles [19:16] so you thought you'd be 1.5 days too late, and you're 2 days early? [19:17] We had someone come in with 300 virtual machines [19:18] how does that work, dont they all need some significant space to actually store the data [19:19] webscale [19:19] er, "the cloud" [19:20] oh heroku, then? [19:20] yessir [19:20] so how much of their space did u buy/ rent? [19:22] this really didn't need much space [19:22] i saw the dashborad, 400gb for blogs alone [19:22] every 'dyno' (e.g. intance) is a self-contained virtualized environment [19:22] *instance [19:22] but it dtill need hd space in some form [19:23] the cedar stack has writable storage [19:23] bsmith093: that's all 1 million users (or so). each user is typically 1MB or less [19:23] but it dies when the dyno dies (when the process stops) [19:23] (yes there were some that got into double digits) [19:23] 1mil at only 400gb, wow im impressed at how much people didnt care about this blog thing? [19:24] so the 'process' just ran the stream downloader, and uploaded ever 200 seconds [19:24] across 300 "boxes" [19:24] kennethre: sounds like ec2 instance-store... but your resource problems make it sound like the dyno isn't running on a bare instance [19:24] my spelling is wonderful today [19:24] Coderjoe: it's not, they use LXC [19:24] but where did the writable storage ultimately dump to? [19:25] bsmith093: it disappears when the process dies [19:25] it's all gone now [19:25] which explains your resource issues when you were forking too much [19:25] that's why i was continually uploading [19:25] it's encoded in a laser beam being bounced off the moon for now, we'll put it somewhere later [19:25] Coderjoe: yeah, at 20 i didn't have a problem, 60 i did [19:25] unless im missing something that means the data goes too, so where did... oh ok then [19:25] bsmith093: rsync [19:25] mmm.. delay line memory [19:26] lol laser [19:26] yeah that makes much more sense [19:26] it would stay as long as the processes stay alive, but they get recycled every day i believe [19:26] and that gets pricy when there's 300 of them ;) [19:26] sent 128293146 bytes received 42113 bytes 100222.77 bytes/sec total size is 4697371387 speedup is 36.60 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1060) [sender=3.0.7] [19:26] \is that error serious? [19:27] thats my re run of the rsync that i left running all night [19:28] re-run. it should tell you what the error was, and shouldn't upload anything new (as long as everything got uploaded already) [19:35] ben@ben-laptop:~/splinder-grab$ ./upload-finished.sh batcave.textfiles.com::bsmith/splinder/ sending incremental file list rsync: link_stat "/home/ben/splinder-grab/data/it/g/gy/gyp/gypsy!" failed: No such file or directory (2) [19:35] sent 1841728 bytes received 12173 bytes 50791.81 bytes/sec [19:37] so anyway, I just checked the projects page, good for you, whoever's saving ff.net and fictionpress, but may i suggest a linklist of straight story urls, with this app, works great for me, fanficdownloader.net [19:38] mmm [19:38] simple text files with all sotry urls in it one to a line, goes through aves in whatever format you want, even plain text formatted _like_ *this* for bold and italic and underlined things [19:38] I think rsync (or more likely the upload-finished.sh script) is choking on that ! [19:39] it might be the ! char [19:39] er, no that is rsync spewing the error [19:40] still thats one profile out of hundreds of thousands, whose gonna notice? [19:42] oh it's just the one site on geocities that had recordings and stuff of radio transmissions from jonestown... who's gonna notice? [19:43] all right allright point taken [19:43] incidentally are there audio recording =s from jonestownn? [19:44] yes. [19:44] I think you can get them if you file a FOIA request to someone [19:44] huh imagine that, the balls o those people recording a cult leader [19:44] oh well foia [19:45] anyway can we ping the archive yet and see if someone else has theat profile? [19:45] these were recordings made at a monitoring station of radio transmissions peoples church members were making [19:45] wow they suspected something, that strongly? [19:46] as far as I have been able to tell, none of the geocities archive projects managed to get that profile [19:46] oh you were serious about that, damn that sucks [19:46] those recordings are at archive.org [19:46] just went through the internets a couiple of days ago [19:46] probably on metafilter, check there [19:46] http://boingboing.net/2008/11/19/jonestown-30-years-l-2.html [19:47] Schbirid: oh? cool [19:47] http://www.archive.org/details/ptc1978-11-18.flac16 [19:53] how much do we still need to get from mobilme [19:53] data in gb [20:04] "here's your thanksgiving bonus, use it to go get your christmas bonus" [20:04] I <3 my workplace [20:04] boss gave me and coworker lockpick sets today [20:04] haha [20:09] underscor: hah, awesome. Where do you work? [20:10] he could tell you, but ... ! [20:10] Oh, I do work at the archive in return for... well. In return for the knowledge that stuff will be saved [20:10] That's not my story, it's chronomex's [20:10] (the lockpick thing) [20:52] are there any slightly smaller projects than mobilme i could help out with? [21:01] How could I join the 'Archive Team'? [21:07] how do i put wget warc in its usr bin place so i can call it normally [21:10] sudo cp ./wget-warc /usr/bin [21:10] hash -r [21:10] I would do mkdir ~/bin; cp ./wget-warc ~/bin [21:10] then add ~/bin to my PATH [21:10] export PATH=$PATH:~/bin [21:11] i ust cped to usr bin, that seems to work thansk [21:11] ok i apparently meant how do i compile it with a]man entries and all that? [21:12] to make updateable and everythikng [21:12] change the get-wget-warc.sh script so that it doesn't delete the source directory after it builds it [21:12] then go in there and do sudo make install [21:13] this won't build a deb or rpm package, so your package manager won't be able to keep it up to date [21:45] I need an rsync slot [21:49] ... [22:30] he's on 4chan, just post the slot there [22:30] he went over to #splinder [22:46] so I should stop all splinder downloads now? [23:51] underscor: Why does ATidlebot think you're offline? [23:54] closure: hah [23:58] PatC: I don't know if you got an answer elsewhere, but as far as joining Archive Team, well, see the #archiveteam title. As far as what could you do, right now, well, we're working on downloading me.com stuff. [23:58] See http://www.archiveteam.org/index.php?title=MobileMe for details. [23:58] ...including how to run a BASH script on your linux box. [23:59] 10-4 [23:59] thanks