[00:21] are Yahoo Groups not considered endangered? i didn't see them listed on the ArchiveTeam wiki. [00:23] i believe they are still well but the last time i checked was some 6 months ago. [00:23] I don't think it's endangered per se, but it can't hurt to start archiving. [00:23] they can't last forever. [00:24] with most yahoo groups, you can't access the posts unless you're a member [00:24] though they seem extremely durable so far :). [00:25] yes. many years back i wrote a download script for Yahoo Groups (required login/password) but it would be broken many times by now. [00:27] l/p and membership naturally [00:39] Yahoo Answers is definitely something worth continually grabbing. Eventually Yahoo Answers will close and it would be neat to have a mirror of it. [00:39] :) [00:53] Google is so dodgy... I have supposedly obtained a "complete" list of indexed pages of a website via the "site:" syntax. Now, totally accidentally, in an unrelated search, I discover *another* indexed page belonging to that same website... It wasn't listed in the "site:" results. [01:12] pastebin.com is down... probably yet another DDoS [01:54] unwhan: well, they only allow 1000 results for a search [01:54] so you can't get a "complete" index [01:59] https://vimeo.com/47812538# [02:06] anyone use anything to back up Google Reader feed content? [02:07] (Google Reader has a lot of deleted blogs backed up) [02:37] SketchCow: is that you or that musician look-alike? [02:40] It's me [03:52] are we gonna move on cinch.fm or what? any ideas yet? [03:57] good info on the wiki about the api and stuff [04:20] i was suprised to find micro file - if i a had a hammer video on archive.org [06:53] So yeah, ANYTHING we can do to get Cinch. [06:53] Please do it. [06:53] hey SketchCow [06:54] i got the computer programme that air on bbc in 1982 [06:55] Well, excellent. [06:55] i'm also trying to get the Making The Most of the Micro series [06:56] looks like Micro File - If I Had A Hammer is on archive.org [06:56] i also want to get the Electronic Office series [06:57] but that may not move at all [06:58] also Bad Influence is on archive.org now [06:58] and first 3 season of Gamesmaster [06:58] *seasons [07:14] i maybe able to get the rest of making the most of the micro [07:14] my problem with being limited to 3 torrents [07:48] SketchCow: Could you make a cinch upload space on fos? [08:28] yes [08:29] Kill tumblr project, let's move to cinch [08:34] HpistrMirrr [08:34] *HipstrMirrr [08:42] So I've got most of what's needed for Cinch, I think. Everything except the pagination of the tracks, comments, followers. [08:42] wow, already? [08:42] The pagination is very awkward, so perhaps we can leave them out? [08:43] It's not that hard, really. They provide a sitemap that lists every user and track, so you can just use that to generate the necessary urls. [08:43] sweet :) [08:43] I think we can make per-user lists and feed them to wget --page-requisites. [08:45] The only big problem left are these paginated things, which require you to POST enormous forms to ASP.net. So I'm tempted to leave them out. [08:46] What do we lose [08:47] Comments beyond the first (or last) 10 comments. Followers/following beyond 20-30. [08:47] Even if we archive them, you won't be able to browse them in the wayback machine. [08:47] scumbag pagination [08:48] Hmmm. [08:48] My heart says save them [08:48] How many times does pagination happen? [08:53] Robert Scoble is such a twerp [08:54] i'm downloading filesharefreak.com [08:55] its been having trouble staying active since november 2010 [08:55] only recently there was some new posts [09:09] There may be quite a few cases with pagination on cinch. [09:21] Here's a list of the URLs that can be generated from the sitemap. Anything missing? https://gist.github.com/e551a322210d5ac8fde8 [12:44] unwhan: well, they only allow 1000 results for a search | so you can't get a "complete" index [12:44] underscor: my "complete" index was some exact number like 856 results or so [15:00] unwhan: oic [16:26] alard is pro [16:27] im ready to help with any downloading if needed, just cant help write much code [17:20] Yeah, I finally got the last of my memac stuff moved to external storage, so if we need to get rolling on Cinch just point me at a seesaw script. [21:34] Hey. Any suggestions for a short, one-sentence blurb on the Cinch.FM archiving project? [21:34] There's room for a short bit in the warrior. Anything better than "Cinch.FM will remove all data on October 20, 2012." ? [21:35] Cinch.fm, archiving is a cinch [21:35] cinch.fm, an audio clip sharing site [21:35] more like an audio clip recycle bin, they are about to empty the trash [21:36] heh [21:39] cinch fm, down dem tunes [21:39] tunes down the tubes [21:40] hmm, needs more profanity and tits to be a real ArchiveTeam slogan for a project [21:40] Cinch.fm, fuck dem tunes. Tits! [21:43] hookers 'n' blow aren't enough to keep the musical dream alive [22:01] Maybe something brilliant will pop up later. [22:01] Meanwhile, does everyone have his/her warrior ready? [22:02] I would but I'm not going to have a PC at home [22:02] actually [22:02] if my desktop still works [22:04] hey, it works! is the latest warrior on the archive.org listing? [22:04] Yes, just download the most recent v2. [22:05] 20120813. goodie. [22:05] Just need to get VirtualBox set up on my desktop [22:06] To follow the Cinch project: http://tracker.archiveteam.org/cinch/ and http://warriorhq.archiveteam.org/ [22:07] crap, did I fuck the server [22:07] alard: I added the munin thing to nginx.conf and did /etc/init.d/nginx restart just a few minutes ago [22:07] nothing's loading for me except / on the ip [22:07] Ah, I see. [22:08] oh, heh, restart only works when it's already running? [22:08] radical departure from standard debian init scripts [22:08] You don't have to restart Nginx to reload the configuration, by the way. [22:09] You can just send it a kill -HUP. [22:09] hmmmk [22:10] It seems to be back up! (Did I install that init script? Maybe.) [22:10] I figured it out, fwiw [22:10] I'm not a *complete* noob [22:10] (I did compile Nginx from source, so the init script probably wasn't included.) [22:10] ah [22:10] Heh. [22:10] hmmmm. [22:11] right, nginx with passenger is kind of not available in any of the normal ways [22:13] Anyway, the warrior should be able to deal with this, so no problem. [22:13] I've got the Warrior working on Cinch on my desktop now [22:13] I'm actually useful for once :D [22:14] there, warriorhq has graphs: http://176.58.114.30/localdomain/localhost.localdomain/index.html [22:14] BlueMax: Wonderful. [22:14] url to change soon no doubt [22:15] Ah, good. These graphs should be really boring, I hope, since it's not doing much. [22:15] they're kind of weird I suppose [22:15] what is not measured cannot be improved [22:15] maybe later I'll add some ateam-specific graphs [22:16] (you can add graphs to munin if you can write a shellscript that prints a number) [22:16] (it's real spiff) [22:16] #!/bin/bash ; echo $RANDOM :) [22:17] woo random graph [22:17] We can add a number-of-warriors-running graph. [22:17] totally [22:17] bandwidth per second, items per second, items outstanding, etc [22:17] whatever [22:18] total-saved-across-all-projects [22:19] redis-cli --raw -n 1 keys "warriorhq:instances:*" [22:19] redis-cli --raw -n 1 keys "warriorhq:instances:*" | wc -l [22:19] Where do I add that? (Or can you add it?) [22:19] ok, cool. [22:19] read up on munin, I guess [22:19] I've got work to do [22:20] it's probably somewhere in /etc/munin [22:20] or whatever [22:20] I found it, I think. [22:20] rad [22:20] usually munin takes 5-10 minutes to put up new graphs, as it's a cronjob [22:21] (words of warning, have caused frustration for me in the past) [22:23] Heh, I give up. Doesn't look like something that can be done in 5 minutes. [22:23] Hey alard, am I going to need to get onto my desktop over the next few hours, or will the download work itself out if something goes wrong? [22:24] It's supposed to keep running without your attention. [22:25] alard: heh, ok, I'll look at it this evening [22:39] OK, for those of us not running warrior, what should we do after cloning cinch-grab? [22:39] Or do we need something else entirely? [22:40] If not running a warrior, you must be slightly adventurous (which you are, or you wouldn't not run a warrior). [22:41] You need to install the seesaw-kit, a python project. [22:41] https://github.com/ArchiveTeam/seesaw-kit [22:41] I already have a machine with Linux on it. Running a Linux VM on top of that seems kind of pointless/ [22:41] yeah, we have unix for a reason [22:41] :) [22:41] Yes, it should work quite well in the future, but for now the warrior has seen more testing. [22:42] Anyway, run sudo pip install -e "git+https://github.com/ArchiveTeam/seesaw-kit.git#egg=seesaw" [22:43] (or clone the thing yourself and then install it, or don't install it at all but use the full path instead) [22:43] Then, if the compiled wget-lua of cinch-grab doesn't work, you should compile Wget+Lua: https://github.com/downloads/ArchiveTeam/cinch-grab/wget-lua-1.14.8-e8a24.tar.bz2 [22:44] (It's been upgraded since Picplz, so don't use your old wget-warc-lua.) [22:44] Jesus H. I liked this place a lot better when I could just download seesaw.sh. [22:44] OK, lemme see if I can do all this without blowing something up. [22:44] It will be like that, or something like that. :) [22:45] The last step is quite simple: go to your cinch-grab directory and run-pipeline pipeline.py YOURNAME [22:46] sudo: pip: command not found [22:46] apt-get istall pip tells me I'm stupid. [22:46] (sudo apt-get install pip just tells me there's no such thing.) [22:47] http://www.pip-installer.org/en/latest/installing.html [22:48] (We should look at simplifying this. :) [22:48] Yes. [22:48] I have a Linux box, but I am not a Linux expert. [22:48] Or even a talented amateur. [22:49] Eventually, my idea is that you should install something, once, and then in the future you'd just git clone the repository and do that run-pipeline thing to start. [22:51] OK, it still tells me command not found. [22:51] (when I try to run pip) [22:51] DoubleJ: if it helps, pip is a Python package manager, and is labeled in Ubuntu's repos as python-pip [22:52] Well, seeing as how the other thing didn't work at all let me try that. [22:52] also that is one sweet-ass UI at warriorhq [22:52] but I am also a sucker for maps [22:53] OK, that did something. I have no idea where all the crap it downloaded went, but it downloaded something [22:54] you should now be able to invoke pip [22:55] sudo pip install -e "git+https://github.com/ArchiveTeam/seesaw-kit.git#egg=seesaw" [22:55] Yep, that worked. [22:55] That was the "I dunno where it went..." part. [22:55] Does run-pipeline give you anything? [22:55] Haven't tried yet. [22:55] I was makign sure there wasn't another step in the scrollback. [22:56] alard: is this a big grab? Should I fire up a box? [22:57] Dumb question time: Is it possible to make changes to the seesaw scripts like before? [22:57] I like to remove the --remove-sent-files from the rsync command [22:57] (Since I'm paranoid about stuff getting nuked on FOS before it has a change to get into IA proper.) [22:58] That bit of coding I was capable of, but with this automatic thing I'm not sure I can do that now. [23:00] swebb: Join the fun if you want to, but I don't think it's too big. We also might want to get the installation instructions first. [23:01] DoubleJ: No, you can't easily change that. [23:01] I've done other grabs using the AT tools, but I wanted to fire up the archiveteam warrior box for the first time and try it out. :) [23:01] The warrior, great, go ahead! [23:01] Dang. Kind of annoyed at splurging on that 3TB external now. [23:02] Ah well. Time to see if this thing is going to work. [23:03] If you haven't compiled wget, I've added a get-wget-lua.sh script now. [23:04] The wget-lua seems to be working. [23:04] I have a few complete; is it possible to check and make sure they're OK? [23:04] I mean, nothing choked, so I kinda figure I'm in good shape. [23:05] I can't check, but I think it's working fine. [23:05] That's going to be my assumption as well. Time to hose my bandwidth! [23:05] Are you going to run more than one? [23:06] If so, run-pipeline --help has options for that. [23:07] Was just coming back to ask that. Do I need to do anything special? [23:07] Yes, don't run more than one run-pipeline. Add --concurrent 5 to run 5 instances. [23:07] Ah. Nice change. [23:07] (You'll have a warrior-like web interface on http://localhost:8001/, by the way.) [23:08] It's not all bad. :) [23:09] Running 3 now; I'll see how this old box holds up. [23:09] Shame that I have to stop to change numbers, but since all the users are small it's no big deal. [23:09] That's something for the to do list. [23:10] Anyway, have fun. Good night! [23:10] Night! [23:10] G'night [23:11] night [23:11] also hi SketchCow [23:12] (Also, to anyone reading this: don't be put off by the technical discussion, the warrior VM does *not* require you to do all this. It's easy(er).) [23:12] (TM) [23:13] alard, can we blast ahead on the warrior for cinch? [23:14] cinch is going full speed. [23:15] http://tracker.archiveteam.org/cinch/ [23:16] meh... upstream is still the limiting factor for me here... those indexing tasks were more fun ;-) [23:16] my upstream is barely 100 kilobytes a second and I'm still helping. [23:17] * BlueMax donks Dark_Star on the head with a small ethernet cable [23:17] I have 250k bytes [23:17] If I can do it, you can too! [23:18] downstream is ~5-10 megs/s though. so my warriors are more useful for the number crunching tasks like indexing, but of course I'll still help with the little bit of upstream I have :) [23:18] so is mine. I work on 10/1mbit [23:21] has anyone tried running the warrior on amazon's cloud? I imagine the upstream there is a bit better (althoug I have no idea what it would cost per day) [23:21] And Jason joins the race! [23:24] And I'm only doing --concurrent 3 [23:25] Oh. Other Jason. [23:25] Never mind. [23:25] you can really see the upload bandwith of each uploader in comparison to each other, in the tracker [23:27] e.g. the interpolated slope for my uploads is ~2x as big as that of BlueMax (which translates to 100kb/s vs. ~200-250kb/s) [23:28] alard: I'm quite curious to see what's causing these weird load spikes - http://176.58.114.30/localdomain/localhost.localdomain/load.html [23:28] and doublej seems to have at least 500kb/s uplink [23:31] I'm on FiOS; I have a few megs/sec if I want to saturate thigns. [23:31] But I want to do other stuff on the internet, and this is a small project :) [23:32] heh, okay... I'm only on cable. no fiber around here (at least not for any sane budget) [23:33] ok time for bed. g'night all [23:33] gnibt [23:33] 'night. [23:40] alard: Can the cinch grab tool be run locally, or is it just made for warrior this time? [23:41] chronomex: do you have some crons firing off every 7 hours or something? [23:41] mistym: Check the scrollback. Alard and yipdw helped me get it rolling on my Ubuntu machine. [23:41] reading backlog now will get a cinch grabber setup shortly [23:42] DoubleJ: Logged in midway, I missed the important part :( [23:42] Coderjoe: I don't, I haven't actually done much of anything with the box [23:42] D'oh. [23:42] OK, hang on. [23:42] I brought up a stock debian stable install and more or less handed it off to alard [23:42] Step 1, apt-get install pythin-pip if you don't have it. [23:42] Er, pythOn-pip [23:42] (Bad place to typo, me.) [23:43] Totally have pip. [23:43] Step 2: sudo pip install -e "git+https://github.com/ArchiveTeam/seesaw-kit.git#egg=seesaw" [23:45] Did someone say new archiving project? [23:45] Step 3: git clone https://github.com/ArchiveTeam/cinch-grab [23:45] feature request for archiveteam warrior v3 - have tetris playable in the web interface [23:46] Bluemax, resend metadata. [23:46] GlaDOS: Why, yes. Grab the newest Warrior if you like. I've heard tell that it kicks many asses. [23:46] SketchCow, same address? [23:46] Yes. [23:46] It's beyond slick, GLaDOS [23:46] I see... [23:47] BlueMax: Maybe we can have jsmess playable too [23:47] Step 4: cd cinch-grab [23:47] Step 5: run-pipeline pipeline.py YOURNAME [23:47] godane: that would be perfect :P [23:47] DoubleJ: Thanks! [23:48] only pubic domin roms so the iso doesn't copyright problems of course [23:48] mistym: No prob. If you want to run multiple instances, you do that by setting the --concurrent parameter. [23:48] E.g. run-pipeline --concurrent 3 pipeline.py YOURNAME [23:49] If you're on Ubuntu or Debian, the wget-lua in cinch-grab should work. If not you'll need to run get-wget-lua.sh [23:50] We'll see who unlocks the scobelizer achievement [23:51] "'AsyncPopen' object has no attribute 'pipe'" Hm. Maybe I'm using the wrong version of python. [23:52] ...really? [23:52] Installing liblua pulled down an incompatible linux image [23:55] mistym: The version I have is 2.7.1+ if that helps you. Ubuntu... 10.11, I think? [23:55] 2.7.3 here. [23:56] Oh, looks like it's actually Tornado throwing an exception. [23:57] I don't think I can help you there... Tornado got downloaded and everything just worked. [23:57] And alard is asleep now. [23:57] Ah well. [23:59] Remind me never to reboot my VPS.