#archiveteam 2012-08-20,Mon

↑back Search

Time Nickname Message
00:21 🔗 unwhan are Yahoo Groups not considered endangered? i didn't see them listed on the ArchiveTeam wiki.
00:23 🔗 unwhan i believe they are still well but the last time i checked was some 6 months ago.
00:23 🔗 nitro2k01 I don't think it's endangered per se, but it can't hurt to start archiving.
00:23 🔗 unwhan they can't last forever.
00:24 🔗 balrog_ with most yahoo groups, you can't access the posts unless you're a member
00:24 🔗 unwhan though they seem extremely durable so far :).
00:25 🔗 unwhan yes. many years back i wrote a download script for Yahoo Groups (required login/password) but it would be broken many times by now.
00:27 🔗 unwhan l/p and membership naturally
00:39 🔗 arkhive Yahoo Answers is definitely something worth continually grabbing. Eventually Yahoo Answers will close and it would be neat to have a mirror of it.
00:39 🔗 unwhan :)
00:53 🔗 unwhan Google is so dodgy... I have supposedly obtained a "complete" list of indexed pages of a website via the "site:" syntax. Now, totally accidentally, in an unrelated search, I discover *another* indexed page belonging to that same website... It wasn't listed in the "site:" results.
01:12 🔗 unwhan pastebin.com is down... probably yet another DDoS
01:54 🔗 underscor unwhan: well, they only allow 1000 results for a search
01:54 🔗 underscor so you can't get a "complete" index
01:59 🔗 SketchCow https://vimeo.com/47812538#
02:06 🔗 ivan` anyone use anything to back up Google Reader feed content?
02:07 🔗 ivan` (Google Reader has a lot of deleted blogs backed up)
02:37 🔗 Coderjoe SketchCow: is that you or that musician look-alike?
02:40 🔗 SketchCow It's me
03:52 🔗 bsmith094 are we gonna move on cinch.fm or what? any ideas yet?
03:57 🔗 S[h]O[r]T good info on the wiki about the api and stuff
04:20 🔗 godane i was suprised to find micro file - if i a had a hammer video on archive.org
06:53 🔗 SketchCow So yeah, ANYTHING we can do to get Cinch.
06:53 🔗 SketchCow Please do it.
06:53 🔗 godane hey SketchCow
06:54 🔗 godane i got the computer programme that air on bbc in 1982
06:55 🔗 SketchCow Well, excellent.
06:55 🔗 godane i'm also trying to get the Making The Most of the Micro series
06:56 🔗 godane looks like Micro File - If I Had A Hammer is on archive.org
06:56 🔗 godane i also want to get the Electronic Office series
06:57 🔗 godane but that may not move at all
06:58 🔗 godane also Bad Influence is on archive.org now
06:58 🔗 godane and first 3 season of Gamesmaster
06:58 🔗 godane *seasons
07:14 🔗 godane i maybe able to get the rest of making the most of the micro
07:14 🔗 godane my problem with being limited to 3 torrents
07:48 🔗 alard SketchCow: Could you make a cinch upload space on fos?
08:28 🔗 SketchCow yes
08:29 🔗 SketchCow Kill tumblr project, let's move to cinch
08:34 🔗 nitro2k01 HpistrMirrr
08:34 🔗 nitro2k01 *HipstrMirrr
08:42 🔗 alard So I've got most of what's needed for Cinch, I think. Everything except the pagination of the tracks, comments, followers.
08:42 🔗 chronomex wow, already?
08:42 🔗 alard The pagination is very awkward, so perhaps we can leave them out?
08:43 🔗 alard It's not that hard, really. They provide a sitemap that lists every user and track, so you can just use that to generate the necessary urls.
08:43 🔗 ersi sweet :)
08:43 🔗 alard I think we can make per-user lists and feed them to wget --page-requisites.
08:45 🔗 alard The only big problem left are these paginated things, which require you to POST enormous forms to ASP.net. So I'm tempted to leave them out.
08:46 🔗 SketchCow What do we lose
08:47 🔗 alard Comments beyond the first (or last) 10 comments. Followers/following beyond 20-30.
08:47 🔗 alard Even if we archive them, you won't be able to browse them in the wayback machine.
08:47 🔗 Coderjoe scumbag pagination
08:48 🔗 SketchCow Hmmm.
08:48 🔗 SketchCow My heart says save them
08:48 🔗 SketchCow How many times does pagination happen?
08:53 🔗 SketchCow Robert Scoble is such a twerp
08:54 🔗 godane i'm downloading filesharefreak.com
08:55 🔗 godane its been having trouble staying active since november 2010
08:55 🔗 godane only recently there was some new posts
09:09 🔗 alard There may be quite a few cases with pagination on cinch.
09:21 🔗 alard Here's a list of the URLs that can be generated from the sitemap. Anything missing? https://gist.github.com/e551a322210d5ac8fde8
12:44 🔗 unwhan <underscor> unwhan: well, they only allow 1000 results for a search | so you can't get a "complete" index
12:44 🔗 unwhan underscor: my "complete" index was some exact number like 856 results or so
15:00 🔗 underscor unwhan: oic
16:26 🔗 S[h]O[r]T alard is pro
16:27 🔗 S[h]O[r]T im ready to help with any downloading if needed, just cant help write much code
17:20 🔗 DoubleJ_ Yeah, I finally got the last of my memac stuff moved to external storage, so if we need to get rolling on Cinch just point me at a seesaw script.
21:34 🔗 alard Hey. Any suggestions for a short, one-sentence blurb on the Cinch.FM archiving project?
21:34 🔗 alard There's room for a short bit in the warrior. Anything better than "Cinch.FM will remove all data on October 20, 2012." ?
21:35 🔗 S[h]O[r]T Cinch.fm, archiving is a cinch
21:35 🔗 chronomex cinch.fm, an audio clip sharing site
21:35 🔗 S[h]O[r]T more like an audio clip recycle bin, they are about to empty the trash
21:36 🔗 chronomex heh
21:39 🔗 BlueMax cinch fm, down dem tunes
21:39 🔗 chronomex tunes down the tubes
21:40 🔗 BlueMax hmm, needs more profanity and tits to be a real ArchiveTeam slogan for a project
21:40 🔗 BlueMax Cinch.fm, fuck dem tunes. Tits!
21:43 🔗 chronomex hookers 'n' blow aren't enough to keep the musical dream alive
22:01 🔗 alard Maybe something brilliant will pop up later.
22:01 🔗 alard Meanwhile, does everyone have his/her warrior ready?
22:02 🔗 BlueMax I would but I'm not going to have a PC at home
22:02 🔗 BlueMax actually
22:02 🔗 BlueMax if my desktop still works
22:04 🔗 BlueMax hey, it works! is the latest warrior on the archive.org listing?
22:04 🔗 alard Yes, just download the most recent v2.
22:05 🔗 BlueMax 20120813. goodie.
22:05 🔗 BlueMax Just need to get VirtualBox set up on my desktop
22:06 🔗 alard To follow the Cinch project: http://tracker.archiveteam.org/cinch/ and http://warriorhq.archiveteam.org/
22:07 🔗 chronomex crap, did I fuck the server
22:07 🔗 chronomex alard: I added the munin thing to nginx.conf and did /etc/init.d/nginx restart just a few minutes ago
22:07 🔗 chronomex nothing's loading for me except / on the ip
22:07 🔗 alard Ah, I see.
22:08 🔗 chronomex oh, heh, restart only works when it's already running?
22:08 🔗 chronomex radical departure from standard debian init scripts
22:08 🔗 alard You don't have to restart Nginx to reload the configuration, by the way.
22:09 🔗 alard You can just send it a kill -HUP.
22:09 🔗 chronomex hmmmk
22:10 🔗 alard It seems to be back up! (Did I install that init script? Maybe.)
22:10 🔗 chronomex I figured it out, fwiw
22:10 🔗 chronomex I'm not a *complete* noob
22:10 🔗 alard (I did compile Nginx from source, so the init script probably wasn't included.)
22:10 🔗 chronomex ah
22:10 🔗 alard Heh.
22:10 🔗 chronomex hmmmm.
22:11 🔗 chronomex right, nginx with passenger is kind of not available in any of the normal ways
22:13 🔗 alard Anyway, the warrior should be able to deal with this, so no problem.
22:13 🔗 BlueMax I've got the Warrior working on Cinch on my desktop now
22:13 🔗 BlueMax I'm actually useful for once :D
22:14 🔗 chronomex there, warriorhq has graphs: http://176.58.114.30/localdomain/localhost.localdomain/index.html
22:14 🔗 alard BlueMax: Wonderful.
22:14 🔗 chronomex url to change soon no doubt
22:15 🔗 alard Ah, good. These graphs should be really boring, I hope, since it's not doing much.
22:15 🔗 chronomex they're kind of weird I suppose
22:15 🔗 chronomex what is not measured cannot be improved
22:15 🔗 chronomex maybe later I'll add some ateam-specific graphs
22:16 🔗 chronomex (you can add graphs to munin if you can write a shellscript that prints a number)
22:16 🔗 chronomex (it's real spiff)
22:16 🔗 alard #!/bin/bash ; echo $RANDOM :)
22:17 🔗 chronomex woo random graph
22:17 🔗 alard We can add a number-of-warriors-running graph.
22:17 🔗 chronomex totally
22:17 🔗 chronomex bandwidth per second, items per second, items outstanding, etc
22:17 🔗 chronomex whatever
22:18 🔗 chronomex total-saved-across-all-projects
22:19 🔗 alard redis-cli --raw -n 1 keys "warriorhq:instances:*"
22:19 🔗 alard redis-cli --raw -n 1 keys "warriorhq:instances:*" | wc -l
22:19 🔗 alard Where do I add that? (Or can you add it?)
22:19 🔗 chronomex ok, cool.
22:19 🔗 chronomex read up on munin, I guess
22:19 🔗 chronomex I've got work to do
22:20 🔗 chronomex it's probably somewhere in /etc/munin
22:20 🔗 chronomex or whatever
22:20 🔗 alard I found it, I think.
22:20 🔗 chronomex rad
22:20 🔗 chronomex usually munin takes 5-10 minutes to put up new graphs, as it's a cronjob
22:21 🔗 chronomex (words of warning, have caused frustration for me in the past)
22:23 🔗 alard Heh, I give up. Doesn't look like something that can be done in 5 minutes.
22:23 🔗 BlueMax Hey alard, am I going to need to get onto my desktop over the next few hours, or will the download work itself out if something goes wrong?
22:24 🔗 alard It's supposed to keep running without your attention.
22:25 🔗 chronomex alard: heh, ok, I'll look at it this evening
22:39 🔗 DoubleJ OK, for those of us not running warrior, what should we do after cloning cinch-grab?
22:39 🔗 DoubleJ Or do we need something else entirely?
22:40 🔗 alard If not running a warrior, you must be slightly adventurous (which you are, or you wouldn't not run a warrior).
22:41 🔗 alard You need to install the seesaw-kit, a python project.
22:41 🔗 alard https://github.com/ArchiveTeam/seesaw-kit
22:41 🔗 DoubleJ I already have a machine with Linux on it. Running a Linux VM on top of that seems kind of pointless/
22:41 🔗 chronomex yeah, we have unix for a reason
22:41 🔗 chronomex :)
22:41 🔗 alard Yes, it should work quite well in the future, but for now the warrior has seen more testing.
22:42 🔗 alard Anyway, run sudo pip install -e "git+https://github.com/ArchiveTeam/seesaw-kit.git#egg=seesaw"
22:43 🔗 alard (or clone the thing yourself and then install it, or don't install it at all but use the full path instead)
22:43 🔗 alard Then, if the compiled wget-lua of cinch-grab doesn't work, you should compile Wget+Lua: https://github.com/downloads/ArchiveTeam/cinch-grab/wget-lua-1.14.8-e8a24.tar.bz2
22:44 🔗 alard (It's been upgraded since Picplz, so don't use your old wget-warc-lua.)
22:44 🔗 DoubleJ Jesus H. I liked this place a lot better when I could just download seesaw.sh.
22:44 🔗 DoubleJ OK, lemme see if I can do all this without blowing something up.
22:44 🔗 alard It will be like that, or something like that. :)
22:45 🔗 alard The last step is quite simple: go to your cinch-grab directory and run-pipeline pipeline.py YOURNAME
22:46 🔗 DoubleJ sudo: pip: command not found
22:46 🔗 DoubleJ apt-get istall pip tells me I'm stupid.
22:46 🔗 DoubleJ (sudo apt-get install pip just tells me there's no such thing.)
22:47 🔗 alard http://www.pip-installer.org/en/latest/installing.html
22:48 🔗 alard (We should look at simplifying this. :)
22:48 🔗 DoubleJ Yes.
22:48 🔗 DoubleJ I have a Linux box, but I am not a Linux expert.
22:48 🔗 DoubleJ Or even a talented amateur.
22:49 🔗 alard Eventually, my idea is that you should install something, once, and then in the future you'd just git clone the repository and do that run-pipeline thing to start.
22:51 🔗 DoubleJ OK, it still tells me command not found.
22:51 🔗 DoubleJ (when I try to run pip)
22:51 🔗 yipdw DoubleJ: if it helps, pip is a Python package manager, and is labeled in Ubuntu's repos as python-pip
22:52 🔗 DoubleJ Well, seeing as how the other thing didn't work at all let me try that.
22:52 🔗 yipdw also that is one sweet-ass UI at warriorhq
22:52 🔗 yipdw but I am also a sucker for maps
22:53 🔗 DoubleJ OK, that did something. I have no idea where all the crap it downloaded went, but it downloaded something
22:54 🔗 yipdw you should now be able to invoke pip
22:55 🔗 alard sudo pip install -e "git+https://github.com/ArchiveTeam/seesaw-kit.git#egg=seesaw"
22:55 🔗 DoubleJ Yep, that worked.
22:55 🔗 DoubleJ That was the "I dunno where it went..." part.
22:55 🔗 alard Does run-pipeline give you anything?
22:55 🔗 DoubleJ Haven't tried yet.
22:55 🔗 DoubleJ I was makign sure there wasn't another step in the scrollback.
22:56 🔗 swebb alard: is this a big grab? Should I fire up a box?
22:57 🔗 DoubleJ Dumb question time: Is it possible to make changes to the seesaw scripts like before?
22:57 🔗 DoubleJ I like to remove the --remove-sent-files from the rsync command
22:57 🔗 DoubleJ (Since I'm paranoid about stuff getting nuked on FOS before it has a change to get into IA proper.)
22:58 🔗 DoubleJ That bit of coding I was capable of, but with this automatic thing I'm not sure I can do that now.
23:00 🔗 alard swebb: Join the fun if you want to, but I don't think it's too big. We also might want to get the installation instructions first.
23:01 🔗 alard DoubleJ: No, you can't easily change that.
23:01 🔗 swebb I've done other grabs using the AT tools, but I wanted to fire up the archiveteam warrior box for the first time and try it out. :)
23:01 🔗 alard The warrior, great, go ahead!
23:01 🔗 DoubleJ Dang. Kind of annoyed at splurging on that 3TB external now.
23:02 🔗 DoubleJ Ah well. Time to see if this thing is going to work.
23:03 🔗 alard If you haven't compiled wget, I've added a get-wget-lua.sh script now.
23:04 🔗 DoubleJ The wget-lua seems to be working.
23:04 🔗 DoubleJ I have a few complete; is it possible to check and make sure they're OK?
23:04 🔗 DoubleJ I mean, nothing choked, so I kinda figure I'm in good shape.
23:05 🔗 alard I can't check, but I think it's working fine.
23:05 🔗 DoubleJ That's going to be my assumption as well. Time to hose my bandwidth!
23:05 🔗 alard Are you going to run more than one?
23:06 🔗 alard If so, run-pipeline --help has options for that.
23:07 🔗 DoubleJ Was just coming back to ask that. Do I need to do anything special?
23:07 🔗 alard Yes, don't run more than one run-pipeline. Add --concurrent 5 to run 5 instances.
23:07 🔗 DoubleJ Ah. Nice change.
23:07 🔗 alard (You'll have a warrior-like web interface on http://localhost:8001/, by the way.)
23:08 🔗 alard It's not all bad. :)
23:09 🔗 DoubleJ Running 3 now; I'll see how this old box holds up.
23:09 🔗 DoubleJ Shame that I have to stop to change numbers, but since all the users are small it's no big deal.
23:09 🔗 alard That's something for the to do list.
23:10 🔗 alard Anyway, have fun. Good night!
23:10 🔗 SketchCow Night!
23:10 🔗 DoubleJ G'night
23:11 🔗 BlueMax night
23:11 🔗 BlueMax also hi SketchCow
23:12 🔗 alard (Also, to anyone reading this: don't be put off by the technical discussion, the warrior VM does *not* require you to do all this. It's easy(er).)
23:12 🔗 chronomex (TM)
23:13 🔗 SketchCow alard, can we blast ahead on the warrior for cinch?
23:14 🔗 BlueMax cinch is going full speed.
23:15 🔗 BlueMax http://tracker.archiveteam.org/cinch/
23:16 🔗 Dark_Star meh... upstream is still the limiting factor for me here... those indexing tasks were more fun ;-)
23:16 🔗 BlueMax my upstream is barely 100 kilobytes a second and I'm still helping.
23:17 🔗 * BlueMax donks Dark_Star on the head with a small ethernet cable
23:17 🔗 Dark_Star I have 250k bytes
23:17 🔗 BlueMax If I can do it, you can too!
23:18 🔗 Dark_Star downstream is ~5-10 megs/s though. so my warriors are more useful for the number crunching tasks like indexing, but of course I'll still help with the little bit of upstream I have :)
23:18 🔗 BlueMax so is mine. I work on 10/1mbit
23:21 🔗 Dark_Star has anyone tried running the warrior on amazon's cloud? I imagine the upstream there is a bit better (althoug I have no idea what it would cost per day)
23:21 🔗 BlueMax And Jason joins the race!
23:24 🔗 DoubleJ And I'm only doing --concurrent 3
23:25 🔗 DoubleJ Oh. Other Jason.
23:25 🔗 DoubleJ Never mind.
23:25 🔗 Dark_Star you can really see the upload bandwith of each uploader in comparison to each other, in the tracker
23:27 🔗 Dark_Star e.g. the interpolated slope for my uploads is ~2x as big as that of BlueMax (which translates to 100kb/s vs. ~200-250kb/s)
23:28 🔗 chronomex alard: I'm quite curious to see what's causing these weird load spikes - http://176.58.114.30/localdomain/localhost.localdomain/load.html
23:28 🔗 Dark_Star and doublej seems to have at least 500kb/s uplink
23:31 🔗 DoubleJ I'm on FiOS; I have a few megs/sec if I want to saturate thigns.
23:31 🔗 DoubleJ But I want to do other stuff on the internet, and this is a small project :)
23:32 🔗 Dark_Star heh, okay... I'm only on cable. no fiber around here (at least not for any sane budget)
23:33 🔗 Dark_Star ok time for bed. g'night all
23:33 🔗 chronomex gnibt
23:33 🔗 DoubleJ 'night.
23:40 🔗 mistym alard: Can the cinch grab tool be run locally, or is it just made for warrior this time?
23:41 🔗 Coderjoe chronomex: do you have some crons firing off every 7 hours or something?
23:41 🔗 DoubleJ mistym: Check the scrollback. Alard and yipdw helped me get it rolling on my Ubuntu machine.
23:41 🔗 S[h]O[r]T reading backlog now will get a cinch grabber setup shortly
23:42 🔗 mistym DoubleJ: Logged in midway, I missed the important part :(
23:42 🔗 chronomex Coderjoe: I don't, I haven't actually done much of anything with the box
23:42 🔗 DoubleJ D'oh.
23:42 🔗 DoubleJ OK, hang on.
23:42 🔗 chronomex I brought up a stock debian stable install and more or less handed it off to alard
23:42 🔗 DoubleJ Step 1, apt-get install pythin-pip if you don't have it.
23:42 🔗 DoubleJ Er, pythOn-pip
23:42 🔗 DoubleJ (Bad place to typo, me.)
23:43 🔗 mistym Totally have pip.
23:43 🔗 DoubleJ Step 2: sudo pip install -e "git+https://github.com/ArchiveTeam/seesaw-kit.git#egg=seesaw"
23:45 🔗 GLaDOS Did someone say new archiving project?
23:45 🔗 DoubleJ Step 3: git clone https://github.com/ArchiveTeam/cinch-grab
23:45 🔗 BlueMax feature request for archiveteam warrior v3 - have tetris playable in the web interface
23:46 🔗 SketchCow Bluemax, resend metadata.
23:46 🔗 DoubleJ GlaDOS: Why, yes. Grab the newest Warrior if you like. I've heard tell that it kicks many asses.
23:46 🔗 BlueMax SketchCow, same address?
23:46 🔗 SketchCow Yes.
23:46 🔗 SketchCow It's beyond slick, GLaDOS
23:46 🔗 GLaDOS I see...
23:47 🔗 godane BlueMax: Maybe we can have jsmess playable too
23:47 🔗 DoubleJ Step 4: cd cinch-grab
23:47 🔗 DoubleJ Step 5: run-pipeline pipeline.py YOURNAME
23:47 🔗 BlueMax godane: that would be perfect :P
23:47 🔗 mistym DoubleJ: Thanks!
23:48 🔗 godane only pubic domin roms so the iso doesn't copyright problems of course
23:48 🔗 DoubleJ mistym: No prob. If you want to run multiple instances, you do that by setting the --concurrent parameter.
23:48 🔗 DoubleJ E.g. run-pipeline --concurrent 3 pipeline.py YOURNAME
23:49 🔗 DoubleJ If you're on Ubuntu or Debian, the wget-lua in cinch-grab should work. If not you'll need to run get-wget-lua.sh
23:50 🔗 SketchCow We'll see who unlocks the scobelizer achievement
23:51 🔗 mistym "'AsyncPopen' object has no attribute 'pipe'" Hm. Maybe I'm using the wrong version of python.
23:52 🔗 GLaDOS ...really?
23:52 🔗 GLaDOS Installing liblua pulled down an incompatible linux image
23:55 🔗 DoubleJ mistym: The version I have is 2.7.1+ if that helps you. Ubuntu... 10.11, I think?
23:55 🔗 mistym 2.7.3 here.
23:56 🔗 mistym Oh, looks like it's actually Tornado throwing an exception.
23:57 🔗 DoubleJ I don't think I can help you there... Tornado got downloaded and everything just worked.
23:57 🔗 DoubleJ And alard is asleep now.
23:57 🔗 mistym Ah well.
23:59 🔗 GLaDOS Remind me never to reboot my VPS.

irclogger-viewer