[00:00] more seriously, I think we could actually use git-annex at my current workplace for versioning slide scans [00:00] or perhaps annotation data on that [00:00] slide scans are a few hundred megabytes each [00:01] that's kind of big [00:01] what sort of instrument do you use to scan? [00:01] they're massive, massive images [00:02] and what sort of originals? [00:02] I think [00:02] we're using a NanoZoomer 2.0 slide scanner here [00:03] on average, it's not quite as bad as a few hundred megabytes [00:03] but when you combine multiple focal planes with the highest image quality, yeah, it can get there [00:03] aha. [00:03] the idea is that you should capture enough data from the slide to permit diagnoses to be made from the capture [00:04] and the bar for that is "what a histologist can see through a microscope" [00:04] which is quite high :P [00:04] a friend and I have cobbled together a high-speed slide scanner, using a carousel slide projector + low wattage bulb + ground glass screen behind the slide, dslr where the projector lens normally lives [00:04] ahaha different slides :P [00:04] heh yeah [00:05] I forgot "slide" had another meaning [00:05] we've gotten really good results with a not-very-fancy camera [00:05] ah, here we go [00:05] http://sales.hamamatsu.com/en/products/system-division/virtual-microscopy/index.php?id=13222680 [00:05] maximum resolution is 0.23 micrometers/piel [00:05] pixel [00:05] yow [00:05] so for a 26x76mm slide, yeah [00:05] work that out [00:06] generally the person scanning the slide will select a region of interest so that they don't have to wait forfuckingever to get an image [01:42] http://torrentfreak.com/book-publishers-shut-down-library-nu-and-ifile-it-120215/ [01:42] According to the complaint, the sites offered users access to 400,000 e-books and made more than $11 million in revenue in the process. [01:42] See. [01:42] This is the thing. [01:43] It is so hard for me to go "OH NO A DIGITAL LIBRARY OF ALEXANDRIA IS GONE" [01:43] I work at one, thank you very much [01:43] i'm just wondering if that stuff is really gone :( [01:43] links that is [01:44] if library.nu at least comes back in some form one could crawl it, but this was crazy-sudden [01:44] but THINK OF THE CHILDREN [01:44] Oh I am [01:44] OK, finished [01:45] * SketchCow zips up [01:45] So, how long has git-annex been around? [01:45] I had someone drop this on me, so it's all new, but obviously it is rather mature. [01:45] I think two years? [01:45] maybe three [01:47] well their gitweb only goes back to 2010-10-09 [01:48] which is around the time articles about it started popping up [01:48] arrith: well that's 1/3rd the lifetime of git [01:48] so *a long time* suffices ;) [01:48] git-annex is not some flaky script that was quickly thrown together. I wrote it in Haskell because I wanted it to be solid and to compile down to a binary. And it has a fairly extensive test suite. (Don't be fooled by "make test" only showing a few dozen test cases; each test involves checking dozens to hundreds of assertions.) [01:48] from http://git-annex.branchable.com/not/ [01:48] the dev seems to be pretty capable is why i pasted that [01:49] that's about the time I first heard about it; I lurk on the vcs-home mailing list which is where it was announced iirc [01:50] arrith: I believe closure is joey hess, the author [01:50] oh wow, that's neat [01:51] 20 Oct 2010 is the earliest date in the debian changelog here: http://packages.debian.org/changelogs/pool/main/g/git-annex/git-annex_3.20120123/changelog [01:51] so yeah i'm going with late 2010 [01:51] seems reasonable [01:53] jesus 1/3 the lifetime of git?! [01:55] What the fuck kind of time measurement is that. [01:55] We'll be done with the project in .4 git-annex lifetimes [02:06] all of library.nu's actual content was hosted on other filehost sites [02:07] like megaupload [02:07] we're fucked [02:08] DFJustin: yeah, just need the links [02:08] almost like tpb and magnet links [02:08] but the alexandria comparisons are pretty silly because all of the stuff still exists in print in regular libraries [02:09] although there is some information in torrent files that won't be available through magnet links alone, tracker urls for example [02:16] http://pastebin.com/NhA3VPhK ... I'm just saying [02:34] https://plus.google.com/hangouts/extras/talk.google.com/jason's%2520incredibly%2520boring%2520clubhouse?authuser=0&hl=en&eid= [02:56] Blargh, I hate it when I have to repair my main desktop :( [02:56] At least this channel is publicly logged [02:59] closure: Been playing/learning about git-annex all evening [02:59] This is excellent! [02:59] Currently making a repo containing all archiveteam uploads on archive.org [03:00] (pointing to web remotes) [03:00] Mostly just to practice using it, but who knows, might be useful [03:01] closure: wrt "file sharing", isn't it just adding other people's remotes and vice versa? [03:01] (of course, they need to be trusted people since they need something like ssh access over git-annex-shell) [03:01] well yeah, basically [03:02] s/trusted/semi-trusted/ [03:02] you can also put git repos on http:// and no logins needed [03:02] or some other things [03:02] yeah [03:02] But if you put repos on http, how would you also distribute the files? [03:02] by http [03:02] in the same directory [03:03] (Since they're not actually wrapped in the git repo) [03:03] Oh, okay, just curious [03:03] they're in .git/annex/objects/ which is accessible via http if you put .git up for http [03:03] I'm only familiar with using git daemon to clone [03:04] So if I git clone a repo, I end up with all the files that are "in" that copy of it at that time? [03:04] no, it ends up empty, you have to get the files you want in that clone [03:05] So, how would you do that over http? [03:05] Just make .got/annex/objects web accessible, and mirror it? [03:05] s/got/git/ [03:06] you say "git annex get foo" and it goes and gets it, if you cloned from http:// it knows where to go [03:06] Oh, wow [03:06] That's rad! [03:06] Same with a clone over git://, or only http? [03:07] same with any clone, *except* for git:// actually [03:07] (because git:// protocol can't transfer arbitrary files) [03:07] but over ssh, sure [03:07] or rsync [03:07] Oh, okay. [03:08] What's the easiest way to expose a git repo over http? [03:08] (if you have an opinion) [03:08] well, I think you want to make a separate, bare repo, and there's this hook you have to enable. bit of a bother really [03:09] oh, I see [03:09] course for your repo of all the archiveteam stuff, you could just put it on github [03:09] yeah, 's what I planned to do [03:09] since you're telling it where to get all the files from the web [03:10] Since everything's web-remote [03:10] Yep :) [03:10] This is incredibly awesome, btw [03:10] that's sweet. [03:10] hmmm. [03:10] Athough now I'm wrestling cabal [03:10] looks like git-annex will solve some stupid problems I have [03:10] (trying to compile the latest git-annex, since the hackage version doesn't have the --file flag for addurl) [03:11] I'm not very fond of cabal. [03:11] with what version of ghc are you building it? [03:11] Could not deduce (Show a) arising from a use of `showHex' [03:11] Data/Digest/SHA2.hs:111:4: [03:11] [ 1 of 26] Compiling Data.Digest.SHA2 ( Data/Digest/SHA2.hs, dist/build/Data/Digest/SHA2.o ) [03:11] bound by the instance declaration at Data/Digest/SHA2.hs:109:10-39 [03:11] from the context (Integral a) [03:11] 7.4.1 [03:11] Latest [03:11] (the problem isn't in your thing, it's in one of the deps) [03:11] yeah, I had the same failre recently. Something broken there [03:12] did you work around it? [03:12] I can't remember [03:12] :P [03:12] I suppose I could build without hs3 [03:13] yep [03:13] in fact, I think that's what I did.. git merge no-s3 [03:15] I feel like I'm forgetting something dumb [03:15] V [03:15] 0 3:15AM:abuie@abuie-dev:~/master 23266 Ï git merge no-s3 [03:15] fatal: 'no-s3' does not point to a commit [03:16] wtf, you're underscor? [03:16] oh, yeah [03:16] sorry [03:16] try origin/no-s3 [03:17] Wheeeee [03:17] Thanks [03:17] now SketchCow needs to come in here as overfiend or something, and our collection of name confusion would be complete [03:18] haha [03:19] oops, forgot pcre-light [03:19] With addurl --fast, does WORM info get recorded? [03:20] (I know you can't do a sha backend with fast, but I wasn't sure about worm) [03:21] with the new version and --fast, it records the file size. Which is basically all WORM does [03:21] it doesn't use that backend, but it's the same level of assurance (ie, not much) [03:22] note that you can always `git annex migrate` later and it will pull it down from the web and convert to a checksum [03:23] Oh, okay, excellent [03:23] jfeifjsepfjdpsfjdkfjweklfjsdcx [03:23] git-annex: unrecognized option `--file=boingboing-2000-2005_files.xml' [03:24] git annex version? [03:25] git-annex version: 3.20120124 [03:25] whoops [03:25] My ruby $PATH and shell $PATH didn't match up [03:25] Working now [03:26] Heh, sorry [03:38] is batcave available for rsync'ing mobileme users again? drive is getting dangerously full [03:38] someone drove the batmobile thru the wall of batcave, I hear there's a new bat location somewhere [03:40] dcmorton: git pull [03:40] then you can run the uploader [03:44] underscor: got it.. thanks [03:44] np [03:49] closure: I'm imagining a giant git-annex repo with everything in archive.org [03:49] :D [03:49] That would be really neat [03:50] I wonder how well it would scale to that though [03:50] heh [03:52] you hit git scalability issues eventually [03:52] I've been working last 3 days in scaling git [03:52] git-annex to millions of files.. it does. but git, not so much :) [03:53] * closure has a directory with 300 copies of the linux kernel source tree in it. takes a while to rm [03:54] haha [03:54] Where does the problem in git lie? [03:54] yay for lvm, so I can just nuke the volume [03:54] Just inefficiency with an index of millions of files? [03:54] oh, it keeps every file in .git/index and rewrites it all the time [03:55] some other stuff. Facebook was complaining about this, it seems their source tree is insane and too big for git [03:55] haha [03:55] damn [03:55] hmm [03:55] Is there a way to make this work? [03:55] git annex addurl --fast --file=NUMBERS/geocities-3-d.7z.001 http://archive.org/download/2009-archiveteam-geocities-part1/NUMBERS/geocities-3-d.7z.001 [03:56] what, to create the directory? [03:56] Right now it spits out an angry error about a nonexistent directory [03:56] Yeah [03:56] I mean, I can write logic to create the directories, and cd in to each one, etcetera [03:56] But that feels real clunky [03:58] will fix [03:58] \o/ [03:58] Thanks! [04:00] Out of curiosity, why'd you choose haskell? [04:00] (I personally love the language, but I know there are others that don't agree) [04:05] fixed. [04:05] because it was time to learn haskell and also I wanted something solid [04:06] I didn't know you were into haskell [04:12] "a monoid is a monad in the category of endofunctors, what's the problem?" [04:12] er [04:12] monad -> monoid [04:13] whoops [04:13] closure: Yeah, I got into it this summer when I was in summer residential governor's school [04:14] one of my classes was mathematical problem solving, and haskell's lazy evaluation and excellent iterable abilities let me solve problems 20 times faster than anyone else in the class [04:14] (who wasn't using haskell) [04:14] * closure hits yipdw with a typoclassopedia [04:14] I haven't done anything beyond write math programs in it though [04:15] underscor: heh, that's about my exposure too -- I've been using it for Project Euler [04:15] for some weird reason [04:15] well damn, I wish I'd known, I could have had you slinging git-annex code [04:16] underscor: also, Paul Hudak's The Haskell School of Expression is most excellent [04:16] I will forever love fibs = 1:1:zipWith (+) fibs (tail fibs) [04:16] Ooh, I'll have to check it out [04:16] Yeah, that was the first problem we had to solve [04:16] 10 thousandth fib number [04:17] Only took me a few minutes, everyone else spent >45 minutes [04:17] closure: Hehee [04:18] I need to learn about how the rest of it works, though [04:18] Like type declarations and stuff [04:18] hmm, I need to get ahold of the Haskell School of Expression [04:18] My exposure's pretty much limited to fuckery in ghci [04:18] underscor: try executing fibs !! 10000 on that definition, it will fly [04:19] I know :D [04:19] yeah, I've been starting to try to learn about dependant types and type level programming [04:19] on my Xeon it completes in something I can't measure [04:19] the reason why it works is also mind-boggling (to me anyway) [04:19] lazy evaluation up the ass [04:19] fucking delicious :D [04:19] closure: is there a way to push git-annex files to another repository? [04:20] Coderjoe: git annex copy foo --to reponame [04:20] Coderjoe: Yeah, git annex copy file --to remotename [04:20] after you set up a git remote for it [04:20] closure: Damn :P [04:20] and what protocols does it handle? [04:20] and is there an ability to copy multiple files? [04:20] any protocol that can be used for a normal git remote (except git://) .. ssh, rsync, http [04:21] yes, foo can be a file or a directory [04:21] or any number of either [04:21] or leave it off to do the whole current directory :) [04:21] You can push via http? [04:21] Damn, never new that [04:21] +k [04:22] um, no, you can't upload bia http [04:22] aw [04:22] well, I don't support WEBDAV yet at least.. [04:22] because http can do it... [04:23] true, but server side it's a bit of a nightmare [04:23] yipdw: closure: No Haskell School of Expression, but any other haskell books here catch your eye? http://hastebin.com/tahewokuco.coffee [04:23] Sorry for the gross formatting [04:24] underscor: I haven't actually read any of those, though I did meet Bryan O' Sullivan at Erlang Factory once [04:24] he's a pretty swell guy [04:24] :P [04:24] so I guess his Real World Haskell book is probably good [04:24] That's neat [04:24] I've been meaning to look at the Bird too [04:24] my name is in RWH :) [04:24] I suppose I was coming more from do-you-want-a-copy-of-them [04:24] ohh [04:24] illicit [04:25] I SEE [04:25] hmm [04:25] um, Haskell School of expression is listed there :P [04:25] the thing I like about HSE is that it uses functional programming for applications that IMO one does not see very often [04:25] it actually focuses on FRP [04:25] Woah, look how blind I am [04:25] which is pretty need for an introductory book [04:25] neat, too [04:25] yipdw: Yeah, mildly illicit :P [04:26] This is basically library.nu, except private [04:26] closure: you're cited in there? [04:26] reviewer [04:26] ahh neat [04:26] along with like 500 other people [04:27] Whoops, client crashed [04:28] Oh, yeah, I remember seeing your name, closure [04:28] I was like "I know that guy!" [04:28] haha [04:28] (yes, I'm that freak who reads all the reviews) [04:29] http://research.microsoft.com/en-us/um/people/simonpj/papers/history-of-haskell/index.htm [04:29] Looks interesting [04:30] yes, I enjoyed that one [04:30] Simon Peyton-Jones sounds like a supervillain's name [04:30] along with R. Kent Dybvig [04:30] Found: [04:31] Why not: [04:31] fmap a (getStale file) [04:31] getStale file >>= return . a [04:31] man, I love hlint [04:31] jvdksvjlewjvldsjvkdjsvldsvlkds [04:31] you should find some SPJ talks. They're intorductory, but he's one of the best presenters I've ever seen [04:31] It's unseeded right now [04:31] oh, hlint suggests alternative constructions? [04:31] I'll link it when it finishes downloading [04:31] Probably overnight [04:42] hmm, a <$> getStale file is better though. hlint must not like applicatives [04:51] closure: yipdw: http://ksnd.it/dl/6613c98564e [04:51] It's a djvu [04:53] expired request [04:53] Weird [04:53] One sec [04:53] works here [04:53] http://ksnd.it/v/6613c98564e [04:54] ^ closure [04:54] damn wasted too much time listening to jason, now I have to do things [04:55] hahahah [04:55] 00000000-0000-0000-0000-000000000001 -- web [04:55] 85019413-6049-441c-a4a9-2f17dc0e734a -- here (ArchiveTeam Releases (@ IA)) [04:55] semitrusted repositories: 2 [04:55] untrusted repositories: 0 [04:55] dead repositories: 0 [04:55] local annex keys: 0 [04:55] local annex size: 0 bytes [04:55] known annex keys: 276 [04:55] known annex size: 57 gigabytes [04:55] backend usage: [04:55] URL: 276 [04:55] closure: Working beautifully now with the directory fix [04:55] Thanks! [04:56] personally, and especially for the archiveteam repo, I "git annex untrust web" [04:56] although if it's all archive.org urls, you *may* trust it :P [04:56] hehe [04:56] I planned on doing that, but after I add everything [04:57] Not really for any particular reason, I suppose [04:57] Makes it so I don't have to force on drop though [05:27] Wow! [05:27] where: &w_collection=*archiveteam* | size: 30,266,111,202 KB| [05:27] That's incredible! [05:28] Of course, mobileme will nearly increase it 10fold [05:28] But still! [05:33] kekekek [05:33] what's the size of cdbbsarchive out of curiosity [05:35] gimme a sec [05:35] btw, this is where fos lives, if y'all are curious about stats [05:35] http://ia700108.us.archive.org:8088/mrtg/ [05:35] You can see where someone started uploading mobileme [05:35] probably dcmorton [05:35] haha [05:36] DFJustin: where: &w_collection=cdbbsarchive | size: 366,592,842 KB [05:36] That would be pretty cool to have as a git-annex repo too [05:38] hehe [05:39] (Recording state in git...) [05:39] 1 5:39AM:abuie@abuie-dev:~/cdbbsarchive 23540 Ï ruby ../ia_annex.rb Gold_II [05:39] Gold_II is not a collection. It is, in fact, a software [05:39] Mirroring Gold_II because its parent is cdbbsarchive [05:39] addurl GOLD_II.cdr ok [05:39] addurl GOLD_II.jpg ok [05:39] Wheee [05:39] git clone ALLSHAREWARE [05:39] That's how it'll be once this finishes ;P [05:40] guess this fulfills jason's wish for an easy way to download collections [05:40] Yeah, assuming he likes it [05:40] (ping SketchCow, so he sees it) [05:41] Also need to write hooks that will automatically do junk when items within a collection are updated [05:41] Need to see if they'll let me touch petabox code [05:41] ;D [05:44] They won't. [05:44] But provide assistance. [05:45] Quick, the boss is here [05:45] Look busy! [05:45] SketchCow: Yeah, I know. Hank and BK are still sore about that incident in October [05:45] (rightfully so) [05:46] DFJustin: SketchCow: 8 down, 800 to go! http://hastebin.com/cowotigili.hs [05:49] closure: git-annex: /home/abuie/digibarn/.git/annex/tmp/remote_web_182_980_URL-s4619--http&c%%archive.org%download%DigibarnBruceDamerOnHowWilliamShatnerChangedTheWorldhistoryChannel%DigibarnBruceDamerOnHowWilliamShatnerChangedTheWorldhistoryChannel.thumbs%history-channel-shatner-digibarn-brucedamer__000390.jpg.log: openBinaryFile: invalid argument (File name too long) [05:49] :( [05:52] ouch.. [05:52] I will fix that tomorrow [05:54] I guess to really have ALLSHAREWARE you'll want http://www.archive.org/details/tucows as well [05:54] (the description on which is now out-of-date....) [05:54] closure: <3 [05:54] Thanks! [05:54] DFJustin: Yeah, I plan on doing it too [05:55] But once I have an automation workflow in place [05:55] Also, still wanting to hand-comb output for the time being [05:55] :) [05:56] they need to do another tucows pull too, "this just in...7 years ago" [05:56] the site is amazingly still up [05:58] wow [05:58] that's pretty impressive [05:58] I remember using tucows in like 2004 [05:59] back on my G3 AIO [05:59] :D [05:59] Fuckin' 10 years old [05:59] hahaha [05:59] I remember when "winsock software" actually meant something [06:00] :o [06:00] That was a while ago [06:00] ;P [06:00] but I'm still a whippersnapper compared to some folks in here :) [06:02] Very true [06:02] Man, I remember playing this game [06:02] Damn, what was it called [06:02] It was like, you pretended you were in a museum [06:02] and there were all these puzzles and stuff you had to solve [06:03] I remember that in 3rd grade START (gifted education) on Windows 95 [06:03] Man, now I really want to know what it was called >:| [06:05] in 3rd grade we had apple IIs [06:05] We had apple IIs until 2nd grade [06:05] underscor: fixed [06:06] and then we got celerons with windows 95 [06:06] closure: :D [06:06] We couldn't use "Batang" on those celerons [06:06] It would crash the computer [06:06] I remember that, too [06:06] God, all these memories [06:07] OMGOMGOMGOMGOMGOMG [06:07] I FOUND IT [06:07] * underscor is so excited [06:07] http://www.abandonia.com/en/games/479/Museum+Madness.html [06:07] is that a text adventure game? [06:08] Nope, windows 3.11 and 95 [06:08] Man, I'm gonna have to set up a win95 vm so I can play this <3 [06:09] closure: Thanks again for fixing that bug [06:09] Couldn't resist, eh? ;) [06:09] Wow, lots of changes! [06:09] turned out to be easy, can just truncate and add a md5 for uniqueness [06:09] oic [06:11] http://www.amazon.com/Museum-Madness/dp/B0009U7CLQ [06:11] ha ha [06:11] prime eligible! [06:12] Customers buy this item with The Oregon Trail, 5th Edition by The Learning Company Windows 98 / Me / 95, Mac [06:12] Frequently Bought Together [06:12] Hehehehehe [06:12] I'm sure TONS of people have bought that combo lolololol [06:13] closure: Works like a charm, many thanks! [06:54] underscor: so how big and unweildy is the git repo so far? [06:54] wow [06:54] 70-01-08.us is kinda overloaded O_o [07:03] ha [07:03] Coderjoe: http://hastebin.com/totedewice.hs [07:03] Keep in mind that they're actually smaller than that, it's just in the middle of adding files [07:03] (re the du) [07:08] seems like a lot of disk space for 1765 urls [07:09] would repacking help? [07:11] Like I said, there are a bunch of temporary cruft laying around [07:11] s/are/is/ [07:11] They're all adding files, which takes up space until they're dropped [07:12] PSA: Sleepytime tea + agave nectar = apotheosis of human creation [08:19] closure: btw, when you're opening files for hashing in git-annex, are you opening them with the noatime option? [08:19] http://kerneltrap.org/node/14148 [08:22] someone here archived lachlan cranswick page, why can't i find it on the wiki under 'people' ? [09:00] oparty [09:01] Hooray, I fixed the jamendo bug [09:01] Asked Emjirp to help me find ones that messed up. [09:01] Shouldn't be many. [09:01] cryptops1: uh... because I forgot to add it to the wiki? [09:03] other than possibly uploading it to batcave (and perhaps downloading it to home), I can no longer remember what I did with the resulting warc file [09:04] ah. found it [09:04] 2.7G lachlan_cranswick/ [09:07] np [09:07] i was afraid it got thrown out [09:08] and, iirc, there is a copy on batcave for SketchCow to push into archive.org at some point [09:09] I delete everything [09:09] Need more space for dwarf fortress 2012 [09:10] World of Wanking 2012? [09:10] with the new Hot Elf Babes expansion pack? [09:13] <@SketchCow> I delete [09:13] Haha, like that'd happen! [09:59] mmm world of wanking [10:07] I am slamming 6,200 podcasts into archive.org. [10:07] http://www.archive.org/details/mypodcast-2020 [10:26] http://www.archive.org/details/mypodcast-zstalkshow [10:26] It's beginning. [10:26] * SketchCow bows. [10:38] * BlueMax laughs [10:38] Found this while browsing r/wtf on reddit [10:38] http://i.imgur.com/FW8xz.png [10:41] Here, BlueMax http://www.archive.org/details/mypodcast-dragonballradio [10:42] SketchCow: ? [10:42] What's this supposed to be [10:42] I'm rewarding you [10:43] * BlueMax scratches his head [10:43] Over 9000 rewards [10:47] 26 podcasts already up [10:47] BlueMax: Can you link me to the wtf [10:48] www.reddit.com/r/WTF/comments/pqrya/who_seriously_sits_and_writes_this_stuff/ [10:48] ahaha [10:49] and that links to this image with "Ariel's Wedding night" on textfiles.com: http://i.imgur.com/FW8xz.png [10:49] what ersi said [10:51] To think that page wouldn't have survived if it wasn't backed up [10:51] I've got 6 different scripts running to push these podcasts in. [10:54] And we should all be greatful [11:42] mirroring blackhat.com [11:45] wait, what the hell? lol [11:48] why is www.geocities.com.7z.302 is listed as WAVE file on archive.org [11:48] http://www.archive.org/details/2009-archiveteam-geocities-part1 [11:52] is that the fixed archive? [11:53] i don't know [11:54] i think i need to get a bluray burner [11:54] okay, sure enough i grabbed geocities off the torrent wire, but i had to grab some files again because some archives were damaged, though im not sure which ones [11:55] not sure if it might be the same on archive.org or whether it's been appended [11:59] compare the checksums [12:02] i'm just thinking of squashfs [12:02] cause squashfs does hard links if files are the same [12:04] i have full squashfs file of www.defcon.org that can be used host it locally on local lan [12:20] Archive.org isn't hot with some things. [12:20] .302 is a sound extension [12:20] But this isn't a sound file. [12:21] thats what i thought [12:28] http://www.archive.org/details/mypodcast-beatmd [12:28] 14 hours of sound mix! [12:32] All those file didn't have any license attached? [12:40] looks like alot of .ppt files are 404 on blackhat.com [12:40] for blackhat 2001 [12:59] blackhat.com was not that big [12:59] only 1.2gb [12:59] defcon.org is like 3.8gb [13:16] alot of .pdf didn't get download or are 404 on blackhat.com [13:16] thats not good [13:19] 1485 broken links it looks like [13:24] I heard that video talks contain hidden info using steganography. [13:29] i have to redownload it [13:29] stupid wget-warc delete everything i think in then try redownloaded it [13:30] i thought it was only redownloaded files i had [13:31] anyways i added a -o www.blackhat.com.log to the end of my command [14:43] Coderjoe: for hashing git-annex uses sha256sum etc commands. So whatever they do. [14:46] closure: Whenever you have a few minutes, want to look this over and see if I did anything wrong? [14:46] https://github.com/ArchiveTeam/ia-digibarn [14:53] underscor: yeah.. you need to push the git-annex branch too [14:53] 0 2:53PM:abuie@abuie-dev:~/digibarn 23635 Ï git annex status [14:53] ae80d947-67dd-46e7-88be-1503f57cd03b -- here (DigiBarn (@ IA)) [14:53] semitrusted repositories: 1 [14:53] supported backends: SHA256 SHA1 SHA512 SHA224 SHA384 SHA256E SHA1E SHA512E SHA224E SHA384E WORM URL [14:53] supported remote types: git bup directory rsync web hook [14:53] trusted repositories: 0 [14:53] untrusted repositories: 1 [14:53] 00000000-0000-0000-0000-000000000001 -- web [14:53] dead repositories: 0 [14:53] local annex keys: 0 [14:53] local annex size: 0 bytes [14:54] known annex keys: 2190 [14:54] known annex size: 12 gigabytes [14:54] backend usage: [14:54] URL: 2190 [14:54] closure: Ok, pushing now [14:54] it should auto-push after the 1st time.. that's where git-annex keeps its info [14:54] So, after I git push -u origin git-annex once, it should auto-push from then on? [14:55] that's been my experience [14:55] okay, cool [14:55] thanks! [14:56] Oh, wow, lots of objects in that branch [14:56] haha [15:05] 70% compressing [15:05] wow, that's more objects than I'd expect [15:05] or an arm box? [15:06] it'll have 2 files in the branch per file in the repo, plus some directories etc.. [15:06] oh, you did individual addurls per file.. so it also has 2000 commits I guess [15:43] closure: looks ok [15:43] woah [15:43] * closure core dumps [15:50] closure: ~17k objects [15:51] It's on an overworked nfsmount, so that doesn't help with IO [15:51] yeah, so it's some unavoidable, but some of it can be improved [15:51] how do you spider a website without download? [15:52] closure: Still pretty damn impressive! [15:52] :) [15:52] I'm testing the recursive collection downloader to git-annex script I wrote this morning on http://archive.org/details/vectrex [15:53] Nothing like being bored in statistics! [15:53] * Archivis2 chuckles [15:53] well, I'm thinking of adding an option --pathdepth=N , and it would take the last N parts of the path and use that for the filename. Then you could run one single git annex addurl and pass it all the urls in one go. This would be more efficient. [15:54] That'd be pretty neat [15:54] Although some of these are >1,000 files, so they probably would need to be batched smaller [15:55] closure: It works! \o/ http://hastebin.com/siyiyinoki.hs [15:55] (I purposely did a small collection to start with) [15:55] hmm, command line length limits you mean? [15:55] possibly a problem yes [15:55] xargs it [15:56] I do wonder if one repo per collection is the right granularity. You could put them all in one repo, might be more fun :) [16:01] All IA items in one repo? [16:01] I dunno how well it'd handle that [16:02] all archiveteam items in one repo [16:02] Oh, yes [16:02] That's how it's going to be [16:02] all IA would probably be insane [16:03] vectrex and digibarn are collections similar to AT [16:03] (number-of-items-wise) [16:03] AT one is running, but it takes much longer because there is a large number of individual files [16:04] ah, I see [16:04] Especially in the geocities/yahoo video things, where they're split into smaller 7z.nnn [16:04] let me write this feature and you can have much faster runtime [16:04] \o/ [16:06] https://github.com/ArchiveTeam/ia-vectrex [16:06] git-annex branch still pushin [16:06] g [16:14] http://blog.archive.org/2012/02/15/want-to-help-build-a-distributed-web/ [16:17] Just depth [16:17] filter (not . null) $ split "/" fullurl [16:17] fromend depth $ map escape $ [16:17] | depth > 0 -> filesize $ join "/" $ [16:17] oh yeah [16:25] Archivis2: ok, --pathdepth pushed [16:25] I'd be curious to see the comparison importing using it and xargs [16:28] Sweet, I wanted to try git-annex for archiveteam today and now I come here and someoen already did all the work ;-) [16:30] Are they checksummed? [16:30] not the way he's doing it, it would need to download them all. but can be migrated to checksums later [16:31] git annex migrate --backend=SHA256 [16:31] The meta.xml files do contain checksums, maybe it is possible without downloading everything? [16:31] oh, hmm, suppose [16:32] Oh, not meta.xml, _files.xml [16:32] Though it only carries md5 and sha1 sums, so you'd have to use the sha1 backend [16:41] maybe something like git annex migrate --converter=xml-parsing.rb [16:42] will think on it [16:45] does anyone happen to have freenode's podcasts? [16:45] the freenode network released some rather obscene podcasts around year 2006-2008? [16:45] lol, really? [16:45] yea they were pretty sweet! [16:45] according to their descriptions [16:45] was lilo in them? [16:46] and they won't upload them for me since i'm life-banned on that network [16:46] i don't know, i don't think so [16:46] he was an old pal of mine, it would be good to hear his voice [16:46] if he faked his death, you won't be hearing from him [16:47] i always like to think people faked their death with a small chance of finding them [16:47] but i've considered the possibility when my friends have 'died' [16:47] anyways ... there really ARE freenode podcasts! [16:47] i think the url is still up [16:48] podcast.freenode.net [16:48] i was wrong, they are from year 2009 [17:02] cryptops1: Lol how did you get life banned? [17:06] don't know, but i'm ban evading right now probably [17:09] i'm getting all the '404' files from blackhat.com [17:10] i did a spider to what i got of a local version of mirror of blackhat.com [17:10] 1496 files were missing [17:12] there are still 404 errors in the list but i'm getting most of it [17:34] one of the topics for a freenode postcast is 00:22:59 - We beg for money [17:34] so maybe lilo was still around back then [18:02] closure: Thanks! [18:02] How do I use it? [18:03] Hey. [18:06] Hi SketchCow [18:06] How goes it? [18:07] Are you new or did you change your name? [18:07] Oh, sorry [18:07] Let me go change the default for efnet [18:08] (I don't usually irc from my laptop, so all my settings aren't on here) [18:11] underscor: you should be able to do something like: geturls | xargs git-annex addurl --fast --pathdepth=2 [18:12] where urls spits out the URL list [18:12] yep [18:13] So, if I have example.com/dir1/file1.mp3 example.com/dir1/dir2/file2.mp3 and example.com/dir1/dir2/dir3/file3.mp3 will I end up with dir1/file1.mp3, dir1/dir2/file2.mp3 and dir1/dir2/dir3/file3.mp3? [18:17] with pathdepth=2 yes [18:17] um, no [18:18] it currently takes the last 2 parts of the path [18:18] maybe that should be --pathdepth=-2 and --pathdepth=2 should *skip* the first two parts? [18:19] That would be more useful [18:19] At least, in my usage scenario [18:20] Because I just don't want the ia6xxxxx.archive.org/16/items/ bit, but I want everything else [18:20] (which may be an arbitrary length [18:20] ) [18:26] It turns out that I can't rsync to the badcave server anymore. Is this a temporary thing or has my access been removed? [18:26] s/badcave/batcave/ [18:27] underscor: done [18:28] | depth > 0 -> frombits $ drop depth [18:28] | depth < 0 -> frombits $ reverse . take (negate depth) . reverse [18:28] schweeeeeeet! [18:28] swebb2: Batcave is offline, we're moving to fos/fortressofsolitude [18:28] Have to talk to SketchCow for a new module [18:28] (I dunno if he's doing them yet though) [18:32] ok. Can someone just email me my new creds or something when they're ready? [18:41] Poke SketchCow [18:41] He's the gatekeeper :) [18:42] Or just drop your email in here and we'll get it to him [18:42] s/email/email address/ [18:44] Whaaa [18:45] I just pinged him. [18:46] Basically, I'm having people verify with me which directories are done, so I can start pushing them along. [18:47] Okay, I see [18:48] SketchCow: Any news about the umich data? [18:48] IT's syncing, but REALLY slowly. [18:49] It's only down to C. [18:49] Oh, that's not fast. I'll be patient then. [18:50] SketchCow: How much space does fos have? [18:50] Just curious :D [18:50] I'm not letting you on fos. [18:50] I'm going to find you your own machine. [18:50] Right now, though I'm finding enormous slowdowns on fos and I am worried I am sharing a cancerous baremetal host. [18:51] Yeah, I was noticing that [18:51] (looking at the stats in the dom0) [18:52] SketchCow: Oh, okay. That works too, I suppose :D [18:52] The new machine is aprt of a new breed and I do not know the story. [18:53] Yeah, it's one of the new ganeti fai machines [18:53] (andy's been teaching me the infra) [18:54] What's strange is neither the VM nor the baremetal is spending a significant amount of time on iowait [18:54] We kept seeing that on the dev vm [18:55] Why aren't you asking andy for a machine? [18:55] That's all I'm going to do. [18:55] Oh, you're already running as cache=none, too [18:55] That's something I would have suggested to increase performance [18:56] I could also go "Why are you discussing internal archive.org infrastructure on a public channel on EFnet" but I gave up on that thread. [18:56] Well, I'm just some outside stranger [18:56] I figured you had more credance with him than I [18:56] I'm louder than you, yes. [18:56] Well, you're also an employee, unlike me [18:56] SketchCow: I haven't said anything that's not already publicly available knowledge [18:57] Tell yourself that. [18:57] But I'll quiet down about it then [18:57] Tell myself which? [18:58] Tell yourself that constant, consistent dumping of information on the operation of archive.org's machinery and methodology for its operation and remote manipulation will have no far-reaching consequences. [18:59] Also, have some pie [19:01] Alright, touche' [19:01] You're right, nothing good can possibly come from it. [19:02] You're young, and you're happy to be seeing the parts of the engineering that impresses you, but the ease of use that inevitably comes from some of the choices in the system, which date back to when literally 1 or 3 people had total knowledge of the environment and responsibility for it, which do not scale to univerally being known. [19:02] ...are not good to drop. [19:04] I see. Thanks. [19:04] Also, I confess that the bathhouse wasn't clothing optional [19:05] And there is actually no "rub room" [19:06] I hit a steam room in Helsinki, those guys know how to run a steam room [19:08] indeed. crazy find and their birch branch flagellation [19:08] haha [19:08] finns even [19:08] Our finn host told us to do that [19:08] We go to the sauna, and the guy who runs it looks at us, and goes "... first time?" [19:08] our finn host told us not to drink alcohol in the really hot room, or we could die. [19:08] "you don't need to do that." [19:09] Also, the snow thing was awesome [19:09] The guy who runs it has the demeanor of hourly motel clerk [19:09] was there glory holes? [19:09] oh, you went to the helsinki ice bar? [19:09] And when we were there, my lady has the steam room to herself [19:10] The ladies steam room [19:10] and he said it was fine for me to go up there [19:10] so we had a steam room to ourselves for a while [19:10] tmi [19:10] No, not the ice bar [19:10] I mean the hanging out outside and rubbing snow on yourself thing [19:11] yeah, in summer you just have to jump in the lake and swim as deep as possible, not as nice [19:12] http://www.flickr.com/photos/mirka23/6849416475/in/photosof-textfiles/ [19:17] I was told this was the best sauna, but I bet it was the most authentic [19:17] Wood-fired steam, etc. [19:19] looks suspiciously like a train [19:19] oh, slideshow [20:33] underscor: how's it going? [20:36] I have a new command pushed for you (or soultcer) [20:37] git annex rekey --force file1 SHA1-sNNN--XXX file2 SHA1-sNNN-XXX ... [20:37] sweet [20:37] the NNN must be size in bytes, the NNN is the sha1 of course [20:38] I'd recommend adding all the urls in one command, and then immedaitly rekeying them in one more command. That will be most efficient. [20:40] One repository for all Archiveteam projects sounds pretty awesome [20:43] er, the XXX is the sha1 , NNN is size [20:44] underscor, I don't understand, does having only your current machine actually prevent from doing something on the poor IA servers? :) [20:44] aka where are you uploading all this data to :p http://abuie-dev.us.archive.org:8088/mrtg/networkv2.html [20:44] git annex rekey --force filea SHA1-s$asize--$asha fileb SHA1-s$bsize--$bsha ... [20:47] so long http://www.archive.org/details/mypodcast-dbandit