[01:50] Wheee [01:50] Counting objects: 66451 [01:50] Ï git push -u origin git-annex [01:50] That's a lot of objects [02:06] can I see the script you're using for import? [02:21] closure: http://hastebin.com/getafahuhu.rb [02:21] There's a bunch of extra cruft in there, cause I just modified an existing script [02:21] (that actually downloads the items and stuff) [02:26] ok.. if you can change this to pass all the urls to git-annex addurl --pathdepth=2 , it will dramatically improve both number of ojects and I think speed [02:37] Ok, let me try it [02:37] Wait, pathdepth=2 or -2? [02:37] =2 [02:37] for from the end [02:37] I modified it to just echo >> /tmp/todo and was going to run git-annex on that with xargs [02:38] I want to drop the http://archive.org/download piece from http://archive.org/download/identifier/this/could/be/arbitrarily/deep/but/I/want/all/of/it [02:38] but that's crap, I don't write ruby :) [02:38] yes =2 drops 2 [02:38] leaving the end [02:38] okay, cool [02:38] thanks! [02:40] closure: --fast and --pathdepth=2 will work together right? [02:40] yes [02:41] and it should be like git-annex addurl --fast --pathdepth=2 http://url1 http://url2 http://urlN ? [02:41] sounds perfect. [02:41] \o/ [02:41] Testing now [02:41] you will need xargs, I'm seeing 6000+ archiveteam files [02:41] 10430 /tmp/todo [02:44] Skipping FRIENDSTER-121000000 because its parent is archive-team-friendster and we want archiveteam (it will be gotten later) [02:44] huh [02:44] Well [02:44] It used to recurse [02:44] That version doesn't [02:44] Because I don't want it to [02:45] If you want to recurse, there's a AND NOT mediatype:collection bit further down [02:45] Just delete that [02:45] Basically, the search result for collection:archiveteam returns both FRIENDSTER-121000000 and archive-team-friendster [02:45] So if you download FRIENDSTER-121000000 [02:46] then you re-download it when you recurse into archive-team-friendster [02:46] if you're actually downloading :) Whereas for git-annex, an extra copy of friendster is just some symlinks to the same content :) [02:46] (because the hierarchy goes web->archiveteam->archive-team-frienster->FRIENDSTER-121000000) [02:47] Well, yeah [02:47] But for my purposes, I wanted it like that [02:47] Feel free to modify it [02:48] wow, I've never seen git-annex addurl stop in the middle to flush its buffer :) [02:48] haha [02:49] 1/mhi [02:49] that's cool [02:49] whoops [02:49] closure: Have you tried cloning any of the repos I'm putting on github? [02:49] I don't have another box to easily test on, and I wanted to see if they work properly [02:49] I looked at the first one. looked fine except for the slightly inneficcient history this is fixing [02:51] btw, I hope archive.org is ok with thousands of HEAD requests that git-annex is doing [02:52] They haven't yelled at me yet [02:52] ;) [02:52] git-annex: unrecognized option `--pathdepth=2' [02:52] Oh, wait [02:52] It would probably help if I rebuilt it with the latest source [02:52] haha [02:54] closure: With the old way, was each one of those kinda making a commit? [02:54] those calls* [02:54] yes [02:54] Oh wow [02:55] No wonder they grew gigantic [02:55] git log --pretty=oneline git-annex [02:55] 1f1f8afa5c9e05384c9f18bc083714eccbfced15 update [02:55] 32f6da243bd30acbf593e14dba10bc88d4d7708a update [02:55] 352fd3f25eb3cd09e0e002c9c7c8f780b891c1b7 update [02:55] 4b648ce8a4a3d981aed7d0e8d227bd96cf627b63 update [02:55] 7e6c240fd019844e7f27fc5999c9ab1ff7fb11b2 update [02:55] a7ebc8dc64e6509353e4014e5f709e14a8ab9451 update [02:55] ce87583b6704aa62f20d08f66d744bcb105053dc update [02:55] 0578e985dc2ad17e092f547e8c9c98bcce5b11cb update [02:55] 3161b91f597dbc6a6c2a2d75a9b9fe10552d6683 update [02:55] c2cd683c618cf4b55904edcec97b571ea7bb4f26 branch created [02:55] now the whole import will be like that [02:55] a few commits still made [02:55] but far fewer [02:56] you could even remove those by running: git config annex.queuesize 102400 [02:56] default is 10240 [02:57] Hmm, so would it be better to do an addurl --pathdepth for each item, or for all the items together? [02:57] The latter would be a gigantic list [02:57] Since each item can have from 4 to 10000 files [02:57] s/4/3/ [02:57] I ran: cat todo | xargs git-annex addurl --fast --pathdepth=2 [02:58] Okay [02:58] Thinking how to rubyize that [02:59] I mean, I guess [03:00] system("echo '#{urls}' | xargs git-annex addurl --fast --pathdepth=2") [03:00] where urls is just a massive string containing all the URLs [03:01] heh, I imported archiveteam, got a 321 mb .git/ .. ran git-gc and it's 66 mb [03:01] 66719 objects total [03:01] garbage collect? [03:02] this is still a bit inneficient, I have a plan to nearly half it [03:02] eventually [03:05] damn, nice [03:06] So, should I run git-gc on all the ones I've done, and repush? [03:06] (to optimize them) [03:06] I'll bet github does it for you [03:08] Oh yeah, probably [03:11] oops [03:15] underscor: pull and rebuild, I made the 50% improvement (or so..) change [03:16] Ooh, nice [03:16] What is it? [03:17] more efficient location used to store the urls in the git-annex branch [03:17] I see [03:17] I had not anticipated every freaking file having an url, so it was sorta inneficient [03:17] url url url url [03:19] url url url url [03:19] closure: Welcome to #archiveteam, where we like to push edge cases! [03:19] x 100,000 [03:20] welcome? I archived geocites dude [03:20] abusing our tools until the authors come in and scream STOP STOP STOP [03:20] closure: I know, just teasing :) [03:20] chronomex: hahaha [03:20] ooh oldtimer fight [03:20] I missed geocities :( [03:21] I was like 15 though when it happened [03:21] Yahoo Video was my first project [03:21] it's always discncerting to ride the bus at night and go "wait, where the fuck am i" [03:21] I think my one-year-anniversary coming up! [03:22] chronomex: uh oh [03:22] I wish I were still a youngun with time to burn [03:22] but no money [03:22] hmmm maybe this is ok. [03:23] closure: If I run addurl again with the same url, does it add bloat? [03:23] or does git-annex just go "hey, this is exactly the same. don't need it!" [03:23] underscor: should add no bloat [03:24] cool, thx [03:24] but to get my efficiency fix, you need to start fresh [03:25] okay [03:25] This is a fresh repo, I was just wondering what would happen if I ran it again [03:25] Wow [03:25] This is soooooooo much faster [03:25] Jeez! [03:26] yeah, isn't optimisation fun [03:26] Very [03:26] This is fantastic [03:26] * underscor awards closure [03:26] it'd be a *lot* faster, but it still has to HEAD every url :) [03:28] Yeah [03:28] Although I'm internal, so at least network isn't a limiting factor [03:28] :) [03:33] they seem to have throttled me after 15 thousand HEADs [03:33] ... [03:33] hahahaha [03:35] closure: It's normal to sit at (Recording state in git...) for a while, right? [03:35] yes, there's a git commit happening there [03:35] Okay [03:36] A git commit of 10240 new things [03:36] :D [03:42] jesus, algathafi-org never ends. [03:42] die you bastard! .. oh wait [03:43] hahaha [03:43] hahahahahaha [03:43] that's great [03:48] Wow, fast push now too [03:48] closure: <3 [03:50] btw, I lied, it does bloat slightly if you re-run addurl with the same url, because it records it had the url at the new time [03:50] so if you were thinking about rerunning to update, you should make it skip [03:52] 41878 objects with my fix.. pretty good [03:53] maybe I hear a call for a smarter version of wikiscraper that generates a git repo directly. [03:53] worthwhile? [04:05] That would be cool [04:05] closure: Okay [04:05] Actually, I should just be able to use git annex fsck to check if any files have changed [04:05] Then redo any that error [04:06] well, I was thinking re-run to find new files [04:06] but NM, I removed that bloat on url re-add [04:07] Oh, really? [04:07] hahaha [04:07] you're so amazing [04:19] looking forward to seeing the archiveteam collection on github [04:19] I'll fire that up next [04:19] Put it on its own 2tb drive, hehe [04:20] That's going to be a GIGANTIC commit, ha [04:20] Actually, multiple commits, won't it? [04:20] that's the one I was running, it's a few [04:23] This is mesmerizing... @_@ [04:25] Thanks for crashing, xchat [04:26] http://i.imgur.com/IZA7D.png [04:26] ^ That's mesmerizing :D [04:27] such a small res [04:28] yay netbooks [04:28] yeah [04:28] This is an acer iconia tab w500 [04:28] it's pretty small, 11" [04:29] 's really all I need in this machine [04:29] It's basically my terminal to all my other machines [04:29] as well as my onenote machine in class [04:29] I don't usually like windows, but onenote is <3 [04:29] s/windows/microsoft products/ [04:30] underscor: +10000000 [04:30] I do everything on my dell mini 9 .. 1024x600 [04:30] your img scrolls :P [04:30] i have fucking xp running in a vm all day long becuase of onenote [04:30] i can't live without it [04:30] closure: hahaha [04:30] i'd murder a man to have it on osx [04:30] kennethre: I KNOW! [04:30] onenote is so amazing [04:30] :D [04:30] it really is [04:30] 3840x1200 here [04:30] must unappreciated product of all time [04:30] NovaKing: two screens? [04:30] ya [04:30] getting 3rd soon [04:30] It's not even that well known :( [04:30] It should be [04:31] It's the best program in the entire suite [04:31] there's a little mini community, luckily [04:31] yeah by far [04:31] without a doubt [04:31] i'd kill for an super small tablet w/ a wacom pen for it [04:31] but there aren't any [04:31] so vm it is [04:33] Yeah, I love the onenote community [04:33] What's funny is everyone I show it to is like "Woah, I have 100000 uses for that" [04:33] If I get bored with all the other stuff i'm working on, I'll make a onenote for windows [04:33] I noticed MS has been running TV commercials for onenote lately though [04:34] oh yeah? [04:34] Yeah [04:34] It integrates with windows mobile 8 [04:34] *I'll make a onenote for os x [04:34] nice [04:34] hope they didn't ruin it [04:34] i'm still running 07 i think [04:34] So if your wife creates/modifies a grocery list on the computer [04:34] It syncs instantly to your phone [04:34] you see changes live [04:34] It looks pretty cool, at least [04:35] Onenote 2010 is really nice [04:35] I like the OCR on scans [04:35] I'm always the first one to find worksheets and stuff from like the beginning of the year [04:35] :D [04:36] (Recording state in git...) [04:36] closure: addurl 2009-archiveteam-geocities-part6/geocities-H-e.7z.257 ok [04:36] Almost done! [04:36] :D [04:45] and the server melts [07:10] The startup’s developer platform is going to be shut down, with maintenance and support continuing through March 2 2012 — all data will be deleted after that. Developers can find a FAQ on transitioning their data here. [07:10] http://developer.hyperpublic.com/transition/transition-faq/ [11:34] anyone got a script ready to extract all the video IDs from a youtube playlist? i fail to do this because of all the javascript rubbish [11:40] Schbirid, http://www.textfiles.com/videoyahoo/SCRIPTS/youtube-dl ? [11:40] I see mention of playlists, dunno what it does [11:41] ha, you rock [11:42] with -s PLAYLISTURL you get the IDs nice [11:48] hm, i should also archive the youtube html pages [12:57] heya, whats the biz for today/tonight? [14:51] is upload via archive.org's s3 interface kinda slow or am i donig something wrong [14:52] ~200kilobytes/s is not much fun :] [15:06] Schbirid: youtube-dl surely do rock at everything youtube related besides the captions [15:06] there's a seperate program for captions though ^_^ [15:07] i use get-flash-videos for the actual downloading [15:07] cant remember why [15:15] Schbirid: It's slow [15:15] If you use a client that supports multipart uploads, you'll get MUCH faster speed [15:15] (they just implemented that on the IA side) [15:16] i used curl as in their examples [15:27] oh, okay [15:27] then no [15:28] sorry :( [15:28] closure: Is there a way to tell git-annex to ignore errors like this? [15:28] addurl TheGoalkeeperAndTheVoid/goalkeeper_and_void.avi git-annex: unable to access url: http://archive.org/download/TheGoalkeeperAndTheVoid/goalkeeper_and_void.avi [15:29] (403 Forbidden) [15:30] btw [15:30] https://github.com/ab2525/ia-archiveteam [15:30] :D [15:32] underscor: doesn't it just continue past the fail? [15:32] Not if I'm xargsing all the urls into it [15:32] underscor: what clients would support that? [15:32] does it stop xargs? [15:33] it should continue past the error, do the rest that xargs gave it, and then exit nonzero [15:34] underscor: ia-archiveteam you need to push the git-annex branch [15:34] Yeah, it's still compressing [15:34] Heh [15:34] Schbirid: http://s3tools.org/s3cmd-110b2-released [15:35] closure: http://hastebin.com/cojiwuwefi.avrasm [15:35] http://hastebin.com/wopanesibi.avrasm [15:36] There's the rest of the file list [15:36] ok, so it throws a real error [15:36] will fix [15:37] underscor: nice, thanks [15:38] np, Schbirid :) [15:38] closure: great! [15:42] I wonder if github is like "what the hell are you doing?" yet [15:42] 21 new repos in 2 days, haha [15:44] underscor: fixed [15:44] awesome, as always [15:50] closure: archiveteam push finished, fyi [15:52] nice, 33k objects [15:52] visible annex keys: 11740 [15:52] visible annex size: 2 terabytes [15:53] niiice [15:59] :D [15:59] That's awesome [16:00] closure: [16:00] addurl The_Vandhaal_Clash_with_the_Gibichungen_directors_cut_04/The_Vandhaal_Clash_with_the_Gibichungen_directors_cut_04.thumbs/Clash_nS_001260.jpg git-annex: The_Vandhaal_Clash_with_the_Gibichungen_directors_cut_04/The_Vandhaal_Clash_with_the_Gibichungen_directors_cut_04.thumbs/Clash_nS_001260.jpg already exists [16:00] addurl dcme_promovideo/dcme_promovideo.thumbs/dcme_promovideo_000030.jpg git-annex: dcme_promovideo/dcme_promovideo.thumbs/dcme_promovideo_000030.jpg already exists [16:00] addurl militia2_insane-quality/militia2_insane-quality_files.xml git-annex: militia2_insane-quality/militia2_insane-quality_files.xml already exists [16:00] addurl warthog_revisited/warthog_revisited.thumbs/warthog_revisited_000090.jpg git-annex: warthog_revisited/warthog_revisited.thumbs/warthog_revisited_000090.jpg already exists [16:00] git annex failed :( [16:00] (and it stops) [16:00] repeated run I assume [16:00] Yes [16:01] I thought that was okay? [16:01] Or do I need to pass a different flag? [16:01] yeah, not with the new --pathdepth [16:01] can be fixed I'm sure [16:01] okay, cool [16:01] Ooooh [16:01] known annex keys: 401 [16:01] known annex size: 14 terabytes [16:04] https://github.com/ab2525/ia-archiveteam-yahoovideo if you want to play with it [16:04] (git-annex branch still pushing) [16:04] ok, done [16:08] ok, fixed [16:14] gracias [16:33] underscor: s3cmd works easy but i cannot find any hint as how to use that multi-part stuff [16:33] oh wait [16:33] duh [16:33] i installed 1.0.0 ;) [16:39] hehe [16:40] it should automatically kick in for files >15mb [16:40] (once you have the beta) [16:40] it does, says "part 1 of 7" but it does upload that with just 100kilobytes/s [16:44] hm, finding nothing about simultanous uploads or so in the man page [16:44] but usage is nice [16:46] oh, hm [16:58] SketchCow: http://www.archive.org/post/412199/please-delete :( [18:18] eek, with that s3cmd and multipart upload the parts show up at the item's detail page during upload [18:20] BACK [18:20] Hey, they DID delete it. [18:20] Weird. [18:21] yeah, especially since it is still live at jamendo [18:22] so by definition it should be free to share [18:24] I'm going to go investigate. [18:26] SHow me the original on jamendo if you could. [18:28] No, wait, I found it. [18:28] OK, that was done by the most awesome guy ever here, and so I don't want to just undo it. [18:28] I'm going to ask him if there's a missing piece here, otherwise I will put it up again. [18:28] It's not deleted, you understand. [18:31] :) [18:51] Jeff works so so so so hard. [18:51] He's the true librarian of archive.org. [19:26] Jeff is amazing [19:26] archive.org wouldn't be nearly as smooth as it is without hi, [19:26] s/,/m/ [19:35] http://www.videolan.org/vlc/releases/2.0.0.html [19:38] native 10bit support will be useful [19:38] i don't think i'll use it but it will stop people from asking [19:41] it'a amazing how Jeff replies to ever weird problem I submit :-O [19:46] Jeff's an awesome guy [19:46] Fun to work with [19:47] "New video outputs for Windows 7, Android, iOS and OS/2." [19:47] Amazing that they still support OS/2 [19:47] haha [19:47] yeah [19:47] my eyes skimmed past that because i'm so used to seeing OS X [19:48] Also amazing that there are new outputs for it [19:48] Video APIs that no one discovered for 15 years? [19:49] This is cool for those who need it "Continued support for X 10.5 and PPC users (1080p and ProRes on Dual-G5!)." [19:49] yay open source~ [19:49] as long as there's a need for it someone will code it [19:49] Open source sucks and open source rules [19:50] It rules for big projects and sucks for small [19:50] how does it possibly suck for small projects? [19:50] Small projects tend to be abandoned and buggy [19:50] they wouldn't be any better if they were closed source. [19:50] small projects are small projects [19:51] Maybe they would be slightly better if someone knew they could make half a living out of it [19:51] maaybe [19:51] that's their choice though starting out. [19:52] i think someone that explicitly starts an open source project isn't looking for money, although it's always an incentive [19:53] That's what I mean. "Ok, I'm probably never going to finish this program, but I'll just leave it here in case someone else will." [19:53] better than ``I'm tired of working on this program, but since I want to sell it some day, I can't give the source away to someone that would work on it" [19:54] in essence, you never know the amount of abandoned closed source projects [19:54] if there's a program i like, but it's half-finished, at least it would help to have the source [20:16] nrghgrnn, this will take 5 days to upload 45GB