[00:13] yipdw ive looked, and havent found any way out of the localhost:4001 error, and i reinstalled rvm and now its completely fubared, so, in spite of my nagging that got this project going again, i cant actually help with it, not sure if tat made it, so here it is again [00:27] bsmith094: sounds like your system is broken [00:28] could it possibly be those 6 bad sectors on this hd [00:28] oh, your having computer problems too? [00:29] * PatC joins the computer screwing up club [00:41] bsmith094: I don't know very much at all about your system configuration, so, yeah, it's something you're probably going to have to solve on your own [00:41] that said, I doubt bad sectors are the problem [00:42] the current issue seems like you installed a package that rewrote some routes [00:42] what files would i send to who, where, to fix this [00:42] it may help to review your package history [00:42] when things stopped working [00:42] and figure out what changed [00:43] is therea quick rollback option, on the history, cause that would be really useful for just these types of probolems [00:52] none that I know of [00:52] I have been running the crawler, though, so I do have a list of story IDs [00:52] I'm not entirely sure when it'll finish [01:02] HUZZZAHHHH [01:02] home now. [01:02] okairi [03:58] yipdw: so i guess ill help with that or something, bash would probably be much easier for that part, and now i really have to g to bed, i have acold gnight [05:14] fantastifc [05:14] 61003) "/Batman_and_Glee_Crossovers/50/4947/_cache_control" [05:17] hahaha [05:22] fellas, now that splinder is over, are we still going to do memac? [05:23] Did we finish splinder? [05:23] We done? [05:23] How big is it? [05:23] dashboard says 1338175 users and 1,084G [05:24] 0 remaining to fetch, rsync to batcave is in progress [05:26] chronomex: is memac mobile me? or something else? [05:26] arrith: yes, that's it [05:27] Oh yeah, I have my internet back now so I need to rsync all my data [05:28] has anyone heard about the BerliOS transition and if anything is still at risk? [05:30] Cameron_D: and there goes my speed. [05:31] It's only 85gb, it'll only take a month :P [05:32] I think there's still some amount of Splinder to do; it's just really huge and/or difficult profiles that remain. [05:33] yes [05:33] not sure how much is uploaded to batcave so far, but somoene still needs to run the check scripts against the profiles [05:33] yeah. I'm wrapping up some of my ass profiles. [05:33] My computer is still busy with files.splinder.com. [05:45] uploading the lachlan cranswick AFK site pull [05:48] though based on my findings, the IA crawlers would have no problems with it. the only blocked items in the robots.txt file are the auto-generated web usage reports pages, which become a crawler trap [05:50] and that's now on batcave [06:07] I am drowning my sorrows at (temporarily) losing the french magazines by downloading 200gb of magazine PDFs [06:08] Also, batcave is actually kinda starting to fill up. [06:08] I'll be clearing that thing out into the main archive all this week. [06:12] SketchCow, I need an upload slot for my Splinder and Anyhub data [06:16] OK, one moment. [06:17] I think I'm about to import the entirety of the JAMENDO collection into archive.org. [06:17] DEVASTATING [06:17] 100,000 music albums. [06:17] jesus [06:18] * chronomex cries [06:18] what happened to the french magazines? [06:18] They're offline while I have the single stupidest negotiation I've had yet. [06:18] And by offline I mean dark - they're all there, saved. [06:22] Anyway, so I need to write a script that generates all the right files for an album [06:22] oh, well.. better than accidentally deleted [06:22] And then uploads said album. [06:22] It's going to be awesome. [06:22] Yeah. [06:22] i was afraid it was a "rm *" or something [06:22] That can't happen. [06:23] due to backups or? [06:23] The issue right now is the scanners are, now stick with me here, the scanners are raising hell because we've stolen their scans. [06:23] Now, yes, I can sit here and use thor's hammer, but it's a little fun to negotiate with them, figure out how they tick, etc. [06:23] I've already grabbed all their issues, I'm not worried. [06:24] But I have to let this play out, see how it goes. [06:25] ah [06:25] that album thing sounds like a fun project [06:26] Mostly absorbing, coding, and putting up. [06:26] I think I need to generate a files.xml. [06:26] You should just be able to use stub_files=1 [06:26] Because archive.org will need it to be track_00.mp3, track_01.mp3, etc. [06:26] (or are you using s3?) [06:26] I want to preload. [06:26] I'm using s3. I always use s3. Everything else is horseshit. [06:27] Nothing else scales. [06:27] contrib_submit scales [06:27] I am about to upload well over 1,000,000 songs. [06:27] Rails scales [06:27] You submit tasks directly into the system, instead of through the s3 layer [06:27] er, wait [06:27] No, contrib_submit requires access to the internal machines. [06:27] It requires you to be an admin, and to be internally connected to the system through the internal user machine. [06:28] It requires a machine in *.archive.org to submit the initial request [06:28] yipdw: but is it webscale? [06:28] Well, yeah, but I mean, *you* can use it [06:28] I don't want to. [06:28] I want to make it easier for PEOPLE to use the system. [06:28] I want to generate scripts PEOPLE can use. [06:28] How goes the scanning, chronomex [06:28] Okay, sorry. [06:28] Will s3 allow you to send your own files.xml? [06:28] I didn't think it did [06:28] Also, clean out /1 or something. You ate it [06:29] And you're making your move on /2 [06:29] I know where this is leading [06:29] /2 is friendster, splinder, and anyhub almost entirely [06:30] so even batcave is using the same /n/ stuff the rest of the ia nodes use? [06:30] /1 is freesound.org, which I'm almost done with [06:30] and then will be ingesting 5,000,000 sound clips in [06:30] Yeah, batcave is just a l'il one [06:30] SketchCow: my scanning is going okay. I'm alternating between scanning and restoring my mainframe. [06:30] If you want me to move my finished friendster/splinder/anyhub/memac stuff somewhere else on batcave, let me know [06:30] chronomex: Excellent. [06:30] There's nowhere else to move it in batcave. [06:31] Batcave is just somewhere near 80 percent full at this point. [06:31] I'm about to make it a lot less. [06:31] 81%, not a bad guess! [06:31] Ok, and I'll work to get /1 significantly more free tomorrow [06:33] I'm about to upload some BEAUTIFUL magazines up to archive.org. [06:33] All of which will go dark. [06:34] I need to finish my 16mm film scanner project. and I need to get a book scanner put together [06:35] i also need be less lazy and figure out what I need to do for the film scanner project, as I haven't touched it in over a year [06:36] if you ever get the film scanner scanning, I have some neat films [06:36] just a couple tho [06:36] I've run one or two of them through a telecine machine because I don't have a proper film scanner [06:36] archive's got one. [06:37] We can always arrange it. [06:37] I've got a bunch of stuff I haven't really looked at yet out in my storage unit [06:37] SketchCow: orly, okay. [06:37] SketchCow: I knew that, I suppose I just assumed that you weren't interested in returning films once they were scanned [06:38] i kinda would like to have a transport like that muller has, but I want to use a different camera [06:39] my current scanner project is using the frame and movement from an old elmo SL-16 [06:40] That would be wrong, we return them all the time. [06:42] good to know. [06:43] I've got some navy and telephone company electronics training films [06:43] ooh [06:43] Seriously, these magazine packs you see, like the ones for contemporary magazines? They're just stunning. [06:44] I assume they're grabbed at the intermediate stage of printing. [06:45] example? [06:45] http://www.demonoid.me/files/details/2798568/008009567016/ [06:46] nowadays magazines publish in ebook form for kindle etc. too [06:46] hmm. those are nice. [07:01] OK, uploaded an album. [07:03] http://www.archive.org/details/jamendo-016966 [07:03] Not bad [07:04] Now, to program a script to do it 100,000 more times. [07:04] simple enough [07:04] something that I have kinda wanted to archive is remix.nin.com [07:04] Let's look into that [07:05] But give me a moment while I make 100,000 albums get into the queue. [07:05] Somewhere, a sysadmin is about to explode. [07:05] I have a pretty good idea of where [07:26] Still working on it. Well along. [07:37] http://www.archive.org/details/jamendo-016963 [07:46] OK, it works. [07:46] So I guess, let's do it 10,000 times and see what happens. [07:50] http://www.archive.org/details/jamendo-016963&reCache=1 [07:50] Just made it so it links back to the original album. [07:50] Good citizen, etc. [07:51] Well, there it goes, first 10,000. [08:04] Awesome, archive.org does not consistently succeed at uploads currently. [08:05] SketchCow: that reminds me. the deriver currently fails on tiffs with their pixels arranged in a certain way. to whom should I direct my bug reports? [08:06] info@archive.org [08:06] ok [08:10] http://www.archive.org/details/jamendo-000028&reCache=1 [08:10] Ah, there we go. [08:17] I added 20 retries. [08:17] I tell you, why would you even pay for music [08:17] When you have over 500,000 tracks here [08:20] yeah really [08:20] fuck that [08:29] neat. [09:53] Man, s3 is NOT enjoying this [09:53] I can't get more than one album uploaded without stopping and restarting. [09:53] SIT DOWN AND TAKE IT [09:53] oooooooo.... [09:53] I just figured it out. Motherfucker. [09:54] ? [09:54] are you having it derive too? [09:54] Could be wrong. [09:54] Yes. [09:56] Got it. [09:56] I was fucking WONDERING why it was always going to shit on the second track. [09:56] It needs to wait for s3 to push through. [09:57] hm? [09:57] I push in anywhere from 1 to 20 tracks into the same item. [09:57] I think it hates that. [09:57] aha [09:57] For the first one, it needs to make the item. [09:58] I suppose I COULD write something to fire through and make all the covers go online, as fast as possible [09:58] OR, I could just open 10 parallel entries. :) [09:58] Yeah, see, making it wait 120 seconds made it work. [09:59] Also, these first albums? These are really nice goddamned albums. [09:59] I think that was on purpose. [09:59] just maybe [10:00] http://www.archive.org/details/jamendo-000050 [10:08] There we go, three threads going. [10:08] Soon to be more [10:51] The race is on, now I'm slamming out data as fast as it's coming in. [10:52] IN OUT IN OUT IN OUT IN OUT [10:52] like a fucking STEAM ENGINE PISTON [11:18] Five simultaneous uploads of that music, plus one stream uploading friendster. [11:22] F'it, making it six. [13:38] http://ukwebfocus.wordpress.com/2011/12/11/the-forthcoming-demise-of-twapperkeeper [13:40] twapperkeeper.com/allnotebooks.php?type=&name=&description=&tag=&created_by= [13:43] http://libreas.wordpress.com/2011/12/09/twapperkeeper/ [13:56] Yeah [13:56] There's little we can do, they're really sewed up [14:23] emijrp: Pretty incredible collection on Jamendo. [14:23] I'm watching the crap file in, and I'm probably choking all sorts of queues doing it. [14:24] But it's, like, years of music. [14:24] Good work : ) [14:25] Did you use my script? I'm running it, still in 60000. [14:25] I believe so... the .py? [14:26] yes [14:26] Turn that shit off, man, it is handled. [14:26] I'm past 100,000 [14:26] Use your resources for something else. [14:26] I want to finish [14:26] OK [14:27] I will finish and then delete all. Or not. I will think about that. [14:27] OK [14:27] I am just trying to get space back on batcave at the moment. [14:27] So I am blowing out a ton of data. [14:28] I found a cool command line trick to sum all durations of videos in a directory http://www.commandlinefu.com/commands/view/3612/get-the-total-length-of-all-video-audio-in-the-current-dir-and-below-in-hms [14:28] It works for mp3 and ogg too I guess. [14:29] It says I haz 673 hours and 49 minutes of Spanish Revolution videos. [14:33] Analysis the Jamendo db dump to sum all songs duration http://developer.jamendo.com/en/wiki/NewDatabaseDumps [14:33] Analysing* [14:34] wtf is up with that awk alternative? xargs echo? [14:47] OK. Some stats using the last Jamendo xml db. [14:47] 336426 tracks. [14:47] 53843 albums. [14:47] IDs have gaps, that why last one is over 100,000. [14:48] (gaps = deleted albums) [14:48] 26738 artists. [14:48] ~2 albums/artist [14:51] Artists from 141 countries. [14:51] 'NPL', 'LKA', 'MUS', 'ESH', 'CCK', 'BFA', 'CMR', 'ARE', 'PAK', 'MSR', 'BGD', 'VGB', 'GMB', 'MLT', 'GIB', 'SLE', 'TUV'] 141 [14:51] 'THA', 'FIN', 'IND', 'MTQ', 'GTM', 'SPM', 'CYP', 'TUR', 'NIC', 'LVA', 'MYT', 'NOR', 'BTN', 'KGZ', 'AZE', 'SVN', 'GEO', 'HRV', 'TLS', 'JAM', 'SAU', 'KOR', 'LBN', 'ALB', 'GLP', 'CHN', 'PYF', 'GAB', 'CUB', 'NGA', 'PRI', 'PHL', 'EGY', 'GUF', 'GRL', 'VAT', 'PAN', 'SMR', 'TWN', 'IRN', 'MYS', 'NCL', 'BMU', 'DOM', 'VNM', 'CXR', 'ATF', 'BIH', 'AFG', 'GNQ', 'DJI', 'CIV', 'REU', 'MLI', 'MCO', 'ETH', 'SLV', 'HKG', 'TTO', 'MNG', 'ARM', 'ATG', [14:51] ['EST', 'ATA', 'USA', 'FRA', 'BEL', 'RUS', 'DEU', 'ESP', 'DNK', 'ITA', 'POL', 'PER', 'SVK', 'GBR', 'UKR', 'HUN', 'ZAF', 'JPN', 'ISR', 'CAN', 'HND', 'ARG', 'ROU', 'SRB', 'PRT', 'BLR', 'GRC', 'SWE', 'VEN', 'PRY', 'MEX', 'CHE', 'BRA', 'NLD', 'BGR', 'CZE', 'KAZ', 'SGP', 'LUX', 'MDA', 'COL', 'ISL', 'AUT', 'ECU', 'MKD', 'IDN', 'CHL', 'BOL', 'AND', 'IRL', 'SEN', 'CRI', 'LTU', 'MAR', 'TUN', 'AUS', 'SYR', 'FXX', 'COG', 'DZA', 'NZL', 'URY', [14:53] Top 25 countries by albums [14:53] [[4360, 'FRA'], [1874, 'ITA'], [1539, 'DEU'], [1440, 'ESP'], [1234, 'USA'], [818, 'RUS'], [544, 'POL'], [464, 'GBR'], [270, 'BRA'], [249, 'BEL'], [244, 'CAN'], [225, 'ARG'], [187, 'UKR'], [162, 'MEX'], [149, 'CHE'], [139, 'SWE'], [129, 'NLD'], [99, 'CHL'], [94, 'JPN'], [92, 'AUT'], [90, 'PRT'], [87, 'COL'], [80, 'GRC'], [76, 'FIN'], [74, 'AUS']] [14:55] Most used license http://creativecommons.org/licenses/by-nc-sa/3.0/ [14:58] And... [14:58] 971.575752315 days of music. [14:58] --- END --- [15:04] Excellent. [15:05] Well, I'm past 600 albums that have gone in. [15:06] Nearly three years of music, that's crazy. [15:19] you can have the track names in the files.xml using http://ia600803.us.archive.org/9/items/TokyoProse-SamuraiMusicOfficialPodcast5/TokyoProse-SamuraiMusicOfficialPodcast5_files.xml [15:21] <emijrp> By the way, the new IA mediaplayer doesnt work for me (Ubuntu). I click on the track and it is ok. [15:21] <emijrp> On the mp3 or ogg file I mean, one to one. [15:57] <Coderjoe> emijrp: better command line (for the total length of video/audio in a directory): mplayer -endpos 0.1 -vo null -ao null -identify *.avi 2>&1 |grep ID_LENGTH |cut -d = -f 2|awk '{SUM += $1} END { printf "%d:%d:%d\n",SUM/3600,SUM%3600/60,SUM%60}' [15:58] <Coderjoe> change wildcard for different file types and/or stick a find and xargs on it [15:59] <emijrp> why is better? [15:59] <Coderjoe> less commands needed (and thus less overhead) [15:59] <Coderjoe> the awk example was terrible [16:00] <Coderjoe> all that extra crap with the sed, "xargs echo", another sed, and bc which was all not needed [16:01] <Coderjoe> just need cut to get the value and sed can do the rest of the work doing the summation [16:01] <emijrp> submit to the website [16:01] <Coderjoe> already did. waiting moderation [16:01] <emijrp> cool [16:04] <SketchCow> Boy, blasting dubstep sure makes a decrepit 40-year-old seem much more hip than he is [16:05] <Coderjoe> the other examples also counted on there not being an audio or video output module named "dummy". if one at some point was added, it would no longer error out. better to use the null modules and tell mplayer to exit quickly [16:27] <SketchCow> BRUTAL. It's adding 100 albums every 15 minutes. [16:28] <emijrp> Meh. I saw faster Petaboxes. [16:42] <DFJustin> jamendo music making monday morning at work suck a little less [16:43] <chronomex> I'm about to go visit the phone museum's storage unit ... it's ~10 thousand square feet. [16:43] <dnova> where is that? [16:43] <chronomex> seattle [16:44] <dnova> damn. I didn't know about a phone museum and I was there a couple years ago [16:44] <chronomex> damn. [16:44] <chronomex> I've been to the other major phone museums in the usa, this one is the best of them [16:45] <chronomex> haven't been to any outside the usa tho [16:47] <chronomex> the next best (in my opinion ...) is in Maine [16:47] <emijrp> Newton published his works under CC-BY-NC http://yfrog.com/z/ntkk1kp [16:47] <emijrp> 3.0, not 2.0, you heard. [16:48] <chronomex> emijrp: well, good thing he's been dead for a while. [16:51] <emijrp> That is pecata minuta compared with Dead Sea Scrolls digitization, which is not Creative Commons at all. [16:52] <chronomex> it should be marked as public domain, emijrp [16:52] <emijrp> Obviously. [16:52] <chronomex> so what is it marked as? [16:53] <emijrp> The " " "digitization effort" " ". [17:01] <Coderjoe> you'd think so, but researchers (or their institutions) are assholes [17:06] <emijrp> ORLY. [17:07] <Coderjoe> if they hold the copyright, they can get money on licensing it out and all that crap [17:13] <emijrp> " " " hold " " " [17:24] <Schbirid> SketchCow: emijrp told me you are upping http://www.archive.org/details/jamendo-albums ? the filenames seem broken to me. and are the vorbis files derived from the mp3s? any chance to get the original high bitrate vorbis files up instead? :) [17:25] <Schbirid> oh and a hint about linking to jamendo: instead of eg linking http://www.jamendo.com/en/album/001290 use http://www.jamendo.com/album/001290 (no preselected language) [17:26] <Schbirid> last but not least, be aware that albums with the same id can be "re-released" with a new date if the artist changed something [17:33] <emijrp> Do you know if Archive-IT archive videos? http://wayback.archive-it.org/2648/20110531164037/http://www.youtube.com/watch?v=Zx5giakLhP8 looks broken [17:36] <ersi> Uh, who runs Archive-IT anyhow? [17:36] <yipdw> FYI, if anyone wants to actually start saving some stuff: http://depot.ninjawedding.org/fanfiction.net_story_ids_2011-12-12.gz [17:36] <ersi> Oh, it's IA as well [17:36] <yipdw> and by "stuff" I mean "some of the weirdest shit on the internet:" [17:51] <emijrp> There are a fuckton of Creative Commons videos on YouTube. [18:02] <DFJustin> special characters in the album titles aren't coming in, http://www.archive.org/details/jamendo-005199 vs http://www.jamendo.com/en/album/005199 [18:05] <emijrp> And som codification errors in readme.txt files http://ia600708.us.archive.org/24/items/jamendo-005199/Readme.txt [18:06] <db48x> there are no special characters [18:07] <db48x> emijrp: the readme looks fine [18:08] <emijrp> This 'Readme' file is available below in: English, Français, Italiano, Español, Deutsch, Polski, Русский, Português. [18:08] <DFJustin> that's just utf-8, switch your browser encoding [18:08] <db48x> emijrp: yea, that's just your end [18:08] <emijrp> yes [18:09] <emijrp> but i read pages with accents everyday [18:09] <emijrp> while does it fail with that file? [18:09] <emijrp> why* [18:09] <DFJustin> it's a .txt file so there's no encoding declaration like there would be for an html page [18:09] <db48x> yea, it just gets sent as text/plain [18:09] <emijrp> and do you have to switch? or autodetect? [18:10] <DFJustin> I have autodetect turned on [18:10] <db48x> so either you have autodetection turned off, or it just didn't work this tiem [18:10] <DFJustin> autodetecting utf-8 is piss-easy so it must be off [19:38] <chronomex> DFJustin: it's especially easy if you dont know how to detect anything else... [19:51] <VonGuard> SketchCow [19:51] <VonGuard> u-matic is not something you can handle, eh? [19:51] <VonGuard> i shoulda saved the one player i found at the accrc years ago [19:51] <Coderjoe> ooh [19:51] <Coderjoe> umatic [19:51] <Coderjoe> found a copy of the star wars holiday special? [19:51] <VonGuard> the first 3 episodes of the GamePro tv show are on that format [19:51] <VonGuard> seems to be the only copies anywhere [19:51] <Coderjoe> O_O [19:52] <VonGuard> was hoping jason had a player [19:52] <chronomex> VonGuard: I'll ask around my circles. [19:52] <VonGuard> but i guess not. might have to take it to archive.org [19:52] <VonGuard> as they are nearby [19:54] <emijrp> Do we work for the Interwebz? http://iworkfortheinternet.org [19:54] <Coderjoe> hmm [19:55] <Coderjoe> there are some players on ebay. if I had money to spare I might snap one up [19:56] <chronomex> emijrp: what is this? [19:57] <emijrp> Read MAN. [19:57] <chronomex> I'm on my phone, I see no such thing [19:58] <emijrp> Anti-SOPA site. [19:59] <chronomex> oh [19:59] <Schbirid> merica! merica! [20:29] <emijrp> Any to download ustream videos? [20:29] <Coderjoe> I have a python script that can dump non-ppv live streams [20:29] <Coderjoe> well, it gives the proper rtmpdump command line [20:30] <Coderjoe> I was trying to figure out how ustream's ppv live feed system worked, but have not yet [20:33] <Coderjoe> emijrp: if you want it: https://gist.github.com/1468962 [20:34] <Coderjoe> kinda hackish, and requires pyamf. (also doesn't follow the python standard indenting) [20:36] <emijrp> thanks [20:37] <emijrp> http://code.google.com/p/get-flash-videos/issues/detail?id=115 [20:39] <emijrp> Coderjoe: what url format needs? channel, record ? [20:39] <Coderjoe> channel page [21:56] <SketchCow> EXCELLENT sick day [22:05] <underscor> <ATidlebot> chronomex tamed a wild horse! This wondrous godsend has accelerated them 0 days, 00:26:54 towards level 43 [22:05] <underscor> I knew you were an equestrian [23:04] <SketchCow> I will eventually have a umatic. [23:04] <SketchCow> I just don't yet.