[00:09] all of it? [00:28] joepie91: http://www.anonnews.org/press/item/820/comments/ didn't have it? [00:29] it used to be at http://www.archiveteam.org/archives/edramatica/ED_archive.zip [00:32] balrog: well yes, "used to e" [00:32] be * [00:32] I've run across a number of broken links on archiveteam.org [00:32] which is simultaneously funny and kinda bad [00:56] so we should run !a http://www.archiveteam.org/ more often on #archivebot [01:06] joepie91: archivebot has it [01:06] maybe not the old version you want [01:06] https://encrypted.google.com/search?q=archivebot+encyclopediadramatica+site%3Aarchive.org&btnG=Search [01:09] http://web.archive.org/http://www.archiveteam.org/archives/edramatica/ED_archive.zip [03:16] so i'm mirroring msnbc news pages from wayback machine [03:16] crazy code to make it happen from cdx: cat cdx*msnbc.com*news*1* | grep 'asp?cp1=1 ' | grep 'text/html 200' | sed 's| http|/http|g' | sed 's| text/html.*||g' | sed 's|.* ||g' | sed 's|:80||g' | sed 's|http://msnbc.com|http://www.msnbc.com|g' | sort | uniq > urls.txt [03:17] yeah [03:17] there comes a point where shell is no longer the best option :P [03:39] https://www.fanfiction.net/s/9571902/1/The-Truth [03:39] whoa [03:40] Edward Snowden/Hetalia Axis Powers crossover [04:15] yipdw: there are no limits to what can be found on hte interwebs [04:15] the * [04:15] ivan`: not the same stuff [04:15] I mean, that webecology backup -was- integrated into the new site [04:16] but it's not the same data :p [04:27] so i got a 22 min video from dateline in 1998 about beef [04:27] well, it was what's for dinner [06:37] how come this is not being updated anymore? https://archive.org/details/freemusicarchive [06:43] SketchCow: underscor: if I were to write a client for IA, what should I set as the default maximum concurrent download and upload limit? [07:12] Why would you write a client? [07:12] We already have one. [07:12] You could look at it and see if improvements or features are needed. [07:18] but making it work with fortran is so much work [07:30] https://pypi.python.org/pypi/internetarchive [07:30] We've done a million uploads with it [08:00] SketchCow: I mean a graphical client, where uploading to IA is one of the features [08:00] not just a library [08:00] it's something I've been working on for a while to automate some processes here [08:01] hence wondering how many concurrent transfers are acceptable [08:01] (also, SketchCow, I've actually been providing some feedback / bug reports on that library already :) [08:05] That's the one. [08:05] I would say, ask Jake then. [08:05] jake@archive.org [08:07] alright, thanks [08:10] Also, the answer to "why hasn't _____ been updated on archive.org" is ALWAYS "because there are 8 people responsible for maintaining collections" [08:10] So unless an outside person is maintaining/co-maintaining the collection, fix-ups come in waves [08:11] Across years, sometimes [08:25] i'm close to 1000 videos for 2000 clips from nbcnews [08:25] *for year 2000 [08:46] damnit gmail [08:46] where did my "you don't have a subject" warning go [08:46] SketchCow: I see [08:46] So I've been working on script-based ways to shore up our stuff. [08:47] Because when the new UI kicks in it will DEFINITELY show gaps and slowdowns in additions. [08:48] what kind of stuff should I be thinking about? [08:48] In what context [08:48] thinking of*, sorry [08:48] like, what kind of stuff is to be shored up [08:48] (my brain is on low-power mode today) [08:50] Help me understand what's going on, again. You hinted but I was busy. [08:50] Quit your job, intend to do "stuff" for a year. [08:50] With IA being one of the beneficiaries of this time. [08:50] Is that right? [08:50] oh, that was a different context actually [08:50] this was more a generic question of "what do you mean with 'stuff' in <@SketchCow> So I've been working on script-based ways to shore up our stuff." [08:51] but yes, the above is also correct [08:51] (though I'll have to see how the fundraiser idea works out before I commit to anything) [08:51] What I am talking about scripting isn't an archiveteam thing. It's a me and the archive thing. [08:51] well yes, but I'm curious what kind of stuff it entails :P [08:51] Many items don't have cover images. Many don't have keywords, etc. [08:51] aha [08:51] right [08:52] Many have no metadata of any kind. Intend to work on that. [08:52] SketchCow: I'd been pondering about this a bit, but idk if this might simply already be on the roadmap: would wikifying metadata not be an option? [08:52] That is an ugly situation. [08:52] We worked together on that one solution, but I've had zero time to work with your code. [08:53] Yanking metadata into a wiki wholesale, and then we edit and I oversee it flying back in, could be good. [08:53] That's the best compromise we can have it. [08:53] well, the idea I was thinking of was more inline wikified editing - so that a user with an account on IA could just edit metadata from an item page itself (excluding 'protected' items) [08:53] but not sure how technically feasible [08:53] There will never, never, ever be, at least within the span of years, a case where you click on something at IA and people do editing in a wiki fashion. [08:54] what's the reasoning behind that? [08:54] It's baked into the organization at the moment. [08:54] I mean, you want to go ahead and tell me why it's great, go ahead, make yourself feel better. But I can see it won't happy anytime soon. [08:54] right, but I'm quite curious whether that's just a time/attention constraint issue, or an inherent conceptual problem with wikifying [08:54] Happy? [08:54] er [08:54] Conceptual problem. [08:54] conceptual problem that people have with * [08:54] right [08:54] Combined with time/attention. [08:56] SketchCow: completely unrelated quesiton, do you guys at IA have a spamfilter that triggers on empty subject lines? because I accidentally sent my email to jake without a subject, and apparently my gmail setting to warn me about that has magically vanished [08:56] My end-run is the closest we'll have. [08:56] question * [08:57] I have not the slightest idea. [08:57] I do know we have a spam issue. [08:57] I don't use the IA mail system. [08:57] alright, we'll see if I get a response then [08:57] right :P [08:57] I suppose that if you have a spam issue, it's not a terribly trigger-happy filter (if any at all), so my mail will probably go through fin [08:57] fine * [08:57] I am all for us using the parallel wiki idea. [08:58] SketchCow: can you elaborate on how you'd see that working, in a technical sense? [08:58] metadata goes in [08:59] metadata comes out [08:59] can't explain that [08:59] lol [08:59] We did a prototype a while ao. [08:59] Sort of - you wrote a post bot but I've been busy. [08:59] well obviously, but the idea I got was that SketchCow meant using a standard wiki system (a la mediawiki), at which point the question is "how do you turn the wiki page back into useful metadata without making the page a pain to edit" [08:59] re: exmic [09:00] * collection chosen [09:00] * metadata of all items is pulled into wiki under a set, with each item a page [09:00] * editttttt [09:00] * push all of it back [09:00] ---- [09:00] On a page: [09:00] metadata pair becomes == METADATA NAME == [09:00] Followed by metadata. [09:01] Obviously there is some trickery from the ingestor to pull things in. [09:01] Obviously there is potential for things to go wrong, or for issues with newbs making a mess [09:01] Obviously it's not the fast fast fast fast shut the fuck up it's fast keep going world of, say, Wikipedia. [09:01] Which... I hate. [11:43] 05:40 yipdw> Edward Snowden/Hetalia Axis Powers crossover [11:44] i really should try to restart the ffnet archiving project [11:50] https://github.com/FlatRockSoft/ [14:10] SketchCow: is the code for your keyword generator posted anywhere? [14:11] I know you're using https://github.com/ox-it/spindle-code/ and https://pypi.python.org/pypi/internetarchive, but what about the glue and baling twine that holds them together? [14:23] some good news on the martin yan's chinatowns torrents [14:24] i got upload 2 and upload 4 last night [14:25] so now i got about 30 episodes of it [17:03] Hmm~ got a USB stick that shows up in dmesg as a SCSI removable disk (like usual) that gets a device (/dev/sdb).. but I can't mount it and if I `dd` from it, it says "dd opening /dev/sdb no medium found" :/ [17:04] Any ideas on how to retrieve data from it? [17:39] ersi: borked usb stick? [17:39] do cfdisk /dev/sdb return something real? [18:31] SadDM: My keyword generator is VERY weaksauce [18:32] If you want it, I can provide it [18:32] Obviously you need write control on the item for it to work. [18:43] SadDM: http://fos.textfiles.com/keyworder.zip [18:44] You need internetarchive (the python program) installed [19:36] SketchCow: anything I'd cobble together would also be weaksauce... you've just saved me the trouble [19:39] gah! *BOOM* goes the zip file [20:16] http://www.reading.ac.uk/news-and-events/releases/PR583836.aspx [20:41] https://www.youtube.com/watch?v=d0mg9DxvfZE [23:41] kanzure_: good question, I dunno. I'd think that people who do photographic printed circuit board production might know. [23:41] this is for diybio? [23:59] balrog: yes, sort of