[00:27] *** BlueMaxim is now known as BlueMax [01:08] *** primus104 has quit IRC (Leaving.) [01:24] *** Mayonaise has joined #archiveteam-bs [01:27] *** dashcloud has quit IRC (Read error: Operation timed out) [01:34] *** dashcloud has joined #archiveteam-bs [01:34] *** Mayonaise has quit IRC (Quit: WeeChat 0.3.8) [01:34] *** Mayonaise has joined #archiveteam-bs [01:38] *** mistym_ has joined #archiveteam-bs [01:41] *** mistym has quit IRC (Read error: Operation timed out) [01:51] *** ivan` has quit IRC (Max SendQ exceeded) [01:52] *** ivan` has joined #archiveteam-bs [01:54] *** mistym_ has quit IRC (Remote host closed the connection) [02:30] *** mistym has joined #archiveteam-bs [03:02] sorted! https://archive.org/details/EmuWiki_Collection [03:03] hmm. a giant tar is kinda bad since there's no index and the contents view doesn't seem to get cached [03:11] also how do I make https://archive.org/details/acclaim_discs show up better? [03:42] *** Sellyme_ has quit IRC (Remote host closed the connection) [03:51] *** ivan` has quit IRC (Max SendQ exceeded) [03:53] *** ivan` has joined #archiveteam-bs [04:21] *** ex-parro1 has quit IRC (Leaving.) [04:22] so i maybe able to grab kbs news clips from 2001 [04:22] all cause of this post came up on google: http://cluster1.cafe.daum.net/_c21_/bbs_search_read?grpid=PqYA&fldid=4f5o&datanum=10&openArticle=true&docid=PqYA4f5o1020030523012744 [04:23] they embed this link: mms://vod.kbs.co.kr/news/newstoday/2001/02/27/100.asf" [04:23] that google puts in plain text for search description [04:24] then i turn it into this: rtmpdump -R -r "rtmp://newsvod.kbs.co.kr:1935/news/_definst_/mp4:newstoday/2001/02/27/100.mp4" -o 20010227.mp4 [04:38] hm... where's the second Sony data dump? [05:04] *** aaaaaaaaa has quit IRC (Leaving) [05:19] *** wp494 has quit IRC () [05:24] *** Start is now known as StartAway [05:25] *** brayden has quit IRC (Read error: Operation timed out) [05:27] *** wp494 has joined #archiveteam-bs [05:30] *** brayden has joined #archiveteam-bs [05:38] *** StartAway is now known as Start [05:40] *** Start is now known as StartAway [06:15] *** StartAway is now known as Start [06:18] *** Start is now known as StartAway [06:27] *** ionpulse_ has joined #archiveteam-bs [06:28] *** RainbowCo has quit IRC (Ping timeout: 272 seconds) [06:29] *** RainbowCo has joined #archiveteam-bs [06:30] *** ionpulse has quit IRC (Read error: Connection reset by peer) [06:31] *** RainbowCo has quit IRC (Read error: Operation timed out) [06:32] goodmorning [06:33] *** RainbowCo has joined #archiveteam-bs [06:37] *** wp494 has quit IRC () [06:44] midas: hey [06:56] *** wp494 has joined #archiveteam-bs [07:28] so looks like my kbs news grab from 2000 is 80 seconds per 1mb of data [07:58] *** toad1 has quit IRC (Ping timeout: 852 seconds) [08:06] *** mistym has quit IRC (Remote host closed the connection) [08:11] *** toad1 has joined #archiveteam-bs [08:29] *** primus104 has joined #archiveteam-bs [09:11] *** schbirid has joined #archiveteam-bs [09:40] *** Alyssa has quit IRC (Read error: Operation timed out) [09:40] *** Alyssa has joined #archiveteam-bs [10:13] *** dashcloud has quit IRC (Ping timeout: 265 seconds) [10:15] *** dashcloud has joined #archiveteam-bs [10:28] *** SmileyG has quit IRC (Remote host closed the connection) [10:39] *** Smiley has joined #archiveteam-bs [10:49] *** primus104 has quit IRC (Leaving.) [10:57] yipdw: yipyipyip [11:00] yipdw: seriously though, when you're here: you may want to look into Web Components (https://www.youtube.com/watch?v=fqULJBBEVQE&feature=youtu.be) and Polymer (https://www.polymer-project.org/) [11:54] *** BlueMax has quit IRC (Quit: Leaving) [12:28] *** schbirid has quit IRC (Read error: Operation timed out) [12:29] *** schbirid has joined #archiveteam-bs [13:09] Hmm, I had another insight on history, related to computer history. One thing might be the very original thing, but it might have absolutely nothing to do with how a thing progressed. [13:09] So you could say, "We made this first," but another party can very honestly say, "We never heard of you." [14:16] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [14:18] *** StartAway is now known as Start [14:20] *** Lord_Nigh has joined #archiveteam-bs [14:25] *** primus104 has joined #archiveteam-bs [15:22] *** rejon_ has quit IRC (Ping timeout: 480 seconds) [15:42] *** Start has quit IRC (Ping timeout: 615 seconds) [15:46] *** mistym has joined #archiveteam-bs [15:49] *** mistym has quit IRC (Remote host closed the connection) [15:59] oh holy fuck, new opera has no bookmark folders [15:59] there goes years of work [16:05] *** primus104 has quit IRC (Leaving.) [16:07] w0rp: also known as "why patents are fucking nonsense" [16:13] god i hate icon only gui [16:13] everything is terrible [16:13] someone launch the nukes [16:14] *** ben__ has quit IRC (Read error: Operation timed out) [16:14] *** Start has joined #archiveteam-bs [16:18] I hate this Mr Cranky guy [16:19] I had a discussion soe time ago on the forums of IA with him and he just made me explode. [16:19] https://archive.org/post/1027171/genealogy [16:20] *** ionpulse_ is now known as ionpulse [16:20] https://archive.org/post/1019321/jeff-save-us-from-the-tweakers-please [16:22] https://archive.org/post/1019197/whats-with-tweakers [16:22] https://archive.org/post/1019249/jeff-kaplan-please-help-tweakers-invasion [16:23] I did understand their first concerns, but the way they talked to me after that. [16:23] saying I was some computer generated robot and when I got a bit angry stating there might be something human in me [16:25] is that the banned kid? [16:26] I don't think that [16:26] he was waaay before that [16:26] that Mr Cranky, was way before the kid [16:26] *** ben__ has joined #archiveteam-bs [16:35] *** aaaaaaaaa has joined #archiveteam-bs [16:37] schbirid: its a shame too because old opera was pretty rad, if only they could open source it or something. [16:37] yeah [16:38] the only piece of closed source i knowingly loved [16:48] That said I'm too stuck in my Firefox ways to ever switch at this point, I don't know if I could live without tree style tabs any more. [16:51] *** sep332 has left [16:51] *** sep332 has joined #archiveteam-bs [16:56] https://twitter.com/ryanprior/status/540905020984553472 [16:56] Anyone grabbing notable vines right now? [16:56] or any Vines? [17:02] i've backed up jontron and both ashens accounts [17:02] haven't got around to uploading them to IA yet [17:08] *** abartov has joined #archiveteam-bs [17:31] *** mistym has joined #archiveteam-bs [17:47] *** Start_ has joined #archiveteam-bs [17:47] *** Start has quit IRC (Read error: Connection reset by peer) [17:47] *** Start_ is now known as Start [17:48] *** Lord_Nigh has quit IRC (Ping timeout: 272 seconds) [17:52] arkiver: Please don't engage the people in the forums. [17:52] SketchCow: 04 Dec 12:59Z tell SketchCow I forgot to ask last night, does the wayback crawler have any ignores besides the robots.txt so we should keep a lookout for [17:52] arb735: The wayback does not have any except possibly some legal thing even I won't be privvy to. [17:54] ---------------------------------------------- [17:54] Don't engage people in the IA Forums. [17:54] They don't understand archive team, and they [17:54] are the crankiest, accusingest people [17:54] in the entire world. You got me. You don't [17:54] need them. [17:54] ---------------------------------------------- [17:55] that should be on the forums. "you guys are batshit crazy and dont like the rest of the world" [17:56] No. [17:56] NO. [17:56] The forums are a bunch of very intense, very unpleasant people who all secretly want to work for the archive. [17:57] And when anything upsets what they think is a special, good, don't-tell-anyone thing, they flip out. [17:57] They worship my co-worker Jeff because he's the ONLY employee who talks to them. [17:57] Do a search for "jscott" in there - nobody knows about me and they don't talk about me. [17:58] lol, you probably want to keep it like that [17:58] *** Lord_Nigh has joined #archiveteam-bs [17:59] I DO. [17:59] Which is why I don't want archive team members, which are literally these guys' living NIGHTMARE, engaging and debating with them. [17:59] haha [17:59] Software has a forum, which was there before me. I never read it. [18:00] Every time I'm asked about a major collection like arcade or CDs getting a forum, I go "nooooooo" [18:01] hm i might search for archiveteam on the forums [18:01] i wonder what happens to my brain [18:03] It just makes me so annoyed to see people interacting with these folks. [18:04] I watched them go after one of us when he was uploading a bunch of materials. [18:04] oh i wont interact, after reading this single line: "Those uploads of a few years ago were at least done by humans. If you examine the time stamps on the tweaker files (All Files :HTTPS) you will notice the minute by minute upload timing that occurs day and night. " [18:04] And they didn't SPECULATE, they ASSUMED AND STARTED SCREAMING that he was OBVIOUSLY an agent of some television network, running a false flag of uploading piles of some show so they could rush in and shut the Internet Archive down. [18:05] Not to come off as whatever, but I really changed some things at the archive [18:05] *** ben__ has quit IRC (Remote host closed the connection) [18:05] I would hope any employee coming into some place and spending some years there would have some influence [18:05] But I'm jammed their intake like crazy, between archive team and helping people be able to add new material. [18:06] This is why I did this 180 and my current priority is cleaning up the archive's stacks so that junk and unclassified good stuff isn't in some corner. [18:06] When I started on Monday, "unfiled texts" had 560,000 items. [18:06] It currently has 330,000. [18:06] thats impressive [18:06] That's one week. [18:07] *** ohhdemgir has quit IRC (Ping timeout: 272 seconds) [18:07] The key isn't just that I'm doing this cleanup, it's that I'm working with an internal tool that's being used for semi-effective spam control and I'm going to help make it able to classify and push various incoming large sets. [18:07] And folkscanomy made a difference too. [18:07] I found a guy who had uploaded 4,000 books and pamplets. [18:07] Crazy, nasty stuff, too - anti-whatever you'd like. [18:07] also, you probably used some scripting. dont post it on the forum or you will be shot. [18:08] human intervention only and stuff [18:08] I don't go on the forum. [18:08] ;) [18:08] just pulling your leg [18:08] I really really avoid it. [18:08] Yeah, don't do that. [18:08] Someone tells you about a car accident that killed their family, don't occasionally go "honk honk" [18:08] That thing is a horrorshow and I don't like having archive team members engage them on there. [18:09] Because then they accuse, apparently arkiver took the bait, and then it's me and Jeff talking. [18:11] *** antomati_ has joined #archiveteam-bs [18:12] ---- [18:13] Anyway, I'm going to pass something to you nerds which you probably have, but which I found very useful. [18:13] I needed treemapping. Treemapping works with someone like me who works visually a lot better than numerically. [18:13] *** rduser has quit IRC (Ping timeout: 370 seconds) [18:13] https://developers.google.com/chart/interactive/docs/gallery/treemap [18:13] I discovered google has a graphing api, and you can just take their code and shove in anything. That's how I made my contribution treemap. [18:15] hm thats cool, so you could create a treemap of any collection? [18:15] *** antomatic has quit IRC (Ping timeout: 370 seconds) [18:15] *** sirkov has quit IRC (Ping timeout: 370 seconds) [18:15] *** SadDM has quit IRC (Ping timeout: 370 seconds) [18:17] And I did. [18:17] I can now see, in the open bins, who has contributed a ton of files, and how much of the total space they take. [18:19] *** rduser has joined #archiveteam-bs [18:20] For example, I now know the largest uploader in open, unsorted, uploaded 4,300 Romanian items. [18:21] I found another one who did nothing but 3,000 Urdu-ish items. [18:21] *** sirkov has joined #archiveteam-bs [18:22] finding the diamonds in the rough? :) [18:22] Diamonds in the dogshit, yes. [18:22] heh [18:25] *** SadDM has joined #archiveteam-bs [18:28] there is also a d3 treemap block that is fairly easy to hack for other data [18:28] kdirstat (or was it something else, some KDE thing) promised some import/export of treemaps but it never worked for me [18:31] *** primus104 has joined #archiveteam-bs [18:39] *** abartov has quit IRC (Ping timeout: 258 seconds) [18:42] 560,000 is already down a lot, I thought it was over 1 million before, guess spamsweep really did a number on it [18:43] *** primus104 has quit IRC (Leaving.) [18:44] *** wm_ has joined #archiveteam-bs [18:44] *** ohhdemgir has joined #archiveteam-bs [18:46] *** Start has quit IRC (Read error: Operation timed out) [18:46] my personal favourite ia forums guy is the one who was very upset that people keep uploading videos in arabic which he, personally, does not speak or understand [18:47] *** ete_ has joined #archiveteam-bs [18:49] So today's project is adding tables of contents to the descriptions of issues of Weird Tales. [18:50] Every once in a while you see a really wild name... like this: https://archive.org/stream/wt_1937_10#page/n99/mode/2up [18:50] I imagine that when he wasn't writing for the pulps, Mr Wellman was out in the bush wrestling bears. [18:52] Also, TIL that the author of Psycho (Robert Bloch) wrote for the pulps early in his career. [19:00] *** abartov has joined #archiveteam-bs [19:02] *** bsmith093 has joined #archiveteam-bs [19:04] *** mistym_ has joined #archiveteam-bs [19:05] *** ^NazguL^ has joined #archiveteam-bs [19:06] *** mistym has quit IRC (Read error: Operation timed out) [19:12] *** mistym_ has quit IRC (Ping timeout: 272 seconds) [19:16] *** Becky12 has joined #archiveteam-bs [19:17] *** mistym has joined #archiveteam-bs [19:17] *** ^NazguL^ has quit IRC () [19:17] *** Becky12 has quit IRC (Read error: Connection reset by peer) [19:17] *** mistym has quit IRC (Remote host closed the connection) [19:26] *** mistym has joined #archiveteam-bs [19:31] arb735: From what i've seen, Wayback does have ignores for websites that have made requests to be blocked, as well as those that have had legal threats made against them (e.g. xenu.net) [19:36] i hope your not actually deleting those things with the spam removal tool? (the crazy pamphlets, and whatever) [19:37] *** abartov has quit IRC (Ping timeout: 258 seconds) [19:46] Why would we delete crazy pamphlets [19:51] political correctness or whatevre [19:51] ha ha ha [19:51] Is kyan new [19:51] i've run across the attitude that, like, hate speech should be destroyed or wahtever [19:51] ARE YOU NEW [19:51] ish [19:51] We don't even delete SPAM [19:51] shown up on and off for a year or so [19:51] We dark it [19:51] cool! *so cool* [19:51] Nothing leaves the archive, not a bit [19:52] <3 [19:52] whoa http://bellard.org/bpg/ [19:52] The only issue, which I'm addressing is 1. Getting "rid" of spam that tries to hide itself or which due to some of our search limitations, we don't find. [19:53] And 2. Making it so our big "unsorted" bin is quickly and easily semi-classified into smaller, manageable bins [19:53] nice :) the opensoruce collections are a little, umm, messy [19:54] It was 500,000 items on monday [19:54] It's friday and I got it to 360,000 [19:54] The laser's aimed at this problem. By 2015 you won't remember when it was this bad [19:56] Like those scenes in movies where some kid will complain there is 250 items in it and the old guy will say "you think thats bad? I remember when..." [19:56] *** Start has joined #archiveteam-bs [19:59] *** abartov has joined #archiveteam-bs [20:02] Well, we'll focus on other things. [20:02] I'm just going to keep improving the shitpile in 2015. [20:03] This is my primary goal at the archive for the year, I got this cleared, and I will just brutalize it [20:04] *** mistym_ has joined #archiveteam-bs [20:06] *** mistym has quit IRC (Read error: Operation timed out) [20:15] *** mistym_ has quit IRC (Ping timeout: 492 seconds) [20:27] *** Start_ has joined #archiveteam-bs [20:27] *** Start has quit IRC (Read error: Connection reset by peer) [20:31] *** Start has joined #archiveteam-bs [20:31] *** Start_ has quit IRC (Read error: Connection reset by peer) [20:34] *** bauruine has quit IRC (Ping timeout: 265 seconds) [20:38] *** primus104 has joined #archiveteam-bs [20:42] *** bauruine has joined #archiveteam-bs [20:50] *** bauruine has quit IRC (Read error: Connection reset by peer) [20:51] *** bauruine has joined #archiveteam-bs [21:15] *** ete_ has quit IRC (Ping timeout: 265 seconds) [21:23] *** Start has quit IRC (Read error: Connection reset by peer) [21:23] *** antomati_ is now known as antomatic [21:27] *** Start has joined #archiveteam-bs [21:34] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [21:36] *** Lord_Nigh has joined #archiveteam-bs [21:44] *** BlueMaxim has joined #archiveteam-bs [21:44] *** bauruine has quit IRC (Read error: Connection reset by peer) [21:47] *** bauruine has joined #archiveteam-bs [22:16] *** Aranje has joined #archiveteam-bs [22:21] *** Start has quit IRC (Read error: Connection reset by peer) [22:23] *** Start has joined #archiveteam-bs [22:33] *** primus has quit IRC (Remote host closed the connection) [22:37] *** Start has quit IRC (Ping timeout: 370 seconds) [22:39] *** primus has joined #archiveteam-bs [22:40] hmmm [22:40] how did we access twitch streams again anyone? [22:41] *** abartov has quit IRC (Ping timeout: 258 seconds) [22:41] I think there was a custom tool or something [22:42] since it was a warrior project, the code should be available somewhere [22:46] yup hgot that [22:46] *** mistym has joined #archiveteam-bs [22:58] *** schbirid has quit IRC (Leaving) [23:15] i at least try my best to keep my shitpie neat [23:15] with tons of metadata and stuff [23:15] shitpie? [23:16] my collections of stuff i upload [23:16] oh [23:17] 2008 collection of radionz nine to noon mp3s is almost done [23:22] *** mistym_ has joined #archiveteam-bs [23:24] *** Start has joined #archiveteam-bs [23:29] When files are deleted from IA via the item editor, are they deleted deleted, or hidden deleted? [23:30] Nothing gets deleted [23:30] godane: We fixed your shitpile by giving you a collection for an inbox. [23:32] *** mistym has quit IRC (Read error: Operation timed out) [23:35] *** aaaaaaaaa has quit IRC (Leaving) [23:41] i know [23:42] btw i got access to asiatorrents.me [23:43] i'm grabbing 1 thur 38406 subtitle files [23:45] *** slash` has quit IRC (Ping timeout: 615 seconds) [23:50] *** dashcloud has quit IRC (Read error: Operation timed out) [23:51] *** slash` has joined #archiveteam-bs [23:53] *** dashcloud has joined #archiveteam-bs