[00:16] dashcloud: does tripod have such a thing? [00:21] *** GLaDOS has quit IRC (Ping timeout: 272 seconds) [00:26] *** cf has joined #archiveteam [00:30] *** LordNigh2 has joined #archiveteam [00:33] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [00:33] *** LordNigh2 is now known as Lord_Nigh [00:35] *** cbb2 has joined #archiveteam [00:38] *** cbb has quit IRC (Read error: Operation timed out) [00:42] *** GLaDOS has joined #archiveteam [00:46] *** cbb2 has quit IRC (cbb2) [00:48] *** hive-mind has quit IRC (Ping timeout: 272 seconds) [00:49] *** toad1 has joined #archiveteam [00:50] *** hive-mind has joined #archiveteam [00:54] *** toad has quit IRC (Read error: Operation timed out) [00:57] not that I can see- ask godane though- I think he found the angelfire sitemap files [01:02] *** pilgrim has joined #archiveteam [01:03] *** mistym has quit IRC (Remote host closed the connection) [01:05] http://www.angelfire.com/robots.txt [01:06] you have download the sitemap xml.gz files [01:06] then zcat them [01:06] but the robots will give you a complete list i hope [01:06] *** xk_id has joined #archiveteam [01:07] *** cf_ has joined #archiveteam [01:12] *** cf has quit IRC (Ping timeout: 633 seconds) [01:12] *** cf_ is now known as cf [01:20] *** K4k has joined #archiveteam [01:25] the tripod sitemap is much less helpful than the angelfire one: http://www.tripod.lycos.com/sitemap_index.xml [01:28] *** nertzy has joined #archiveteam [01:29] *** K4k has quit IRC (Ping timeout: 480 seconds) [01:37] *** Ymgve has quit IRC () [01:39] they are actually under members.tripod.com [01:39] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [01:43] and the archivebot is grabbing them [01:43] *** LordNigh2 has joined #archiveteam [01:44] *** Lord_Nigh has quit IRC (Ping timeout: 272 seconds) [01:45] *** LordNigh2 is now known as Lord_Nigh [01:48] *** cf_ has joined #archiveteam [01:54] *** LordNigh2 has joined #archiveteam [01:56] *** cf has quit IRC (Ping timeout: 633 seconds) [01:56] *** cf_ is now known as cf [01:57] *** Start has joined #archiveteam [01:58] *** primus104 has quit IRC (Leaving.) [02:01] *** Lord_Nigh has quit IRC (Ping timeout: 600 seconds) [02:01] *** LordNigh2 is now known as Lord_Nigh [02:02] arkiver: looks like the highest valid roon blog has changed: https://roon.io/api/v1/blogs/122234 [02:02] arkiver: we should probably scrape everything up to 122300 to be safe [02:26] *** pilgrim has quit IRC (Read error: Operation timed out) [02:27] *** pilgrim has joined #archiveteam [02:42] *** APerti_ has joined #archiveteam [02:46] *** Sellyme_ has joined #archiveteam [02:46] *** Sellyme has quit IRC (Read error: Connection reset by peer) [02:48] *** APerti has quit IRC (Read error: Operation timed out) [02:54] *** mistym has joined #archiveteam [02:56] *** Sellyme_ has quit IRC (Read error: No route to host) [02:57] *** Sellyme has joined #archiveteam [03:21] *** nertzy has joined #archiveteam [03:37] tripod has sitemaps? that should greatly ease discovery. [03:45] SketchCow: Juliacoleratings and Sharonsleeper are spammers and should be banned from the wiki [03:49] there's at least two kinds of tripod pages/setups: classic style: http://sicexcels.tripod.com/ & modern style: http://members.tripod.com/no_numbers/ [03:52] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [04:06] *** Froggypwn has joined #archiveteam [04:06] http://techcrunch.com/2014/12/01/microsoft-is-getting-rid-of-clip-art/ [04:07] if i'm not mistaken, isn't there some part of office.microsoft.com that lets you browse through clip art [04:07] if i'm not mistaken, isn't there some part of office.microsoft.com that lets you browse through clip art? [04:07] if anyone can find it, let me know [04:08] if it's still there [04:09] *** rejon has joined #archiveteam [04:10] it used to be office.microsoft.com/language setting/images/ IIRC [04:11] so mine would have been office.microsoft.com/en-US/images/ [04:11] but it used to be some office.com url that I can't remember in earlier versions [04:17] found it: http://office.microsoft.com/en-us/images/CM079001906.aspx [04:17] the last two numbers are incrementing [04:17] starting at http://office.microsoft.com/en-us/images/CM079001901.aspx [04:18] there are gaps here and there [04:23] *** mistym has quit IRC (Remote host closed the connection) [04:37] *** SN4T14 has quit IRC (Ping timeout: 369 seconds) [04:38] *** chfoo has quit IRC (Ping timeout: 258 seconds) [04:43] *** chfoo has joined #archiveteam [04:44] operation: save clippy begins (i guess) [04:50] i've created a wiki page for microsoft clip art [04:50] i'll ask arkiver about writing grab scripts in the morning [04:50] or as soon as possible [04:51] what should it's irc channel be called? [04:51] two ideas that come to mind are #clipfart and #clippyart [04:56] *** mistym has joined #archiveteam [05:01] *** zenguy_pc has quit IRC (Read error: Operation timed out) [05:02] *** aaaaaaaaa has quit IRC (Leaving) [05:06] damnit I thought nadella would be mostly a good guy [05:06] also, I vote #clipfart [05:06] Start: aren't those from Fotolia? [05:06] so they're not even MS clip art? [05:06] some newer ones are [05:06] there's a ton of older stuff in there [05:07] like this guy: http://officeimg.vo.msecnd.net/en-us/images/MH900240985.jpg [05:10] any more votes/ideas for the irc channel name? [05:10] *** rejon has quit IRC (Read error: Connection reset by peer) [05:10] #ditchart? [05:14] anyone else? so far it's between #clipfart and #ditchart [05:16] *** zenguy_pc has joined #archiveteam [05:23] *** SN4T14 has joined #archiveteam [05:24] *** rejon has joined #archiveteam [05:27] i vote #clipfart [05:43] *** Start is now known as StartAway [05:50] *** SN4T14 has quit IRC (Ping timeout: 369 seconds) [05:52] *** SN4T14 has joined #archiveteam [06:16] *** dashcloud has quit IRC (Read error: Operation timed out) [06:19] *** dashcloud has joined #archiveteam [06:57] *** REiN^ has joined #archiveteam [07:07] *** primus104 has joined #archiveteam [07:28] *** chfoo has quit IRC (Ping timeout: 258 seconds) [07:36] *** Froggypwn has quit IRC (Quit: ~ Trillian Astra - www.trillian.im ~) [07:44] *** BiggieJon has quit IRC (Read error: Connection reset by peer) [07:45] *** BiggieJon has joined #archiveteam [07:51] *** primus104 has quit IRC (Leaving.) [07:55] *** BiggieJo1 has joined #archiveteam [07:56] *** mistym has quit IRC (Remote host closed the connection) [08:03] *** BiggieJon has quit IRC (Read error: Operation timed out) [08:14] I vote #clipfart [08:15] So with all the new websites, we currently need to start grabbing: [08:15] - ep1c (will be done through the viddy grab) [08:15] - roon [08:16] - microsoft clip art [08:16] If I'm missing something there ^ please let me know [08:20] In the grab for microsoft clip art I'll grab the whole http://office.microsoft.com/en-us/images/MP900******.aspx range and the images/videos/audios/others that are up there for download [09:30] *** primus104 has joined #archiveteam [09:42] *** kris33 has joined #archiveteam [09:51] *** MMovie1 has joined #archiveteam [09:53] *** MMovie has quit IRC (Ping timeout: 335 seconds) [10:09] *** Ymgve has joined #archiveteam [10:26] *** BlueMaxim has quit IRC (Quit: Leaving) [10:43] *** Boppen has quit IRC (Read error: Connection reset by peer) [10:43] *** Boppen has joined #archiveteam [10:44] *** APerti_ has quit IRC (Ping timeout: 265 seconds) [11:07] *** kris33 has quit IRC (Textual IRC Client: www.textualapp.com) [11:11] *** primus104 has quit IRC (Leaving.) [11:20] *** filippo__ has quit IRC (Connection closed for inactivity) [11:27] *** ex-parrot has quit IRC (Read error: Operation timed out) [11:28] *** ex-parro1 has quit IRC (Read error: Operation timed out) [11:34] *** ex-parrot has joined #archiveteam [11:35] *** ex-parro1 has joined #archiveteam [11:55] *** schbirid has joined #archiveteam [12:27] *** dashcloud has quit IRC (Read error: Operation timed out) [12:28] *** dashcloud has joined #archiveteam [12:59] *** cf has quit IRC (Quit: cf) [13:03] *** K4k has joined #archiveteam [13:40] *** rduser has quit IRC (ircd.shaw.ca irc.shaw.ca) [13:40] *** SadDM has quit IRC (ircd.shaw.ca irc.shaw.ca) [13:41] *** rduser has joined #archiveteam [13:44] *** SadDM has joined #archiveteam [13:49] *** antomati_ has joined #archiveteam [13:50] *** antomati_ is now known as antomat2 [13:50] Coming to you live from a moving train. [13:51] What hath technology wrought. [13:51] Buh. I have nothing else to say. That is all. :) [13:51] * antomat2 waves [13:51] *** antomat2 has quit IRC (Client Quit) [13:52] *** sankin has joined #archiveteam [14:03] *** ete_ has joined #archiveteam [14:09] *** primus104 has joined #archiveteam [14:45] *** REiN^ has quit IRC (Read error: Connection reset by peer) [14:53] *** REiN^ has joined #archiveteam [14:59] *** StartAway is now known as Start [15:23] *** thechip has joined #archiveteam [15:32] *** mistym has joined #archiveteam [15:37] *** Emcy_ has joined #archiveteam [15:38] *** aaaaaaaaa has joined #archiveteam [15:40] *** nico_ has joined #archiveteam [15:40] *** mistym has quit IRC (Remote host closed the connection) [15:43] *** Kniffy has quit IRC (hub.se irc.swepipe.se) [15:43] *** Emcy has quit IRC (hub.se irc.swepipe.se) [15:43] *** nico has quit IRC (hub.se irc.swepipe.se) [15:43] *** danneh_ has quit IRC (hub.se irc.swepipe.se) [15:48] *** Start has quit IRC (Ping timeout: 265 seconds) [16:00] *** mistym has joined #archiveteam [16:10] *** Kniffy has joined #archiveteam [16:10] *** danneh_ has joined #archiveteam [16:22] *** Start has joined #archiveteam [16:39] *** chfoo has joined #archiveteam [16:46] *** mistym_ has joined #archiveteam [16:51] *** primus104 has quit IRC (Leaving.) [16:52] *** mistym has quit IRC (Ping timeout: 480 seconds) [16:55] *** Start has quit IRC (Read error: Connection reset by peer) [16:56] *** Start has joined #archiveteam [16:57] *** Start__ has joined #archiveteam [16:57] *** Start has quit IRC (Read error: Connection reset by peer) [16:57] *** Start__ is now known as Start [17:09] *** mistym_ has quit IRC (Remote host closed the connection) [17:14] *** rejon has quit IRC (Ping timeout: 480 seconds) [17:33] *** signius_ has quit IRC (Read error: Operation timed out) [17:47] *** signius_ has joined #archiveteam [17:54] Start: Just figured it out. [17:57] *** Start has quit IRC (Ping timeout: 633 seconds) [18:02] Hey hi. [18:02] ----------------------------------------------- [18:02] archive.org is putting up wikimedia-like banners [18:02] test and give feedback when you feel like it [18:02] ----------------------------------------------- [18:07] too much text, too tiny, banner blindness orange, where can i opt out? [18:08] why does IA need money suddenly? [18:08] what is the goal? [18:12] schbirid: to pay for storing more data? [18:12] "why does IA need money suddenly", as terabytes of twitpic are slammed onto s3.us.archive.org [18:13] (shout-out: we're trying to get #aohell going better but need people experienced with protocol reverse engineering) [18:13] i know it can put any money to good use but i did not get the impression that money is _needed_ and that there is a set sum (75$ of "everyone") needs to be funded [18:13] orange on black doesn't contrast enough for me [18:13] and that peach on orange doesn't either. [18:15] but it is nicely written and much better than wikipedia's version [18:20] (what i am saying is: people might like to know the target and cause) [18:24] *** dashcloud has quit IRC (Read error: Connection reset by peer) [18:25] *** dashcloud has joined #archiveteam [18:26] *** ete_ has quit IRC (Ping timeout: 265 seconds) [18:28] SketchCow: ah, luckily you mean the OLD wikimedia-like banners [18:28] Because the new ones take an entire 1024x768 screen and look like an obituary, see https://commons.wikimedia.org/wiki/Category:Fundraising_2014 [18:29] Oh good point. One thing I don't like about wikipedia's is that they don't really say what or how much they want. Plus, I don't like it when I can't easily figure out the underlying financial picture. [18:31] *** nico_ is now known as nico [18:37] *** Aranje has quit IRC (Read error: Connection reset by peer) [18:37] *** Aranje has joined #archiveteam [18:38] Oh, any amount is good. It all goes to wise investors in New York [18:38] https://wikimediafoundation.org/w/index.php?oldid=100396 says 20 M$ for the "English users in English countries" december ride [18:40] 13:17 < schbirid> why does IA need money suddenly? [18:40] So, you've not been in here..... for the past 5 years [18:40] IA loses money basically every single year. [18:40] oh poop, i thought it was all super duper funded :( [18:40] It has a very nice rich guy supporting it [18:41] But it is definitely not super duper funded and we definitely don't have an endowment yet. [18:41] They really should link to their 990 or audited financial statements, or maybe they do but I can't find it. [18:42] I'm definitely surprised. I thought the columns and triangle logo was a symbol of the oil platform in the SF sea owned by IA. isn't it? [18:42] Oh, maybe it's because the oil price is dropping [18:42] Evil arabs [18:43] the new Library of Alexandria must be playing the long game now. [18:44] The Institute of Museum and Library Services (http://www.imls.gov/) issues grants, does IA take advantage of that? [18:44] The oil platform is where we keep all the servers with the archiveteam downloads. [18:45] *** commentat has joined #archiveteam [18:46] About to go out and buy more crates and more plastic bags, because that's my life now. [18:47] no supermarkets near the oil platform [18:48] howto archive a site like http://archief.schooltv.nl/wieisdedader/index.jsp [18:49] it is shutting don @ end this month [18:49] commentat, warcprox? [18:49] *** Start has joined #archiveteam [18:49] can it handle a flash site? [18:49] cc joepie91_ ^ [18:49] schooltv [18:50] the site contains all flash items to open other flash items [18:50] we should grab all of it [18:50] kyan: Our fundaiser thanks you, and says we've been eyeing grants from them, and watching where they go. [18:50] cool :) [18:51] it is part of an educational program but the Dutch "schooltv" has a new site without flash so all flash related items are lost @ end this month [18:51] maybe more interesting items @ this archief subdomain [18:56] *** bzc6p_ has joined #archiveteam [18:56] *** sankin has quit IRC (Leaving.) [18:57] *** bzc6p_ is now known as bzc6p [18:57] *** bzc6p has left [19:09] *** primus104 has joined #archiveteam [19:14] *** APerti has joined #archiveteam [19:16] *** mistym has joined #archiveteam [19:30] *** dashcloud has quit IRC (Read error: Connection reset by peer) [19:30] *** dashcloud has joined #archiveteam [19:42] *** Start has quit IRC (Ping timeout: 265 seconds) [19:56] *** primus104 has quit IRC (Leaving.) [20:16] *** Start has joined #archiveteam [20:17] *** Start has quit IRC (Remote host closed the connection) [20:18] *** Start has joined #archiveteam [20:22] *** primus104 has joined #archiveteam [20:23] SketchCow: about that, the funding thing. is the mailinglist completely fixed now? [20:28] *** K4k has quit IRC (Read error: Operation timed out) [20:33] *** Start has quit IRC (Ping timeout: 265 seconds) [20:44] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [20:47] *** dashcloud has joined #archiveteam [21:28] *** Start has joined #archiveteam [21:47] *** mistym has quit IRC (Remote host closed the connection) [22:25] *** Start has quit IRC (Ping timeout: 265 seconds) [22:27] *** Start has joined #archiveteam [22:29] *** Ymgve has quit IRC (Ping timeout: 512 seconds) [22:42] *** commentat has quit IRC () [22:42] *** commentat has joined #archiveteam [22:46] i'm guessing we won [22:47] i'm guessing we won't be able to save relay [22:47] i couldn't find any sort of api or any other efficient discovery methods. [22:51] *** schbirid has quit IRC (Leaving) [22:52] *** BlueMaxim has joined #archiveteam [22:58] *** mistym has joined #archiveteam [23:01] *** dashcloud has quit IRC (Ping timeout: 265 seconds) [23:05] *** dashcloud has joined #archiveteam [23:07] *** APerti has quit IRC () [23:08] *** cf has joined #archiveteam [23:11] *** Start has quit IRC (Read error: Connection reset by peer) [23:13] *** Start has joined #archiveteam [23:36] *** Start has quit IRC (Ping timeout: 606 seconds) [23:47] *** khaoohs_ has joined #archiveteam [23:47] *** khaoohs has quit IRC (Read error: Connection reset by peer)