[00:00] * JesseW wishes chfoo's logs left out join/quit messages... [00:00] they can be useful [00:00] but also annoying [00:01] One case from about a year ago completely took me by surprise, and I only learned this week that they made an announcement just 11 days in advance [00:40] *** Honno has quit IRC (Read error: Operation timed out) [00:51] *** BlueMaxim has joined #archiveteam-bs [00:56] *** chazchaz has quit IRC (Read error: Operation timed out) [00:58] *** chazchaz has joined #archiveteam-bs [01:09] *** Stiletto has joined #archiveteam-bs [01:09] *** schbirid2 has joined #archiveteam-bs [01:10] *** Stiletto has quit IRC (Client Quit) [01:10] *** Stiletto has joined #archiveteam-bs [01:15] *** godane has quit IRC (Quit: Leaving.) [01:18] *** username1 has quit IRC (Read error: Operation timed out) [01:46] is there an easy way to extract all of the urls from a sitemap that have modified dates that match certain search parameters? [01:46] Like say, cases that were last modified after 2010? [01:49] kinda. i want to get down to the specific month, and i will probably need to only grab the ones modified before a certain date and after another date. [01:49] *specific month and day [01:58] Hm...what language are you coding with? [01:58] I know Javascript can work with dates. Don't know about Python or Lua [02:09] !ig 52c4ayoflu10jowlsf4uq60tu ^https?://www\.portalgraphics\.net/pg/ [02:14] *** DoomTay has quit IRC (Ping timeout: 268 seconds) [02:18] *** DoomTay has joined #archiveteam-bs [02:22] *** DoomTay_ has joined #archiveteam-bs [02:23] *** DoomTay_ has left [02:24] *** DoomTay has quit IRC (Ping timeout: 268 seconds) [02:25] *** DoomTay has joined #archiveteam-bs [02:36] *** chfoo has quit IRC (Quit: LoveChatot) [02:46] *** Famicoman has quit IRC (Killed (hub.se (Nick collision (new)))) [02:46] *** Aranje has joined #archiveteam-bs [02:46] *** HCross2 has joined #archiveteam-bs [02:46] *** Sanqui has joined #archiveteam-bs [02:46] *** sigkell has joined #archiveteam-bs [02:46] *** JSharp___ has joined #archiveteam-bs [02:46] *** Muad-Dib has joined #archiveteam-bs [02:46] *** deathy has joined #archiveteam-bs [02:46] *** johtso has joined #archiveteam-bs [02:46] *** zhongfu has joined #archiveteam-bs [02:46] *** Rickster has joined #archiveteam-bs [02:46] *** Meroje has joined #archiveteam-bs [02:46] *** DFJustin has joined #archiveteam-bs [02:46] *** Famicoman has joined #archiveteam-bs [02:46] *** Boltsie has joined #archiveteam-bs [02:46] *** lesderid has joined #archiveteam-bs [02:46] *** SN4T14 has joined #archiveteam-bs [02:46] *** ItsYoda has joined #archiveteam-bs [02:46] *** _desu___ has joined #archiveteam-bs [02:46] *** BartoCH has joined #archiveteam-bs [02:46] *** Atluxity has joined #archiveteam-bs [02:46] *** FalconK has joined #archiveteam-bs [02:46] *** Ctrl-S___ has joined #archiveteam-bs [02:46] *** r3c0d3x has joined #archiveteam-bs [02:46] *** Jeroen52 has joined #archiveteam-bs [02:46] *** swebb sets mode: +o DFJustin [02:47] *** Famicoma1 has joined #archiveteam-bs [02:53] * JesseW is ... unimpressed ... at the number of big Internet companies who have used robots.txt to /block/ the Wayback Machine from archiving the pages where they list the public keys to be used to contact their own network security teams... [02:57] *** chfoo has joined #archiveteam-bs [03:12] The cleaning out of the FOS drives is actually progressing. [03:12] So much I'm murdering the queue. #feelsgoodman [03:12] I'm in some sort of hellish chiptunes directory [03:13] And of course Youtube crap aplenty. [03:13] But a directory that was at one point 1.8tb is now 106gb [03:15] clap clap clap! [03:16] Nice [03:35] better clap than clop, or even the clap [03:43] *** Meroje has quit IRC (Quit: bye!) [04:06] *** Meroje has joined #archiveteam-bs [04:13] *** Stiletto has quit IRC (Ping timeout: 501 seconds) [04:20] *** vitzli has joined #archiveteam-bs [04:24] DoomTay: I'm not coding with any language at the moment. :/ [04:24] Oh [04:25] Because I think the process would be straight forward [04:26] You iterate through the sitemap's list, check the lastmod property and if it matches what you're looking for, add it to the array and when you're done, return that array [04:26] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:27] The only experience I have with coding is a little bit of python. [04:27] At least I'm pretty sure that's it... [04:27] Well... And HTML, but that isn't exactly programming. [04:32] *** Sk1d has joined #archiveteam-bs [05:15] *** Stiletto has joined #archiveteam-bs [05:36] *** mutoso has joined #archiveteam-bs [05:49] *** JesseW has quit IRC (Ping timeout: 370 seconds) [06:05] damn, I just realized another casualty of the war on Flash is http://techno.org/electronic-music-guide/ [06:05] :( [06:13] *** zhongfu has quit IRC (Remote host closed the connection) [06:32] oh great now I'm sucked into that vortex again [06:33] :D [06:34] What, simply because it's Flash-based? [06:34] I mean it being a casualty [06:40] *** DoomTay has quit IRC (Quit: Page closed) [06:55] *** zhongfu has joined #archiveteam-bs [07:20] *** zhongfu has quit IRC (Remote host closed the connection) [07:24] *** zhongfu has joined #archiveteam-bs [07:24] *** zhongfu has quit IRC (Remote host closed the connection) [07:25] How common is censored internet in the US? [07:25] *** zhongfu has joined #archiveteam-bs [07:25] Because I think mine might be censored... I'm not sure how or why though. [07:34] I can't access the KKK website [07:36] I don't support them at all, I was just telling someone that they still have a website and wanted to show it to them to prove it. I had to go through Archive.org. :P [07:40] *** brayden has quit IRC (Quit: Leaving) [07:42] *** tomwsmf has quit IRC (Read error: Operation timed out) [07:46] *** brayden has joined #archiveteam-bs [07:46] *** swebb sets mode: +o brayden [08:17] *** fusl has quit IRC (Read error: Connection reset by peer) [08:18] *** fusl has joined #archiveteam-bs [08:34] *** vitzli has quit IRC (Leaving) [08:40] *** Honno has joined #archiveteam-bs [09:03] *** brayden has quit IRC (Quit: Leaving) [09:05] uploading 130 gigabytes of processed global land elevation data yes/no? [09:05] i made it by converting some other data [09:24] *** brayden has joined #archiveteam-bs [09:24] *** swebb sets mode: +o brayden [09:44] *** brayden has quit IRC (Quit: Leaving) [09:47] *** brayden has joined #archiveteam-bs [09:47] *** swebb sets mode: +o brayden [10:05] *** brayden_ has joined #archiveteam-bs [10:05] *** swebb sets mode: +o brayden_ [10:05] *** brayden has quit IRC (Read error: Connection reset by peer) [11:30] *** jspiros has quit IRC (Read error: Operation timed out) [11:31] *** jspiros has joined #archiveteam-bs [12:00] *** godane has joined #archiveteam-bs [12:04] hey all i'm back [12:04] i turned off my computer last night cause of thunder storms [12:04] also i have past the 810k items mark [12:18] *** Igloo_ is now known as Igloo^ [12:46] quote from a static site theme: "Static sites are faster, the load time of this blog as tested on Pingdom is 540ms. " [12:46] lol [13:00] and then you've https://forum.dlang.org/ which is still something dynamic and loads at an astonishing speed. [13:03] i'm grabbing gizmodo sitemap urls for 2016-06 and 2016-07 [13:09] *** davidar has joined #archiveteam-bs [13:11] BartoCH: yeah, any idea if their tricks are listed somewhere? [13:12] heh, their http server is https://github.com/CyberShadow/ae [13:22] oh hey I know CyberShadow [13:22] weird to see his name come up here [13:23] heh, well it's definitely faster when your website is machine code instead of some lazy interpreted php :-) [13:27] i thought they used the vibe.d library as a web framework [13:29] *** BlueMaxim has quit IRC (Quit: Leaving) [13:30] *** VADemon has joined #archiveteam-bs [14:43] *** whydomain has joined #archiveteam-bs [14:49] arkiver: Yahoo! Answers discovery completed, 26.5M items. What’s the script status? [14:50] *** Sanqui has left . [15:00] I'm trying to use youtube-dl to download a Bambuster video, but it fails with 'Unsupported URL'. [15:01] The url is: https://embed.bambuser.com/broadcast/6396210 [15:01] What should I change it to? [15:05] i'm starting to upload more gizmodo.com sitemaps [15:05] Oh wait, I've found the video somewhere else, doesn't matter. [15:19] *** JesseW has joined #archiveteam-bs [16:15] *** whydomain has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) [16:15] https://archive.org/details/gizmodo.com-sitemap-2016-06-20160814 [16:15] https://archive.org/details/gizmodo.com-sitemap-2016-07-20160814 [16:50] *** RichardG has quit IRC (Read error: Connection reset by peer) [16:50] *** RichardG has joined #archiveteam-bs [17:05] this seems to be a dodgy mirror of KAT: http://fivetorrent.com/full/ (note: the download links are fakes, only the magnet links probably work) [17:12] so i found sitemap urls for sbs.com.au [17:12] i think [17:13] based on the pages there are at least 1000 urls per a page [17:13] and there is over 40 of them [17:13] or just 40 of them [17:17] so that sitemap maybe total crap [17:18] i got urls from 2007-08 by me brute forcing the news/node urls [17:18] there are no urls from 2007-08 in the sitemap grab [17:44] *** Sanqui has joined #archiveteam-bs [18:14] *** tomwsmf has joined #archiveteam-bs [19:21] *** VADemon has quit IRC (Read error: Connection reset by peer) [19:21] *** VADemon has joined #archiveteam-bs [19:51] *** JesseW has quit IRC (Read error: Operation timed out) [20:00] *** Coderjoe has quit IRC (Read error: Operation timed out) [20:03] *** schbirid2 has quit IRC (Quit: Leaving) [20:05] *** Coderjoe has joined #archiveteam-bs [20:12] *** DoomTay has joined #archiveteam-bs [20:17] *** Coderjoe has quit IRC (Read error: Operation timed out) [20:17] yipdw_: shouldn't Be hard to extract resources [20:17] menus might be work, tho [20:18] wait a sec, that one dynamically does audio, does it? [20:18] ie doesn't store everything in one big swf [20:21] *** Coderjoe has joined #archiveteam-bs [20:26] ranma: yes, the menus are a very important part of it [20:26] and yes, audio samples are loaded in as requested [20:32] *** Coderjoe has quit IRC (Read error: Connection reset by peer) [20:32] the resources aren't necessarily the problem. it's reconstructing the whole thing in something trendy like CSS animations + Javascript [20:32] although I did manage to get an FLA via jpexs so maybe Animate CC can take care of that automatically [20:44] *** Coderjoe has joined #archiveteam-bs [21:13] *** JesseW has joined #archiveteam-bs [21:56] SketchCow: you may want that guys 1000+ floppies: https://twitter.com/ilopmar/status/763046762198212608?s=09 [22:16] godane: what's on them? [22:17] have no idea [22:18] i'm just throwing it out there [22:18] "a lot of games and programs" https://twitter.com/ilopmar/status/763078006063628289 [22:18] How specific [22:19] That said, I can still kina picture Jason replying to him being like http://i.imgur.com/lhjhbB9.gif [22:21] i figure at worse you end up with floppies that can't be read from or Jason already has them all uploaded [22:25] *** DoomTay has quit IRC (Quit: Page closed) [22:45] *** Honno has quit IRC (Read error: Operation timed out) [23:01] I feel no need to play that fuckstick's social media ransom game. [23:01] Fuck him and the horse he rode in on using a specialized fuckhorse/man combo [23:02] * SketchCow is still emptying out the drive, by the way. We're down to the real "oh fuck this" stuff where people just literally dumped this side of "I'm way too lazy to do this" stuff./ [23:03] Hence I've "only" gone down to 75gb from yesterday's 132gb, after getting to 132gb from 1.8tb the previous day [23:08] Also, I had some guy who gave me a pile of stuff [23:08] I put it up best I could, except some [23:09] he goes "YOU'RE DOING IT WRONG!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" [23:09] I stop [23:09] Guess what [23:09] Started again [23:09] I'm sure he and I can have a discussion 1,000 years from now [23:12] Also, some of these are going in with some pretty bad auto-metadata [23:12] I'll have to write code to deal with already-up things to make them "better" [23:12] and then give them to people, I think. [23:16] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [23:17] congrads for getting 57gb more up. [23:18] *** BartoCH has joined #archiveteam-bs [23:22] I made the decision that items up on the Archive have a slightly greater chance of later repair than sitting in the hard drive of one overtaxed individual. [23:23] That's a touch-and-go declaration [23:26] :-P [23:31] my stuff is uploaded to FOS cause i'm always lazy with the big files [23:32] also not want IA to tell me it didn't upload after 8+ hours [23:33] Well, as I'm sure I made clear, lots of stuff has gone into the Archive just fine with the FOS [23:33] Tons. Tons and tons and tons. [23:33] And the fact that 1.9tb "fell between the cracks" tells you how much. [23:34] btw My birthday is on the 16th [23:34] i turn 31 this year [23:37] HURRRAH! [23:37] You precious thing [23:37] https://archive.org/details/saturdaymorningcartoons?sort=-publicdate [23:46] https://archive.org/details/vgmuseum?&sort=-publicdate [23:54] welcome to the 2nd year of your 30s... [23:55] *** Coderjoe has quit IRC (Read error: Operation timed out)