[00:00] *** fenn has joined #archiveteam-bs [00:04] *** RichardG_ is now known as RichardG [00:10] *** RichardG_ has joined #archiveteam-bs [00:11] *** RichardG has quit IRC (Ping timeout: 260 seconds) [00:27] *** RedType_ has joined #archiveteam-bs [00:27] *** RedType has quit IRC (Read error: Operation timed out) [00:36] *** dxrt has joined #archiveteam-bs [01:31] *** dashcloud has joined #archiveteam-bs [01:35] *** ndiddy has quit IRC () [01:45] *** RichardG_ is now known as RichardG [02:37] *** Honno has quit IRC (Ping timeout: 370 seconds) [02:40] *** BlueMaxim has joined #archiveteam-bs [02:50] i remember finding a torrent with only 50% of the files archived since someone did a partial download, and the rest lost. what i did is i manually found elsewhere on the internet a few of the "lost" files, shoved them in the torrent output directory and did a force rescan, then uploaded those files to the other peers [02:50] this was some vgm music archive or something [02:50] so the end result was instead of all peers having 50% of the archive all peers had 68% or something [02:50] this made me thing of something [02:51] for all the individual files in internet archive, can we calculate the bittorrent hashes for 32k/64k/etc/1mb blocks chunks of said files? [02:52] i'll bet this could be used to rebuild loads of lost data where all we have is torrent files but no surviving seeds or peers [03:47] *** johnny4 has joined #archiveteam-bs [03:54] *** pizzaiolo has left [04:11] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:17] *** Sk1d has joined #archiveteam-bs [05:43] so i may have found out congress.gov has searching for old bills [05:43] going back to mid 80s or 100th congress [05:52] *** kyounko has quit IRC (Read error: Connection reset by peer) [06:20] *** GE has joined #archiveteam-bs [06:57] *** Honno has joined #archiveteam-bs [08:00] *** xmc sets mode: +o swebb [08:00] *** swebb sets mode: +o Jonimus [08:00] *** swebb sets mode: +o antomatic [08:00] *** swebb sets mode: +o brayden [08:20] *** dashcloud has quit IRC (Read error: Operation timed out) [08:34] *** mutoso has quit IRC (Read error: Operation timed out) [08:59] *** GE has quit IRC (Remote host closed the connection) [09:41] *** Honno has quit IRC (Quit: Leaving) [11:19] *** odemg has quit IRC (Remote host closed the connection) [11:27] *** kristian_ has joined #archiveteam-bs [11:37] *** dcmorton has quit IRC (Quit: ZNC - http://znc.in) [11:42] *** dcmorton has joined #archiveteam-bs [12:07] For the Dutch people in here, you might be interested in participating in the comments section of this Tweakers.net article about IA and robots.txt: https://tweakers.net/nieuws/123895/internet-archive-wil-robots-punt-txt-negeren-om-accurater-beeld-te-krijgen.html [12:09] a lot of people seem to be completely ignorant of the fact that robots.txt is not meant as a access control mechanism, or even absolutely outraged about public content being saved [12:10] the usual debate [12:10] forgot the quotes: "debate" [12:18] *** pizzaiolo has joined #archiveteam-bs [12:19] *** BlueMaxim has quit IRC (Read error: Operation timed out) [12:20] *** BlueMaxim has joined #archiveteam-bs [12:39] Muad-Dib: I think you can add 3 sets of scare quotes to that :) [12:39] anyway, seems the discussion is already being had and there's not much I have to add [12:39] well, that's not true, I do have one comment [12:39] * joepie91 makes [12:40] 3 sets of """meme-quotes""" [12:41] :^) [12:45] there, replied [12:46] there was a legitimate "why abandon a thing that's been around since 1994?" question in there :P [12:47] *** beardicus has quit IRC (Read error: Operation timed out) [12:58] *** beardicus has joined #archiveteam-bs [13:17] *** pizzaiolo has quit IRC (Remote host closed the connection) [13:19] did you link the "suicide note" article as well? I'm feeling seriously tempted. [13:20] Muad-Dib: I did not [13:27] *** kristian_ has quit IRC (Quit: Leaving) [13:57] joepie91: jesus fuck, this subthread: https://tweakers.net/nieuws/123895/internet-archive-wil-robots-punt-txt-negeren-om-accurater-beeld-te-krijgen.html?showReaction=9907277#r_9907277 [14:10] *** pizzaiolo has joined #archiveteam-bs [14:17] for the non dutch-speaking people here: someone linked the "robots.txt is a suicide note" article in the comment thread and immediately gets greeted with comments that the article is a "rude" and how archiveteam are "unbelievably arrogant people" [14:18] the article is "rude"* [14:22] *** btfo has quit IRC (Ping timeout: 600 seconds) [14:24] rough translation of the remainder of the comment: "So we do not have to go to WikiLeaks anymore, because everything is on their network instead. So what Web sites leak information. They'll just grab it anyway." [14:24] what the fuck does this guy even hinting at? [14:25] people are actually comparing the IA to the NSA as well, these people make me sad [14:25] wikileaks only function according to that comment is ignoring robots.txt [14:25] xmc: precisely [14:25] which, well, that doesn't sound far wrong to me [14:26] *** mls has quit IRC (Ping timeout: 250 seconds) [14:28] *** mls has joined #archiveteam-bs [14:36] Man, if only the NSA hadn't relied on robots.txt, then wikileaks couldn't of gotten those Snowden leaks. [14:37] JAA: 161Gb so far. [14:38] *** btfo has joined #archiveteam-bs [15:03] *** GE has joined #archiveteam-bs [15:07] Muad-Dib: lol [15:08] so yeah, the warrior will grab it, but if the robots.txt exist it still wont show up in the wayback machine. Just because we grab all the data doesnt mean it's publicly readable at that moment. [15:08] tweakers, the home of dutch it morons. [15:09] it might change, but we dont know, nor should we even care about it. [15:17] *** BlueMaxim has quit IRC (Quit: Leaving) [16:19] *** pizzaiol1 has joined #archiveteam-bs [16:19] *** pizzaiolo has quit IRC (Ping timeout: 260 seconds) [16:25] *** beardicus has quit IRC (Read error: Operation timed out) [16:27] *** SketchCow sets mode: +b *!*edsu@*.members.linode.com [16:27] *** edsu was kicked by SketchCow (edsu) [16:33] *** beardicus has joined #archiveteam-bs [16:52] Facebook added this to their robots.txt file: [16:52] https://www.irccloud.com/pastebin/0bh7F4hm/ [17:04] *** schbirid has joined #archiveteam-bs [17:12] *** Aranje has joined #archiveteam-bs [17:35] *** Stilett0 has quit IRC (Ping timeout: 246 seconds) [17:43] *** beardicus has quit IRC (Read error: Operation timed out) [17:45] *** beardicus has joined #archiveteam-bs [18:07] *** GE has quit IRC (Quit: zzz) [18:30] *** Stilett0 has joined #archiveteam-bs [18:40] *** icedice has joined #archiveteam-bs [18:42] *** beardicus has quit IRC (Read error: Operation timed out) [18:43] hook54321: only they get to keep it, it seems ;) [18:47] *** beardicus has joined #archiveteam-bs [18:57] *** Aranje has quit IRC (Ping timeout: 245 seconds) [19:17] *** JensRex has quit IRC (Remote host closed the connection) [19:19] *** JensRex has joined #archiveteam-bs [19:36] *** GE has joined #archiveteam-bs [20:11] I'm working on a full web capture of gov.uk (especially in the run up to the election) [20:11] Im also going to take a look at downloading the website of every MP [21:23] *** GE has quit IRC (Remote host closed the connection) [21:47] *** dashcloud has joined #archiveteam-bs [23:01] *** beardicus has quit IRC (Read error: Operation timed out) [23:14] *** beardicus has joined #archiveteam-bs [23:29] *** odemg has joined #archiveteam-bs [23:31] *** BlueMaxim has joined #archiveteam-bs [23:46] *** JensRex has quit IRC (Remote host closed the connection) [23:46] *** JensRex has joined #archiveteam-bs [23:48] *** odemg has quit IRC (Remote host closed the connection) [23:55] *** Aranje has joined #archiveteam-bs [23:56] *** ndiddy has joined #archiveteam-bs