#archiveteam-bs 2017-04-25,Tue

↑back Search

Time Nickname Message
00:00 🔗 fenn has joined #archiveteam-bs
00:04 🔗 RichardG_ is now known as RichardG
00:10 🔗 RichardG_ has joined #archiveteam-bs
00:11 🔗 RichardG has quit IRC (Ping timeout: 260 seconds)
00:27 🔗 RedType_ has joined #archiveteam-bs
00:27 🔗 RedType has quit IRC (Read error: Operation timed out)
00:36 🔗 dxrt has joined #archiveteam-bs
01:31 🔗 dashcloud has joined #archiveteam-bs
01:35 🔗 ndiddy has quit IRC ()
01:45 🔗 RichardG_ is now known as RichardG
02:37 🔗 Honno has quit IRC (Ping timeout: 370 seconds)
02:40 🔗 BlueMaxim has joined #archiveteam-bs
02:50 🔗 Lord_Nigh <Lord_Nightmare> i remember finding a torrent with only 50% of the files archived since someone did a partial download, and the rest lost. what i did is i manually found elsewhere on the internet a few of the "lost" files, shoved them in the torrent output directory and did a force rescan, then uploaded those files to the other peers
02:50 🔗 Lord_Nigh <Lord_Nightmare> this was some vgm music archive or something
02:50 🔗 Lord_Nigh <Lord_Nightmare> so the end result was instead of all peers having 50% of the archive all peers had 68% or something
02:50 🔗 Lord_Nigh this made me thing of something
02:51 🔗 Lord_Nigh for all the individual files in internet archive, can we calculate the bittorrent hashes for 32k/64k/etc/1mb blocks chunks of said files?
02:52 🔗 Lord_Nigh i'll bet this could be used to rebuild loads of lost data where all we have is torrent files but no surviving seeds or peers
03:47 🔗 johnny4 has joined #archiveteam-bs
03:54 🔗 pizzaiolo has left
04:11 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
04:17 🔗 Sk1d has joined #archiveteam-bs
05:43 🔗 godane so i may have found out congress.gov has searching for old bills
05:43 🔗 godane going back to mid 80s or 100th congress
05:52 🔗 kyounko has quit IRC (Read error: Connection reset by peer)
06:20 🔗 GE has joined #archiveteam-bs
06:57 🔗 Honno has joined #archiveteam-bs
08:00 🔗 xmc sets mode: +o swebb
08:00 🔗 swebb sets mode: +o Jonimus
08:00 🔗 swebb sets mode: +o antomatic
08:00 🔗 swebb sets mode: +o brayden
08:20 🔗 dashcloud has quit IRC (Read error: Operation timed out)
08:34 🔗 mutoso has quit IRC (Read error: Operation timed out)
08:59 🔗 GE has quit IRC (Remote host closed the connection)
09:41 🔗 Honno has quit IRC (Quit: Leaving)
11:19 🔗 odemg has quit IRC (Remote host closed the connection)
11:27 🔗 kristian_ has joined #archiveteam-bs
11:37 🔗 dcmorton has quit IRC (Quit: ZNC - http://znc.in)
11:42 🔗 dcmorton has joined #archiveteam-bs
12:07 🔗 Muad-Dib For the Dutch people in here, you might be interested in participating in the comments section of this Tweakers.net article about IA and robots.txt: https://tweakers.net/nieuws/123895/internet-archive-wil-robots-punt-txt-negeren-om-accurater-beeld-te-krijgen.html
12:09 🔗 Muad-Dib a lot of people seem to be completely ignorant of the fact that robots.txt is not meant as a access control mechanism, or even absolutely outraged about public content being saved
12:10 🔗 Muad-Dib the usual debate
12:10 🔗 Muad-Dib forgot the quotes: "debate"
12:18 🔗 pizzaiolo has joined #archiveteam-bs
12:19 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
12:20 🔗 BlueMaxim has joined #archiveteam-bs
12:39 🔗 joepie91 Muad-Dib: I think you can add 3 sets of scare quotes to that :)
12:39 🔗 joepie91 anyway, seems the discussion is already being had and there's not much I have to add
12:39 🔗 joepie91 well, that's not true, I do have one comment
12:39 🔗 * joepie91 makes
12:40 🔗 Muad-Dib 3 sets of """meme-quotes"""
12:41 🔗 Muad-Dib :^)
12:45 🔗 joepie91 there, replied
12:46 🔗 joepie91 there was a legitimate "why abandon a thing that's been around since 1994?" question in there :P
12:47 🔗 beardicus has quit IRC (Read error: Operation timed out)
12:58 🔗 beardicus has joined #archiveteam-bs
13:17 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
13:19 🔗 Muad-Dib did you link the "suicide note" article as well? I'm feeling seriously tempted.
13:20 🔗 joepie91 Muad-Dib: I did not
13:27 🔗 kristian_ has quit IRC (Quit: Leaving)
13:57 🔗 Muad-Dib joepie91: jesus fuck, this subthread: https://tweakers.net/nieuws/123895/internet-archive-wil-robots-punt-txt-negeren-om-accurater-beeld-te-krijgen.html?showReaction=9907277#r_9907277
14:10 🔗 pizzaiolo has joined #archiveteam-bs
14:17 🔗 Muad-Dib for the non dutch-speaking people here: someone linked the "robots.txt is a suicide note" article in the comment thread and immediately gets greeted with comments that the article is a "rude" and how archiveteam are "unbelievably arrogant people"
14:18 🔗 Muad-Dib the article is "rude"*
14:22 🔗 btfo has quit IRC (Ping timeout: 600 seconds)
14:24 🔗 Muad-Dib rough translation of the remainder of the comment: "So we do not have to go to WikiLeaks anymore, because everything is on their network instead. So what Web sites leak information. They'll just grab it anyway."
14:24 🔗 Muad-Dib what the fuck does this guy even hinting at?
14:25 🔗 Muad-Dib people are actually comparing the IA to the NSA as well, these people make me sad
14:25 🔗 xmc wikileaks only function according to that comment is ignoring robots.txt
14:25 🔗 Muad-Dib xmc: precisely
14:25 🔗 xmc which, well, that doesn't sound far wrong to me
14:26 🔗 mls has quit IRC (Ping timeout: 250 seconds)
14:28 🔗 mls has joined #archiveteam-bs
14:36 🔗 rocode Man, if only the NSA hadn't relied on robots.txt, then wikileaks couldn't of gotten those Snowden leaks.
14:37 🔗 tammy_ JAA: 161Gb so far.
14:38 🔗 btfo has joined #archiveteam-bs
15:03 🔗 GE has joined #archiveteam-bs
15:07 🔗 midas Muad-Dib: lol
15:08 🔗 midas so yeah, the warrior will grab it, but if the robots.txt exist it still wont show up in the wayback machine. Just because we grab all the data doesnt mean it's publicly readable at that moment.
15:08 🔗 midas tweakers, the home of dutch it morons.
15:09 🔗 midas it might change, but we dont know, nor should we even care about it.
15:17 🔗 BlueMaxim has quit IRC (Quit: Leaving)
16:19 🔗 pizzaiol1 has joined #archiveteam-bs
16:19 🔗 pizzaiolo has quit IRC (Ping timeout: 260 seconds)
16:25 🔗 beardicus has quit IRC (Read error: Operation timed out)
16:27 🔗 SketchCow sets mode: +b *!*edsu@*.members.linode.com
16:27 🔗 edsu was kicked by SketchCow (edsu)
16:33 🔗 beardicus has joined #archiveteam-bs
16:52 🔗 hook54321 Facebook added this to their robots.txt file:
16:52 🔗 hook54321 https://www.irccloud.com/pastebin/0bh7F4hm/
17:04 🔗 schbirid has joined #archiveteam-bs
17:12 🔗 Aranje has joined #archiveteam-bs
17:35 🔗 Stilett0 has quit IRC (Ping timeout: 246 seconds)
17:43 🔗 beardicus has quit IRC (Read error: Operation timed out)
17:45 🔗 beardicus has joined #archiveteam-bs
18:07 🔗 GE has quit IRC (Quit: zzz)
18:30 🔗 Stilett0 has joined #archiveteam-bs
18:40 🔗 icedice has joined #archiveteam-bs
18:42 🔗 beardicus has quit IRC (Read error: Operation timed out)
18:43 🔗 pizzaiol1 hook54321: only they get to keep it, it seems ;)
18:47 🔗 beardicus has joined #archiveteam-bs
18:57 🔗 Aranje has quit IRC (Ping timeout: 245 seconds)
19:17 🔗 JensRex has quit IRC (Remote host closed the connection)
19:19 🔗 JensRex has joined #archiveteam-bs
19:36 🔗 GE has joined #archiveteam-bs
20:11 🔗 HCross2 I'm working on a full web capture of gov.uk (especially in the run up to the election)
20:11 🔗 HCross2 Im also going to take a look at downloading the website of every MP
21:23 🔗 GE has quit IRC (Remote host closed the connection)
21:47 🔗 dashcloud has joined #archiveteam-bs
23:01 🔗 beardicus has quit IRC (Read error: Operation timed out)
23:14 🔗 beardicus has joined #archiveteam-bs
23:29 🔗 odemg has joined #archiveteam-bs
23:31 🔗 BlueMaxim has joined #archiveteam-bs
23:46 🔗 JensRex has quit IRC (Remote host closed the connection)
23:46 🔗 JensRex has joined #archiveteam-bs
23:48 🔗 odemg has quit IRC (Remote host closed the connection)
23:55 🔗 Aranje has joined #archiveteam-bs
23:56 🔗 ndiddy has joined #archiveteam-bs

irclogger-viewer