[00:41] *** JesseW has joined #archiveteam-bs [00:47] *** RichardG has quit IRC (Read error: Operation timed out) [01:09] *** DoomTay has joined #archiveteam-bs [01:38] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [02:05] *** Aranje has joined #archiveteam-bs [02:10] *** Sum has joined #archiveteam-bs [02:10] well shit [02:10] http://postghost.com/Home/Shutdown/ [02:10] Already heard about it [02:10] this is BS [02:10] it was a very useful service [02:11] I didn't even know it existed until now [02:12] neither [02:12] BS indeed though [02:12] I think I may have mentioned it a couple days ago when I was following a case [02:12] the accountability section of their shutdown statement is so true [02:13] without a third-party archive site everyone relies on screenshots of deleted tweets [02:13] there is this european right to be forgotten thing [02:13] I wonder about using hashing as a way to work around this. [02:13] people with a clue will use archive.is to capture Google cache results but so many are missed [02:13] Don't display the tweets -- display *hashes* of the tweets (and hashes of the account name) [02:14] eh [02:14] same problem with it being hard to search as we have now [02:14] that way only the ones that actually otherwise come to public attention are saved -- but they can be *verified* [02:14] makes it very hard to do retrospective analysis [02:15] so [02:15] if you click the "embed tweet" button [02:15] eh, it's a way to separate verification from publishing [02:15] it gives you the full text in html [02:15] so you can read it even after it's deleted [02:15] it's been this way for years [02:15] there are loads of such archivings of tweets [02:15] y e a r s [02:16] I noted a bunch of tweets Cathy Brennan made were archived in this way even after her account is gone, on pages noting the strange sayings she has said [02:16] it's very effective [02:17] it would be ... interesting ... to see twitter try to claim that "something with hash XXX was written by account named hash YYY" is in violation of their agreement. [02:17] yeah [02:17] JesseW: you're just trying to out-nerd them [02:17] it would [02:17] this won't lead anywhere interesting [02:17] but it is odd enough for them to yowl that nobody may keep a record of what they saw [02:17] that is some RIAA-level shit right there [02:17] yep [02:18] oh? Why wouldn't it lead to interesting law? [02:18] I mean, the POTUS's twitter expressly allows archiving tweets from there [02:18] it would never get to court [02:18] any lawyer worth their scotch would drag that through procedure to avoid that coming up in open court [02:19] a smart judge would say "you're trying to be clever but i know what you are doing and it's actually stupid so fuck off" [02:19] s/smart/sharp/ [02:19] most judges lack clue [02:19] if it was big enough, though, perhaps the EFF might get interested and write a brief [02:20] tru [02:20] that is their thing [02:20] in fact have postghost contacted the EFF I wonder? [02:20] they should [02:20] eh, I'm too hungry right now to continue this argument. So ... conceded, have a nice day. [02:24] the thing is they mention another deleted tweets archive, Politwoops, hasn't been hit with a shutdown [02:24] they're picking and choosing which sites they want to claim violate their dev agreement [02:24] I wouldn't take that for granted though [02:25] politwoops was shutdown and then reinstated [02:25] (and they mention that, and explain why) [02:26] shouldn't there be a loophole to their agreement policy? [02:28] that is, what's stopping an archive site like archive.org (or someone else) archiving the deleted tweets *before* the main site officially deletes them [02:29] so the main site can say they did 'update' their public feeds to match the originals [02:30] FalconK: probably sending the abort signal is fine. another thing I never really figured out with seesaw was how to mark a job as failed [02:30] the warrior system does this occasionally [02:30] but we rarely ever need that in Warrior projects because the better solution is "requeue" [02:31] people have told me to move archivebot away from seesaw and I don't think that's a bad idea, it's just something I can never get myself started on [02:35] *** ravetcofx has quit IRC (Ping timeout: 506 seconds) [02:36] random Q: do any of you know if a kindle paperwhite (perhaps the latest one) will actually display scanned images? Like if one scans a book into a pdf and each page is an image [02:36] or for that usage should some other object be picked [02:38] a friend is trying to figure out what device to take to a sunny sandy place with potentially zero internet :) [02:38] swimsuit [02:39] already packed, but wants to take scanned ebooks as it'll be a long stay [02:39] ah [02:39] http://www.dummies.com/how-to/content/how-to-read-pdf-documents-on-your-kindle-paperwhit.html says the kindle paperwhite is capable of displaying PDFs [02:40] (I have no idea why that was my first search hit; maybe Google is telling me something) [02:40] yeah I know they're supposed to be idiotproof, but it was my understanding that the previous paperwhites could not do what I'm asking [02:41] but perhaps the one just released last month can [02:41] I don't know, nobody says [02:41] and dear lord reading random internet forums full of people with unrelated-to-the-question comments is not how I want to spend my friday [02:41] (lol) [02:44] stack overflow eh [02:47] heh [02:47] I learned this afternoon that kindle has userforums [03:16] *** Sum has quit IRC (Ping timeout: 370 seconds) [03:17] *** Sum has joined #archiveteam-bs [03:29] *** Sum has quit IRC (Ping timeout: 370 seconds) [03:30] *** Sum has joined #archiveteam-bs [03:36] i like Stack Overflow [03:36] or the series of sites [03:36] same [03:36] i think StackExchange is the parent site? [03:36] it's like Yahoo Answers but useful [03:38] and almost naziistically moderated [03:40] I like that moderators are just high ranking community members that have made a lot of contributions [03:56] I guess we'll order it and find out shortly. fortunately there's time before they leave :) [04:03] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:03] Aranje: That's definitely incorrect: nas far as I can tell *every* Kindle can display PDFs. [04:04] Though it' [04:04] s a bit less useful for some of the older ones (low res screen makes it not a nice experience) [04:04] Yeah, I know the file format is supported and all that... but non-terrible experience w/image-pages is a different thing [04:05] we're going to see if acrobat's ocr button can help too [04:05] turns out their workplace has their printers set up where you can scan stuff in and it'll autopdf it into an email for you [04:06] so... maybe run them through acrobat to see if the filesize can come down [04:06] then put them on the paperwhite and see if it's usable [04:10] *** Sk1d has joined #archiveteam-bs [04:38] *** ravetcofx has joined #archiveteam-bs [04:40] *** Sum has quit IRC (Ping timeout: 370 seconds) [04:41] *** Sum has joined #archiveteam-bs [05:00] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:04] *** Sum has quit IRC (Ping timeout: 370 seconds) [05:08] *** Sk1d has joined #archiveteam-bs [05:08] *** Sk1d has quit IRC (Connection closed) [05:10] *** Sk1d has joined #archiveteam-bs [05:17] *** DiscantX has joined #archiveteam-bs [05:31] *** VADemon has joined #archiveteam-bs [05:40] *** DiscantX has quit IRC (Ping timeout: 244 seconds) [06:09] *** vtyl has joined #archiveteam-bs [06:18] *** lytv has quit IRC (Read error: Operation timed out) [06:38] *** JesseW has quit IRC (Ping timeout: 370 seconds) [07:04] *** Sum has joined #archiveteam-bs [07:10] *** DoomTay has quit IRC (Quit: Page closed) [07:22] *** Sum has quit IRC (Ping timeout: 370 seconds) [07:23] *** Sum has joined #archiveteam-bs [07:45] *** Sum has quit IRC (Ping timeout: 370 seconds) [07:46] *** Sum has joined #archiveteam-bs [08:00] *** Sum has quit IRC (Ping timeout: 370 seconds) [08:01] *** Sum has joined #archiveteam-bs [08:14] *** BlueMaxim has joined #archiveteam-bs [08:40] *** Arcai has joined #archiveteam-bs [09:24] *** Sum has quit IRC (Read error: Operation timed out) [09:24] *** Sum has joined #archiveteam-bs [09:32] *** Sum has quit IRC (Ping timeout: 370 seconds) [09:33] *** Sum has joined #archiveteam-bs [10:04] *** Sum has quit IRC (Ping timeout: 370 seconds) [10:05] *** Sum has joined #archiveteam-bs [10:06] *** Arcai has quit IRC (Ping timeout: 268 seconds) [10:16] *** ArgyroNet has joined #archiveteam-bs [10:16] hello there [10:17] any native english-speaking people around ? [10:17] or just anyone with a good level [10:19] is "we are still deeply in need of your support" a correct phrase ? [10:20] Yeah, that's fine. [10:21] thanks :) [10:22] *** dxrt- sets mode: +o dxrt [10:26] *** Sum has quit IRC (Ping timeout: 370 seconds) [10:26] *** Sum has joined #archiveteam-bs [10:29] *** VADemon has quit IRC (Quit: left4dead) [10:31] *** RichardG has joined #archiveteam-bs [10:35] *** ArgyroNet has quit IRC (Quit: thanks :)) [10:54] *** Sum has quit IRC (Ping timeout: 370 seconds) [10:55] *** Sum has joined #archiveteam-bs [11:06] *** fie has quit IRC (Ping timeout: 370 seconds) [11:17] *** Sum has quit IRC (Ping timeout: 370 seconds) [11:18] *** Sum has joined #archiveteam-bs [11:33] *** zhongfu has quit IRC (Ping timeout: 260 seconds) [11:38] *** zhongfu has joined #archiveteam-bs [11:44] *** zhongfu has quit IRC (Remote host closed the connection) [11:58] *** zhongfu has joined #archiveteam-bs [12:03] *** zhongfu has quit IRC (Remote host closed the connection) [12:15] *** zhongfu has joined #archiveteam-bs [12:25] *** Sum has quit IRC (Ping timeout: 370 seconds) [12:26] *** Sum has joined #archiveteam-bs [12:45] *** RichardG_ has joined #archiveteam-bs [12:45] *** RichardG has quit IRC (Read error: Connection reset by peer) [12:54] *** zhongfu has quit IRC (Remote host closed the connection) [12:55] *** Sum has quit IRC (Ping timeout: 370 seconds) [12:56] *** zhongfu has joined #archiveteam-bs [12:56] *** mutoso has quit IRC (Read error: Operation timed out) [12:56] *** Sum has joined #archiveteam-bs [12:57] *** mutoso has joined #archiveteam-bs [13:09] *** vitzli has joined #archiveteam-bs [13:21] *** BlueMaxim has quit IRC (Quit: Leaving) [13:28] *** Sum has quit IRC (Ping timeout: 370 seconds) [13:29] *** Sum has joined #archiveteam-bs [13:37] *** Sum has quit IRC (Ping timeout: 370 seconds) [13:37] *** Sum has joined #archiveteam-bs [13:54] *** RichardG_ has quit IRC (Read error: Operation timed out) [13:55] *** RichardG has joined #archiveteam-bs [14:00] *** Sum has quit IRC (Ping timeout: 370 seconds) [14:01] *** Sum has joined #archiveteam-bs [14:21] *** Sum has quit IRC (Ping timeout: 370 seconds) [14:22] *** Sum has joined #archiveteam-bs [14:38] *** Sum has quit IRC (Ping timeout: 370 seconds) [14:44] *** Sum has joined #archiveteam-bs [14:56] *** metalcamp has joined #archiveteam-bs [15:06] *** Sum has quit IRC (Ping timeout: 370 seconds) [15:07] *** DoomTay has joined #archiveteam-bs [15:07] *** Sum has joined #archiveteam-bs [15:24] !ao https://youtu.be/gvuqLylQOoE --youtube-dl [15:25] ranma: we got a portion of the files section before AOL broke all the links to the files- you can still see descriptions if you go to the libraries, but there's no way to reach the files anymore [15:27] This search covers the list pretty well: https://archive.org/search.php?query=aol+files [15:36] *** Sum has quit IRC (Ping timeout: 370 seconds) [15:37] *** Sum has joined #archiveteam-bs [15:46] So, how goes development of that examiner script? [15:50] *** Aranje has quit IRC (Quit: Three sheets to the wind) [15:50] *** JesseW has joined #archiveteam-bs [16:09] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:14] *** SN4T14 has quit IRC (Ping timeout: 370 seconds) [16:38] *** Sum has quit IRC (Read error: Operation timed out) [16:38] *** Sum has joined #archiveteam-bs [16:52] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [16:53] *** RichardG has joined #archiveteam-bs [16:58] *** mutoso has quit IRC (Read error: Operation timed out) [17:04] *** DiscantX has joined #archiveteam-bs [17:06] *** mutoso has joined #archiveteam-bs [17:06] *** Sum has quit IRC (Ping timeout: 370 seconds) [17:07] *** Sum has joined #archiveteam-bs [17:13] Oh wow [17:13] https://github.com/chrislgarry/Apollo-11 [17:15] *** Sum has quit IRC (Ping timeout: 370 seconds) [17:16] *** Sum has joined #archiveteam-bs [17:55] *** Sue_ has quit IRC (Read error: Operation timed out) [18:06] *** Sum has quit IRC (Ping timeout: 370 seconds) [18:08] *** Sue_ has joined #archiveteam-bs [18:09] *** tomwsmf-a has joined #archiveteam-bs [18:18] doomtay: agc in js: http://svtsim.com/moonjs/agc.html [18:19] also very neat :) [18:27] Archive Team is currently number 1 on the Hacker News front page: https://news.ycombinator.com/item?id=12062116 [18:27] About Coursera [18:36] Someone should tell them that courses are browseable through the Wayback Machine, aren’t they? [18:36] *** JesseW has joined #archiveteam-bs [18:37] *** RedType has quit IRC (Ping timeout: 260 seconds) [18:37] PurpleSym: people having issues with that, judging from the comments [18:38] That’s Because the link refers to the IA item, not the Wayback Machine, as far as I understand. [18:41] Oh, there’s a robots.txt: https://web.archive.org/web/*/https://d396qusza40orc.cloudfront.net/virology/lecture_slides/W010_S001_virology.pdf [18:47] *** DiscantX has quit IRC (Ping timeout: 244 seconds) [18:48] *** RedType has joined #archiveteam-bs [18:49] And playback apparently does not work: https://web.archive.org/web/20160627062435/https://class.coursera.org/bigdata-004 ? [19:18] *** DiscantX has joined #archiveteam-bs [19:18] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [19:26] *** vitzli has quit IRC (Leaving) [19:30] i'm uploading more korea news: https://archive.org/details/koreanet-1_changwon_newsplaza-20030101 [19:58] PurpleSym: all info is saved though, have a look at the source code. [19:59] It is possible though to make this running in the wayback machine, it works in webarchiveplayer [20:00] I can’t tell why it is not working in the Wayback Machine, arkiver. [20:01] it might because URLs like https://www.coursera.org/eventing/info?key=page.error&value=%7B%22status%22%3A404%2C%22url%22%3A%22https%3A%2F%2Fwayback-beta.archive.org%2Fweb%2F20160627062435%2Fhttps%3A%2F%2Fclass.coursera.org%2Fbigdata-004%22%7D&user=19902603&session=7842137219-1467217133015&client=spark&url=https%3A%2F%2Fwayback-beta.archive.org%2Fweb%2F20160627062435%2Fhttps%3A%2F%2Fclass.coursera.org%2Fbigdata-004&time=1468094448063&screen=%7B%22he [20:01] 3A1050%2C%22width%22%3A1680%7D [20:01] are still requested through www.coursera.org [20:01] instead of the wayback machine [20:01] along with other files that are not requested through the wayback machine [20:02] webarchiveplayer seems to be requesting everything through webarchiveplayer [20:02] I’ve seen those requests and thought they were sent because something failed before that. [20:03] might be possible [20:04] the wayback machine shouldn't request the .js from the original location though [20:05] Of course. There’s also a few requests to cloudfront.net. [20:05] And another one to https://web.archive.org/web/20160627062435/https://class.coursera.org/bigdata-004/data/api/reports/end_of_course_stories.json which does not seem to be in the WARCs we got. [20:05] Right [20:06] Strange we didn't get that, before starting the project I made sure we got everything [20:06] *** Sum has joined #archiveteam-bs [20:06] not having that URL saved shouldn't be much of a problem though, since the projects I tested did work with the webarchiveplayer [20:09] *** anjacks0n has joined #archiveteam-bs [20:21] *** Sum has quit IRC (Ping timeout: 370 seconds) [20:22] *** Sum has joined #archiveteam-bs [20:27] *** Start has quit IRC (Quit: Disconnected.) [20:30] *** Start has joined #archiveteam-bs [20:42] *** Sum has quit IRC (Ping timeout: 370 seconds) [20:48] *** Sum has joined #archiveteam-bs [20:57] *** DiscantX has quit IRC (Ping timeout: 244 seconds) [20:57] *** ArgyroNet has joined #archiveteam-bs [20:57] *** metal_cam has joined #archiveteam-bs [21:00] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [21:05] *** anjacks0n has quit IRC (anjacks0n) [21:07] *** metal_cam has quit IRC (Ping timeout: 244 seconds) [21:24] *** Sum has quit IRC (Ping timeout: 370 seconds) [21:25] *** Sum has joined #archiveteam-bs [21:37] *** anjacks0n has joined #archiveteam-bs [21:47] *** godane has quit IRC (Leaving.) [22:00] *** Sum has quit IRC (Ping timeout: 370 seconds) [22:01] *** Sum has joined #archiveteam-bs [22:16] *** ndiddy has joined #archiveteam-bs [22:18] *** DoomTay has quit IRC (Quit: Page closed) [22:22] *** Jeroen52 has joined #archiveteam-bs [22:27] here's something interestic for parsung web pages: https://github.com/mozilla/fathom [22:28] *inteeresting [22:28] **interesting [22:28] parsing as well [22:29] OMG lol [22:29] * HCross hands luckcolor a dictionary [22:29] that's amazic :) [22:29] * luckcolor definetely has to change this keyboard [22:29] Sigh [22:31] anyway, it doesn't matter, luckcolor [22:31] thanks for sharung hue hue hue hue [22:31] np [22:32] funny thing that will always come to my mind during these times: those irc logs will definetely land on archive.org someday so that my spelling mistakes can be kept indefinetely [22:32] oh boy [22:32] :P [22:35] hey, that's not the proper way to think [22:35] the proper way is "I'll have a proof that I bettered my writing !" [22:35] :p [22:39] anyway, ++ [22:39] *** ArgyroNet has quit IRC (Quit: Once you know what cake you want to be true, instinct is a very useful device for enabling you to know that it is) [23:21] *** Sum has quit IRC (Ping timeout: 370 seconds) [23:22] *** Sum has joined #archiveteam-bs [23:22] *** godane has joined #archiveteam-bs [23:24] *** anjacks0n has quit IRC (anjacks0n) [23:29] *** Sum has quit IRC (Ping timeout: 370 seconds) [23:30] *** Sum has joined #archiveteam-bs [23:33] *** anjacks0n has joined #archiveteam-bs [23:47] *** anjacks0n has quit IRC (anjacks0n)