[00:08] *** JesseW has joined #archiveteam-bs [00:22] *** mr-b has quit IRC (Read error: Operation timed out) [00:27] *** ris has quit IRC () [00:29] *** mr-b has joined #archiveteam-bs [00:58] *** BlueMaxim has quit IRC (Quit: Leaving) [00:59] What time zone does Wayback Machine go off of? It's only 8:58 over here and Wayback Machine seems to think it's July 1st [01:01] Hmm...guess it's UTC [01:02] *** j08nY has quit IRC (Quit: Leaving) [01:04] i'm at 733k items now [01:05] i'm up 4k from just 6 days ago [01:35] *** JesseW has quit IRC (Ping timeout: 370 seconds) [01:46] *** schbirid has quit IRC (Ping timeout: 258 seconds) [01:57] You know, I have a feeling that DeviantArt made an announcement of official closure, the window they would be given would be too small to crawl everything [02:00] *** schbirid has joined #archiveteam-bs [02:15] *** VADemon has quit IRC (left4dead) [02:22] DoomTay we have been chasing that robots.txt bug for a week now, sorry. We'll fix it eventually. [02:22] Huh [02:24] wumpus: on which note, I sent a robots.txt bug to info@, was that the most effective address to send it to? :P [02:25] vitzli I believe that yes, djvu is not expected to be the primary format for an uploaded item. My guess would be that converting to pdf before uploading is a good idea, and I'm not sure what you should do with the djvu, if it really is the primary document it should be uploaded... we aren't creating djvu anymore in derives, but to be safe(r) you might want to give it a different filename from the PDF. [02:25] and yes, Wayback is UTC. [02:25] joepie91 yes. Probably all that happened is that you're added to our list of testcases once we find the &#&$&@#@# bug [02:27] wumpus: anything I can do to help locate the bug? [02:28] joepie91 no, test cases are about all you can do. It's something dumb in our code. Remember, redis is not the service, the freaking service is the service. (One of our daemons is putting bogus info in redis and we can't figure out which one.) [02:30] wumpus: ah. well, I'll ask some people to forward testcases if they run into the bug, but aside from that, if you run into anything I can help with, let me know :P [02:30] (debugging is half of my day job, heh) [02:48] I need a clue. I need to know a site which is (1) currently dead (2) has a LOT of content (recipes, fan fic, whatever) that is (3) SFW (4) interesting that (5) would probably attract a lot of traffic if it was indexed by Google. The BBC food site would count if it was dead, but it is not dead. [02:48] This is for a wayback experiment. [02:50] Ooh, that's a good one [02:50] * joepie91 think [02:50] Would an older incarnation of a currently living site count? [02:50] DoomTay: then it'd compete in the google rankings, I'd imagine [02:51] Ooh, there was an art site that used to exist, but the name escapes me [02:51] (perhaps I'm guessing at the intent of the experiment a bit too much here :P) [02:52] joepie91 is guessing correctly [02:52] I don't want an angry site owner [02:53] wumpus: the problem I'm seeing is that Google rankings are still fairly heavily influenced by linkbacks, and so even for interesting/popular content, you'd have trouble getting ranked high enough to attract a lot of traffic if you don't own the original domain... unless the content in question is also relatively rare or unique to find [02:53] or unless the visitor is searching explicitly for that particular site [02:54] recipes are relatively unique so that'd still make BBC Food a good candidate if it weren't alive [02:54] joepie91 you can leave the SEO to me. [02:54] (read my linkedin profile.) [02:54] *** BlueMaxim has joined #archiveteam-bs [02:54] wumpus: I have no idea about what your non-IRC identity even is, realy :P [02:54] really* [02:55] CGHub.com? [02:56] DoomTay: what am I looking at [02:56] oh [02:56] hold on! [02:56] That's a stretch. It USED t obe a gallery site. Now it's used for conspiracy BS [02:56] wumpus: something that offers game content, like maps? [02:56] like, community-based game content sites [02:56] relatively unique, can attract a bunch of content, are dying by the dozens [02:56] er [02:56] a bunch of traffic* [02:57] text content is best [02:57] ah [02:57] wumpus: the only other thing I can think of then, off the top of my head, is Q&A sites [02:57] of the yahoo answers variety [02:57] or if the maps were well-described by the text on the pages, maybe that would work. The game would have to be current, tho. [02:57] (though with slightly less crazy, usually) [02:57] yahoo answers, uh, yeah. [02:57] that content is pretty timeless [02:58] "I'm pregnant, what's the most paracetamol can I take without killing my baby?" [02:58] yeah... :P [02:58] that seems like one of the more well-informed questions to be honest [02:58] for yahoo answers [02:58] it was the one we used in our pitch deck. there are worse ones, but that one seemed realistic [02:59] wumpus: so I went to the Yahoo Answers frontpage to see how alive it still was [02:59] one of the top questions: "Should I forgive my boyfriend for killing my dog?" [02:59] Okay, I'm pretty sure CGHub is the site I'm looking for in the first place. Just....don't go to the site as it is presently [02:59] not much has changed :P [02:59] what's the last good date for cghub? [03:00] 1st cghub capture of the frontpage now shows a soft 404 [03:00] wumpus: how about http://www.archiveteam.org/index.php?title=Formspring ? [03:01] Hm, that may do. Let me play with it. [03:01] Well, it closed April 13, 2014 [03:03] I'll try cutting it at 5/8/13 [03:04] fuck me it's affected by the robots.txt bug [03:08] hah [03:08] typical [03:10] This seems to be a different situation. robots there is live and it blocks everything [03:13] oh wait, I misread it. [03:14] I guess they deleted all of their content, so why let anyone crawl them other than adsense [03:14] <.< [03:15] Assholes [03:17] ah well, looks like a lot of content. [03:19] *** JesseW has joined #archiveteam-bs [03:39] *** wyatt8740 has joined #archiveteam-bs [03:41] sketchcow: web.archive.org claims my site has a robots.txt when a) there isn't one and b) there never has been one. http://wyatt8740.no-ip.org [03:41] (my site's down atm and I'm trying to view the archive) [03:41] https://web.archive.org/web/*/wyatt8740.no-ip.org [03:41] cc wumpus [03:42] ah wumpus is at archive.org too? [03:43] I think they're aware of that bug [03:43] wyatt8740: wumpus is currently trying to track down a robots.txt bug in the wayback, and needs more testcases, so maybe your issue relates to that as well [03:43] :P [03:43] Is it me or is your site currently dead? [03:43] uts dead [03:43] but I still have the subdomaim [03:43] Sounds a lot like the same situation I'm faced with [03:43] *subdomain [03:43] Hope Wayback doesn't delete stuff in this kind of situation [03:44] it doesnt [03:44] but sketchcow (or someone else, I forget) won't tell me the backdoor that apparently exists :p [03:46] wyatt8740: wayback has 4 versions of your (non-existant) robots.txt file [03:47] srsly? [03:47] what's it say according to wayback [03:48] https://web.archive.org/web/20150415000000*/http://wyatt8740.no-ip.org/robots.txt [03:48] they are all 404s [03:48] So wyatt8740's point still stands [03:49] yep [03:50] amusingly, the only *visible* difference appears to be the version of Apache you were running. [03:50] Was that from a manual scan or some diffing tool? [03:52] manual scan [03:52] as in when I scanned in atari service manuals? or triggered archivebot/ [03:53] I have a warc but it's extremely inconvenient since there's no C program to unpack it and I don't have maven to compile the java tools [03:55] no, I mean I visually looked at the differences in what it gave back [03:56] ohh [03:56] (I'm on an android phone with a C toolchain and nothing else) [03:58] *** RavetcoFX has joined #archiveteam-bs [03:59] *** BlueMaxim has quit IRC (Read error: Operation timed out) [03:59] RavetcoFX: yeah I do like the archiving scene though. I've always been one to try and preserve things that were in danger of disappearing, even long before I found Archive Team [03:59] Frogging: I've been archiving/hording a bit for last 4-5 years [04:00] not to the extent you guys do though [04:00] RavetcoFX: glad to hear it -- please dump copies of what you have on IA if you can [04:00] (and *make* copies of stuff on IA you think is neat, if you have the space, also) [04:00] here's my random garbage so far :o) https://archive.org/details/@ravetcofx [04:01] nice [04:02] Frogging: if I remember correctly, you were one of the only others from Canada right? [04:02] wyatt8740 not the usual platform for manipulating warcs. [04:03] RavetcoFX: The only others? [04:03] I am from Canada though yes [04:03] wumpus: I know. Still shocked there's no C program to do it yet [04:03] Frogging: in mint-dev [04:04] ah yeah [04:04] wyatt8740: I mean, they are just plain text -- you could *write* one. [04:04] Is binary encoded in base64 or something> [04:04] ? [04:04] or hack something together with shell scripting [04:04] code it in C, it's not hard [04:04] on his phone? :p [04:05] wyatt8740: ah, fair point -- no, binary data is not encoded as anything, it's just there directly [04:05] Again, are binaries in base64? also yeah I'd be writing a C program in GNU nano on my phone [04:05] do you have a physical keyboard at least :p [04:05] so if you wanted a particular payload, you just need to chop it out of the stream [04:05] wyatt8740: WARC is a HTTP-like format [04:06] * Frogging hugs his BlackBerry PRIV [04:06] wyatt8740: headers, double newline I believe, content length [04:06] everything up to end of that content length is the record payload [04:06] usually containing a HTTP request or response with its respective headers [04:06] or DNS requests, in the case of heritrix [04:06] I have absolutely no idea how HTTP works on the protocol level [04:06] there's an rfc ;-) [04:06] wyatt8740: okay, so basically [04:06] header\n [04:06] header\n [04:06] is it base64 for non-ascii? [04:06] header\n [04:06] \n [04:06] payload here [04:06] there's no base64 involved [04:06] Frogging: what are your teams stance on archiving hard to obtain copyright material? [04:06] it's literally just the payload itself [04:06] ascii is 7 bit though [04:06] content-length is used to determine where it starts and ends [04:06] RavetcoFX: archive first, ask questions later [04:06] you might see base64 in some headers but that's because they're arbitrary. [04:07] Frogging: cool [04:07] I put base64 in headers just so people can ask if base64 is used in HTTP headers [04:07] wyatt8740: honestly, just open a WARC in something like `less` and you'll see :P [04:07] WARC is a very trivial format to recreate a parser for, even without a spec [04:07] this WARC is 1gb [04:07] or zless if it is compressed :-) [04:07] sure, less will read selectively [04:07] I dont think nano will take that well [04:07] hence, less [04:07] not nano [04:07] nano doesn't take anything well [04:08] :P [04:08] haha [04:08] you'll have no issues less'ing a 1GB WARC on a memory-constrained device though [04:08] actually I got a bug fixed so it runs well on my IBM 3161 terminal [04:08] also yeah, the gz might be a problem [04:08] can always gzip -cd | head -n 1000 [04:08] or so [04:08] I have busybox [04:08] so yes [04:08] and then pipe to less [04:08] or zcat [04:09] right, something that lets you `head` [04:09] :P [04:09] zcat file.warc.gz |less [04:09] that will likely stream the entire file into RAM [04:09] selectively reading the file only works if you specify a path to less [04:09] not via stdin [04:09] anyone know of another domain without a robots.txt file, btw? [04:09] iirc [04:09] head -n 10;head -n 20; and so on. -_- [04:09] yeah that'll work [04:09] safe to say a thousand or so though [04:09] why are there 47 different python warc libraries?! [04:09] JesseW: pdf.yt, HTTPS-only [04:09] 47 python warc libraries and zero in C [04:10] it's depressing [04:10] wumpus: because, as we were just saying, it's really easy to write one :-) [04:10] (custom error page but it correctly sends 404) [04:10] so lots and lots of people have [04:10] joepie91: ok, will check [04:10] JesseW: mind, strict TLS settings [04:10] old Java crap and WinXP will likely fail [04:10] kids these days with their python and their JIT interpreted languages [04:10] also HSTS [04:10] :P [04:11] I bet they don't even know what a segfault looks like [04:11] HSTS is annoying [04:11] wyatt8740: oh trust me, Python devs know what segfaults look like [04:11] lol [04:11] and their object oriented language constructs spoiling them and confusing me :\ [04:11] Frogging: how so? [04:11] joepie91: if I want http:// then give it to me dammit! [04:11] :p [04:12] holy shit did I walk into #fsf by accident [04:12] frogging is a dinosaur. https is the way forward. [04:12] don't see why I would [04:12] there's a very good reason HSTS is a thing [04:12] hm, well pdf.yt isn't blocked on wayback, even though it gives a 404 for robots.txt [04:12] it probably isn't as good of a reason as you think it is [04:12] yipdw: do they share my dislike of high level programming and OOP? [04:12] Frogging: are you _sure_ you want to have this discussion? :) [04:12] i'm thinking I don't actually :p [04:13] JesseW: Are we still looking for the "currently dead" requirement? [04:13] they share a tendency for polarization yeah [04:13] I do worry of what could happen to the Internet Archive though :/ [04:13] yeah it's quite a SPOF RavetcoFX [04:13] (I should warn you I'm the type that uses goto to annoy teachers in programming assignments) [04:13] RavetcoFX: we have this though fwiw http://www.archiveteam.org/index.php?title=INTERNETARCHIVE.BAK [04:13] RavetcoFX: have you heard of http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK ? [04:13] lol [04:13] hence the decentralizedweb summit that we recently had, but I digress [04:14] * JesseW ah, ninja'ed [04:14] that actually sounds similar to something I came across recently. balrog helped me archive a site last year, now archive.org is claiming robots.txt issue. No server is currently answering the domain, though we own the domain still. Domain hosted by GoDaddy. :-/ [04:14] yes, the bug is related to sites which time out. [04:14] stiletto: join the club [04:14] joepie91: I use plaintext IRC because fuck the SSL police [04:14] hm, interesting [04:14] Frogging: fwiw, I'm not somebody who blindly shouts "everything must be TLS 100% of the time right now without exceptions": http://cryto.net/~joepie91/blog/2015/05/01/on-mozillas-forced-ssl/ [04:14] * Stiletto joins club [04:14] my server's dead but the domain's still in my control [04:14] Frogging: but essentially all of the anti-TLS arguments that don't pertain to the CA model are outright wrong [04:15] wumpus: have you looked at foxbox.tv? Another potential use case, I think [04:15] forcing SSL will kill people like me that can't pay people to "trust" then [04:15] them* [04:15] so I'm very, -very- skeptical of people demanding plaintext connections [04:15] wyatt8740: and that would be the CA model [04:15] and that's precisely the issue [04:15] and why it shouldn;t be enforced on a browser level (yet) [04:15] but that doesn't mean that it shouldn't be enforced on a *server* level [04:15] I self sign [04:16] :\ [04:16] joepie91: because the wikipedia page I'm loading can be loaded by anyone and there is no reason to protect its content in transit [04:16] which is a separate discussion entirely [04:16] woah [04:16] wyatt8740: just use let's encrypt then? [04:16] just because you want bad data, frogging, doesn't mean anyone else should [04:16] Frogging: yeah, no, that's not even remotely correct and shows a complete misunderstanding of what TLS is *for* [04:16] Indo not know what "let's encrypt" is [04:16] wyatt8740: Why don't you use Let's Encrypt these days? It's definitely an improvement over self signing. [04:16] wyatt8740: hold on, I will explain in a bit [04:16] https://letsencrypt.org/ [04:16] this is not going to mix well with a discussion about what TLS is for [04:16] wumpus: I didn't say anyone else should [04:16] :p [04:17] * pikhq holds off on everything else [04:17] Frogging: the problem is that you didn't argue why you want HTTP. you argued why you don't think you need HTTPS. [04:17] that's an entirely different discussion [04:17] frogging: you will learn I am not a fan of 1-offs ;-) [04:17] wumpus: so the bug isn't triggered by the statuscode, but rather by a timeout when making the connection? [04:17] wyatt8740:) "free certificate that automatically renews itself" [04:17] joepie91: what's it for then, if not to um... encrypt things [04:17] jessew: yes. [04:17] Frogging: encrypting things is just the means, not the goal. [04:17] part of the means, in fact [04:17] authentication is another part of that [04:17] oh dear, what's the goal [04:17] does lets encrypt have backing from major browsers yet? [04:17] wyatt8740: yes [04:17] I use seamonkey so Mozilla'd probably have to adopt it [04:17] wumpus: and that fact isn't stored in the (public) parts of the cdx, right? [04:18] wyatt8740:) mozilla is founding member [04:18] jessew: this is a bug in a redis cache. [04:18] Frogging: the goal of TLS is two-fold: 1) preventing passive attackers 'on the wire' from seeing your traffic and deducing information from that about you, your habits, interests, position, relationships, and so on. 2) preventing active attackers from pretending to be the server and the client (MITM attack) and thus circumventing 1 [04:18] BTW: any work towards a facebook-dl equiv of youtube-dl that archivebot can use, or something? Getting ready to kill off a page and knowing that archive.org preserved the entire thing would be cool :) [04:18] Stiletto: no [04:18] Frogging: it's primarily about privacy, not secrecy [04:18] wyatt8740: All of 'em. Cross-signed by an already adopted CA, so it works in all browsers including fairly old versions of IE (IE 6 SP3, IIRC). [04:18] that said I bet seamonkey gets killed off when mozilla puts a machine gun to its foot and removes XUL [04:18] it's just that privacy helps with secrecy [04:19] Frogging: so the public availability of wikipedia is irrelevant, because that's a secrecy argument, not a privacy argument [04:19] the data you're protecting isn't the wikipedia content [04:19] @yipdw :( well i understand how it could be challenging [04:19] it's the fact that you're *looking* at it [04:19] I see [04:19] wyatt8740:) that's happening right now; they're moving away from XUL already. e10s requires it [04:19] Frogging: Additionally, it prevents any outside party from falsely *claiming* to be Wikipedia. [04:19] Frogging: this is also why the rationale of a lot of website operators - "I don't think my site is sensitive so it doesn't need TLS" is wrong - this is simply not a consideration that you can make for your users [04:19] wumpus: wow -- I wouldn't have expected that would have an effect like this. I'll be very curious to read whatever public or semi-public writeup gets made when the bug is figured out. [04:19] frogging: not only do I not want my ISP spying on me, I want them to not spy on my friends, too. [04:19] aranje: not for seamonley [04:19] seamonkey* [04:20] wyatt8740:) yes, I mean firefox, but I'm not aware that mozilla even works on seamonkey much anymore [04:20] also say goodbye to total customizability :\ [04:20] jessew: it's a garden-variety bug, we stupidly have multiple codebases talking to the same cache and one of them has a bug, we aren't sure which. [04:20] jessew: and the fix is DON'T DO THAT YOU FOOLS [04:20] It's mostly done by a small group of insane people like me that think chrome is the worst thing to ever happen to GUI design [04:20] Frogging: in the end, it comes down to what the *user* wants to keep private, and to that end, there must always be a possibility for encryption. further, to make this work on an internet-wide basis, it needs to be the *default*, because otherwise people will either fuck up or not care enough, often because they haven't really considered the consequences of not keeping this information private [04:20] when it's financially feasible one day, I'd love to help fund some kind of long term archival project, like burying a few hundred PB on the moon [04:21] wyatt8740:) heh [04:21] Frogging: unfortunately the CA model gets into the way of that "must always be a possibility", but it's slowly improving [04:21] wumpus: lol [04:21] Frogging: I will note, of course, that it actually prevents anyone who's not a CA or in cahoots with a CA from doing that, not prevents it entirely. [04:21] in the way* [04:21] Buuut that's at least a limitation. [04:21] or even shooting an archive out of the solar system [04:21] jessew: redis is not a service. the service is the service. [04:21] ravetcofx already done, voyager 1 and 2 [04:21] What good is an archive if you can't get to it? [04:22] Frogging: now a lot of people think that TLS will be a burden resource-wise, but that hasn't really been true for a while now [04:22] not to a point that it matters [04:22] wumpus: only a small sample of humanity on the vinyl [04:22] DoomTay: other people can get to it later, even if you can't [04:22] integration is now also readily available for almost all software, so that's usually not an argument anymore either [04:22] the third secret goal of TLS is awesome error messages [04:22] Google increased load by about 1% in their servers by turning it on, IIRC. [04:22] note that the usual librarian definion of archive is not about access. [04:22] Frogging: so in summary: TLS is essentially free for the user, we want it to be enabled by default, there are clear privacy benefits, and not really any real-world drawbacks anymore. [04:22] Frogging: and from that perspective, "give me HTTP if I want HTTP" makes no sense [04:23] aranje thus my browser looks like netscape communicator in 2016 (bookmarks bar collapsed) http://i.imgur.com/zsYmHsx.png [04:23] and it makes much more sense to ensure that every client will always use HTTPS regardless of any downgrade attacks that might occur on their network [04:23] maybe I don't care about my privacy [04:23] maybe I'm crazy and want to destroy it [04:23] Frogging: sure, but that's irrelevant [04:23] frogging: we don't care that you don't care ;-) [04:23] DoomTay: an archive for distant ancestors to find I suppose? [04:23] if you're crazy then I shouldn't let you make decisions ;) [04:23] wyatt8740:) hehehe that's a hilarious look :> [04:23] I love it. [04:23] Frogging: and you can always voluntarily publish your browsing history [04:23] if you intend on destroying your privacy [04:23] several people have done so [04:24] (to make a point, primarily) [04:24] It works and DOESN'T CHANGE EVERYTHING with each update [04:24] and I recommend you use tls for that copy of your browsing history, so somebody doesn't insert fake pages [04:24] wyatt8740: If you don't have it already: Let's Encrypt is free secure certs, forever, for everyone, that are accepted in all relevant TLS users. And it's easy to set up as well. [04:24] Aranje: that's an interesting threat model :P [04:24] joepie91:) :D [04:24] Frogging: then make a proxy that accepts HTTPS and outputs HTTP, and put that as far away (network wise) from you as you can gt [04:24] Seamonkey follows the principle of least astonishment [04:24] wyatt8740: So there's essentially no reason to use self-signed certs anymore. [04:25] wyatt8740: okay, so to get back to Let's Encrypt, now that the "what is TLS for" thing has been covered... Let's Encrypt is a CA that gives out free domain-validated certificates, that are considered valid by browsers, in an automated manner. [04:25] pikhq: there are a few [04:25] but not any that wyatt8740 is likely to encounter [04:25] pikhq: okay except maybe I want to scare off people who use HTYPS [04:25] HTTPS [04:25] wyatt8740:) nothing in chrome changes from version to version because there's no UI to change :D [04:25] aranje: mainly referring to firefox transforming more and more into chrome each release [04:25] joepie91: Well, sure. Things like "not on the public Internet at all" or "using it in a closed system with only one key trusted" or the like. [04:25] wyatt8740: anyway, if you want automagical browser-valid TLS that takes you 0 effort to configure, there's even things like Caddy - a HTTPd that, for every hostname you configure, *automatically* obtains and periodically renews a Let's Encrypt certificate, sets it up, and so on [04:26] joepie91:) side question: do you think the kazakh government's legal requirement of a root cert on every browser for tls stripping is a CA model failure or [04:26] pikhq: also wildcard scenarios, API servers, etc [04:26] Right, yes. [04:26] wyatt8740:) oh, haha, indeed [04:26] wyatt8740: ref https://caddyserver.com/ [04:26] 'we have deprecated everything our user base stayed with us for so we could become chrome which other users like' - mozilla [04:26] But not for the common case of "I want a secure website". There, Let's Encrypt is better and easier. [04:26] Aranje: I wouldn't consider that a CA model failure, no [04:26] that's essentially just a client compromise [04:27] by legal means [04:27] wyatt8740:) they also helped standardize addons to be cross browser and are finally going to have the capacity to sandbox the browser [04:27] could've happened for any auth model [04:27] joepie91:) heh, alright [04:27] However they destroyed customizabiluty [04:27] *customizability [04:27] *** yipdw changes topic to: Archive Team: Oh Yeah, Negotiation is the Answer | Shut Up | Strike Up A Conversation With These Three Amazing Letters [04:27] since sandboxing and 100% configurability are mutually incompatible [04:27] aye [04:27] Aranje: now if kazakh CAs were coerced to sign certificates for third-party sites when they are served within kazakhstan using MITM, then -that- is what I would consider a CA model failure [04:27] but also likely a very short-lived one [04:28] that doesn't really work at scale, tends to get you noticed [04:28] :P [04:28] *** RavetcoFX has quit IRC (Quit: Page closed) [04:28] wyatt8740: why are those incompatible? [04:28] wyatt8740: anyway, anything unclear about LE, or? :P [04:28] joepie91:) yeah I don't know what the super detailed specifics of their legal requirements are, I just saw in passing that they asked for inclusion into the root store because kazakh law requires ssl stripping ability. browsers are notably reticent [04:28] jesseW: look at something like downthemall. [04:28] what's this +t bullshit? the topic would be more interesting if I could set it. [04:29] wumpus: I think that's precisely why the +t is there ;) [04:29] wumpus:) you don't have operator [04:29] they can add special javascript functions to allow that particular adon to adapt, but a download accelerator is simply one of infinite use cases [04:29] you need +h or +o to edit topic [04:29] aranje I am the wumpus who formed EFNET. I am aware of how irc works. [04:29] the code is GPL because of me. [04:30] (server) [04:30] * JesseW is actually rather surprised to see wumpus breaking his no-EFNET rule... [04:30] and I'm the guy who mis-designed kick and ban. [04:30] wumpus:) excellent, I would have no idea about that since it's 10 years before I had an internet connection [04:30] yeah yeah rules are made to be broken, more than a decade later [04:30] more than 2 decades, sheesh [04:31] Unless you can arbitrarily inject code into any point of the browser userspace you're going to meet things you can't do in the new model. [04:31] wumpus: gonna say thanks for efnet then [04:31] no-EFNET rule? [04:31] oh, founded 1990 [04:31] Hi, my name's Greg and I'm an IRC addict. [04:31] okay, 15 years before internet [04:31] WHAT [04:31] lol [04:31] oh, just your internet. [04:31] I did not have internet until 2005 [04:31] 26 years old if 1990 [04:32] I'm 19 years old:\ [04:32] :) [04:32] The Intenet didn't exist in 1990, I was on NSFNet [04:32] (late 1996) [04:32] BBSes existed though right [04:32] I'll leave the internet vs web out on thetable [04:32] usenet [04:32] wyatt8740: well, until the inclusion of things like EME, you can still build a custom browser, no? [04:33] not entirely [04:33] wyatt8740 before the internet, we had interconnected networks with the same tech that just weren't named "the internet" [04:33] *** ravetcofx has joined #archiveteam-bs [04:33] [00:33:31] [00:24:37] Frogging: then make a proxy that accepts HTTPS and outputs HTTP, and put that as far away (network wise) from you as you can gt [04:33] If they remove XUL entirely then the cross platform UI design idea will be gone [04:33] good idea :v [04:34] wow double timestamp [04:34] I injected it into your packets [04:34] wumpus: so you mean the 'world wide web'? [04:35] Frogging: it's the sensible way to give entertainment to your ISP [04:35] I'm gonna miss XUL a lot. [04:35] * joepie91 lost faith in Mozilla a good while ago [04:35] * wyatt8740 uses SeaMonkey after losing faith in mozilla proper but hating chrome and what it stands for [04:35] remember when they changed the .bro file extension because some keyboard warriors complained [04:35] I'm stoked on writing the entire UI in js/html/canvas [04:36] about "rape culture" or something [04:36] (also chrome can't use APNGs still) [04:36] cause realistically that's what it'll end up [04:36] * wumpus finds that mozilla is the best initial organization for Internet Archive to work with to integrate the Wayback into all browsers [04:36] regardless, their organizational direction has been disappointing to say the least [04:36] * Aranje nods [04:36] wayback all the things ++ [04:36] especially past 2009 [04:37] Wait, how? [04:37] they most certainly don't fit into the "community project" category anymore [04:37] * wumpus used to like puerile jokes when he was 14 [04:37] I mean, how would a browser "integrate" Wayback? [04:37] whenever you get a 404 or soft 404, hit the wayback [04:37] and I still haven't forgiven them for their tonedeaf advertising tiles bullshit (and subsequent feel-good marketing blather on the blog_ either [04:37] )* [04:37] sounds like IA server suicide :p [04:37] I wish I could go back to 2009 and get them to stop changing shit and show them how frustrated their user base is now [04:37] we'll have to buy some new servers. [04:38] * Frogging ups his donation [04:38] ^ [04:38] I'd put an advanced option to have a date scroller like the current wayback header so you can slide back in history of the page you're on [04:38] I'd suggest a button "view this site on the Wayback Machine" [04:38] i may be crazy but I still donate to good causes :p [04:38] I think there's an addon that does that kind of thing [04:38] yeah that's the easy/safest [04:38] or the killing off of FirefoxOS for being "not commercially successful" [04:38] rather than automatic redirect [04:38] I thought htey arlready killed that off [04:38] which is pretty much the opposite of what Mozilla was originally founded for [04:39] Whoa, when'd they kill firefoxOS [04:39] >commercial success [04:39] >non profit [04:39] or their blind "let's require TLS for everything" while completely ignoring the real-world concerns with that [04:39] well and it's even weirder because they announced its death but appear to continue working on it [04:39] Frogging: yeah, precisely [04:39] I thought some panasonic TVs used firefoxOS [04:39] they seem to have just started ignoring their userbase completely [04:39] oh and the plugins thing [04:39] Hmm...it seems they just stopped selling FirefoxOS phones [04:39] (all my TVs are old CRT broadcast monitors so IDK) [04:39] and effectively act as a commercial corporation now, from an organizational perspective [04:40] DoomTay: they announced ceasing development, then un-announced it, then announced it again [04:40] with some drivel about IoT applications [04:40] which has precisely nothing to do with what drove the team behind Firefox OS [04:40] and for all intents and purposes, is a separate project entirely [04:40] though tbh iot on rust stuffs would probably be awesome [04:41] the forcing extensions to be signed by mozilla with no about:config option to say "let me do what the fuck I want to" is what made me givenup entirely on mozilla [04:41] and by awesome I mean deterministically crashy [04:41] IoT anything is not awesome. It's a security nightmare [04:42] It makes developing addons you're only going to use for yourself is a lot harder wit hthat, I'll admit [04:42] honestly, the moment an organization starts using dark patterns in their design, that's pretty much a death sentence [04:42] from a 'community project' perspective [04:42] I will mostly defend mozilla's decisions re: user safety. they're following chrome's footsteps for sure, but the internet's fucking retarded if you don't know how a computer works at this point. and so they have to handle that for the user. [04:42] and Mozilla has gotten beyond that point [04:42] Aranje: that argues for defaults, not removal of a feature [04:42] I'm fine with defaulting to signed-only [04:42] who really knows how a computer works at this point [04:42] so I'm hanging onto seamonkey for dear life and bracing for impact when they remove XUL from gecko instead of building it with '--without-xul' [04:42] I'm not fine with removing the ability to opt in to unsigned thbings [04:42] things* [04:42] yipdw: no single person that's for sure [04:43] yipdw: I know how a computer works. But not all the proprietary bullshit in a _new_ computer [04:43] give me a VIC-20 and I can tell you though [04:43] yeah what exactly does an ime do [04:43] Aranje: not even Intel knows [04:43] :p [04:43] :D [04:43] I'll take 3, then [04:44] layer the backdoors on top of eachother [04:44] intel management engine(tm) is a proprietary driver that does magical things [04:44] I actually know the answer to my own question [04:44] lol [04:44] I heard some scary things about it. I doubt Intel would be stupid enough to put something on ALL their CPUs that listens to the network [04:44] isn't that precisely what they did [04:44] lol [04:44] the thing that freaks people out about it is the capacity for remote contact [04:44] Yep [04:44] like, literally [04:44] word for word [04:44] yes [04:45] Yep [04:45] it's actually the point [04:45] joepie91: well that's what people said they did.. It just seems so counter-intuitive to me [04:45] remote attestation is the feature, if you wanna google [04:45] Frogging: why? [04:45] it'd be an existence-ending disaster for them if it fucked up [04:45] So tragically my Netburst Xeon IBM server is my mewest hardware I can mostly trust [04:45] why would they take that risk [04:45] Frogging: they have an effective duopoly on x86 [04:45] I heard it uses nanomachines [04:45] and a monopoly on highly-performant x86 [04:45] *newest [04:45] is there an irc channel about archiveteam which is NOT populated with people second-guessing firefox management? [04:46] they can do pretty much whatever the hell they want and get away with it [04:46] wumpus:) heh [04:46] joepie91: I'm aware but if someone cracked it then they would die quickly [04:46] wumpus: #urlteam is quiet :-) [04:46] Frogging: nope, it won't end their existence [04:46] Frogging: it probably wouldn't [04:46] wumpus #archiveteam Proper? [04:46] Frogging: you have to understand that x86 is riddled in patents and other bullshit [04:46] there are essentially no competitors to Intel [04:46] And here I was thinking off-topic stuff wasn't tolerated [04:46] and never can be unless AMD shapes up [04:46] it would if suddenly every x86 processor they made was insecure, nobody would trust them [04:46] joepie:x'the 8086 is out of the patent period [04:46] Frogging: and what would their alternative be? [04:47] DoomTay: this is *why* it isn't tolerated :-P [04:47] AMD has a similar feature these days, though I forget the name [04:47] wyatt8740: I'm talking about all the bits and pieces of design involved in developing an efficient x86 proc [04:47] joepie91: literally anything that doesn't have a giant hole in it [04:47] Frogging: like? [04:47] AMD? [04:47] also my NEC V20 is 8088 compatible :) [04:47] (before you mention PSP, that's a different thing) [04:47] Frogging: have you checked their benchmarks lately? [04:47] joepie91:) the one coming out this fall will be nice, but it's the first competitive one in 10 years [04:47] they've been lagging behind on power:performance for the past 5 years or so [04:47] if not more [04:48] lagging behind far enough that realistically, they're not a mainstream alternative [04:48] joepie91: I think all their enterprise customers would take that over something totally broken [04:48] wumpus: I think we've moved on to second-guessing Intel's management now :-) [04:48] Frogging: I doubt it [04:48] joepie: keyword being 'efficient'? [04:48] Frogging: enterprise will just put a firewall around it [04:48] ie. how all the other 'security issues' are resolved [04:48] because it's cheaper [04:48] I tell you, this would be funnier if I could set channel topics. [04:48] *** yipdw sets mode: +o wumpus [04:48] heh [04:48] do eeeet [04:48] sweet zombie jeebus [04:48] lmao [04:48] wumpus:) fix it [04:49] *** wumpus changes topic to: Wumpus is in charge. No more SketchCow! [04:49] round 1... fight! [04:49] * bwn popcorn [04:49] :P [04:49] he's not even present, not much of a fight. [04:49] wumpus: isn't that how all of history's victors won [04:49] basically [04:49] OMG my first channel oper in 20 years [04:49] picking the right moment for an unfair fight [04:49] hitler was AFK [04:49] people at wrong spot [04:49] lol [04:50] godwin's law [04:50] wyatt8740: yass [04:50] I wonder if they ever fixed ban... [04:50] * joepie91 has a bad feeling about this [04:50] :P [04:51] I get randomly banned from prison.net upon login sometimes for no reason [04:51] so I dont think so [04:51] *** wumpus sets mode: +b *?*?*?*DO!*@* [04:51] not fixed [04:51] jesus [04:51] lol [04:51] 20 years nobody fixed that fucking bug [04:51] lol [04:51] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:51] *** wumpus sets mode: -b *?*?*?*DO!*@* [04:51] what did you type to get that? [04:51] what's the buf [04:51] what the hell [04:52] *bug [04:52] I forgot about the length limit! [04:52] mode #archiveteam-bs +b *?**?**?**DOESNOTMATCHANYONE [04:52] heh [04:52] oh, that's the fix [04:52] stupid [04:52] LOL [04:52] I was going to just reduce the regular expression [04:52] my username==length limit [04:52] now overflow the buffer [04:52] perform hax [04:52] omg that's a dangerous way to fix it [04:52] 9n0 chars [04:52] 9* [04:53] lol [04:53] welcome to 2016 [04:53] wumpus: I feel like basically everything you've said for the past 5 minutes perfectly illustrates efnet [04:53] :P [04:53] RESPECT MAH AUTHORIATAE [04:53] the wild west of IRC [04:53] I'm just glad that the doucebag squatting on my nick doesn't come on efnet anymore [04:53] are two-byte characters treated as one for nicks? [04:54] irc predates utf8 by a long time [04:54] so you mean 'no' [04:54] :p [04:54] I'm saying, if they implemented it they fucked it up [04:55] *** Frogging sets mode: +o arkiver [04:55] 🙃 [04:55] holy shit [04:55] whew. not allowed in nics [04:55] :D [04:55] I see a box that describes 0x01F643 [04:56] I see a box [04:56] Frogging: would it be appropriate to make a wiki entry somewhere in regards to the Channel? [04:56] mine doesn't have a sticker on it [04:56] yeah, that's the client chaos that I would expect. [04:56] :p [04:56] 回線が込み合っております。時間をおいて再度アクセスしてください [04:56] ravetcofx: I am not sure, I don't think that's generally done but maybe? [04:56] upside down smiley [04:56] heh [04:56] wyatt8740 わやと :Erroneous Nickname [04:56] ravetcofx: we have http://archiveteam.org/index.php?title=IRC ? [04:56] wumpus: mostly the fact that on most Linux desktop environments, unknown characters display their hex value in a box [04:57] doomtay: If I could read kanji I'd respond [04:57] JesseW: he means the YouTube channel mentioned earlier [04:57] mozilla then amd, anyone else we want to have pointless venting about? amiga marketing failures? [04:57] It basically says "The pipes are full. Please try again later" [04:57] which I'm still downloading to a drive that's too small for it and I need to move things off it [04:57] and fucking buy more hard drives too [04:57] and another computer to put them in [04:57] wumpus: I think the complaints were more about Intel then about AMD :P [04:57] I would consider his channel a very valuable educational resource [04:57] wumpus: I love my amiga 500 [04:57] how do I money [04:57] ... [04:57] than* [04:57] what the hell [04:57] I've never misspelled that before, what is this [04:57] *** wumpus changes topic to: Amiga lovers, oh, and SketchCow is history [04:58] Frogging: doesn't help our dollar is still in the shitter [04:58] wumpus http://i.imgur.com/w5ilCal.jpg [04:58] Whut. [04:58] ravetcofx: I know :/ [04:59] thats my desk [04:59] *** SketchCow sets mode: -o wumpus [04:59] ravetcofx: slightly less so than before, I guess [04:59] I keep seeing amazing deals on /r/datahorder, but never for Canadians [04:59] circa two weeks ago [04:59] EGADS [04:59] those speagers! [04:59] speakers [04:59] stabbed in the back [04:59] *** SketchCow changes topic to: Off-topic and lengthy Archive Team items [04:59] oh no, datahoarder crossover [04:59] joepie91: ikr? [04:59] ravetcofx: you're not in the datahoarder IRC channel! [04:59] I was [04:59] unacceptable [04:59] then I left because I have too many channels [04:59] sketchcow I hope you're leaving NYC June 1-4 because I'm going to be there [04:59] Pfff same thing [05:00] and we aren't allowed to both be in NYC at the same time [05:00] wumpus: that pic's missing the big speakers http://i.imgur.com/za1zWdD.jpg [05:00] oh sorry frogging [05:00] heh [05:00] when friends ask why I have so many harddrives, you always say you archive, hording sounds gross :) [05:00] *** Sk1d has joined #archiveteam-bs [05:00] ... in my case, it actually -is- archival [05:00] <.< [05:00] "I hoard. *room is spotless* it's really subtle" [05:00] I tend to prune data once it's been put somewhere permanently [05:00] :p [05:01] http://i.imgur.com/j3HqWbN.jpg my oldest hard drive. 8 bit XT IDE. [05:01] of course the "put somewhere permanently" is permanently delayed [05:01] joepie91: somewhere permanent like iCloud! [05:01] * joepie91 glare [05:01] I do not trust the 'cloud' [05:01] (joke) [05:01] and hate the term [05:01] fuck the cloud [05:01] the cloud is the thing that doesn't have enough mass to put things on them [05:02] that should tell you enough [05:02] :p [05:02] hehe [05:02] Frogging: anyways, make an entry on Fire watch or small projects perhaps? [05:02] it's like how OS/2 got marketed as 'the totally cool way to use your computer' [05:02] I hope that made you cringe [05:02] that's actual IBM marketing [05:02] let me see if I have a wiki account [05:02] ravetcofx: that seems reasonable, yeah [05:02] actually you can do it if you want [05:02] which one? [05:03] ravetcofx: Fire watch, I'd say [05:03] ok [05:04] I see "fire drill" and "death watch" [05:04] I assume you mean the latter [05:04] death watch, yes [05:04] or maybe you mean this http://static1.gamespot.com/uploads/screen_kubrick/1551/15511094/3003451-firewatch.jpg [05:04] ¯\_(ツ)_/¯ [05:04] Frogging: I do rather want to play that game, yes [05:05] such a pretty game, great story and acting [05:05] so deathwatch though? [05:05] yeah [05:07] anyway sketchcow my site's not got a robots.txt but web.archive.org claims it does again [05:07] http://wyatt8740.no-ip.org/ [05:07] So, wait, what's going to go on deathwatch again? [05:08] (site is currently offline but the subdomain's still mine) [05:08] mm [05:09] do you guys not back up what's on IA? [05:09] (AT) [05:09] usually [05:09] I still think robots.txt should be abolished or there shouldnne a way for a site owner to specifically request in a special robots.txt that archive.org ignore future revisions of the file [05:10] I'm tiring of this discussion, so let's summarize in three points. Quote at will. [05:10] i've read the policy [05:10] 1. The robots.txt policy for IA goes back to 1990s when it wasn't even clear what web archiving WAS [05:11] 2. In the interim time, IA has archived tons of things and pretty much every co-signer to that policy has dropped away [05:11] 3. Shaking up how robots.txt should be treated is a fundamental to-do item that requires Brewster to decide, and as of yet he has not [05:12] Any other information is redundant, malinformed, or wishes [05:12] sorry to interject SketchCow, but I thought the issue here was to do with a bug rather than a policy decision [05:12] You are wrong. [05:12] * wyatt8740 will hold prayers every friday so that brewster may revoke this policy [05:12] my question is separate from the current convo [05:13] Perhaps some minor thing you're dealing with has some strangeness but it all stems back to the open item of the robots.txt policy [05:13] maybe I'm thinking of a different thing that joepie91 (?) and wumpus were discussing earlier [05:13] sketchcow: previously my page was accessible via web.archive.org, and the supposed robots.txt file links to a 404 [05:13] ranma: do you mean keeping a non-IA copy around of archiveteam stuff, or do you mean backing up IA itself? [05:13] ranma: the basic answer to your question, as I understand it, is http://archiveteam.org/index.php?title=Ia.bak [05:14] ah, will check it out [05:14] thanks! [05:14] sorry make that no resolution since the server's down now [05:14] ranma: we can always use more hard drive space if you can afford it [05:15] but not being able to resolve is different from if a robots.txt actually existed [05:15] you guys need a Windows client [05:15] I'd contribute if I could afford to :( [05:15] HDD wise [05:15] you need to stop using windows :p [05:15] wyatt8740: I did years ago but sadly not everyone else did :p [05:15] you need to stop expecting everyone to switch [05:15] I do plan to contribute to that when I have the time to expand my storage capacity [05:15] I've not had to use windows in nine months. [05:16] ranma: hence my comment that they need a windows client [05:16] I'd love to if "everything just worked" [05:16] Cygwin not good enough? [05:16] Frogging: we know. Please write one. :-) [05:16] debian works fine ;) [05:16] no dx10 in Linux AFAIK [05:16] or 12 [05:17] or FFDShow [05:17] >proprietary APIs in an open OS [05:17] JesseW: I hate writing GUIs >.> [05:17] also via wine yes [05:17] sorry, I game wyatt8740 [05:17] Frogging: this should have a really minimal UI, AFAIK... [05:17] yes wine is quite performant and has pretty decent support these days [05:17] cygwin with an X server not enough? [05:17] they got an entire dx stack donated a year ago [05:17] Frogging: we may also want to archive his website https://www.rossmanngroup.com/ [05:17] and if you write all but the UI parts, that would be quite helpful in any case [05:18] and https://mailin.repair/ [05:18] including 12, Aranje? [05:18] ranma:) I don't know, but it had 10 [05:18] I wanna play quantum leap [05:18] I just play minecraft and emulate a wii [05:18] I'm not sure how to go about that [05:19] well, wine app db doesn't have gold + for half of my fave games [05:19] the bug I mentioned is a minor thing. SketchCow mentioned the overall issue, which is what it is. [05:19] which are? [05:19] ranma: fwiw, have you looked at playonlinux? [05:19] never heard of it [05:19] and how old are the games' ratings [05:19] it gets a lot of shit to work that is either a pain with WINE itself, or that's even rated below-silver on winehq [05:19] ranma: basically a wrapper for WINE and doxbos that has pre-set profiles for games [05:20] PoL is a professionally supported version of wine [05:20] and will manage a separate WINE/dosbox instance for each game with its optimal settings [05:20] out of the box [05:20] wyatt8740: it's not, you're confused with crossover I think [05:20] oh dur [05:20] yeah [05:20] ranma: it doesn't support -everything- but it gets a lot of stuff to work out of the box, using community-provided setup scripts [05:20] winetricks sounds a lot like PoL [05:20] (plus basically almost all of GOG's catalog works with it, because GOG itself contributes scripts to PoL) [05:20] it's not, but it is used by PoL [05:20] ravetcofx: I've put both of those into #archivebot [05:21] The last time I bought a windows PC game was in 2005 [05:21] hm, does PoL still use winetricks? [05:21] I thought they do that themselves now? [05:21] Star Wars Battlefront II [05:21] JesseW: thanks, I should look into getting that set up [05:21] anyhow, winetricks is only part of the story :P [05:21] works great in wine [05:21] ranma: https://www.playonlinux.com/en/supported_apps-1-0.html [05:21] I added the blurb on the Deathwatch page [05:22] indeed [05:22] ravetcofx: well, archivebot is commanded just from the #archivebot channel -- but if you would be willing to run a pipeline for it, that would be very welcome! [05:23] JesseW: I would definatly be willing to do that, can I limit how much space gets used up? [05:23] ranma: anyway, I won't claim that every Windows application will run on Linux, but in -many- cases, you can get very far if not all the way, with PoL+WINE, open-source alternatives for things, or even a VM with GPU passthrough [05:24] so I wouldn't disclaim the possibility until you've determined that it really won't work [05:24] JesseW: I'd prefer to hold off on the new pipelines for now [05:24] :p [05:24] ravetcofx: the requirements are a bit steep, really [05:24] I have a free TB [05:24] need to get more hdd's [05:24] yipdw: ah, good to know -- I won't invite people to make them, then. [05:25] AFAIK, HCross is still looking for more newsgrabber pipelines, though. [05:25] newsgrabber is one of my favorite projects [05:30] I don't know if you guys saw my other post, but has anyone managed to grab sunrise calendar? [05:30] will check it out, joepie91 [05:34] ravetcofx: I'm not sure how we'd grab that... http://calendar.sunrise.am [05:35] ranma: did you mean me, not joepie? [05:35] I know :/ [05:36] was referring to the PoL thing [05:37] reading the article now, JesseW [05:38] Funny, I think it was pretty recently thar Louis Rossmann said he would keep making videos in spite of Apple's thing [05:38] JesseW: I'll try grabbing the sources, then try to see if it's plausible to strip server auth [05:39] ravetcofx: great [05:40] and IA is being real slow at the worst time [05:40] I don't know if it'll have much cultural significance though [05:43] ravetcofx: where did it announce it was shutting down? [05:43] how long do we have? [05:43] http://blog.sunrise.am/post/144196642739/its-almost-time-to-say-goodbye [05:44] JesseW: till Aug 31st [05:45] Ah, then yeah, grabbing everything we can is a good idea just so people with personal stuff they care about can get it from our copy (hopefully). [05:45] Any wider cultural benefit is secondary. [05:46] JesseW: it doesn't seem to actually hold any data, it's a front end for other cloud based calendars [05:46] so personal data isn't of concern [05:47] Yeah, I saw that. Well, that's much better then. [05:47] http://support.sunrise.am/article/95-how-does-sunrise-work [05:47] So aside from the web site (which I'll grab with #archivebot) what else is there? [05:47] the android APKs [05:48] maybe screen shots of the calendar and how it worked for cultural significance [05:48] In other news, Wayback Machine is being ridiculous right now. If I have a certain number of tabs open, they take forever to load, but if I have only 5 at once, everything is nice and fast [05:49] JesseW, is there a preferred time frame that the storage be available? [05:49] ravetcofx: I'm not sure what the best way to grab the APKs is -- if you figure out how, do upload them [05:49] ranma: for IA.BAK? [05:50] *** schbirid has quit IRC (Quit: Leaving) [05:50] or is that elsewhere in the faq? [05:50] JesseW: http://www.apkmirror.com/apk/microsoft-corporation/sunrise/sunrise-4-2-0-release/sunrise-4-2-0-android-apk-download/ [05:50] I know permanent is preferred [05:51] JesseW: and you were grabbing support.sunrise as well? [05:54] ravetcofx: yep [05:55] ranma: for IA.BAK, as I understand it, the longer the better, but permanent is not required (the tracker will notice and adjust) [05:55] JesseW: let me know if you make a collection on the Internet Archive so I can add the APK's [05:56] ravetcofx: I can't make IA collections -- but you can still upload them just to the general software collection, I think. [05:56] cross-compiling with cmake to make debian packages is a special kind of software development hell [06:00] screw this [06:00] * yipdw wires up an ARM board [06:00] *** dashcloud has quit IRC (Read error: Operation timed out) [06:04] *** dashcloud has joined #archiveteam-bs [06:16] *** wyatt8740 has quit IRC (Read error: Connection reset by peer) [06:17] *** DoomTay has quit IRC (Quit: Page closed) [06:17] *** BlueMaxim has joined #archiveteam-bs [06:47] *** JesseW has quit IRC (Ping timeout: 370 seconds) [07:25] *** wumpus has quit IRC (Quit: Page closed) [07:37] IA.BAK needs to go to the next level. [07:37] I have faith [07:39] yipdw: you know how there are multiple circles of hell? [07:39] yeah, well. :P [07:47] joepie91: fortunately this one ended up being a missing CMAKE_C_COMPILER setting [08:08] (ignore the repos I just created on github, they're an experimental thing) [08:14] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [09:09] *** dashcloud has quit IRC (Read error: Operation timed out) [09:21] *** dashcloud has joined #archiveteam-bs [10:31] *** BlueMaxim has quit IRC (Quit: Leaving) [10:42] Do we have an easy way to download Simple Machines forums? [10:44] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [10:51] *** kristian_ has joined #archiveteam-bs [11:14] *** mls has joined #archiveteam-bs [11:14] *** r3c0d3x has joined #archiveteam-bs [11:23] *** r3c0d3x has quit IRC (Quit: Leaving) [11:26] http://rosalindgardner.com/blog/where-to-buy-content-for-your-site/ <- sigh [11:26] these kind of things just make me sad [11:27] HCross: any particular reason they need special treatment...? [11:27] I dont know, probably the ordinary forums igset would work [11:28] it should :P [11:35] *** r3c0d3x has joined #archiveteam-bs [11:35] *** dashcloud has quit IRC (Read error: Connection reset by peer) [11:36] *** dashcloud has joined #archiveteam-bs [11:42] Heya HCross, fancy seeing you here [11:42] hi. trying to think where I remember you from now [11:43] Don't, you'll melt your brain :P [11:43] Different nick, different ident [11:43] yea.. ive just been through a load of chans trying to work it out [11:44] I still have it on log in case you really want to see for yourself [11:44] nah, idc [11:44] =) [11:45] You might recall Kksmkrn, been on very briefly when telenor went bust [11:47] *** j08nY has joined #archiveteam-bs [11:50] Actually home.no went bust, I'm remembering it wrong [11:57] mls: vaguely recall you [11:57] very very vaguely [12:03] Hey, was added to three more repositories for a project? Cross? [12:04] joepie91, didnt you say you were doing something? [12:04] yeah, those are my repos [12:05] that's just github sending out dumb notices [12:05] it seems to do that for any newly created repo [12:05] (I didn't explicitly add anybody) [12:05] cc SketchCow [12:05] Got it [12:06] hey SketchCow [12:07] i'm up to 11k items in my inbox [12:08] i'm grabbing those Flight International Mgazines [12:09] they only made one pdf per a page [12:09] so i have to grab them then then make pdfs for each of the magazine dates [12:22] Nice [12:25] *** arkiver has quit IRC (Read error: Operation timed out) [12:27] *** remsen has quit IRC (Read error: Operation timed out) [12:28] *** superkuh has quit IRC (Read error: Operation timed out) [12:29] *** superkuh has joined #archiveteam-bs [12:30] *** arkiver has joined #archiveteam-bs [12:33] *** remsen has joined #archiveteam-bs [13:40] joepie91: Glad to hear, especially since I was here for such a short time [13:43] *** Yoshimura has joined #archiveteam-bs [14:18] luckcolor: a wget recursive works [14:18] *** kristian_ has quit IRC (Leaving) [14:18] So not sure why the crawl wont [14:18] mmh [14:25] *** bzc6p has joined #archiveteam-bs [14:25] *** swebb sets mode: +o bzc6p [14:25] working on the domain whole it pulls correly luckcolor [14:25] good [14:25] Weird why it won't pull the subdomain [14:25] We could just pull the whole site [14:25] it suggests a region of 500million + items though [14:31] *** dashcloud has quit IRC (Read error: Operation timed out) [14:40] *** dashcloud has joined #archiveteam-bs [14:44] Discussion moved to #greatlookup [15:10] Igloo: There’s two pages for dnshistory now. See above. [15:14] Lol, we were doing it at the same time ¬_¬ [15:14] I'll tidy up. [15:15] PurpleSym: the apge I made has more information in it. Which one do you want to keep? [15:16] I’d keep your, but move it to “DNS History”. [15:17] *yours [15:19] I can only move, I can't delete the other one [15:20] Yeah, same for me. Not sure who’s got delete privileges on the Wiki. [15:22] Blank the page, put Template:deleteme on it and one day it will be deleted [15:23] Ok, I can't move the page to the right URL because of the redirect page still. [15:25] *** bzc6p has left [15:30] *** JesseW has joined #archiveteam-bs [15:36] *** DoomTay has joined #archiveteam-bs [15:43] Uh? [15:43] Why is the page on DNS History to be deleted [15:47] There are two pages, the other one is http://www.archiveteam.org/index.php?title=Dnshistory [15:48] Just by visiting the site I think the other "spelling" is more accurate [15:56] *** JesseW has quit IRC (Ping timeout: 370 seconds) [15:59] *** arkiver2 has joined #archiveteam-bs [16:00] *** Yoshimura has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) [16:03] *** arkiver2 has quit IRC (Read error: Connection reset by peer) [16:06] Damn, it looks like Arkiver was right. A plain old page crawl seems to have failed to save the flash stuff [16:43] *** kvieta has quit IRC (Ping timeout: 260 seconds) [16:43] *** SamiPiplu has joined #archiveteam-bs [16:46] Ugh, using on my phone. Anyways, there's this small but deticated community you could help possibly, but first, I have a question. If a site has downloadable files on it, and you back it up, do you backup the downloads too? [16:47] Depends [16:47] What are they? [16:50] You see, there was a popular 90's game called Petz. It was a virtual pet game that had dogs and cats that were stored as files you could send to other people. Y [16:51] Frogging: hey, calexil a moderator on some subreddits and also from the Linuxmint rooms created a torrent for all the Louis Rossmann videos [16:51] ravetcofx: really? [16:51] Go on SamiPiplu... [16:52] Frogging: yeah, one issue though is it doesn't have any metadata https://kat.cr/louis-rossmann-backup-t12864278.html [16:52] *** dashcloud has quit IRC (Read error: Operation timed out) [16:53] The game was easily moddable, and people took advantage of that. Files for mods were on like, every Petz centered website. But most Petz sites have died out, and the files have been lost. [16:53] ravetcofx: my download died overnight, I resumed it and it's currently at 242/703 and 116G [16:53] so i'm still downloading on my end at 250/700 videos [16:54] Frogging: got to love our shit ISP's [16:54] it was my router's fault actually [16:54] not their router either [16:55] SamiPiplu: in that case if we were to archive one of the modding sites, downloads would be preserved yes [16:55] *** ris has joined #archiveteam-bs [16:55] (Assuming it's not a horrendous pain in th arse to to do so) [16:55] Petz! [16:56] SamiPiplu: generally, the approach is "archive first, ask questions later" [16:56] :P [16:56] Frogging: I'm with Telus here in Alberta, supposedly 20 down, but I never see it go higher than 2 [16:56] I think we get 35 down [16:57] which is what we pay for [16:57] *** RedType has quit IRC (Quit: leaving) [16:57] *** dashcloud has joined #archiveteam-bs [16:57] or maybe it's 30 or 25 [16:57] all I know is it's "fast enough" [16:58] no such thing as fast enough, we shouldn't settle till we have google fibre up here [16:59] So if anyone can find intact files that end in .pet, .cat, or .dog, you might have a lost file. Or maybe an already recovered one. Either way, it would be awesome if any can be found? [16:59] I don't really care for giving Google any more power [17:00] That's a bit of an open ended query SamiPiplu, Do you have any sites in particular you would like saving? [17:00] http://laf.simplesuccess.us Here's a site that 's looking for files. They only accept submissions of files they are actively seeking out, though. [17:01] And 35/30/whatever really is fast enough for me. [17:01] *** kvieta has joined #archiveteam-bs [17:01] I have remote servers for when I really need bigger pipes [17:02] There's one called faerie barn dance. Loads of files were on it, but it was shut down years ago, and only a small fraction of the files have been recovered. [17:02] SamiPiplu: if you have any ideas of possible domain names, you could look them up on the Wayback Machine and see what's there. If they were grabbed by ArchiveTeam's #archivebot directly, you can also search (by domain name) on the archivebot viewer, http://archive.fart.website/archivebot/viewer/ [17:03] unfortunately my dedicated server has been blocked by YouTube [17:03] The site itself is found and archived, but not the downloads. [17:03] you could also download all the index files for all of the #archivebot material, and do a local search for the extensions you mentioned. [17:05] also, you may be able to use IA's search interface to see if anyone has uploaded files with those extensions as items (not as part of the Wayback Machine), although I'm not sure exactly how to make such a search [17:06] [19:03] unfortunately my dedicated server has been blocked by YouTube [17:06] how.. did you manage that [17:06] I have no clue [17:06] I imagine something to do with ArchiveBot [17:06] but like [17:06] that's seriously impressive [17:06] I've downloaded hundreds of pages of search results in hours [17:06] and I've never gotten blocked [17:07] john@alopex:~$ curl -I https://www.youtube.com/ [17:07] HTTP/1.1 429 Too Many Requests [17:07] literally sometimes pulling in a few hundred gigs in one go [17:07] heh [17:07] I know, it makes no sense at all [17:07] and it's been like this for months [17:07] Frogging: what provider? [17:07] Kimsufi (OVH's budget brand) [17:08] *** ravetcofx has quit IRC (Ping timeout: 506 seconds) [17:08] maybe it's not me but the subnet [17:08] yeah [17:08] but still, it's weird [17:08] I'm thinking they're on the sketchy list [17:08] :p [17:08] I guess I could proxy through my VPS, but that's in the US and the server is in France [17:08] and TCP across oceans is garbage [17:08] yeah [17:08] say bye bye to your bandwidth [17:09] :p [17:09] *** metalcamp has joined #archiveteam-bs [17:09] bandwidth wouldn't be an issue, the VPS has more than the dedi [17:09] it's just the wildly inconsistent throughput [17:10] Frogging: yes, I did mean bandwidth [17:10] not traffic [17:10] :p [17:10] I did too but maybe I'm getting confused on some finer points :p [17:11] Dedicated server has symmetrical 100Mbit. VPS has 125Mbit out and 40Gbit in [17:11] I have an OVH server in france [17:11] 46.105.103.x subnet [17:11] dedicated server has unmetred traffic and the VPS has a 2TB incoming limit [17:11] that can curl youtube [17:12] john@spydo ~/gits/muffin $ dig +short alopex.fastquake.com [17:12] 91.121.76.34 [17:13] matt@dedicated1:~$ curl -I https://www.youtube.com/ [17:13] HTTP/1.1 429 Too Many Requests [17:13] Mines just gone for the 2nd attempt [17:13] wow [17:13] MegaWarrior:/home/archiveteam/heritrix/heritrix-3.1.1/jobs# curl -I https://www.youtube.com/ [17:13] HTTP/1.1 200 OK [17:14] Kimsufi server in Canada is OK though [17:14] Guess people haven't been norty there yet [17:15] *** SamiPiplu has quit IRC (Ping timeout: 268 seconds) [17:16] *** ravetcofx has joined #archiveteam-bs [17:19] YouTube works fine from my OVH.com server - using Opera 38 [17:22] *** closure has quit IRC (Ping timeout: 250 seconds) [17:22] you might just be throttled more quickly on OVH ranges [17:22] not outright blocked [17:24] *** mls has quit IRC (Quit: Lost terminal) [17:25] *** mls has joined #archiveteam-bs [17:31] *** RedType has joined #archiveteam-bs [17:55] joepie91: I'm at 259/703 with 128GB [17:57] *** ndiddy has joined #archiveteam-bs [18:43] *** ring has quit IRC (Read error: Connection reset by peer) [18:43] *** ring has joined #archiveteam-bs [19:28] you need youtube-dl -4 on OVH because the IPv6 ranges are blocked [19:28] and testing with curl is pointless because there's probably a UA filter [19:30] Google has a really sophisticated firewall at their edge that combines IP address & your TCP characteristics & early data sent in the HTTP request [19:31] That's fair enough, Good to know [19:31] Wonder why the IPV6 ranges are blocked? [19:31] one way you can get on the No Route list is by logging into your own youtube account for every video via youtube-dl option [19:31] Igloo: heh probably because tracking the reputation of IPv6 space is a lot more troublesome than IPv4 space [19:32] people get too many IPv6 IPs [19:32] a /64 for every vm / dedicated? Naaah ;) [19:40] *** JW_work has quit IRC (Ping timeout: 370 seconds) [19:45] *** JW_work has joined #archiveteam-bs [19:48] Does it mean anything that something's up with DNS History's SSL credentials? [19:49] At least, if it includes www.? [20:39] *** DoomTay has quit IRC (Quit: Page closed) [20:46] *** ravetcofx has quit IRC (Ping timeout: 506 seconds) [20:50] *** ravetcofx has joined #archiveteam-bs [21:01] *** JesseW has joined #archiveteam-bs [21:05] *** dashcloud has quit IRC (Read error: Operation timed out) [21:07] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [21:13] *** dashcloud has joined #archiveteam-bs [21:32] *** JesseW has quit IRC (Ping timeout: 370 seconds) [21:44] *** JesseW has joined #archiveteam-bs [21:45] *** DoomTay has joined #archiveteam-bs [21:52] *** BlueMaxim has joined #archiveteam-bs [22:25] *** tomwsmf-a has joined #archiveteam-bs [22:36] *** JesseW1 has joined #archiveteam-bs [22:37] *** JesseW2 has joined #archiveteam-bs [22:37] *** JesseW3 has joined #archiveteam-bs [22:38] *** JesseW3 has quit IRC (Client Quit) [22:38] *** JesseW1 has quit IRC (Read error: Operation timed out) [22:38] *** JesseW1 has joined #archiveteam-bs [22:39] *** JesseW has quit IRC (Read error: Operation timed out) [22:41] *** JesseW2 has quit IRC (Read error: Operation timed out) [23:00] ltiscreen setup? [23:00] [23:19:05] < arvut> as in more than 3 [23:00] https://www.reddit.com/r/privacy/comments/4q840n/terrorism_blacklist_i_have_a_copy_should_it_be/ [23:01] edocr -- lololol [23:06] people still fighting about that? [23:07] re: talking in #archiveteam [23:09] #NoFightinInTheWarRoom [23:10] The guy who posted that is somewhat of an attention whore [23:10] not because of this but because he does this all the time [23:11] just fucking ban his ass. [23:12] he can come and bitch here instead [23:12] no I meant the reddit guy :p [23:12] sorry [23:14] ojh hahaah [23:14] oh right [23:14] Does he find a lot of 'stuff' then? [23:14] Wow [23:14] I just platinum'ed peggle 2 [23:14] ... [23:14] yes he does [23:28] yah [23:28] :/ [23:29] *** dashcloud has quit IRC (Read error: Operation timed out) [23:30] *** dashcloud has joined #archiveteam-bs [23:31] *** JesseW1 is now known as JesseW