[01:06] *** Stiletto has quit IRC () [01:24] *** powerKitt has quit IRC (Remote host closed the connection) [01:27] *** dashcloud has joined #archiveteam-bs [01:44] *** Swizzle has joined #archiveteam-bs [02:03] about fos, i went to throw more things i found into it, and my folder was gone, was it uploaded, or just dumped? [02:04] im "wacko" incase you forgot [02:09] *** logchfoo2 starts logging #archiveteam-bs at Wed Nov 02 02:09:03 2016 [02:09] *** logchfoo2 has joined #archiveteam-bs [02:10] bsmith093: Hi [02:10] Are you the one uploading, like, cable TV shows [02:11] soooo, stop doing that? [02:11] k [02:11] Yes [02:11] I've been deleting them as fast as they come in [02:12] They just cause huge headaches for the archive providing basically available TV shows and known properties [02:12] Vs., say, all sorts of unusual documents, or VHS rips of long-lost properties, etc. [02:19] *** bsmith093 has quit IRC (Read error: Connection reset by peer) [02:21] *** bsmith093 has joined #archiveteam-bs [02:25] *** Swizzle has quit IRC (Quit: Leaving) [02:27] *** Sanqui has quit IRC (Ping timeout: 260 seconds) [02:28] *** Sanqui has joined #archiveteam-bs [02:31] *** GE has joined #archiveteam-bs [02:40] *** GE_ has joined #archiveteam-bs [02:42] *** GE has quit IRC (Ping timeout: 255 seconds) [02:42] *** GE_ is now known as GE [02:52] *** GE has quit IRC (Remote host closed the connection) [03:09] *** Stiletto has joined #archiveteam-bs [03:31] *** Stiletto has quit IRC () [04:15] *** brayden has joined #archiveteam-bs [04:15] *** swebb sets mode: +o brayden [04:16] *** brayden has quit IRC (Client Quit) [04:17] *** brayden has joined #archiveteam-bs [04:17] *** swebb sets mode: +o brayden [04:25] *** Stiletto has joined #archiveteam-bs [04:35] SketchCow: i mostly go after original recording blocks of tv shows [04:36] at least if there WOC they have something different then the dvd [04:36] also the original broadcasts could be different then the dvd ones [05:06] *** Blackout has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) [05:09] hmm http://www.csoonline.com/article/3137181/security/google-to-untrust-wosign-and-startcom-certificates.html [05:10] as an experiment, I removed wosign and startcom from my trust roots [05:10] it's interesting to see how many HTTPS TLS-related errors you get when you do that [05:10] (it's quite a few sites) [05:11] bugzilla.gnome.org is one of them, which caused some finagling on my part to fix that up [05:13] 2025: Let's Encrypt issues 90% of TLS certificates for HTTPS [05:13] 2026: "Flintlock" vulnerability punches massive holes in trust chains [05:20] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:26] *** Sk1d has joined #archiveteam-bs [05:56] *** is- has quit IRC (Ping timeout: 633 seconds) [06:02] *** is- has joined #archiveteam-bs [06:28] *** Coderjoe has joined #archiveteam-bs [06:57] *** eloos has joined #archiveteam-bs [08:06] *** brayden_ has joined #archiveteam-bs [08:06] *** swebb sets mode: +o brayden_ [08:09] *** yipdw has quit IRC (Remote host closed the connection) [08:10] *** yipdw has joined #archiveteam-bs [08:11] *** brayden has quit IRC (Read error: Operation timed out) [08:27] *** GE has joined #archiveteam-bs [08:32] *** brayden_ has quit IRC (Read error: Operation timed out) [09:21] *** ravetcofx has quit IRC (Ping timeout: 506 seconds) [09:32] yipdw: don't say that, the internet will burn [09:32] What should I look for when I would like fulltext search of archived pages? [09:47] *** fie has joined #archiveteam-bs [09:59] *** Yoshimura has quit IRC (Remote host closed the connection) [10:04] *** Yoshimura has joined #archiveteam-bs [10:22] tiddlyspace: So the resets seem to happen when reditecting to https. [10:23] From archive.org perspective, will there be a full fetch or just the tiddler data? as else I cannot see how would that be accessible. [10:36] *** BlueMaxim has quit IRC (Quit: Leaving) [10:37] *** GE has quit IRC (Remote host closed the connection) [10:49] Can one match only the top page for a domain with CDX search? [10:56] Nevermind, I forgot about filter. [11:00] https://web.archive.org/cdx/search/cdx?url=tiddlyspace.com&collapse=urlkey&fl=original&matchType=domain&filter=original:.*\.tiddlyspace.com(|:[^/]*)/&limit=2000 [11:24] 1684 unique tiddlyspace subdomains/handles http://chunk.io/f/38108e5a63614fd588a67b583c78e4aa [11:24] Parsed from archive.org + some other lesser sources. [11:26] Actually, I failed, this one is unique (still 1684): http://chunk.io/f/6be43facd1a34395acce906b1d1d6d22 [11:27] And there are some other data apparently mixes, but yeah, here it is, I need brain food. [11:38] *** brayden has joined #archiveteam-bs [11:38] *** swebb sets mode: +o brayden [11:52] so we are up to 2014 with abc.net.au/news/2014 [12:17] *** GE has joined #archiveteam-bs [12:39] *** VADemon has joined #archiveteam-bs [13:16] hey guys does anyone know the channel for Vine? [13:16] (if it's already inpòace ofc) [13:16] *inplace [13:17] #vinewhine [13:18] thx [14:52] *** VADemon_ has joined #archiveteam-bs [14:54] *** VADemon has quit IRC (Ping timeout: 250 seconds) [15:12] *** powerKitt has joined #archiveteam-bs [17:01] *** ravetcofx has joined #archiveteam-bs [17:11] *** VADemon_ has quit IRC (Quit: left4dead) [17:46] Huh. I've started to get spam in the email I used to sign up for myvip [17:46] it was an address used only for that [17:50] or maybe it was sent from and/or through myvip, because it does mention the website. it's selling vehicle insurance. though it's all in hungarian so it's hard to tell [18:04] *** powerKitt has quit IRC (Read error: Operation timed out) [18:29] *** powerKitt has joined #archiveteam-bs [18:36] same. seems to be coming from the company behind myvip, because rdns for myvip and the mailserver are very similar. [18:37] seems like a desparate method to generate some money out of a sinking ship. [18:45] *** JW_work has quit IRC (Quit: Leaving.) [18:50] *** JW_work has joined #archiveteam-bs [18:55] *** JW_work has quit IRC (Quit: Leaving.) [19:03] *** JW_work has joined #archiveteam-bs [19:15] arkiver: Uploaded public NXP datasheets: https://archive.org/details/nxp-datasheets-11-2016 Can you move the item to collections web and archiveteam? [19:44] PurpleSym: What was the method used and what everything is in it? [19:47] Yoshimura: There’s a site search returning XML. I scraped it and got all unique documents (by Asset_id). [19:49] Oh. You sure you got all? Therefore I need not to do it. Good for me. [19:49] Are those only datasheets or all docs? [19:50] I did not double-check yet, so I’m not sure. [19:50] All documents. [19:50] how do i open a lz archive? i've literally *never* heard of that format. [19:50] lzip. [19:51] Same compression algorithm that xz uses, but simpler container format. [19:51] *** VADemon has joined #archiveteam-bs [19:53] To be honest, lzip is not best choice for archival [19:55] What do you suggest instead? [19:56] gzip is best [19:56] bzip is pretty good [19:56] .zip is also an a+ choice [19:56] *** Aranje has joined #archiveteam-bs [19:57] print out the data in hex, and scan it back in as jpegs [19:57] That would be a Bad Idea. [19:58] but could be useful for material one wanted to *really painful* to search for [19:58] er, wanted to *make* really [20:00] Well, in terms of compression ratio none of these beats LZMA. [20:01] lzma is not very well tested and doesn't have a robust container format [20:01] i mean, petabytes have been gzipped, and we know how it works [20:01] we are archivists, not compression fanatics [20:01] it's 52 MB of JSON next to a 10 GB warc.gz [20:02] goal #1 is to not destroy things accidentally [20:02] this is like saying a Veyron Super Sport is faster than an Aventador [20:02] technically true and nobody gives a shit [20:03] also I think I miss Top Gear [20:05] aw [20:05] LZMA has been here for quite a while and lzip’s container format is dead simple, just like gzip. [20:06] just use gzip [20:06] it's less confusing to everyone [20:07] in vine news [20:07] https://gitlab.peach-bun.com/snippets/40 [20:08] I like how #2 is a backflip [20:09] well, backflip minus the tuck [20:10] xz is in linux kernel at least. [20:10] But unless the data is very compressible, there is not much point in using xz and not gz. [20:10] we discussed yesterday why xz is not very robust either [20:10] http://www.nongnu.org/lzip/xz_inadequate.html [20:13] #9 is also a backflip lol [20:13] Yeah. I can argue though that data integrity and error recovery should be handled on storage level either by user or by storage. [20:13] Yoshimura: It’s JSON, so yeah, it compresses well: gzip: 7828510, lzip: 3968256 [20:14] PurpleSym: gzip -9? [20:14] --best [20:14] Yeah. But it is a small file, so unless you use that format across the board for a large amount of data, you know. [20:16] I know there are strong arguments for gzip, but personally I would store stuff differently, saving a lot of space, a lot of disk time. I cannot talk for the Archive.org scale though and how having the data already gzp compressed helps to serve the data without decompression. [20:17] And not sure how it's handled (curious). If it needs to decompress to verify checksum then you need more compute anyway, but not as much as compression. verifying compressed would be faster. [20:18] Here, everything you need to know: http://www.nongnu.org/lzip/manual/lzip_manual.html#File-format [20:18] *** Yoshimura was kicked by xmc (you are not contributing meaningfully) [20:19] Nice one, xmc. [20:19] the hell is going on [20:19] Yoshimura being a butt [20:19] is this seriously all happening over compression formats [20:19] come on [20:22] here if you want to be technical, explain to this guy why his vine isn't "lagging" https://vine.co/v/OjaUq3gi60h [20:24] ha [20:24] Top Gear may be gone, but The Grand Tour is coming in few weeks [20:28] *** Yoshimura has joined #archiveteam-bs [20:32] pfft? [20:34] Never knew one has to contribute only in offtopic channel. [20:36] *** jrwr has joined #archiveteam-bs [20:54] SketchCow, Or BS it up in here [20:54] It was a productive chat from what I saw [20:55] it's Another Chat on Ethical Archiving (TM) [20:56] Its understandable, Does AT have a official stance on that type of data? [20:56] no [20:57] my personal stance on it is that it's harmful [20:57] I don't think humans know how to interpret that sort of data [20:57] without fucking themselves and everyone around them [20:57] I think there is a Southpark Ep about this that just came out [20:58] when you're in a chat you aren't producing text that's carefully massaged for consumption by the public at large [20:58] *regardless* of technical access controls [20:58] intent is a tricky thing [20:58] *** JW_work has quit IRC (Quit: Leaving.) [20:59] *** JW_work has joined #archiveteam-bs [20:59] i agree with yipdw, and i think this is a fundamental flaw in telegram [20:59] It is, as it just takes one conversion to see how something begin [20:59] *** JW_work has quit IRC (Client Quit) [20:59] past messages in a chat should be only accessible by the people who were present at that point at time [20:59] IMO [21:00] I agree [21:00] I go back and forth [21:00] unless there is a public and loud notification that the chat is logged publicly, like some freenode help channels have [21:00] publicpublicpulbic [21:00] What about IRC logging then? Its very common, but some of my best projects start out in a IRC channel as (Why hasn't anyone done X yet) [21:00] i whip my discourse back and forth [21:00] But I DEFINITELY think that a case where you have everyone in some sort of communication and NOBODY has ANY idea it's being recorded for posterity, that's straight up black, not grey-area [21:01] personal logging is obviously fine, but if you publish the logs without the approval of the people who are present in it, you aren't being ethical [21:01] jrwr: depends on channel, intent, and context, and my fallback policy is "don't log it and ask first" [21:01] obviously i can't control what others do [21:02] some of the stuff I've said in "public" chat could fuck me up lol [21:02] like if it's a known project channel on (say) Freenode I'd expect public logs to show up somewhere [21:02] I know this channel is publicly logged but I'm writing things in here that I wouldn't write on a Freenode project channe [21:02] stuff like that [21:03] it's possible that policy is irrational [21:03] but I think with (again) Freenode there's a shared, implicit expectation amongst many (if not all) participants that the chat is to be treated more like proceedings of a meeting [21:04] not so much others [21:04] other channels/networks that is [21:04] like the deeper parts of IRC, the private channels of the private project channels [21:08] one thing I did find interesting was the outrage over warrantless wiretapping vs. the accepted nature of "everyone logs" [21:08] I wonder if there was a similar outrage in the early days of IRC [21:08] read up on the history of dejanews [21:09] http://www.antipope.org/charlie/old/rant/dejanews.html <-- ? [21:13] Ah [21:14] the old Newsgroups discussions [21:14] As people, we don't want things taken out of context as it might look bad [21:14] like a bad 90s joke that if your employer found 10 years later might get you passed up on [21:16] that'd be silly af [21:17] I know for a fact that has happened [21:17] since it was a racist joke [21:17] of course it was, people got real sensitive in the last while :p [21:18] I know I have some crazy IRC logs of my username floating around [21:18] like '04 and back when I was like 11 [21:26] otoh, the fact that a conversation 12 years ago is still available, is pretty cool [21:31] The fact that I had FullHD Evanescence video (which I probably lost) and Vevo has 360p version is the sad truth about copyrighted stuff. [21:47] *** dashcloud has quit IRC (Ping timeout: 250 seconds) [21:49] *** VADemon has quit IRC (Read error: Operation timed out) [21:50] *** dashcloud has joined #archiveteam-bs [22:09] *** JW_work has joined #archiveteam-bs [22:10] Well, a sort of sad truth [22:11] Oh, and is there where I tell you an insider told me how basically every public IRC server is being logged, at the server level [22:11] All conversations, period [22:13] Huh, that sounds interesting. Makes me wonder if irc.mindfang.net (the IRC server for the "Pesterchum" roleplay chat client) is logged at the server level. [22:14] Cause if it is, it might be interesting to try and get a copy of logs related to the ARG I'm archiving. [22:15] I didn't say the servers knew this was happening [22:16] *** wp494_ has joined #archiveteam-bs [22:17] There are addons to the IRC core to do this [22:17] and all trimmings [22:21] SketchCow: seems to fall into the same bucket as "basically every shared hosting server in existence is pwnt in some way at the admin level" [22:21] *** wp494 has quit IRC (Read error: Operation timed out) [22:21] (you just don't see that as a customer, because it's useful to send spoofed UDP as root) [22:25] the size of IoT DDoS swarms are amazing [22:26] it endangers sites to the point they may never come back online [22:27] that thing that security people have been warning about happening for the past 2 years, happened [22:27] news at 11 [22:27] :P [22:27] (more than 2, even) [22:28] like, my response to the DDoS stuff was mostly "yeah, that's an IoT botnet, they're fucked" [22:28] it's not unexpected at all [22:28] have a pile of companies put out shit at the lowest common denominator with no repercussions for fucking up security nor incentives to do it right [22:28] attach them all to a network [22:28] what do you *think* is going to happen, really [22:29] maybe it was the plan all along [22:33] *** dashcloud has quit IRC (Remote host closed the connection) [22:34] *** dashcloud has joined #archiveteam-bs [22:35] That is why the serious have multilevel routers with BGP blackholing, FPGA filtering, and finetunable end of the line software filtering. [22:36] Good ol OVH, their DDoS Protection is not half bad [22:37] xmc Thank you for the link to the xz scrutiny. That plus related materials gave me more insight. My concern is data size for individuals, that do not possess resources of accumulated wealth by community donations/work. But which can play a big role. Meanwhile cost of data often means cost of transfer, but I do not know the insides. [23:02] *** Ravenloft has joined #archiveteam-bs [23:05] *** GE has quit IRC (Remote host closed the connection) [23:05] *** Swizzle has joined #archiveteam-bs [23:10] *** wp494_ is now known as wp494 [23:27] *** Swizzle has quit IRC (Read error: Operation timed out) [23:33] all the world's problems would be solved if only people were as smart as you, eh Yoshimura [23:37] *** BlueMaxim has joined #archiveteam-bs [23:40] why so abrasive [23:53] *** powerKitt has quit IRC (Remote host closed the connection) [23:56] *** ndiddy has joined #archiveteam-bs