[00:03] *** arkiver2 has quit IRC (Client Quit) [00:55] *** JesseW has quit IRC (Read error: Operation timed out) [00:57] *** superkuh_ has quit IRC (Read error: Operation timed out) [00:57] *** superkuh_ has joined #archiveteam-bs [01:11] *** Stiletto has joined #archiveteam-bs [01:12] *** Stilett0 has quit IRC (Read error: Operation timed out) [01:14] *** JesseW has joined #archiveteam-bs [01:20] *** JesseW has quit IRC (Read error: Operation timed out) [01:29] *** primus105 has quit IRC (Leaving.) [02:18] *** dashcloud has quit IRC (Read error: Operation timed out) [02:24] *** dashcloud has joined #archiveteam-bs [03:19] *** JesseW has joined #archiveteam-bs [04:27] *** primus104 has joined #archiveteam-bs [04:38] *** aaaaaaaaa has quit IRC (Leaving) [05:30] *** primus104 has quit IRC (Leaving.) [06:02] I'm arguing with someone about url shorteners… [06:28] sounds like a bad argument to get into [06:36] *** JesseW has quit IRC (Read error: Operation timed out) [06:37] Yeah. [06:37] He thought it was better that he was using his own url shortener… -_- [06:48] AKA his own pile of dog shit that he'll just kill because at some point it will become too hard to properly maintain [06:49] or he realizes it was a stupid idea in the first place to even try it [06:50] Yeah. That's what I tried to tell him. [06:50] Oh well. Archiveteam exists because such people are a fact of life, I suppose. [06:52] To be fair, it's a little low on the list of offenses one can commit against archivists. [06:53] Heard of the QR codes on headstones trend? That's a grave offense [06:55] What are headstones? [06:56] Also, is there an actual list of such offenses? I think that'd be neat. [06:56] The stone bit in a cemetery that goes above the body with the details on the person that died [06:58] Oh… [06:59] Dear god, why? [07:00] There's an article on it, but I mainly mentioned it for the pun. http://www.theatlantic.com/technology/archive/2014/05/qr-codes-for-the-dead/370901/ [07:01] Dear god… doesn't this kinda defeat the purpose of a headstones? [07:02] Yeah it just seems like someone totally missed the point [07:02] Yeah. [07:02] the point of headstone is basically so you know what grave is what [07:02] wow the one in the article allready 404s [07:02] I mean, part of the reason the headstone is made out of what it's made of, is that it will last a very long time with minimal maintenance, right? [07:03] bentpins: … there are no words for the antipathy I feel right now [07:04] I found this old HN thread. Many of the URL shorteners mentioned don't exist anymore. https://news.ycombinator.com/item?id=508132 [07:05] *** PurpleSym has joined #archiveteam-bs [07:06] That being said, is there a "better" way to run a url shortening service? [07:07] make it open source with a way to download the whole thing [07:07] Despite their abuse, I can still imagine them being used urls on paper. [07:07] *useful for [07:07] so you can clone it trivially [07:07] Ctrl-S: Fair point. [07:07] I mean, a url database isn't that large. [07:08] so you can grab the codebase from their repo and a daily dump of the DB [07:08] I'm not a fan of the blockchain buzzword, but maybe that could be used effectively. [07:10] I'm suprised shortners haven't been going rouge and running drive by downloads and MITM attacks [07:11] I am too. [07:11] I'm guessing it might be because very few become really popular. [07:12] probably [07:15] bentpins: Wow. I know it's just for a cat, but I'd at least expect the qr code to be engraved. http://cdn.theatlantic.com/assets/media/img/posts/2014/05/pet_memorial_qr_code/25b9ea967.jpg [07:16] anomie: easier to update, i guess? [07:17] >updating a headstone [07:17] Uhmm… [07:17] hah [07:17] ok look, when that page goes down, how else will they rehost it? [07:18] Or when the domain http://www.foreverheadstone.com/ expires and gets bought... [07:19] >credit mistakes [07:19] just exactly what somebody reminiscing needs to be reminded about [07:19] "HEY REMEMBER THAT TIME YOU BOUGHT THAT EXPENSIVE CAT PLAY TOY ON YOUR CREDIT CARD AND NEVER PAID IT OFF? WE DO TOO" [07:23] There is no distributed url shortener, it seems. [07:23] Maybe I'll make one myself, if I can motivate myself properly. [07:24] really? with all of the blockchains i would've assumed somebody would've made one.. [07:26] I know. [07:26] I mean… the need for one seems obvious. [07:26] Distributed social networks are infinitely more complicated than this, yet there are plenty of those. [07:31] i guess getting people to actually use it would be a bit of a hassle, if something needs to be installed [07:31] because the link would be useless without said program [07:33] The way I was picturing it it's just a distributed keystore. Then you have site operators that do the 301 bit. That way if one goes down you can just choose another operator to prefix links with [07:36] that could also work [07:39] Without some sort of rewriting plugin the links to operator A would still be dead for most people. [07:40] True, but it would still put #urlteam out of a job [07:41] we could also store other shortener rewrites in the keystore.. [07:41] if a site goes down, someone just has to snatch that domain up and point it at a server with the keystore.. [07:41] (configuring it to act like it did before ofc0 [07:53] I’m diverting the topic a litte here, but has anybody looked into backing up Yahoo! Groups before? [07:53] Like, all of it. [07:55] With 5.5 million groups and ~8000 messages per group on average that would be 42.5 billion messages to back up. [07:56] Which would a single person 511 years and 477 TB of storage. [07:56] sounds like a lot of error 999 [07:57] Nah, that’s with appropriate rate-limits. [07:57] yeah, we've looked in to it before [07:57] yahoo has a pretty aggressive ratelimiter, which returns 999 when it wants you to go away [07:58] I’ve seen that, but waiting 0.38 seconds between the requests usually gets around that limitation. [07:59] not meaning to discourage you, if you want to make a yahoo groups scrape happen then i'm all ears [07:59] there's a lot of important shit in there [07:59] It's reachable over IPV6, surely if you have even a tiny block that would solve things [07:59] unfortunately a lot of it is membership-restricted [07:59] oh really [07:59] hmm [07:59] No IPv6 on my end, unfortunately. [08:00] luckily most reputable vps providers have it these days [08:00] And yes, half of the groups I discovered so far are members only. [08:05] capturing public groups only is better than doing nothing though.. [08:06] So, the biggest problem I had so far is: How do I store the data? [08:06] I’m currently using a mongodb, because that’s the only thing that worked reliably so far. [08:07] PurpleSym: i'd just save as HTML and WARC them up [08:07] that way it's ingestable into the wayback [08:07] ..unless we stopped doing that for some reason [08:08] *** schbirid has joined #archiveteam-bs [08:08] But there’s a nice API with machine-readable data. [08:08] That’s what I’m scraping right now. [08:22] is there a way you can dump out something that looks like an mbox file? [08:22] i.e. an email message [08:23] Sure, that’s easy. [08:24] The API has the raw message. [08:24] (with email addresses censored) [08:24] kool [08:25] an mbox file per group per month would be a good start then [08:25] You still get usernames though right? [08:25] There’s just one problem with that: https://yahoo.uservoice.com/forums/209451-us-groups/suggestions/9644478-displaying-raw-messages-is-not-8-bit-clean [08:26] Yes, I think yahoo usernames are in there as well, bentpins [08:28] PurpleSym: that sounds like an issue with the person who sent the email [08:28] Hm, but the HTML version is fine. [08:43] *** arkiver2 has joined #archiveteam-bs [08:45] never expected to see "mongodb" and "worked reliably" in association [08:45] learn something new every day [08:47] yipdw, http://howfuckedismydatabase.com/ [08:47] i've seen that before yes [08:47] Well, I had everything in small files previously. The filesystem did not like that. [08:53] *** signius has quit IRC (Ping timeout: 306 seconds) [08:53] *** primus104 has joined #archiveteam-bs [09:06] *** signius has joined #archiveteam-bs [09:37] *** godane has quit IRC (Leaving.) [10:13] *** arkiver2 has quit IRC (Ping timeout: 252 seconds) [10:50] *** swebb has quit IRC (Read error: Operation timed out) [10:51] *** Laverne has quit IRC (Read error: Operation timed out) [10:51] *** lytv has quit IRC (Read error: Operation timed out) [10:52] *** chazchaz has quit IRC (Read error: Operation timed out) [10:53] *** Laverne has joined #archiveteam-bs [10:53] *** aschmitz has quit IRC (Read error: Operation timed out) [10:54] *** zenguy_pc has quit IRC (Read error: Operation timed out) [10:54] *** aschmitz has joined #archiveteam-bs [10:54] *** lytv has joined #archiveteam-bs [10:55] *** atlogbot has quit IRC (Ping timeout: 369 seconds) [10:58] *** zenguy_pc has joined #archiveteam-bs [11:00] *** Laverne has quit IRC (Ping timeout: 369 seconds) [11:03] *** Laverne has joined #archiveteam-bs [11:04] *** dashcloud has quit IRC (Read error: Operation timed out) [11:04] *** swebb has joined #archiveteam-bs [11:04] *** atlogbot has joined #archiveteam-bs [11:08] *** dashcloud has joined #archiveteam-bs [11:09] *** chazchaz has joined #archiveteam-bs [11:45] *** godane has joined #archiveteam-bs [11:49] *** Infreq has quit IRC (Read error: Operation timed out) [11:50] *** Infreq has joined #archiveteam-bs [11:56] *** robink has quit IRC (Ping timeout: 492 seconds) [11:56] *** cloudmons has quit IRC (Ping timeout: 492 seconds) [11:57] *** arkiver2 has joined #archiveteam-bs [12:23] *** arkiver2 has quit IRC (Ping timeout: 252 seconds) [12:48] *** cloudmons has joined #archiveteam-bs [12:48] *** robink has joined #archiveteam-bs [13:24] *** zenguy_pc has quit IRC (Read error: Connection reset by peer) [13:42] *** zenguy_pc has joined #archiveteam-bs [14:26] *** vitzli has joined #archiveteam-bs [14:42] *** primus104 has quit IRC (Leaving.) [15:25] *** chfoo has quit IRC (Read error: Operation timed out) [15:28] *** robink has quit IRC (Read error: Connection reset by peer) [15:28] *** chfoo has joined #archiveteam-bs [15:34] *** cloudmons has quit IRC (Ping timeout: 492 seconds) [15:35] *** primus104 has joined #archiveteam-bs [16:29] *** cloudmons has joined #archiveteam-bs [16:55] *** JesseW has joined #archiveteam-bs [17:15] *** JesseW has quit IRC (Read error: Operation timed out) [17:27] *** JesseW has joined #archiveteam-bs [17:32] *** robink has joined #archiveteam-bs [17:43] *** robink has quit IRC (Read error: Connection reset by peer) [17:44] *** robink has joined #archiveteam-bs [17:46] *** vitzli has quit IRC (Quit: Leaving) [17:46] *** dashcloud has quit IRC (Read error: Operation timed out) [17:50] *** dashcloud has joined #archiveteam-bs [18:00] *** arkiver2 has joined #archiveteam-bs [18:13] *** arkiver2 has quit IRC (Ping timeout: 252 seconds) [18:34] *** Aranje has quit IRC (Read error: Connection reset by peer) [18:48] *** schbirid2 has joined #archiveteam-bs [18:48] *** Aranje has joined #archiveteam-bs [18:52] *** schbirid has quit IRC (Ping timeout: 306 seconds) [18:54] *** arkiver2 has joined #archiveteam-bs [18:58] *** aaaaaaaaa has joined #archiveteam-bs [18:58] *** Aranje has quit IRC (Ping timeout: 483 seconds) [18:59] *** arkiver2 has quit IRC (Ping timeout: 252 seconds) [19:07] *** Aranje has joined #archiveteam-bs [19:09] *** JesseW has quit IRC (Read error: Operation timed out) [19:18] *** wyatt874- has joined #archiveteam-bs [19:18] *** wyatt8740 has quit IRC (Read error: Connection reset by peer) [20:18] *** JesseW has joined #archiveteam-bs [20:19] *** Mayonaise has quit IRC (Read error: Operation timed out) [20:22] *** Mayonaise has joined #archiveteam-bs [20:26] *** schbirid2 has quit IRC (Quit: Leaving) [20:48] *** PurpleSym has quit IRC (WeeChat 1.1.1) [20:53] *** wyatt874- is now known as wyatt8740 [21:18] *** zenguy_pc has quit IRC (Ping timeout: 483 seconds) [21:26] *** zenguy_pc has joined #archiveteam-bs [21:34] *** JesseW has quit IRC (Read error: Operation timed out) [21:54] *** zenguy_pc has quit IRC (Ping timeout: 483 seconds) [22:01] *** RichardG has quit IRC (Remote host closed the connection) [22:02] *** RichardG has joined #archiveteam-bs [22:03] *** zenguy_pc has joined #archiveteam-bs [23:12] *** dashcloud has quit IRC (Read error: Operation timed out) [23:15] *** dashcloud has joined #archiveteam-bs [23:15] *** zenguy_pc has quit IRC (Ping timeout: 483 seconds) [23:20] *** zenguy_pc has joined #archiveteam-bs [23:40] *** zenguy_pc has quit IRC (Remote host closed the connection) [23:42] *** zenguy_pc has joined #archiveteam-bs [23:44] *** zenguy_pc has quit IRC (Read error: Connection reset by peer)