Time |
Nickname |
Message |
00:03
π
|
|
arkiver2 has quit IRC (Client Quit) |
00:55
π
|
|
JesseW has quit IRC (Read error: Operation timed out) |
00:57
π
|
|
superkuh_ has quit IRC (Read error: Operation timed out) |
00:57
π
|
|
superkuh_ has joined #archiveteam-bs |
01:11
π
|
|
Stiletto has joined #archiveteam-bs |
01:12
π
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
01:14
π
|
|
JesseW has joined #archiveteam-bs |
01:20
π
|
|
JesseW has quit IRC (Read error: Operation timed out) |
01:29
π
|
|
primus105 has quit IRC (Leaving.) |
02:18
π
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
02:24
π
|
|
dashcloud has joined #archiveteam-bs |
03:19
π
|
|
JesseW has joined #archiveteam-bs |
04:27
π
|
|
primus104 has joined #archiveteam-bs |
04:38
π
|
|
aaaaaaaaa has quit IRC (Leaving) |
05:30
π
|
|
primus104 has quit IRC (Leaving.) |
06:02
π
|
anomie |
I'm arguing with someone about url shorteners⦠|
06:28
π
|
yipdw |
sounds like a bad argument to get into |
06:36
π
|
|
JesseW has quit IRC (Read error: Operation timed out) |
06:37
π
|
anomie |
Yeah. |
06:37
π
|
anomie |
He thought it was better that he was using his own url shortener⦠-_- |
06:48
π
|
wp494 |
AKA his own pile of dog shit that he'll just kill because at some point it will become too hard to properly maintain |
06:49
π
|
wp494 |
or he realizes it was a stupid idea in the first place to even try it |
06:50
π
|
anomie |
Yeah. That's what I tried to tell him. |
06:50
π
|
anomie |
Oh well. Archiveteam exists because such people are a fact of life, I suppose. |
06:52
π
|
anomie |
To be fair, it's a little low on the list of offenses one can commit against archivists. |
06:53
π
|
bentpins |
Heard of the QR codes on headstones trend? That's a grave offense |
06:55
π
|
anomie |
What are headstones? |
06:56
π
|
anomie |
Also, is there an actual list of such offenses? I think that'd be neat. |
06:56
π
|
bentpins |
The stone bit in a cemetery that goes above the body with the details on the person that died |
06:58
π
|
anomie |
Oh⦠|
06:59
π
|
anomie |
Dear god, why? |
07:00
π
|
bentpins |
There's an article on it, but I mainly mentioned it for the pun. http://www.theatlantic.com/technology/archive/2014/05/qr-codes-for-the-dead/370901/ |
07:01
π
|
anomie |
Dear god⦠doesn't this kinda defeat the purpose of a headstones? |
07:02
π
|
bentpins |
Yeah it just seems like someone totally missed the point |
07:02
π
|
anomie |
Yeah. |
07:02
π
|
Ctrl-S |
the point of headstone is basically so you know what grave is what |
07:02
π
|
bentpins |
wow the one in the article allready 404s |
07:02
π
|
anomie |
I mean, part of the reason the headstone is made out of what it's made of, is that it will last a very long time with minimal maintenance, right? |
07:03
π
|
anomie |
bentpins: β¦ there are no words for the antipathy I feel right now |
07:04
π
|
anomie |
I found this old HN thread. Many of the URL shorteners mentioned don't exist anymore. https://news.ycombinator.com/item?id=508132 |
07:05
π
|
|
PurpleSym has joined #archiveteam-bs |
07:06
π
|
anomie |
That being said, is there a "better" way to run a url shortening service? |
07:07
π
|
Ctrl-S |
make it open source with a way to download the whole thing |
07:07
π
|
anomie |
Despite their abuse, I can still imagine them being used urls on paper. |
07:07
π
|
anomie |
*useful for |
07:07
π
|
Ctrl-S |
so you can clone it trivially |
07:07
π
|
anomie |
Ctrl-S: Fair point. |
07:07
π
|
anomie |
I mean, a url database isn't that large. |
07:08
π
|
Ctrl-S |
so you can grab the codebase from their repo and a daily dump of the DB |
07:08
π
|
anomie |
I'm not a fan of the blockchain buzzword, but maybe that could be used effectively. |
07:10
π
|
bentpins |
I'm suprised shortners haven't been going rouge and running drive by downloads and MITM attacks |
07:11
π
|
anomie |
I am too. |
07:11
π
|
anomie |
I'm guessing it might be because very few become really popular. |
07:12
π
|
bentpins |
probably |
07:15
π
|
anomie |
bentpins: Wow. I know it's just for a cat, but I'd at least expect the qr code to be engraved. http://cdn.theatlantic.com/assets/media/img/posts/2014/05/pet_memorial_qr_code/25b9ea967.jpg |
07:16
π
|
GLaDOS |
anomie: easier to update, i guess? |
07:17
π
|
anomie |
>updating a headstone |
07:17
π
|
anomie |
Uhmm⦠|
07:17
π
|
bentpins |
hah |
07:17
π
|
GLaDOS |
ok look, when that page goes down, how else will they rehost it? |
07:18
π
|
bentpins |
Or when the domain http://www.foreverheadstone.com/ expires and gets bought... |
07:19
π
|
GLaDOS |
>credit mistakes |
07:19
π
|
GLaDOS |
just exactly what somebody reminiscing needs to be reminded about |
07:19
π
|
GLaDOS |
"HEY REMEMBER THAT TIME YOU BOUGHT THAT EXPENSIVE CAT PLAY TOY ON YOUR CREDIT CARD AND NEVER PAID IT OFF? WE DO TOO" |
07:23
π
|
anomie |
There is no distributed url shortener, it seems. |
07:23
π
|
anomie |
Maybe I'll make one myself, if I can motivate myself properly. |
07:24
π
|
GLaDOS |
really? with all of the blockchains i would've assumed somebody would've made one.. |
07:26
π
|
anomie |
I know. |
07:26
π
|
anomie |
I mean⦠the need for one seems obvious. |
07:26
π
|
anomie |
Distributed social networks are infinitely more complicated than this, yet there are plenty of those. |
07:31
π
|
GLaDOS |
i guess getting people to actually use it would be a bit of a hassle, if something needs to be installed |
07:31
π
|
GLaDOS |
because the link would be useless without said program |
07:33
π
|
bentpins |
The way I was picturing it it's just a distributed keystore. Then you have site operators that do the 301 bit. That way if one goes down you can just choose another operator to prefix links with |
07:36
π
|
GLaDOS |
that could also work |
07:39
π
|
PurpleSym |
Without some sort of rewriting plugin the links to operator A would still be dead for most people. |
07:40
π
|
bentpins |
True, but it would still put #urlteam out of a job |
07:41
π
|
GLaDOS |
we could also store other shortener rewrites in the keystore.. |
07:41
π
|
GLaDOS |
if a site goes down, someone just has to snatch that domain up and point it at a server with the keystore.. |
07:41
π
|
GLaDOS |
(configuring it to act like it did before ofc0 |
07:53
π
|
PurpleSym |
Iβm diverting the topic a litte here, but has anybody looked into backing up Yahoo! Groups before? |
07:53
π
|
PurpleSym |
Like, all of it. |
07:55
π
|
PurpleSym |
With 5.5 million groups and ~8000 messages per group on average that would be 42.5 billion messages to back up. |
07:56
π
|
PurpleSym |
Which would a single person 511 years and 477 TB of storage. |
07:56
π
|
xmc |
sounds like a lot of error 999 |
07:57
π
|
PurpleSym |
Nah, thatβs with appropriate rate-limits. |
07:57
π
|
xmc |
yeah, we've looked in to it before |
07:57
π
|
xmc |
yahoo has a pretty aggressive ratelimiter, which returns 999 when it wants you to go away |
07:58
π
|
PurpleSym |
Iβve seen that, but waiting 0.38 seconds between the requests usually gets around that limitation. |
07:59
π
|
xmc |
not meaning to discourage you, if you want to make a yahoo groups scrape happen then i'm all ears |
07:59
π
|
xmc |
there's a lot of important shit in there |
07:59
π
|
bentpins |
It's reachable over IPV6, surely if you have even a tiny block that would solve things |
07:59
π
|
xmc |
unfortunately a lot of it is membership-restricted |
07:59
π
|
xmc |
oh really |
07:59
π
|
xmc |
hmm |
07:59
π
|
PurpleSym |
No IPv6 on my end, unfortunately. |
08:00
π
|
xmc |
luckily most reputable vps providers have it these days |
08:00
π
|
PurpleSym |
And yes, half of the groups I discovered so far are members only. |
08:05
π
|
GLaDOS |
capturing public groups only is better than doing nothing though.. |
08:06
π
|
PurpleSym |
So, the biggest problem I had so far is: How do I store the data? |
08:06
π
|
PurpleSym |
Iβm currently using a mongodb, because thatβs the only thing that worked reliably so far. |
08:07
π
|
GLaDOS |
PurpleSym: i'd just save as HTML and WARC them up |
08:07
π
|
GLaDOS |
that way it's ingestable into the wayback |
08:07
π
|
GLaDOS |
..unless we stopped doing that for some reason |
08:08
π
|
|
schbirid has joined #archiveteam-bs |
08:08
π
|
PurpleSym |
But thereβs a nice API with machine-readable data. |
08:08
π
|
PurpleSym |
Thatβs what Iβm scraping right now. |
08:22
π
|
xmc |
is there a way you can dump out something that looks like an mbox file? |
08:22
π
|
xmc |
i.e. an email message |
08:23
π
|
PurpleSym |
Sure, thatβs easy. |
08:24
π
|
PurpleSym |
The API has the raw message. |
08:24
π
|
PurpleSym |
(with email addresses censored) |
08:24
π
|
xmc |
kool |
08:25
π
|
xmc |
an mbox file per group per month would be a good start then |
08:25
π
|
bentpins |
You still get usernames though right? |
08:25
π
|
PurpleSym |
Thereβs just one problem with that: https://yahoo.uservoice.com/forums/209451-us-groups/suggestions/9644478-displaying-raw-messages-is-not-8-bit-clean |
08:26
π
|
PurpleSym |
Yes, I think yahoo usernames are in there as well, bentpins |
08:28
π
|
xmc |
PurpleSym: that sounds like an issue with the person who sent the email |
08:28
π
|
PurpleSym |
Hm, but the HTML version is fine. |
08:43
π
|
|
arkiver2 has joined #archiveteam-bs |
08:45
π
|
yipdw |
never expected to see "mongodb" and "worked reliably" in association |
08:45
π
|
yipdw |
learn something new every day |
08:47
π
|
HCross |
yipdw, http://howfuckedismydatabase.com/ |
08:47
π
|
yipdw |
i've seen that before yes |
08:47
π
|
PurpleSym |
Well, I had everything in small files previously. The filesystem did not like that. |
08:53
π
|
|
signius has quit IRC (Ping timeout: 306 seconds) |
08:53
π
|
|
primus104 has joined #archiveteam-bs |
09:06
π
|
|
signius has joined #archiveteam-bs |
09:37
π
|
|
godane has quit IRC (Leaving.) |
10:13
π
|
|
arkiver2 has quit IRC (Ping timeout: 252 seconds) |
10:50
π
|
|
swebb has quit IRC (Read error: Operation timed out) |
10:51
π
|
|
Laverne has quit IRC (Read error: Operation timed out) |
10:51
π
|
|
lytv has quit IRC (Read error: Operation timed out) |
10:52
π
|
|
chazchaz has quit IRC (Read error: Operation timed out) |
10:53
π
|
|
Laverne has joined #archiveteam-bs |
10:53
π
|
|
aschmitz has quit IRC (Read error: Operation timed out) |
10:54
π
|
|
zenguy_pc has quit IRC (Read error: Operation timed out) |
10:54
π
|
|
aschmitz has joined #archiveteam-bs |
10:54
π
|
|
lytv has joined #archiveteam-bs |
10:55
π
|
|
atlogbot has quit IRC (Ping timeout: 369 seconds) |
10:58
π
|
|
zenguy_pc has joined #archiveteam-bs |
11:00
π
|
|
Laverne has quit IRC (Ping timeout: 369 seconds) |
11:03
π
|
|
Laverne has joined #archiveteam-bs |
11:04
π
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
11:04
π
|
|
swebb has joined #archiveteam-bs |
11:04
π
|
|
atlogbot has joined #archiveteam-bs |
11:08
π
|
|
dashcloud has joined #archiveteam-bs |
11:09
π
|
|
chazchaz has joined #archiveteam-bs |
11:45
π
|
|
godane has joined #archiveteam-bs |
11:49
π
|
|
Infreq has quit IRC (Read error: Operation timed out) |
11:50
π
|
|
Infreq has joined #archiveteam-bs |
11:56
π
|
|
robink has quit IRC (Ping timeout: 492 seconds) |
11:56
π
|
|
cloudmons has quit IRC (Ping timeout: 492 seconds) |
11:57
π
|
|
arkiver2 has joined #archiveteam-bs |
12:23
π
|
|
arkiver2 has quit IRC (Ping timeout: 252 seconds) |
12:48
π
|
|
cloudmons has joined #archiveteam-bs |
12:48
π
|
|
robink has joined #archiveteam-bs |
13:24
π
|
|
zenguy_pc has quit IRC (Read error: Connection reset by peer) |
13:42
π
|
|
zenguy_pc has joined #archiveteam-bs |
14:26
π
|
|
vitzli has joined #archiveteam-bs |
14:42
π
|
|
primus104 has quit IRC (Leaving.) |
15:25
π
|
|
chfoo has quit IRC (Read error: Operation timed out) |
15:28
π
|
|
robink has quit IRC (Read error: Connection reset by peer) |
15:28
π
|
|
chfoo has joined #archiveteam-bs |
15:34
π
|
|
cloudmons has quit IRC (Ping timeout: 492 seconds) |
15:35
π
|
|
primus104 has joined #archiveteam-bs |
16:29
π
|
|
cloudmons has joined #archiveteam-bs |
16:55
π
|
|
JesseW has joined #archiveteam-bs |
17:15
π
|
|
JesseW has quit IRC (Read error: Operation timed out) |
17:27
π
|
|
JesseW has joined #archiveteam-bs |
17:32
π
|
|
robink has joined #archiveteam-bs |
17:43
π
|
|
robink has quit IRC (Read error: Connection reset by peer) |
17:44
π
|
|
robink has joined #archiveteam-bs |
17:46
π
|
|
vitzli has quit IRC (Quit: Leaving) |
17:46
π
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
17:50
π
|
|
dashcloud has joined #archiveteam-bs |
18:00
π
|
|
arkiver2 has joined #archiveteam-bs |
18:13
π
|
|
arkiver2 has quit IRC (Ping timeout: 252 seconds) |
18:34
π
|
|
Aranje has quit IRC (Read error: Connection reset by peer) |
18:48
π
|
|
schbirid2 has joined #archiveteam-bs |
18:48
π
|
|
Aranje has joined #archiveteam-bs |
18:52
π
|
|
schbirid has quit IRC (Ping timeout: 306 seconds) |
18:54
π
|
|
arkiver2 has joined #archiveteam-bs |
18:58
π
|
|
aaaaaaaaa has joined #archiveteam-bs |
18:58
π
|
|
Aranje has quit IRC (Ping timeout: 483 seconds) |
18:59
π
|
|
arkiver2 has quit IRC (Ping timeout: 252 seconds) |
19:07
π
|
|
Aranje has joined #archiveteam-bs |
19:09
π
|
|
JesseW has quit IRC (Read error: Operation timed out) |
19:18
π
|
|
wyatt874- has joined #archiveteam-bs |
19:18
π
|
|
wyatt8740 has quit IRC (Read error: Connection reset by peer) |
20:18
π
|
|
JesseW has joined #archiveteam-bs |
20:19
π
|
|
Mayonaise has quit IRC (Read error: Operation timed out) |
20:22
π
|
|
Mayonaise has joined #archiveteam-bs |
20:26
π
|
|
schbirid2 has quit IRC (Quit: Leaving) |
20:48
π
|
|
PurpleSym has quit IRC (WeeChat 1.1.1) |
20:53
π
|
|
wyatt874- is now known as wyatt8740 |
21:18
π
|
|
zenguy_pc has quit IRC (Ping timeout: 483 seconds) |
21:26
π
|
|
zenguy_pc has joined #archiveteam-bs |
21:34
π
|
|
JesseW has quit IRC (Read error: Operation timed out) |
21:54
π
|
|
zenguy_pc has quit IRC (Ping timeout: 483 seconds) |
22:01
π
|
|
RichardG has quit IRC (Remote host closed the connection) |
22:02
π
|
|
RichardG has joined #archiveteam-bs |
22:03
π
|
|
zenguy_pc has joined #archiveteam-bs |
23:12
π
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
23:15
π
|
|
dashcloud has joined #archiveteam-bs |
23:15
π
|
|
zenguy_pc has quit IRC (Ping timeout: 483 seconds) |
23:20
π
|
|
zenguy_pc has joined #archiveteam-bs |
23:40
π
|
|
zenguy_pc has quit IRC (Remote host closed the connection) |
23:42
π
|
|
zenguy_pc has joined #archiveteam-bs |
23:44
π
|
|
zenguy_pc has quit IRC (Read error: Connection reset by peer) |