Time |
Nickname |
Message |
00:30
🔗
|
|
fuzzy8021 has quit IRC (Read error: Connection reset by peer) |
00:30
🔗
|
|
fuzzy8021 has joined #archiveteam-bs |
00:36
🔗
|
|
Arcorann has joined #archiveteam-bs |
00:51
🔗
|
|
Ctrl has quit IRC (Ping timeout: 857 seconds) |
01:20
🔗
|
|
Ctrl has joined #archiveteam-bs |
01:30
🔗
|
|
c0mpass has quit IRC () |
02:24
🔗
|
|
DogsRNice has quit IRC (Read error: Connection reset by peer) |
03:12
🔗
|
|
ColoHusky has joined #archiveteam-bs |
03:16
🔗
|
ColoHusky |
JAA: It doesn't appear as if some (or all) of the Fotolog data is in the Wayback Machine, I tried some usernames from https://archive.org/download/archiveteam_fotolog_20160505045634/fotolog_20160505045634.megawarc.json.gz (radiun, stoidisponible, milenasantanaa) but they're all not in there (https://web.archive.org/web/*/http://www.fotolog.net/milenasantanaa), which is quite odd |
03:17
🔗
|
ColoHusky |
Nvm, fotolog.com works, I had the wrong URL lol |
03:18
🔗
|
|
qw3rty__ has joined #archiveteam-bs |
03:22
🔗
|
JAA |
ColoHusky: Oh, we definitely archived *something*, but it's clearly also not everything. |
03:22
🔗
|
ColoHusky |
Yeah |
03:24
🔗
|
ColoHusky |
All of the ones listed on GitHub seem to be archived, but some of the usernames must've been missed |
03:25
🔗
|
|
qw3rty_ has quit IRC (Read error: Operation timed out) |
03:28
🔗
|
JAA |
Did you check the list 03 as well? |
03:28
🔗
|
|
ColoHusky has quit IRC (Ping timeout: 253 seconds) |
03:28
🔗
|
JAA |
Welp |
03:59
🔗
|
|
maxfan8 has quit IRC (Read error: Operation timed out) |
04:20
🔗
|
|
ephemer0l has quit IRC (Read error: Connection reset by peer) |
04:30
🔗
|
|
maxfan8 has joined #archiveteam-bs |
04:58
🔗
|
|
Aoede has quit IRC (Read error: error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac) |
04:59
🔗
|
|
Aoede has joined #archiveteam-bs |
05:09
🔗
|
|
godane has quit IRC (Ping timeout: 272 seconds) |
05:18
🔗
|
|
wyatt8740 has quit IRC (Remote host closed the connection) |
05:20
🔗
|
|
wyatt8740 has joined #archiveteam-bs |
05:22
🔗
|
|
godane has joined #archiveteam-bs |
05:22
🔗
|
|
Pixi has joined #archiveteam-bs |
05:23
🔗
|
|
scorche` has joined #archiveteam-bs |
05:23
🔗
|
|
SynMonger has quit IRC (Ping timeout: 255 seconds) |
05:23
🔗
|
|
kyledrake has quit IRC (Ping timeout: 255 seconds) |
05:24
🔗
|
|
scorche` has quit IRC (hub.efnet.us irc.Prison.NET) |
05:24
🔗
|
|
robogoat has quit IRC (hub.efnet.us irc.Prison.NET) |
05:24
🔗
|
|
phirephly has quit IRC (hub.efnet.us irc.Prison.NET) |
05:24
🔗
|
|
Pixi` has quit IRC (hub.efnet.us irc.Prison.NET) |
05:24
🔗
|
|
schbirid has quit IRC (hub.efnet.us irc.Prison.NET) |
05:24
🔗
|
|
superkuh has quit IRC (hub.efnet.us irc.Prison.NET) |
05:24
🔗
|
|
scorche has quit IRC (hub.efnet.us irc.Prison.NET) |
05:24
🔗
|
|
achip has quit IRC (hub.efnet.us irc.Prison.NET) |
05:24
🔗
|
|
Somebody2 has quit IRC (hub.efnet.us irc.Prison.NET) |
05:24
🔗
|
|
SynMonger has joined #archiveteam-bs |
05:29
🔗
|
|
kyledrake has joined #archiveteam-bs |
06:04
🔗
|
|
Aoede has quit IRC (se.hub irc.nordunet.se) |
06:13
🔗
|
|
Aoede has joined #archiveteam-bs |
06:18
🔗
|
|
ephemer0l has joined #archiveteam-bs |
07:08
🔗
|
|
Raccoon has quit IRC (Remote host closed the connection) |
07:35
🔗
|
|
MrRadar has joined #archiveteam-bs |
07:37
🔗
|
|
MrRadar_ has quit IRC (Read error: Operation timed out) |
07:43
🔗
|
Ryz |
Potential way to mine Wordpress blogs? I stumbled upon something like https://wordpress.com/post/202218/455/ - on it's own, it didn't do anything |
07:44
🔗
|
Ryz |
However, if it is something like https://wordpress.com/post/202218/ - something interesting happens, there's a link to the blog like https://rthorm.wordpress.com/ (which I saw moments earlier) |
07:44
🔗
|
Ryz |
I'm not sure if the number is just per post, or if it's per Wordpress account |
07:44
🔗
|
Ryz |
Here's some other examples of my findings: |
07:44
🔗
|
Ryz |
https://wordpress.com/post/20221/ = https://lfcmanager.wordpress.com/ |
07:44
🔗
|
Ryz |
https://wordpress.com/post/20/ = nothing |
07:44
🔗
|
Ryz |
https://wordpress.com/post/2/ = https://donncha.wordpress.com/ |
07:46
🔗
|
Ryz |
JAA or anyone else or arkiver, perhaps potential interest in Wordpress blog link mining or investigate more of this? It's numerical too o: |
07:48
🔗
|
Ryz |
So far it doesn't appear to be a custom URL that would use Wordpress as their backend technology, just explicit Wordpress blog URLs |
07:51
🔗
|
Ryz |
...Now only if we can find something similar with Tistory blogs or Blogspot blogs <#>; |
08:03
🔗
|
|
Raccoon has joined #archiveteam-bs |
08:05
🔗
|
|
Jake2 has joined #archiveteam-bs |
08:05
🔗
|
|
Jake has quit IRC (Read error: Operation timed out) |
08:05
🔗
|
|
Jake2 is now known as Jake |
08:31
🔗
|
|
RichardG_ has joined #archiveteam-bs |
08:31
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
08:46
🔗
|
|
superkuh has joined #archiveteam-bs |
08:46
🔗
|
|
schbirid has joined #archiveteam-bs |
08:46
🔗
|
|
scorche has joined #archiveteam-bs |
08:46
🔗
|
|
robogoat has joined #archiveteam-bs |
08:46
🔗
|
|
phirephly has joined #archiveteam-bs |
08:46
🔗
|
|
achip has joined #archiveteam-bs |
08:46
🔗
|
|
Somebody2 has joined #archiveteam-bs |
09:32
🔗
|
|
hook54321 has quit IRC () |
09:32
🔗
|
|
hook54321 has joined #archiveteam-bs |
11:09
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
13:24
🔗
|
JAA |
Ryz: The wp.me shortener is even easier, and that's why I ran it through URLTeam a while ago. |
13:25
🔗
|
JAA |
There's also a way to discover all Wordpress blogs with Jetpack installed, I believe, but I never looked into that in detail. |
14:09
🔗
|
|
benjinsmi has joined #archiveteam-bs |
14:16
🔗
|
|
benjins has quit IRC (Read error: Operation timed out) |
15:02
🔗
|
|
maxfan8 has quit IRC (Quit: WeeChat 2.8) |
15:03
🔗
|
|
maxfan8 has joined #archiveteam-bs |
15:30
🔗
|
|
Arcorann has quit IRC (Read error: Connection reset by peer) |
15:33
🔗
|
|
nepeat_ has joined #archiveteam-bs |
15:33
🔗
|
|
nepeat has quit IRC (Read error: Connection reset by peer) |
15:55
🔗
|
|
hook54321 has quit IRC () |
15:55
🔗
|
|
hook54321 has joined #archiveteam-bs |
16:59
🔗
|
|
Gallifrey has joined #archiveteam-bs |
17:00
🔗
|
Gallifrey |
Darn, they... they banned /r/GenderCriticalGuys. That was on my list of subs to archive and I didn't get there in time. |
17:09
🔗
|
schbirid |
sounds like nothing of value was lost |
17:20
🔗
|
|
Gallifrey has quit IRC (Read error: Operation timed out) |
17:21
🔗
|
|
Gallifrey has joined #archiveteam-bs |
17:24
🔗
|
|
Mateon1 has quit IRC (Remote host closed the connection) |
17:24
🔗
|
|
Mateon1 has joined #archiveteam-bs |
17:27
🔗
|
Gallifrey |
I don't like them either, but destroying the record of the things they've said done... doesn't sit right with me. |
17:28
🔗
|
Ryz |
Less stally, more archivey <#>; |
17:30
🔗
|
JAA |
Yeah, we should get #shreddit (hackint) off the ground ASAP. |
17:30
🔗
|
|
DogsRNice has joined #archiveteam-bs |
17:34
🔗
|
kiska |
I've copied your thing into that channel |
17:35
🔗
|
Gallifrey |
What's the current bottleneck with #shreddit, do we know? |
17:37
🔗
|
JAA |
Code isn't ready AFAIK. Anything further in that channel please. |
17:40
🔗
|
Gallifrey |
Right-o |
17:51
🔗
|
|
maxfan8 has quit IRC (Quit: WeeChat 2.8) |
17:51
🔗
|
|
maxfan8 has joined #archiveteam-bs |
17:53
🔗
|
|
maxfan8 has quit IRC (Client Quit) |
17:53
🔗
|
|
maxfan8 has joined #archiveteam-bs |
18:05
🔗
|
|
ephemer0l has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) |
18:10
🔗
|
|
ephemer0l has joined #archiveteam-bs |
18:38
🔗
|
|
fredgido has joined #archiveteam-bs |
18:57
🔗
|
|
RichardG_ is now known as RichardG |
19:06
🔗
|
|
fredgido has quit IRC (Read error: Connection reset by peer) |
19:10
🔗
|
JAA |
Somebody2: Looks like edit history might only be accessible when logged in? I don't see anything. |
19:12
🔗
|
Somebody2 |
Yes, edit history is only available when logged in. |
19:12
🔗
|
Somebody2 |
I thought it appropriate to mention it initially in non-bs because it was a request to archive something. |
19:13
🔗
|
JAA |
Right |
19:13
🔗
|
JAA |
Yeah, that's fine, just wanted to move discussion about it here. |
19:13
🔗
|
JAA |
If it's behind the login wall, archiving it properly is... tricky to put it mildly. |
19:13
🔗
|
JAA |
Could use webrecorder. |
19:14
🔗
|
JAA |
But it's probably a bad idea to share the archive publicly. |
19:15
🔗
|
JAA |
I wonder if the edit history is actually loginwalled or the option to show it is just hidden. |
19:15
🔗
|
Somebody2 |
All very good questions! |
19:16
🔗
|
Somebody2 |
In this case, it got enough attention that they put out a press release applogizing for it, so it's not that urgent -- but it's a good question for the future. |
19:17
🔗
|
JAA |
If it's just hidden, I'd happily integrate it into snscrape. |
19:17
🔗
|
nicolas17 |
how do I see the edit history, from the UI, logged in? |
19:17
🔗
|
JAA |
nicolas17: "Click the triple dots in the top right corner of the post." according to Reddit. |
19:17
🔗
|
nicolas17 |
oh they hid it under a "more options" in that menu |
19:18
🔗
|
JAA |
Heh |
19:18
🔗
|
JAA |
It being Facebook, I expect that to trigger an XHR which returns JSON which just wraps an HTML string which contains 300 KiB of JS to load the actual edit history. |
19:18
🔗
|
nicolas17 |
that's *exactly* true |
19:19
🔗
|
nicolas17 |
but it also sends 50 POST parameters |
19:19
🔗
|
nicolas17 |
let's see how much I can trim it |
19:24
🔗
|
nicolas17 |
nope, it wants a thing in the cookies that might be my session token |
19:26
🔗
|
nicolas17 |
curl 'https://www.facebook.com/ajax/edits/browser/post/?content_token=620497258568858' -H 'Cookie: c_user=123456789; xs=58%3Axxxxxxxxxxxxxx%3A2%3A1234567890%3A12345%3A12345' --data-raw 'content_token=620497258568858&__user=123456789&__a=1&fb_dtsg=xxxxxxxxxxxx%3Axxxxxxxxxxxx'; this is the smallest I got so yeah it wants a logged-in session token... |
19:31
🔗
|
|
fredgido has joined #archiveteam-bs |
19:32
🔗
|
Somebody2 |
:-( |
19:35
🔗
|
JAA |
Too bad. |
19:49
🔗
|
|
dragond has quit IRC (Remote host closed the connection) |
19:59
🔗
|
|
britmob_ has joined #archiveteam-bs |
20:03
🔗
|
|
britmob has quit IRC (Read error: Operation timed out) |
20:22
🔗
|
|
HP_Archiv has joined #archiveteam-bs |
21:17
🔗
|
|
HP_Archiv has quit IRC (Quit: Leaving) |
21:32
🔗
|
|
deathy__ has joined #archiveteam-bs |
22:22
🔗
|
|
simon816 has quit IRC (Remote host closed the connection) |
22:31
🔗
|
|
simon816 has joined #archiveteam-bs |
22:59
🔗
|
|
Mayonaise has quit IRC (Read error: Operation timed out) |
23:10
🔗
|
|
benjins has joined #archiveteam-bs |
23:11
🔗
|
|
benjinsmi has quit IRC (Read error: Operation timed out) |
23:14
🔗
|
|
ephemer0l has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) |
23:16
🔗
|
|
ephemer0l has joined #archiveteam-bs |
23:19
🔗
|
|
BlueMax has joined #archiveteam-bs |
23:19
🔗
|
|
Pixi has quit IRC (Quit: Leaving) |
23:20
🔗
|
|
Pixi has joined #archiveteam-bs |
23:44
🔗
|
|
Mayonaise has joined #archiveteam-bs |
23:44
🔗
|
|
Mayonaise has quit IRC (Client Quit) |
23:44
🔗
|
|
Mayonaise has joined #archiveteam-bs |
23:57
🔗
|
|
fredgido has quit IRC (Read error: Connection reset by peer) |
23:58
🔗
|
|
fredgido has joined #archiveteam-bs |