[00:30] *** fuzzy8021 has quit IRC (Read error: Connection reset by peer) [00:30] *** fuzzy8021 has joined #archiveteam-bs [00:36] *** Arcorann has joined #archiveteam-bs [00:51] *** Ctrl has quit IRC (Ping timeout: 857 seconds) [01:20] *** Ctrl has joined #archiveteam-bs [01:30] *** c0mpass has quit IRC () [02:24] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [03:12] *** ColoHusky has joined #archiveteam-bs [03:16] JAA: It doesn't appear as if some (or all) of the Fotolog data is in the Wayback Machine, I tried some usernames from https://archive.org/download/archiveteam_fotolog_20160505045634/fotolog_20160505045634.megawarc.json.gz (radiun, stoidisponible, milenasantanaa) but they're all not in there (https://web.archive.org/web/*/http://www.fotolog.net/milenasantanaa), which is quite odd [03:17] Nvm, fotolog.com works, I had the wrong URL lol [03:18] *** qw3rty__ has joined #archiveteam-bs [03:22] ColoHusky: Oh, we definitely archived *something*, but it's clearly also not everything. [03:22] Yeah [03:24] All of the ones listed on GitHub seem to be archived, but some of the usernames must've been missed [03:25] *** qw3rty_ has quit IRC (Read error: Operation timed out) [03:28] Did you check the list 03 as well? [03:28] *** ColoHusky has quit IRC (Ping timeout: 253 seconds) [03:28] Welp [03:59] *** maxfan8 has quit IRC (Read error: Operation timed out) [04:20] *** ephemer0l has quit IRC (Read error: Connection reset by peer) [04:30] *** maxfan8 has joined #archiveteam-bs [04:58] *** Aoede has quit IRC (Read error: error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac) [04:59] *** Aoede has joined #archiveteam-bs [05:09] *** godane has quit IRC (Ping timeout: 272 seconds) [05:18] *** wyatt8740 has quit IRC (Remote host closed the connection) [05:20] *** wyatt8740 has joined #archiveteam-bs [05:22] *** godane has joined #archiveteam-bs [05:22] *** Pixi has joined #archiveteam-bs [05:23] *** scorche` has joined #archiveteam-bs [05:23] *** SynMonger has quit IRC (Ping timeout: 255 seconds) [05:23] *** kyledrake has quit IRC (Ping timeout: 255 seconds) [05:24] *** scorche` has quit IRC (hub.efnet.us irc.Prison.NET) [05:24] *** robogoat has quit IRC (hub.efnet.us irc.Prison.NET) [05:24] *** phirephly has quit IRC (hub.efnet.us irc.Prison.NET) [05:24] *** Pixi` has quit IRC (hub.efnet.us irc.Prison.NET) [05:24] *** schbirid has quit IRC (hub.efnet.us irc.Prison.NET) [05:24] *** superkuh has quit IRC (hub.efnet.us irc.Prison.NET) [05:24] *** scorche has quit IRC (hub.efnet.us irc.Prison.NET) [05:24] *** achip has quit IRC (hub.efnet.us irc.Prison.NET) [05:24] *** Somebody2 has quit IRC (hub.efnet.us irc.Prison.NET) [05:24] *** SynMonger has joined #archiveteam-bs [05:29] *** kyledrake has joined #archiveteam-bs [06:04] *** Aoede has quit IRC (se.hub irc.nordunet.se) [06:13] *** Aoede has joined #archiveteam-bs [06:18] *** ephemer0l has joined #archiveteam-bs [07:08] *** Raccoon has quit IRC (Remote host closed the connection) [07:35] *** MrRadar has joined #archiveteam-bs [07:37] *** MrRadar_ has quit IRC (Read error: Operation timed out) [07:43] Potential way to mine Wordpress blogs? I stumbled upon something like https://wordpress.com/post/202218/455/ - on it's own, it didn't do anything [07:44] However, if it is something like https://wordpress.com/post/202218/ - something interesting happens, there's a link to the blog like https://rthorm.wordpress.com/ (which I saw moments earlier) [07:44] I'm not sure if the number is just per post, or if it's per Wordpress account [07:44] Here's some other examples of my findings: [07:44] https://wordpress.com/post/20221/ = https://lfcmanager.wordpress.com/ [07:44] https://wordpress.com/post/20/ = nothing [07:44] https://wordpress.com/post/2/ = https://donncha.wordpress.com/ [07:46] JAA or anyone else or arkiver, perhaps potential interest in Wordpress blog link mining or investigate more of this? It's numerical too o: [07:48] So far it doesn't appear to be a custom URL that would use Wordpress as their backend technology, just explicit Wordpress blog URLs [07:51] ...Now only if we can find something similar with Tistory blogs or Blogspot blogs <#>; [08:03] *** Raccoon has joined #archiveteam-bs [08:05] *** Jake2 has joined #archiveteam-bs [08:05] *** Jake has quit IRC (Read error: Operation timed out) [08:05] *** Jake2 is now known as Jake [08:31] *** RichardG_ has joined #archiveteam-bs [08:31] *** RichardG has quit IRC (Read error: Connection reset by peer) [08:46] *** superkuh has joined #archiveteam-bs [08:46] *** schbirid has joined #archiveteam-bs [08:46] *** scorche has joined #archiveteam-bs [08:46] *** robogoat has joined #archiveteam-bs [08:46] *** phirephly has joined #archiveteam-bs [08:46] *** achip has joined #archiveteam-bs [08:46] *** Somebody2 has joined #archiveteam-bs [09:32] *** hook54321 has quit IRC () [09:32] *** hook54321 has joined #archiveteam-bs [11:09] *** BlueMax has quit IRC (Quit: Leaving) [13:24] Ryz: The wp.me shortener is even easier, and that's why I ran it through URLTeam a while ago. [13:25] There's also a way to discover all Wordpress blogs with Jetpack installed, I believe, but I never looked into that in detail. [14:09] *** benjinsmi has joined #archiveteam-bs [14:16] *** benjins has quit IRC (Read error: Operation timed out) [15:02] *** maxfan8 has quit IRC (Quit: WeeChat 2.8) [15:03] *** maxfan8 has joined #archiveteam-bs [15:30] *** Arcorann has quit IRC (Read error: Connection reset by peer) [15:33] *** nepeat_ has joined #archiveteam-bs [15:33] *** nepeat has quit IRC (Read error: Connection reset by peer) [15:55] *** hook54321 has quit IRC () [15:55] *** hook54321 has joined #archiveteam-bs [16:59] *** Gallifrey has joined #archiveteam-bs [17:00] Darn, they... they banned /r/GenderCriticalGuys. That was on my list of subs to archive and I didn't get there in time. [17:09] sounds like nothing of value was lost [17:20] *** Gallifrey has quit IRC (Read error: Operation timed out) [17:21] *** Gallifrey has joined #archiveteam-bs [17:24] *** Mateon1 has quit IRC (Remote host closed the connection) [17:24] *** Mateon1 has joined #archiveteam-bs [17:27] I don't like them either, but destroying the record of the things they've said done... doesn't sit right with me. [17:28] Less stally, more archivey <#>; [17:30] Yeah, we should get #shreddit (hackint) off the ground ASAP. [17:30] *** DogsRNice has joined #archiveteam-bs [17:34] I've copied your thing into that channel [17:35] What's the current bottleneck with #shreddit, do we know? [17:37] Code isn't ready AFAIK. Anything further in that channel please. [17:40] Right-o [17:51] *** maxfan8 has quit IRC (Quit: WeeChat 2.8) [17:51] *** maxfan8 has joined #archiveteam-bs [17:53] *** maxfan8 has quit IRC (Client Quit) [17:53] *** maxfan8 has joined #archiveteam-bs [18:05] *** ephemer0l has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) [18:10] *** ephemer0l has joined #archiveteam-bs [18:38] *** fredgido has joined #archiveteam-bs [18:57] *** RichardG_ is now known as RichardG [19:06] *** fredgido has quit IRC (Read error: Connection reset by peer) [19:10] Somebody2: Looks like edit history might only be accessible when logged in? I don't see anything. [19:12] Yes, edit history is only available when logged in. [19:12] I thought it appropriate to mention it initially in non-bs because it was a request to archive something. [19:13] Right [19:13] Yeah, that's fine, just wanted to move discussion about it here. [19:13] If it's behind the login wall, archiving it properly is... tricky to put it mildly. [19:13] Could use webrecorder. [19:14] But it's probably a bad idea to share the archive publicly. [19:15] I wonder if the edit history is actually loginwalled or the option to show it is just hidden. [19:15] All very good questions! [19:16] In this case, it got enough attention that they put out a press release applogizing for it, so it's not that urgent -- but it's a good question for the future. [19:17] If it's just hidden, I'd happily integrate it into snscrape. [19:17] how do I see the edit history, from the UI, logged in? [19:17] nicolas17: "Click the triple dots in the top right corner of the post." according to Reddit. [19:17] oh they hid it under a "more options" in that menu [19:18] Heh [19:18] It being Facebook, I expect that to trigger an XHR which returns JSON which just wraps an HTML string which contains 300 KiB of JS to load the actual edit history. [19:18] that's *exactly* true [19:19] but it also sends 50 POST parameters [19:19] let's see how much I can trim it [19:24] nope, it wants a thing in the cookies that might be my session token [19:26] curl 'https://www.facebook.com/ajax/edits/browser/post/?content_token=620497258568858' -H 'Cookie: c_user=123456789; xs=58%3Axxxxxxxxxxxxxx%3A2%3A1234567890%3A12345%3A12345' --data-raw 'content_token=620497258568858&__user=123456789&__a=1&fb_dtsg=xxxxxxxxxxxx%3Axxxxxxxxxxxx'; this is the smallest I got so yeah it wants a logged-in session token... [19:31] *** fredgido has joined #archiveteam-bs [19:32] :-( [19:35] Too bad. [19:49] *** dragond has quit IRC (Remote host closed the connection) [19:59] *** britmob_ has joined #archiveteam-bs [20:03] *** britmob has quit IRC (Read error: Operation timed out) [20:22] *** HP_Archiv has joined #archiveteam-bs [21:17] *** HP_Archiv has quit IRC (Quit: Leaving) [21:32] *** deathy__ has joined #archiveteam-bs [22:22] *** simon816 has quit IRC (Remote host closed the connection) [22:31] *** simon816 has joined #archiveteam-bs [22:59] *** Mayonaise has quit IRC (Read error: Operation timed out) [23:10] *** benjins has joined #archiveteam-bs [23:11] *** benjinsmi has quit IRC (Read error: Operation timed out) [23:14] *** ephemer0l has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) [23:16] *** ephemer0l has joined #archiveteam-bs [23:19] *** BlueMax has joined #archiveteam-bs [23:19] *** Pixi has quit IRC (Quit: Leaving) [23:20] *** Pixi has joined #archiveteam-bs [23:44] *** Mayonaise has joined #archiveteam-bs [23:44] *** Mayonaise has quit IRC (Client Quit) [23:44] *** Mayonaise has joined #archiveteam-bs [23:57] *** fredgido has quit IRC (Read error: Connection reset by peer) [23:58] *** fredgido has joined #archiveteam-bs