[00:38] *** lunik198 has quit IRC (:x) [00:39] *** lunik198 has joined #archiveteam-ot [01:09] *** MrRadar2 has quit IRC (Remote host closed the connection) [01:12] *** thuban1 has joined #archiveteam-ot [01:14] *** MrRadar2 has joined #archiveteam-ot [01:17] *** thuban has quit IRC (Read error: Operation timed out) [01:21] *** Frogging has quit IRC (Quit: Close the World, Open the nExt) [01:26] *** katocala has joined #archiveteam-ot [01:35] *** Frogging has joined #archiveteam-ot [01:35] *** X-Scale` has joined #archiveteam-ot [01:36] *** X-Scale has quit IRC (Ping timeout: 240 seconds) [01:36] *** X-Scale` is now known as X-Scale [01:46] Just saw this :( https://www.vice.com/en_us/article/8xwe9p/yahoo-groups-is-winding-down-and-all-content-will-be-permanently-removed [01:47] Is there a concerted effort to mirror it all before it's unplugged ? [01:48] yes #yahoosucks [01:49] Ah, thanks. Otherwise, it would be yet another Library of Alexandria down the drain. [01:50] it will be [01:51] but we are going to see what we can do to _mitigate_ that [01:52] That's a great and noble effort. [02:38] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [03:01] *** qw3rty2 has joined #archiveteam-ot [03:08] *** qw3rty has quit IRC (Ping timeout: 745 seconds) [04:00] *** SynMonger has quit IRC (Quit: Wait, what?) [04:01] *** qw3rty has joined #archiveteam-ot [04:01] *** SynMonger has joined #archiveteam-ot [04:10] *** qw3rty2 has quit IRC (Ping timeout: 745 seconds) [04:42] *** BlueMax has quit IRC (Quit: Leaving) [04:42] *** BlueMax has joined #archiveteam-ot [04:51] *** godane has quit IRC (Leaving.) [05:07] *** manjaro-u has quit IRC (Konversation terminated!) [05:30] *** dhyan_nat has joined #archiveteam-ot [05:38] *** godane has joined #archiveteam-ot [06:42] *** dhyan_nat has quit IRC (Read error: Operation timed out) [07:05] *** katocala has quit IRC (Read error: Operation timed out) [07:05] *** katocala has joined #archiveteam-ot [07:46] *** killsushi has quit IRC (Read error: Connection reset by peer) [07:46] *** killsushi has joined #archiveteam-ot [07:50] *** Frogging has quit IRC (Read error: Operation timed out) [07:50] *** JAA has quit IRC (Read error: Operation timed out) [07:51] *** Frogging has joined #archiveteam-ot [07:51] *** simon816 has quit IRC (Ping timeout: 246 seconds) [07:51] *** lunik198 has quit IRC (Ping timeout (120 seconds)) [07:51] *** dxrt has quit IRC (ZNC - http://znc.sourceforge.net) [07:51] *** dxrt has joined #archiveteam-ot [07:51] *** ats has quit IRC (Read error: Operation timed out) [07:52] *** Fusl____ sets mode: +o dxrt [07:52] *** Fusl sets mode: +o dxrt [07:52] *** Fusl_ sets mode: +o dxrt [07:53] *** lunik198 has joined #archiveteam-ot [07:54] *** simon816 has joined #archiveteam-ot [07:54] *** JAA has joined #archiveteam-ot [07:54] *** Fusl____ sets mode: +o JAA [07:54] *** Fusl sets mode: +o JAA [07:54] *** Fusl_ sets mode: +o JAA [07:55] *** ats has joined #archiveteam-ot [07:55] *** AlsoJAA sets mode: +o JAA [08:23] *** schbirid has joined #archiveteam-ot [13:23] I wrote a little script to automate discovery of social media given a list of web pages. That sounds fancier than it is; it just fetches the URL and extracts anything that looks like or could point to a Facebook, Flickr, Instagram, Twitter, VK, or YouTube page. The output requires a lot of manual cleanup obviously, but still saves time. It's here for anyone interested: [13:23] https://github.com/JustAnotherArchivist/little-things/blob/master/website-extract-social-media [13:25] There's also wiki-website-extract-social-media, which takes input in the form of a new-viewer wiki page and formats the output accordingly again. I use it by copying the wiki page source to the clipboard, then xclip -selection c -o | ./wiki-website-extract-social-media | ./social-media-normalise | ./youtube-normalise (plus redirection to a file because it would mangle with errors otherwise) [13:25] Ryz: ^ might be of interest to you. [13:51] *** icedice has joined #archiveteam-ot [13:57] *** systwi_ is now known as systwi [14:34] *** icedice2 has joined #archiveteam-ot [14:39] *** icedice has quit IRC (Ping timeout: 252 seconds) [14:39] *** icedice2 has quit IRC (Client Quit) [14:39] *** icedice has joined #archiveteam-ot [14:43] *** schbirid has quit IRC (Quit: Leaving) [14:43] *** killsushi has quit IRC (Quit: Leaving) [15:14] *** BlueMax has quit IRC (Read error: Connection reset by peer) [15:21] *** DogsRNice has joined #archiveteam-ot [15:28] *** icedice has quit IRC (Quit: Leaving) [15:44] *** thuban1 has quit IRC (Read error: Connection reset by peer) [15:45] *** thuban1 has joined #archiveteam-ot [15:58] *** girst has quit IRC (Quit: ZNC 1.7.3 - https://znc.in) [16:01] *** girst has joined #archiveteam-ot [17:31] *** MaximeleG has joined #archiveteam-ot [17:39] Huh, TIL you can access Facebook posts based only on the post ID. E.g. https://www.facebook.com/10162401954220022 which then redirects to https://www.facebook.com/jungegruene.jeunesverts/posts/10162401954220022 [17:41] makes sense [17:41] Yeah, except nothing else makes sense on Facebook. :-P [17:56] One example: you can insert periods anywhere in the username and still get the profile. [18:11] *** manjaro-u has joined #archiveteam-ot [18:11] *** thuban1 is now known as thuban [18:17] ++ [18:18] well, facebook isn't alone with the periods. gmail / google accounts do that too [18:18] and i presume youtube usernames [18:19] *** girst has quit IRC (Quit: ZNC 1.7.5 - https://znc.in) [18:19] *** girst has joined #archiveteam-ot [18:20] That's something completely different. [18:20] And no, YouTube doesn't do this. [18:21] *** manjaro-u has quit IRC (Quit: Konversation terminated!) [18:21] Well ok, not completely different, but still not the same thing. [18:24] *** manjaro-u has joined #archiveteam-ot [18:24] *** katocala has quit IRC () [18:33] I've seen lots of email providers do that [18:33] you can log in / email / etc the user john.smith or johnsmith [18:33] just because the corporate world made the dot format so prevelent [18:34] maybe the university world [20:11] *** thuban has quit IRC (Read error: Connection reset by peer) [20:12] *** thuban1 has joined #archiveteam-ot [20:12] *** thuban1 is now known as thuban [20:18] *** icedice has joined #archiveteam-ot [20:19] Fusl : this (now outdated) paper suggests than an IP address is worth $100/day when run as a google cookie factory http://nsl.cs.columbia.edu/papers/2016/recaptcha.eurosp16.pdf [20:21] hm? [20:23] Watching Vsauce recently, and I"m reminded of this video he made 6 years ago, 'Where Do Deleted Files Go?': https://www.youtube.com/watch?v=G5s4-Kak49o [20:24] oh, well the summary is Google will use a user's tracking cookie as part of the risk assessment whether to bypass an image recaptcha. So they setup automation to emulate a human browser behavior for a bit to be able to present cookies to avoid the image captcha solving part. [20:52] *** MaximeleG has quit IRC (Quit: MaximeleG) [21:13] *** icedice has quit IRC (Quit: icedice) [22:36] *** BlueMax has joined #archiveteam-ot [22:58] *** godane has quit IRC (Ping timeout: 246 seconds) [23:09] *** dhyan_nat has joined #archiveteam-ot [23:15] *** godane has joined #archiveteam-ot