[02:33] *** BlueMax has quit IRC (Read error: Connection reset by peer) [02:35] *** BlueMax has joined #archiveteam-bs [03:03] *** powerKitt has joined #archiveteam-bs [03:03] Hey, does anyone know a good tool to scrape Mastodon instances? [03:22] *** powerKitt has quit IRC (Quit: Page closed) [03:23] *** SketchCo1 has joined #archiveteam-bs [03:23] *** SketchCow has quit IRC (Read error: Connection reset by peer) [03:23] *** MrRadar has quit IRC (Read error: Operation timed out) [03:23] *** Cameron_D has quit IRC (Read error: Operation timed out) [03:23] *** dxrt has quit IRC (Read error: Operation timed out) [03:23] *** cf has quit IRC (Write error: Broken pipe) [03:23] *** slyphic has quit IRC (Read error: Operation timed out) [03:23] *** nightpool has quit IRC (Read error: Operation timed out) [03:24] *** nightpool has joined #archiveteam-bs [03:24] *** m007a83_ has joined #archiveteam-bs [03:24] *** unlobito has quit IRC (Read error: Operation timed out) [03:24] *** unlobito has joined #archiveteam-bs [03:25] *** Igloo_ has joined #archiveteam-bs [03:25] *** Igloo has quit IRC (Write error: Broken pipe) [03:26] *** Darkstar has quit IRC (Read error: Connection reset by peer) [03:26] *** Atom has quit IRC (Read error: Operation timed out) [03:26] *** twigfoot has quit IRC (Read error: Operation timed out) [03:26] *** SynMonger has quit IRC (Read error: Operation timed out) [03:26] *** twigfoot has joined #archiveteam-bs [03:27] *** dxrt has joined #archiveteam-bs [03:27] *** svchfoo1 sets mode: +o dxrt [03:27] *** Coderjo has quit IRC (Read error: Connection reset by peer) [03:27] *** Coderjo has joined #archiveteam-bs [03:28] *** SynMonger has joined #archiveteam-bs [03:29] *** m007a83 has quit IRC (Read error: Operation timed out) [03:29] *** Cameron_D has joined #archiveteam-bs [03:30] *** Darkstar has joined #archiveteam-bs [03:31] *** slyphic has joined #archiveteam-bs [03:31] *** MrRadar has joined #archiveteam-bs [03:31] *** svchfoo1 sets mode: +o MrRadar [03:38] *** cf has joined #archiveteam-bs [03:54] *** qw3rty119 has joined #archiveteam-bs [04:00] *** qw3rty118 has quit IRC (Read error: Operation timed out) [04:01] *** zyphlar_ has joined #archiveteam-bs [04:15] *** swebb has quit IRC (Read error: Operation timed out) [04:17] *** godane has quit IRC (Leaving.) [04:17] *** godane has joined #archiveteam-bs [04:17] *** svchfoo3 sets mode: +o godane [04:17] *** atlogbot has quit IRC (Read error: Operation timed out) [04:41] *** vitzli has joined #archiveteam-bs [04:49] *** Lord_Nigh has quit IRC (Ping timeout: 252 seconds) [04:51] *** Lord_Nigh has joined #archiveteam-bs [04:52] *** svchfoo1 has quit IRC (Ping timeout: 268 seconds) [04:53] *** dxrt_ has quit IRC (Ping timeout: 268 seconds) [05:22] *** godane has quit IRC (Ping timeout: 260 seconds) [05:27] *** godane has joined #archiveteam-bs [05:27] *** svchfoo3 sets mode: +o godane [05:28] *** atlogbot has joined #archiveteam-bs [05:29] *** swebb has joined #archiveteam-bs [05:29] *** svchfoo3 sets mode: +v atlogbot [05:30] *** svchfoo1 has joined #archiveteam-bs [05:31] *** dxrt_ has joined #archiveteam-bs [05:31] *** dxrt sets mode: +o dxrt_ [05:31] *** svchfoo3 sets mode: +o svchfoo1 [05:49] *** Mateon1 has quit IRC (Ping timeout: 252 seconds) [05:49] *** Mateon1 has joined #archiveteam-bs [06:01] *** vitzli has quit IRC (Leaving) [06:02] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [06:03] *** Lord_Nigh has joined #archiveteam-bs [06:53] *** schbirid has joined #archiveteam-bs [07:02] *** jschwart has joined #archiveteam-bs [08:11] *** godane has quit IRC (Ping timeout: 506 seconds) [09:18] *** godane has joined #archiveteam-bs [09:20] here is my screenshot-webpage.sh script : https://pastebin.com/aycns7Ne [09:21] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [09:30] *** Lord_Nigh has joined #archiveteam-bs [09:43] *** Mateon1 has quit IRC (Remote host closed the connection) [09:43] *** Mateon1 has joined #archiveteam-bs [09:57] *** BartoCH has quit IRC (Quit: WeeChat 2.1) [10:02] *** BartoCH has joined #archiveteam-bs [10:41] flickr has been taken over [10:43] https://www.usatoday.com/story/tech/2018/04/20/smugmug-buys-flickr-verizon-oath/537377002/ [10:46] BLoody hell, download.cnet is worse than ever [10:46] Click on the download button, "HI, WELCOME TO DOWNLOAD" [10:51] it's been like that for years [11:01] Re Flickr: That reminds me I still have 225G of metadata for almost 5 billion photos. [11:05] pls upload [11:07] is anyone still interested in or maybe even doing Tumblr archiving efforts? (similar route, but afaik still owned by Verizon?) [11:08] Oh, apparently I already did that, plue. See https://archive.org/download/flickr-metadata-2016 [11:08] neat [11:09] Never generated any plots though :( [11:10] grabbing the data now, will take some time tho. [11:11] Like this one: https://6xq.net/paste/megapixel.svg [11:11] What are you going to do with it, plue ? [11:12] i'm still occupied with tumblr. got around 19 million usernames, scraping the blog/[uuid]/info api endpoint for them atm. however i'd like to look into the flickr dataset and maybe get estimates about how much data is on there, ... [11:14] s/estimates/an estimate/ [11:14] How would you approach that? There’s no file size attribute in the metadata. [11:14] ugh [11:15] what is in the metadata? is it https://www.flickr.com/services/api/flickr.photos.getInfo.html [11:15] No, photos.search [11:17] There’s stuff like title/description, create/upload dates, tags, geotags, views, resolution. [11:18] that's more like photos.getInfo tho? [11:18] https://www.flickr.com/services/api/flickr.photos.search.html [11:18] is just photoid, secret, owner, basically [11:19] No, you can use the extras parameter to request more information. [11:19] Otherwise I’d have to make 5 billion API requests. And I did not do that. [11:19] ^^ [11:20] tags sound interesting as well [11:21] Yeah. Too bad EXIF tags are not included. [12:20] plue: do you have a way to get all tumblr usernames? [12:25] Some of Flickr's free images are mirrored on Wikimedia Commons. [12:25] by a bot [12:26] No, verified by a bot, though some are bot-assisted uploads [12:26] bmcginty: no, but one can scrape tons of tumblr usernames via the undocumented blog/[uuid]/notes api endpoint [12:32] *** BlueMax has quit IRC (Read error: Connection reset by peer) [12:37] plue: Awesome. I can toss a machine or space on that if you want a hand. [12:48] oh yeah that would help. i have to write a better performing script first tho. it's just a shellscript calling curl atm. i'll ping you later. [13:11] plue: okay. please pm or I may never see it. [13:11] alright [13:25] *** wp494 has quit IRC (Ping timeout: 492 seconds) [13:26] *** wp494 has joined #archiveteam-bs [13:27] *** svchfoo1 sets mode: +o wp494 [14:09] youtube video about Babaxia (a community-build network) : https://www.youtube.com/watch?v=jkpTry8M6gg [14:09] in Brazil [14:22] this is basically my idea for the archivebox [14:25] *** eientei95 has quit IRC (Quit: ZNC 1.6.5 - http://znc.in) [14:44] “Flickr is an amazing community, full of some of the world's most passionate photographers. It’s a fantastic product and a beloved brand, supplying tens of billions of photos to hundreds of millions of people around the world,” MacAskill said. “Flickr has survived through thick-and-thin and is core to the entire fabric of the Internet. [14:44] hey, that actually sounds promising [14:48] *** schbirid has quit IRC (Quit: Leaving) [14:55] *** antomatic has joined #archiveteam-bs [14:59] *** antomati_ has quit IRC (Ping timeout: 260 seconds) [15:09] I'm cautiously optimistic [15:11] *** Gfy_ is now known as Gfy [16:51] *** REiN^ has quit IRC (Read error: Operation timed out) [16:58] *** REiN^ has joined #archiveteam-bs [17:05] *** godane has quit IRC (Ping timeout: 252 seconds) [17:26] *** godane has joined #archiveteam-bs [17:26] *** svchfoo3 sets mode: +o godane [17:43] *** godane has quit IRC (Quit: Leaving.) [17:43] *** godane has joined #archiveteam-bs [18:06] *** noirscape has quit IRC (ZNC 1.6.5+deb1 - http://znc.in) [18:07] *** noirscape has joined #archiveteam-bs [18:31] *** plue has quit IRC (Ping timeout: 260 seconds) [18:38] *** plue has joined #archiveteam-bs [20:00] *** Pixi has quit IRC (Quit: Pixi) [20:09] *** Zexaron has joined #archiveteam-bs [20:34] *** Pixi has joined #archiveteam-bs [21:05] *** Atom has joined #archiveteam-bs [21:10] *** Atom-- has joined #archiveteam-bs [21:10] *** godane has quit IRC (Ping timeout: 252 seconds) [21:14] *** Atom has quit IRC (Read error: Operation timed out) [21:28] *** bwn has quit IRC (Read error: Operation timed out) [21:34] *** Pixi has quit IRC (Ping timeout: 255 seconds) [21:37] *** Pixi has joined #archiveteam-bs [21:47] *** BlueMax has joined #archiveteam-bs [21:48] *** bwn has joined #archiveteam-bs [21:56] *** Mateon1 has quit IRC (Remote host closed the connection) [21:56] *** Mateon1 has joined #archiveteam-bs [22:00] *** Lord_Nigh has quit IRC (Ping timeout: 268 seconds) [22:00] *** dxrt_ has quit IRC (Ping timeout: 268 seconds) [22:00] *** svchfoo1 has quit IRC (Ping timeout: 268 seconds) [22:05] *** Lord_Nigh has joined #archiveteam-bs [22:39] *** svchfoo1 has joined #archiveteam-bs [22:39] *** dxrt_ has joined #archiveteam-bs [22:39] *** dxrt sets mode: +o dxrt_ [22:39] *** svchfoo3 sets mode: +o svchfoo1 [23:40] *** ndiddy has quit IRC () [23:49] *** lindalap_ has joined #archiveteam-bs [23:49] *** lindalap has quit IRC (Read error: Connection reset by peer) [23:49] *** lindalap_ is now known as lindalap [23:56] *** ndiddy has joined #archiveteam-bs