[00:22] *** kristian_ has quit IRC (Quit: Leaving) [00:39] *** Dimtree has quit IRC (Quit: Peace.) [00:42] *** dashcloud has quit IRC (Ping timeout: 255 seconds) [00:48] *** drumstick has quit IRC (Ping timeout: 255 seconds) [00:50] *** dashcloud has joined #archiveteam-bs [01:06] *** Dimtree has joined #archiveteam-bs [01:24] *** BlueMaxim has joined #archiveteam-bs [01:41] *** schbirid2 has joined #archiveteam-bs [01:44] *** schbirid has quit IRC (Read error: Operation timed out) [02:04] *** drumstick has joined #archiveteam-bs [02:33] *** Mateon1 has quit IRC (Ping timeout: 260 seconds) [02:34] *** Mateon1 has joined #archiveteam-bs [02:38] *** jspiros has quit IRC (leaving) [02:39] *** jspiros has joined #archiveteam-bs [03:33] SketchCow: does IA want 500TB of github.com [03:34] I don't have it yet, working out a plan to buy drives+bandwidth [03:34] And if I can sneakernet drives once a month it saves me $5-$10K [03:37] I plan to grab all git hosting sites, but in terms of % data that's a good ballpark number, it's mostly github these days [04:35] when was it ever not mostly github? :p [04:54] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:00] *** Sk1d has joined #archiveteam-bs [05:08] kisspunch: I think there's already a github archive [05:10] nope. there's githubarchive.org, which archives only the timeline, and ghtorrent, which archives all the metadata [05:10] or do you mean on IA? [05:11] also there are a lot of projects that claim they will someday be an archive of github and have <1TB of data [05:17] i have most of ghtorrent, except the really old stuff hosted on a separate server that's never accessible [05:17] and all of the timeline [05:24] uh some of the 1TB projects did archive the top-starred sites, which is pretty worthwhile, just not what i'm going to do [05:25] not trying to disparage any of these existing projects :) [05:48] *** Stilett0 has joined #archiveteam-bs [06:14] *** drumstick has quit IRC (Ping timeout: 255 seconds) [06:21] kisspunch: I thought githubarchive.org does archive the stuff on the repos [06:22] *** wabu has quit IRC (Read error: Operation timed out) [06:22] *** TC04 has quit IRC (Read error: Operation timed out) [06:23] !a https://diskprices.com/ [06:24] oops wrong channel [06:24] its in the right one now [06:24] *** TC01 has joined #archiveteam-bs [06:32] *** wabu has joined #archiveteam-bs [06:52] hook54321: githubarchive.org does not even archive all the metadata, no. it's just the events timeline. [06:56] *** drumstick has joined #archiveteam-bs [06:58] as per usual, i strong encourage someone to start the second archiver--that's ephemeral data and only one person is grabbing it [07:01] !a http://lite.cnn.io/ [07:01] sorry did it again [07:53] *** jspiros_ has joined #archiveteam-bs [07:55] *** closure has quit IRC (Read error: Operation timed out) [07:55] *** jspiros has quit IRC (Read error: Operation timed out) [07:56] *** refeed has joined #archiveteam-bs [07:56] *** refeed has quit IRC (Connection closed) [07:57] *** refeed has joined #archiveteam-bs [07:59] *** _refeed_ has joined #archiveteam-bs [08:05] *** kristian_ has joined #archiveteam-bs [08:07] *** _refeed_ has quit IRC (Remote host closed the connection) [08:17] *** kristian_ has quit IRC (Quit: Leaving) [08:18] *** closure has joined #archiveteam-bs [08:18] *** midas sets mode: +o closure [08:34] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [08:50] *** drumstick has quit IRC (Ping timeout: 255 seconds) [09:02] *** drumstick has joined #archiveteam-bs [09:09] *** schbirid2 has quit IRC (Quit: Leaving) [09:14] *** BartoCH has joined #archiveteam-bs [09:14] *** refeed has quit IRC (Ping timeout: 633 seconds) [11:17] *** drumstick has quit IRC (Ping timeout: 255 seconds) [11:48] *** refeed has joined #archiveteam-bs [11:48] *** refeed has quit IRC (Client Quit) [11:58] *** namibj1 has quit IRC (Read error: Operation timed out) [12:15] *** namibj1 has joined #archiveteam-bs [12:45] *** odemg has quit IRC (Read error: Operation timed out) [13:26] *** namibj1 has quit IRC (Ping timeout: 506 seconds) [13:30] *** SilSte has quit IRC (Ping timeout: 194 seconds) [14:05] *** godane has quit IRC (Read error: Operation timed out) [14:32] *** godane has joined #archiveteam-bs [15:32] *** odemg has joined #archiveteam-bs [16:03] *** fie_ has quit IRC (Ping timeout: 250 seconds) [16:17] *** fie_ has joined #archiveteam-bs [16:44] JAA: I can't talk much right now, but thought I should mention that the number of URLs for imgh.us on the Wayback Machine doesn't seem to be going up. [16:48] hook54321: Almost all of those jobs were on zino's pipeline, so they go through FOS first and might only show up on IA a few days later. Also, there were recently some issues where ArchiveBot grabs didn't show up in Wayback Machine even after weeks and although the derive job ran. So no reason to be alarmed (yet). [18:56] *** what_the_ has quit IRC (Quit: Page closed) [19:15] JAA (and anyone else that's interested in imgh.us): [19:18] Here's some of the things I think we should consider doing next [19:18] Grab imgh.us links from Reddit, then change them into the new URL format. [19:18] Use a simple list of words in a text file and to try to bruteforce more. [19:18] Attempt to compile all the URLs that we successfully grabbed. [19:35] *** TheLovina has quit IRC (Read error: Operation timed out) [20:05] *** dashcloud has quit IRC (Read error: Connection reset by peer) [20:20] *** DogsRNice has joined #archiveteam-bs [20:21] hello [20:22] *** frontop has joined #archiveteam-bs [20:41] *** jspiros has joined #archiveteam-bs [20:44] *** jspiros_ has quit IRC (Ping timeout: 492 seconds) [20:59] *** DogsRNice has quit IRC (Quit: Page closed) [21:17] *** n00b646 has joined #archiveteam-bs [21:18] hey all [21:37] *** n00b646 has quit IRC (Quit: Page closed) [21:52] *** kristian_ has joined #archiveteam-bs [21:56] *** TheLovina has joined #archiveteam-bs [22:05] *** drumstick has joined #archiveteam-bs [22:43] *** kristian_ has quit IRC (Quit: Leaving) [23:26] *** BartoCH has quit IRC (Quit: WeeChat 1.9) [23:27] *** etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)