[01:04] *** RichardG_ is now known as RichardG [01:12] *** Asparagir has quit IRC (Asparagir) [01:13] *** Asparagir has joined #archiveteam-bs [01:33] *** fie has joined #archiveteam-bs [01:36] *** yakfish has quit IRC (Operation timed out) [02:40] *** Asparagir has quit IRC (Asparagir) [03:09] *** yakfish has joined #archiveteam-bs [03:34] *** Asparagir has joined #archiveteam-bs [03:35] *** krazedkat has quit IRC (Quit: Leaving) [04:14] godane: http://calteches.library.caltech.edu/ -- Archive of Caltech magazine back to the 1930s; might be good for you to grab when you get a chance [04:30] *** ndiddy has quit IRC (Quit: Leaving) [04:31] It looks like it is part of a large open database of Caltech materials, so it's *probably* pretty safe where it is, though. [04:56] *** Asparagir has quit IRC (Asparagir) [05:12] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:18] *** Sk1d has joined #archiveteam-bs [05:26] *** Asparagir has joined #archiveteam-bs [05:29] *** Asparagir has quit IRC (Client Quit) [06:03] Somebody2: thanks [06:03] first Journal i have seen where the full issues archived [06:04] i always see science journals only put out the articles but no full issue scans [07:09] *** godane has quit IRC (Ping timeout: 250 seconds) [07:17] *** VADemon has joined #archiveteam-bs [07:18] *** godane has joined #archiveteam-bs [07:21] *** vitzli has joined #archiveteam-bs [07:43] *** Aranje has quit IRC (Ping timeout: 260 seconds) [08:34] *** VADemon_ has joined #archiveteam-bs [08:40] *** VADemon has quit IRC (Ping timeout: 370 seconds) [08:42] *** VADemon_ has quit IRC (Read error: Operation timed out) [09:18] *** Honno has joined #archiveteam-bs [09:30] *** schbirid has joined #archiveteam-bs [12:07] *** BlueMaxim has quit IRC (Quit: Leaving) [12:08] *** RichardG has quit IRC (Ping timeout: 244 seconds) [12:10] *** RichardG has joined #archiveteam-bs [12:33] ugh, 1MB/s to ACD right now. had gbit speeds earlier [13:25] you're lucky, last month i got ~30-80 kb/s, though I believe it has to do with ISP messing around [13:56] *** godane has left [13:57] *** godane has joined #archiveteam-bs [15:01] *** sep332 has joined #archiveteam-bs [15:06] *** vitzli has quit IRC (Leaving) [15:41] *** Boppen has quit IRC (Ping timeout: 194 seconds) [16:08] *** Boppen has joined #archiveteam-bs [16:13] *** Aranje has joined #archiveteam-bs [16:20] arkiver: in fccbda81dc24d605f74ecdc24bca290e74683c2b you broke the link to the IRC channel [16:39] arkiver: (in the ftp-gov-grab repo btw); IRC link was changed from cheetoflee to cheetoftp [16:45] yan: fixed. [16:52] *** VADemon has joined #archiveteam-bs [17:53] *** HCross2 has quit IRC (Ping timeout: 260 seconds) [18:09] *** johtso has joined #archiveteam-bs [18:09] *** HCross2 has joined #archiveteam-bs [19:08] anyone going to SHA2017? [20:11] *** Boppen has quit IRC (Quit: Nettalk6 - www.ntalk.de) [20:15] *** Boppen has joined #archiveteam-bs [20:48] *** GinhijiQu has joined #archiveteam-bs [20:49] API data is more useful for robot consumption and transformation. Accessing HTML pages is easier for human beings. [20:50] So, depends on your audience, GinhijiQu. [20:52] I just suspect that storing the whole webpages will lead to a lot of redundancy and waste storage that could be used to store more information? [20:54] Sure, you trade time to generate a visually appealing output for space. But then again HTML probably compresses well. [20:56] definitely does [20:57] How well will that go with blogs that include stuff like the bloated Flickr widgets? [20:58] I'd prefer to just grab the images and deduplicate the images, but then again that would probably require some modifications to the web pages. [20:59] Afaik grab-site implements deduplication, output is stored as WARC an can be played back with another piece of software. [21:01] Have a look at http://archiveteam.org/index.php?title=The_WARC_Ecosystem [21:07] Maybe I will make some tests tonight to see how well it works with these things I am most worrying about. [21:09] I guess a perfect archive would include both the HTML pages and data from the API embedded as comments or stored alongside the other documents, so there would be a way to upload blogs to other platforms later. (But that would probably be really too much data scaled across all of Tumblr.) [21:10] apis are nice, but generally they have limits and such, which isn't terribly helpful when you're trying to save a sinking ship [21:12] Tumblr has an old API which they didn't seem to care about that much a while ago. I didn't go for an extreme stress test but I never hit any rate limits either... :-) But maybe if 100 of clients would start accessing that API it would overload the servers, idk. [21:16] Also it doesn't require authentication. [21:51] *** tsr has joined #archiveteam-bs [21:52] *** BlueMaxim has joined #archiveteam-bs [22:01] *** ndiddy has joined #archiveteam-bs [22:01] *** GE has joined #archiveteam-bs [22:40] *** pizzaiolo has joined #archiveteam-bs [22:49] *** GE has quit IRC (Quit: zzz) [23:45] It looks like that HDD deal I posted the other day was an accident. Couple friends reporting their orders cancelled [23:50] *** Honno has quit IRC (Read error: Operation timed out)