#archiveteam-bs 2017-01-02,Mon

↑back Search

Time Nickname Message
01:04 🔗 RichardG_ is now known as RichardG
01:12 🔗 Asparagir has quit IRC (Asparagir)
01:13 🔗 Asparagir has joined #archiveteam-bs
01:33 🔗 fie has joined #archiveteam-bs
01:36 🔗 yakfish has quit IRC (Operation timed out)
02:40 🔗 Asparagir has quit IRC (Asparagir)
03:09 🔗 yakfish has joined #archiveteam-bs
03:34 🔗 Asparagir has joined #archiveteam-bs
03:35 🔗 krazedkat has quit IRC (Quit: Leaving)
04:14 🔗 Somebody2 godane: http://calteches.library.caltech.edu/ -- Archive of Caltech magazine back to the 1930s; might be good for you to grab when you get a chance
04:30 🔗 ndiddy has quit IRC (Quit: Leaving)
04:31 🔗 Somebody2 It looks like it is part of a large open database of Caltech materials, so it's *probably* pretty safe where it is, though.
04:56 🔗 Asparagir has quit IRC (Asparagir)
05:12 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
05:18 🔗 Sk1d has joined #archiveteam-bs
05:26 🔗 Asparagir has joined #archiveteam-bs
05:29 🔗 Asparagir has quit IRC (Client Quit)
06:03 🔗 godane Somebody2: thanks
06:03 🔗 godane first Journal i have seen where the full issues archived
06:04 🔗 godane i always see science journals only put out the articles but no full issue scans
07:09 🔗 godane has quit IRC (Ping timeout: 250 seconds)
07:17 🔗 VADemon has joined #archiveteam-bs
07:18 🔗 godane has joined #archiveteam-bs
07:21 🔗 vitzli has joined #archiveteam-bs
07:43 🔗 Aranje has quit IRC (Ping timeout: 260 seconds)
08:34 🔗 VADemon_ has joined #archiveteam-bs
08:40 🔗 VADemon has quit IRC (Ping timeout: 370 seconds)
08:42 🔗 VADemon_ has quit IRC (Read error: Operation timed out)
09:18 🔗 Honno has joined #archiveteam-bs
09:30 🔗 schbirid has joined #archiveteam-bs
12:07 🔗 BlueMaxim has quit IRC (Quit: Leaving)
12:08 🔗 RichardG has quit IRC (Ping timeout: 244 seconds)
12:10 🔗 RichardG has joined #archiveteam-bs
12:33 🔗 schbirid ugh, 1MB/s to ACD right now. had gbit speeds earlier
13:25 🔗 vitzli you're lucky, last month i got ~30-80 kb/s, though I believe it has to do with ISP messing around
13:56 🔗 godane has left
13:57 🔗 godane has joined #archiveteam-bs
15:01 🔗 sep332 has joined #archiveteam-bs
15:06 🔗 vitzli has quit IRC (Leaving)
15:41 🔗 Boppen has quit IRC (Ping timeout: 194 seconds)
16:08 🔗 Boppen has joined #archiveteam-bs
16:13 🔗 Aranje has joined #archiveteam-bs
16:20 🔗 yan arkiver: in fccbda81dc24d605f74ecdc24bca290e74683c2b you broke the link to the IRC channel
16:39 🔗 yan arkiver: (in the ftp-gov-grab repo btw); IRC link was changed from cheetoflee to cheetoftp
16:45 🔗 arkiver yan: fixed.
16:52 🔗 VADemon has joined #archiveteam-bs
17:53 🔗 HCross2 has quit IRC (Ping timeout: 260 seconds)
18:09 🔗 johtso has joined #archiveteam-bs
18:09 🔗 HCross2 has joined #archiveteam-bs
19:08 🔗 arkiver anyone going to SHA2017?
20:11 🔗 Boppen has quit IRC (Quit: Nettalk6 - www.ntalk.de)
20:15 🔗 Boppen has joined #archiveteam-bs
20:48 🔗 GinhijiQu has joined #archiveteam-bs
20:49 🔗 PurpleSym API data is more useful for robot consumption and transformation. Accessing HTML pages is easier for human beings.
20:50 🔗 PurpleSym So, depends on your audience, GinhijiQu.
20:52 🔗 GinhijiQu I just suspect that storing the whole webpages will lead to a lot of redundancy and waste storage that could be used to store more information?
20:54 🔗 PurpleSym Sure, you trade time to generate a visually appealing output for space. But then again HTML probably compresses well.
20:56 🔗 Aranje definitely does
20:57 🔗 GinhijiQu How well will that go with blogs that include stuff like the bloated Flickr widgets?
20:58 🔗 GinhijiQu I'd prefer to just grab the images and deduplicate the images, but then again that would probably require some modifications to the web pages.
20:59 🔗 PurpleSym Afaik grab-site implements deduplication, output is stored as WARC an can be played back with another piece of software.
21:01 🔗 PurpleSym Have a look at http://archiveteam.org/index.php?title=The_WARC_Ecosystem
21:07 🔗 GinhijiQu Maybe I will make some tests tonight to see how well it works with these things I am most worrying about.
21:09 🔗 GinhijiQu I guess a perfect archive would include both the HTML pages and data from the API embedded as comments or stored alongside the other documents, so there would be a way to upload blogs to other platforms later. (But that would probably be really too much data scaled across all of Tumblr.)
21:10 🔗 dashcloud apis are nice, but generally they have limits and such, which isn't terribly helpful when you're trying to save a sinking ship
21:12 🔗 GinhijiQu Tumblr has an old API which they didn't seem to care about that much a while ago. I didn't go for an extreme stress test but I never hit any rate limits either... :-) But maybe if 100 of clients would start accessing that API it would overload the servers, idk.
21:16 🔗 GinhijiQu Also it doesn't require authentication.
21:51 🔗 tsr has joined #archiveteam-bs
21:52 🔗 BlueMaxim has joined #archiveteam-bs
22:01 🔗 ndiddy has joined #archiveteam-bs
22:01 🔗 GE has joined #archiveteam-bs
22:40 🔗 pizzaiolo has joined #archiveteam-bs
22:49 🔗 GE has quit IRC (Quit: zzz)
23:45 🔗 HCross2 It looks like that HDD deal I posted the other day was an accident. Couple friends reporting their orders cancelled
23:50 🔗 Honno has quit IRC (Read error: Operation timed out)

irclogger-viewer