Time |
Nickname |
Message |
01:04
🔗
|
|
RichardG_ is now known as RichardG |
01:12
🔗
|
|
Asparagir has quit IRC (Asparagir) |
01:13
🔗
|
|
Asparagir has joined #archiveteam-bs |
01:33
🔗
|
|
fie has joined #archiveteam-bs |
01:36
🔗
|
|
yakfish has quit IRC (Operation timed out) |
02:40
🔗
|
|
Asparagir has quit IRC (Asparagir) |
03:09
🔗
|
|
yakfish has joined #archiveteam-bs |
03:34
🔗
|
|
Asparagir has joined #archiveteam-bs |
03:35
🔗
|
|
krazedkat has quit IRC (Quit: Leaving) |
04:14
🔗
|
Somebody2 |
godane: http://calteches.library.caltech.edu/ -- Archive of Caltech magazine back to the 1930s; might be good for you to grab when you get a chance |
04:30
🔗
|
|
ndiddy has quit IRC (Quit: Leaving) |
04:31
🔗
|
Somebody2 |
It looks like it is part of a large open database of Caltech materials, so it's *probably* pretty safe where it is, though. |
04:56
🔗
|
|
Asparagir has quit IRC (Asparagir) |
05:12
🔗
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
05:18
🔗
|
|
Sk1d has joined #archiveteam-bs |
05:26
🔗
|
|
Asparagir has joined #archiveteam-bs |
05:29
🔗
|
|
Asparagir has quit IRC (Client Quit) |
06:03
🔗
|
godane |
Somebody2: thanks |
06:03
🔗
|
godane |
first Journal i have seen where the full issues archived |
06:04
🔗
|
godane |
i always see science journals only put out the articles but no full issue scans |
07:09
🔗
|
|
godane has quit IRC (Ping timeout: 250 seconds) |
07:17
🔗
|
|
VADemon has joined #archiveteam-bs |
07:18
🔗
|
|
godane has joined #archiveteam-bs |
07:21
🔗
|
|
vitzli has joined #archiveteam-bs |
07:43
🔗
|
|
Aranje has quit IRC (Ping timeout: 260 seconds) |
08:34
🔗
|
|
VADemon_ has joined #archiveteam-bs |
08:40
🔗
|
|
VADemon has quit IRC (Ping timeout: 370 seconds) |
08:42
🔗
|
|
VADemon_ has quit IRC (Read error: Operation timed out) |
09:18
🔗
|
|
Honno has joined #archiveteam-bs |
09:30
🔗
|
|
schbirid has joined #archiveteam-bs |
12:07
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
12:08
🔗
|
|
RichardG has quit IRC (Ping timeout: 244 seconds) |
12:10
🔗
|
|
RichardG has joined #archiveteam-bs |
12:33
🔗
|
schbirid |
ugh, 1MB/s to ACD right now. had gbit speeds earlier |
13:25
🔗
|
vitzli |
you're lucky, last month i got ~30-80 kb/s, though I believe it has to do with ISP messing around |
13:56
🔗
|
|
godane has left |
13:57
🔗
|
|
godane has joined #archiveteam-bs |
15:01
🔗
|
|
sep332 has joined #archiveteam-bs |
15:06
🔗
|
|
vitzli has quit IRC (Leaving) |
15:41
🔗
|
|
Boppen has quit IRC (Ping timeout: 194 seconds) |
16:08
🔗
|
|
Boppen has joined #archiveteam-bs |
16:13
🔗
|
|
Aranje has joined #archiveteam-bs |
16:20
🔗
|
yan |
arkiver: in fccbda81dc24d605f74ecdc24bca290e74683c2b you broke the link to the IRC channel |
16:39
🔗
|
yan |
arkiver: (in the ftp-gov-grab repo btw); IRC link was changed from cheetoflee to cheetoftp |
16:45
🔗
|
arkiver |
yan: fixed. |
16:52
🔗
|
|
VADemon has joined #archiveteam-bs |
17:53
🔗
|
|
HCross2 has quit IRC (Ping timeout: 260 seconds) |
18:09
🔗
|
|
johtso has joined #archiveteam-bs |
18:09
🔗
|
|
HCross2 has joined #archiveteam-bs |
19:08
🔗
|
arkiver |
anyone going to SHA2017? |
20:11
🔗
|
|
Boppen has quit IRC (Quit: Nettalk6 - www.ntalk.de) |
20:15
🔗
|
|
Boppen has joined #archiveteam-bs |
20:48
🔗
|
|
GinhijiQu has joined #archiveteam-bs |
20:49
🔗
|
PurpleSym |
API data is more useful for robot consumption and transformation. Accessing HTML pages is easier for human beings. |
20:50
🔗
|
PurpleSym |
So, depends on your audience, GinhijiQu. |
20:52
🔗
|
GinhijiQu |
I just suspect that storing the whole webpages will lead to a lot of redundancy and waste storage that could be used to store more information? |
20:54
🔗
|
PurpleSym |
Sure, you trade time to generate a visually appealing output for space. But then again HTML probably compresses well. |
20:56
🔗
|
Aranje |
definitely does |
20:57
🔗
|
GinhijiQu |
How well will that go with blogs that include stuff like the bloated Flickr widgets? |
20:58
🔗
|
GinhijiQu |
I'd prefer to just grab the images and deduplicate the images, but then again that would probably require some modifications to the web pages. |
20:59
🔗
|
PurpleSym |
Afaik grab-site implements deduplication, output is stored as WARC an can be played back with another piece of software. |
21:01
🔗
|
PurpleSym |
Have a look at http://archiveteam.org/index.php?title=The_WARC_Ecosystem |
21:07
🔗
|
GinhijiQu |
Maybe I will make some tests tonight to see how well it works with these things I am most worrying about. |
21:09
🔗
|
GinhijiQu |
I guess a perfect archive would include both the HTML pages and data from the API embedded as comments or stored alongside the other documents, so there would be a way to upload blogs to other platforms later. (But that would probably be really too much data scaled across all of Tumblr.) |
21:10
🔗
|
dashcloud |
apis are nice, but generally they have limits and such, which isn't terribly helpful when you're trying to save a sinking ship |
21:12
🔗
|
GinhijiQu |
Tumblr has an old API which they didn't seem to care about that much a while ago. I didn't go for an extreme stress test but I never hit any rate limits either... :-) But maybe if 100 of clients would start accessing that API it would overload the servers, idk. |
21:16
🔗
|
GinhijiQu |
Also it doesn't require authentication. |
21:51
🔗
|
|
tsr has joined #archiveteam-bs |
21:52
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
22:01
🔗
|
|
ndiddy has joined #archiveteam-bs |
22:01
🔗
|
|
GE has joined #archiveteam-bs |
22:40
🔗
|
|
pizzaiolo has joined #archiveteam-bs |
22:49
🔗
|
|
GE has quit IRC (Quit: zzz) |
23:45
🔗
|
HCross2 |
It looks like that HDD deal I posted the other day was an accident. Couple friends reporting their orders cancelled |
23:50
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |