[00:15] *** spacedog has quit IRC (Ping timeout: 268 seconds) [00:59] *** maelstrom has joined #archiveteam [01:04] *** RichardG_ is now known as RichardG [01:12] *** Asparagir has quit IRC (Asparagir) [01:13] *** Asparagir has joined #archiveteam [01:33] *** fie has joined #archiveteam [01:36] *** yakfish has quit IRC (Operation timed out) [02:40] *** Asparagir has quit IRC (Asparagir) [03:04] *** i336__ has quit IRC (Read error: Operation timed out) [03:08] *** yakfish has joined #archiveteam [03:34] *** Asparagir has joined #archiveteam [03:35] *** krazedkat has quit IRC (Quit: Leaving) [03:36] *** maelstrom has left Leaving [04:21] *** vOYtEC has quit IRC (Ping timeout: 370 seconds) [04:30] *** ndiddy has quit IRC (Quit: Leaving) [04:56] *** Asparagir has quit IRC (Asparagir) [05:12] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:18] *** Sk1d has joined #archiveteam [05:26] *** Asparagir has joined #archiveteam [05:29] *** Asparagir has quit IRC (Client Quit) [07:09] *** godane has quit IRC (Ping timeout: 250 seconds) [07:17] *** VADemon has joined #archiveteam [07:18] *** godane has joined #archiveteam [07:21] *** vitzli has joined #archiveteam [07:43] *** Aranje has quit IRC (Ping timeout: 260 seconds) [08:34] *** VADemon_ has joined #archiveteam [08:40] *** VADemon has quit IRC (Ping timeout: 370 seconds) [08:42] *** VADemon_ has quit IRC (Read error: Operation timed out) [09:18] *** Honno has joined #archiveteam [09:30] *** schbirid has joined #archiveteam [09:46] *** mhazinsk has quit IRC (Read error: Operation timed out) [09:49] *** mhazinsk has joined #archiveteam [10:29] *** redlob_ has quit IRC (Quit: ZNC - http://znc.in) [10:32] *** redlob has joined #archiveteam [12:07] *** BlueMaxim has quit IRC (Quit: Leaving) [12:08] *** RichardG has quit IRC (Ping timeout: 244 seconds) [12:10] *** RichardG has joined #archiveteam [12:59] *** atomotic has joined #archiveteam [13:56] *** godane has left [13:57] *** godane has joined #archiveteam [14:26] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [15:01] *** sep332 has joined #archiveteam [15:06] *** vitzli has quit IRC (Leaving) [15:31] *** atomotic has joined #archiveteam [15:33] http://inside-ottensen.de/ is closing [15:40] I'll add it to the bot [15:41] *** Boppen has quit IRC (Ping timeout: 194 seconds) [15:43] thanks [16:08] *** Boppen has joined #archiveteam [16:11] *** alterego has left [16:13] *** Aranje has joined #archiveteam [16:43] *** atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) [16:44] *** atomotic has joined #archiveteam [16:52] *** VADemon has joined #archiveteam [17:00] *** atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) [17:05] *** atomotic has joined #archiveteam [17:07] ------------------------------------------------- [17:07] CONGRATULATIONS, FOS IS FULL [17:07] ------------------------------------------------- [17:30] So, first, Rsync is now down for a little on FOS while I straighted out the mess. (Load Average was at 120) [17:30] yeah, sorry I didn't notice until I woke up this morning [17:31] didn't expect it to fill overnight [17:31] It's not hard to triage. [17:31] FTPGOV is taking FOS forever to make Megawarcs [17:31] Meanwhile, Google code went full bore into it [17:32] I only do one MegaWarcing at the time [17:34] Load average is now at 9 [17:34] I'm doing some mvs of items [17:35] just to get it down to where it can do work [17:52] *** maxogden has quit IRC (Ping timeout: 260 seconds) [17:52] Clawing its miserable life to 213gb, I'm now doing some google code pushes [17:52] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [17:52] *** maxogden has joined #archiveteam [17:53] *** VonGuard has quit IRC (Ping timeout: 260 seconds) [17:53] I doubt anything was lost lost, but maybe [17:53] *** HCross2 has quit IRC (Ping timeout: 260 seconds) [17:53] *** johtso has quit IRC (Ping timeout: 260 seconds) [17:53] I think we pretty much have the system that if rsync says to go to hell or fails, stuff doesn't end up there [17:54] and none of the scripts delete anything until a successful push [17:57] *** abartov__ has joined #archiveteam [17:59] *** VonGuard has joined #archiveteam [18:07] The Google Code and FTP stuff on the machine is being packed up and send to IA, after I get a good toehold I'll turn on rsync again [18:09] *** johtso has joined #archiveteam [18:09] *** HCross2 has joined #archiveteam [18:10] Thanks for the fix! [18:19] *** Froggypwn has quit IRC (Ping timeout: 244 seconds) [18:31] *** Froggypwn has joined #archiveteam [18:46] *** bRick5772 has joined #archiveteam [19:27] Google Code is working and uploading. [19:28] FTPGOV same [20:11] *** Boppen has quit IRC (Quit: Nettalk6 - www.ntalk.de) [20:13] *** maelstrom has joined #archiveteam [20:15] *** Boppen has joined #archiveteam [20:38] *** GinhijiQu has joined #archiveteam [20:44] Hi. I was wondering if anyone has a clear opinion on whether it is better to archive webpages (specifically Tumblr) as is or use APIs to get all the raw data available and store that in some kind of XML format? [20:47] it depends on the specific case. in general, archiving "as is" is more useful e.g. because it can be readily digested by archivebot and put into wayback machine, and usually you can derive all data you'd like from it post-scrape [20:48] but maybe the API gives off data that's not present on the website? anyway, let's continue this in #archiveteam-bs please [21:52] *** BlueMaxim has joined #archiveteam [22:01] *** ndiddy has joined #archiveteam [22:09] GinhijiQu: both is best, whatever is higher-fidelity to the source data is preferred, html is a very important component but it's much less parseable [22:09] er, ok [22:40] *** pizzaiolo has joined #archiveteam [22:43] *** i336__ has joined #archiveteam [23:01] *** bRick5772 has quit IRC (Quit: Leaving.) [23:50] *** Honno has quit IRC (Read error: Operation timed out)