[00:22] SketchCow: you are only uploading the 'second' telenor archive as megawarcs right? [00:23] *** Arkiver2 is now known as arkiver [00:24] I like that NewsGrabber is getting some attention [00:25] SketchCow: what's the manager thinking of how newsgrabber is going, vs. GDELT? [00:27] Yeah. It is nice to see it going so well. [00:34] yep [00:34] arkiver: it is? [00:34] (getting attention) [00:34] yes [00:35] arkiver: link? [00:36] It's getting attention from the wayback machine manager [00:36] Not like global [00:36] yet [00:36] oh, in that sense [00:37] Yet being the key word [00:39] Yes [00:42] GitHub down \o/ [00:45] Atlassian Army strikes again [00:45] I think service outages are more fun if you imagine them as overly simplistic battles between two major competitors [00:48] or maybe Age of Empires games where you have that one asshole who types in the car cheat code [00:53] "Click to select this one %s" [00:53] and that cheat, wasn't it "how do you turn this on" or something like that [00:54] (how the fuck do i know this. too much fucking about in aoe2 single player games i guess) [00:58] lol [00:59] downtimes are fun only when you can observe the ensuing chaos and then optionally know about the cause [01:00] I want my TiddlyWiki Desktop :( [01:30] *** vitzli has joined #archiveteam-bs [01:54] *** JesseW has joined #archiveteam-bs [02:16] "The odds of Github meeting a fate similar to that of the Library of Alexandria are slim." [02:16] welp [02:16] so much for taking that article seriously, I guess [02:16] (ref http://www.wired.com/2015/06/problem-putting-worlds-code-github/ ) [02:17] wonderful morning [02:31] still down is it? [02:32] ah, apparently it's partially back up [02:34] "Slim"? Really? I mean, I don't think it's going away especially *soon*, but it's not as though corporate digital data storage has a great long-term track record. [02:48] *** VADemon has quit IRC (Quit: left4dead) [03:11] *** DFJustin has quit IRC (Quit: IMHOSTFU) [03:11] *** DFJustin has joined #archiveteam-bs [03:47] heh [03:48] ive got half of github in cloud storage [03:48] totes reliable [03:50] *** kyan has quit IRC (This computer has gone to sleep) [03:52] *** kyan has joined #archiveteam-bs [04:07] *** JetBalsa has quit IRC (Read error: Connection reset by peer) [04:08] *** atlogbot has quit IRC (Ping timeout: 360 seconds) [04:09] *** Zebranky_ has joined #archiveteam-bs [04:12] *** Zebranky has quit IRC (Read error: Operation timed out) [04:18] *** atlogbot has joined #archiveteam-bs [04:18] *** beardicus has quit IRC (Read error: Operation timed out) [04:18] *** kvieta has quit IRC (Read error: Operation timed out) [04:18] *** dxrt has quit IRC (Read error: Operation timed out) [04:20] *** slyphic has quit IRC (Read error: Operation timed out) [04:20] *** toad1 has quit IRC (Read error: Operation timed out) [04:20] *** kyan_ has joined #archiveteam-bs [04:22] *** Sanqui has quit IRC (Read error: Operation timed out) [04:24] *** kyan has quit IRC (Read error: Operation timed out) [04:25] *** kyan_ is now known as kyan [04:27] *** phuzion has quit IRC (Ping timeout: 633 seconds) [04:27] *** toad1 has joined #archiveteam-bs [04:28] *** kvieta has joined #archiveteam-bs [04:28] *** dxrt has joined #archiveteam-bs [04:31] *** phuzion has joined #archiveteam-bs [04:32] *** slyphic has joined #archiveteam-bs [04:37] *** Sanqui has joined #archiveteam-bs [04:42] *** JesseW has quit IRC (Ping timeout: 246 seconds) [04:53] *** chazchaz has quit IRC (Ping timeout: 260 seconds) [04:54] *** chazchaz has joined #archiveteam-bs [05:31] *** acridAxid has quit IRC (Quit: marauder) [05:32] *** beardicus has joined #archiveteam-bs [05:32] *** acridAxid has joined #archiveteam-bs [05:36] *** kvieta has quit IRC (Read error: Operation timed out) [05:36] *** phuzion has quit IRC (Read error: Operation timed out) [05:37] *** beardicus has quit IRC (Read error: Operation timed out) [05:39] *** JesseW has joined #archiveteam-bs [05:39] *** phuzion has joined #archiveteam-bs [05:42] *** SimpBrain has quit IRC (Ping timeout: 260 seconds) [05:53] *** kyan has quit IRC (Ping timeout: 260 seconds) [05:56] *** mistym has quit IRC (Ping timeout: 633 seconds) [05:56] *** kyan has joined #archiveteam-bs [05:56] *** mistym has joined #archiveteam-bs [06:00] *** kyan_ has joined #archiveteam-bs [06:02] *** robink has quit IRC (Ping timeout: 190 seconds) [06:03] *** kyan has quit IRC (Ping timeout: 260 seconds) [06:03] *** robink has joined #archiveteam-bs [06:04] this is probably the most annoying thing about the S3 upload endpoint: [06:04] uploading docstoc20151218032352.megawarc.warc.gz: [################################] 25625/25625 - 00:00:00 [06:04] error uploading docstoc20151218032352.megawarc.warc.gz to archiveteam_docstoc_20151218032352, 403 Client Error: Forbidden [06:04] for some reason it will only return 4xx *after* you upload everything [06:05] *** beardicus has joined #archiveteam-bs [06:05] (also I don't know why my keys are still denied but that's a different problem) [06:06] *** kvieta has joined #archiveteam-bs [06:29] *** SimpBrain has joined #archiveteam-bs [06:31] *** SimpBrain has quit IRC (Read error: Operation timed out) [06:43] *** mismatchm has quit IRC (Ping timeout: 360 seconds) [06:46] *** SimpBrain has joined #archiveteam-bs [07:11] *** kyan_ has quit IRC (Ping timeout: 260 seconds) [07:35] yipdw: are you surprised that an HTTP response can't come before the HTTP request is finished? ;) [07:36] ivan`: well there's a 100-Continue response in there before the giant upload [07:36] ah I see [07:36] so I'd prefer to be told then, yes :P [07:36] or at least curl reports one if you POST to the endpoint [07:37] I'm not sure what ia upload does but it seems to use the same mechanism [07:56] yipdw: I totally agree. With the NewsGrabber, by the time the IA has errored, we've uploaded 10GB [07:58] i'm grabbing a show called Prime Time from americanarchive.org [07:59] its from 1982/1983 on Rocky Mountain PBS [08:16] *** vitzli has quit IRC (Leaving) [08:17] *** JesseW has quit IRC (Leaving.) [08:21] i had to stop uploading [08:21] i have 735 items waiting on derive [08:27] i'm at 629,870 items right now [08:30] *** kyan has joined #archiveteam-bs [11:35] *** vitzli has joined #archiveteam-bs [12:24] *** beardicus has quit IRC (Read error: Operation timed out) [12:26] *** kvieta has quit IRC (Read error: Operation timed out) [12:28] *** SimpBrain has quit IRC (Ping timeout: 633 seconds) [12:39] *** kvieta has joined #archiveteam-bs [12:39] *** beardicus has joined #archiveteam-bs [12:42] *** SimpBrain has joined #archiveteam-bs [12:53] *** kvieta has quit IRC (Read error: Operation timed out) [12:55] *** beardicus has quit IRC (Ping timeout: 961 seconds) [13:13] *** kvieta has joined #archiveteam-bs [13:14] *** beardicus has joined #archiveteam-bs [13:28] *** logan has quit IRC (Remote host closed the connection) [13:28] *** logan has joined #archiveteam-bs [13:45] *** Fletcher has quit IRC (Ping timeout: 252 seconds) [13:59] *** dashcloud has quit IRC (Ping timeout: 250 seconds) [14:06] *** dashcloud has joined #archiveteam-bs [14:46] *** Fletcher has joined #archiveteam-bs [14:50] *** dashcloud has quit IRC (Read error: Operation timed out) [14:54] *** dashcloud has joined #archiveteam-bs [15:20] *** VADemon has joined #archiveteam-bs [15:29] *** RichardG_ has joined #archiveteam-bs [15:30] *** RichardG has quit IRC (Ping timeout: 260 seconds) [15:41] 1. No idea if I screwed up Telenor. [15:41] 2. Yes, this manager is trying to keep track of the work we do. He'll have fun. We're massive. [15:43] and are only getting bigger [15:45] i'm at 630,508 items now [15:51] *** RichardG_ has quit IRC (Remote host closed the connection) [15:51] *** RichardG has joined #archiveteam-bs [15:54] *** HCross has quit IRC (Read error: Connection reset by peer) [15:54] *** Stiletto has quit IRC (Read error: Connection reset by peer) [15:54] *** brayden has quit IRC (Read error: Connection reset by peer) [15:54] *** Stiletto has joined #archiveteam-bs [15:54] *** HCross has joined #archiveteam-bs [15:54] *** brayden has joined #archiveteam-bs [16:00] herding cats commences [16:00] lol [16:07] was looking at FTP project and found a few SimTel.net mirrors that include variations [16:09] Always good to get. [16:10] tbh i think the largest differences are things that got squashed by C&Ds (contents of /msdos/pcmag are still there) [16:12] in case you're curious: ftp://ftp.sunet.se/pub/simtelnet/ [16:14] i might have uploaded that one [16:14] the mirror on freenet.de also has some differences from the mirror currently stored on IA (from bu.edu), but not yet sure what that means (more/less, newer/older) [16:15] *** dashcloud has quit IRC (Read error: Operation timed out) [16:15] *** dashcloud has joined #archiveteam-bs [16:15] https://archive.org/details/ftp.sunet.se-simtelwin [16:16] ftp.sunet.se/pub/simtelnet/msdos/pcmag/ [16:16] yup, nice [16:16] complete dump was a couple of TB in size, so i've split that one up into a couple of parts [17:03] I could try to do audit/check/whatever it is on telenor: download all the archives and check inside content for users/webpages inside [17:07] looks like macros in TiddlyWiki work well: http://i.imgur.com/HsrfMrv.png [17:07] syntax highlighting, not so much... [17:10] *** Fletcher has quit IRC (Ping timeout: 252 seconds) [17:14] *** Fletcher has joined #archiveteam-bs [17:14] *** JesseW has joined #archiveteam-bs [17:16] *** kristian_ has joined #archiveteam-bs [17:16] hi, all [17:22] I have a 28 page PDF that is 408 megs [17:24] *** dashcloud has quit IRC (Read error: Operation timed out) [17:25] wow, what a silly way to make a PDF. :-) [17:25] well [17:26] it's a scanned document ... I got a pdf with borders (to make it fit a certain format) and a size of 240M [17:27] I extracted with pdfimages [17:27] to ppm, so I guess that's what was embedded [17:27] *** dashcloud has joined #archiveteam-bs [17:28] strange, ppm is less compressed than i would expect the pdf to be [17:28] aha, pdfimages -list gives me "1 0 image 2456 3492 rgb 3 8 jpeg no 4 0" and so on [17:28] "jpeg" is what's important there [17:32] so I guess the original pdf has jpg files embedded after all [17:35] image-000.jpg JPEG 2456x3492 2456x3492+0+0 8-bit DirectClass 14MB 0.650u 0:00.660 [17:38] what are you trying to do [17:39] I'm trying to figure out what I should upload [17:39] I guess I'll go with the 240 megs version, sans borders [17:39] you scanned a document yourself? [17:39] no [17:40] Does IA do derive tasks on PDFs? [17:40] upload whatever you received it as [17:40] yes phuzion [17:40] Ok yeah, then kristian_, you're gonna want to upload the original file. [17:41] I'll do the one without the artificial borders [17:43] *** dashcloud has quit IRC (Read error: Operation timed out) [17:45] *** JesseW has quit IRC (Read error: Operation timed out) [17:48] *** dashcloud has joined #archiveteam-bs [17:49] hurm, nothing happens when I click "upload file" [17:50] sorry, had to click "add" [17:53] *** vitzli has quit IRC (Leaving) [18:00] *** VADemon has quit IRC (Read error: Connection reset by peer) [18:51] *** Aad has quit IRC (Read error: Connection reset by peer) [18:54] *** chfoo has quit IRC (Read error: Operation timed out) [19:23] *** chfoo has joined #archiveteam-bs [19:48] *** kristian_ has quit IRC (Quit: Leaving) [19:59] *** schbirid has joined #archiveteam-bs [20:09] *** dashcloud has quit IRC (Read error: Operation timed out) [20:12] *** dashcloud has joined #archiveteam-bs [20:19] *** dashcloud has quit IRC (Read error: Operation timed out) [20:22] *** dashcloud has joined #archiveteam-bs [20:59] *** schbirid has quit IRC (Quit: Leaving) [21:04] *** mismatchm has joined #archiveteam-bs [21:09] *** mismatchm has quit IRC (Ping timeout: 252 seconds) [21:09] *** mismatch has joined #archiveteam-bs [21:48] *** brayden has quit IRC (Read error: Connection reset by peer) [21:50] *** brayden has joined #archiveteam-bs [21:55] *** Apathy has quit IRC (Read error: Operation timed out) [21:55] *** Apathy has joined #archiveteam-bs [22:05] *** VADemon has joined #archiveteam-bs [22:09] *** slyphic is now known as slyphic|a [22:44] *** RedType has joined #archiveteam-bs [22:51] *** JetBalsa has joined #archiveteam-bs [22:58] *** robink has quit IRC (Ping timeout: 190 seconds) [23:00] *** robink has joined #archiveteam-bs [23:03] *** wp494_ has joined #archiveteam-bs [23:04] *** HCross3 has joined #archiveteam-bs [23:04] *** alard has quit IRC (Ping timeout: 250 seconds) [23:04] *** wp494 has quit IRC (Ping timeout: 250 seconds) [23:04] *** zerkalo has quit IRC (Ping timeout: 250 seconds) [23:05] *** HCross has quit IRC (Ping timeout: 250 seconds) [23:05] *** SketchCow has quit IRC (Ping timeout: 250 seconds) [23:05] *** mutoso has quit IRC (Ping timeout: 250 seconds) [23:05] *** HCross3 is now known as HCross [23:05] *** Lord_Nigh has quit IRC (Ping timeout: 250 seconds) [23:05] *** SketchCow has joined #archiveteam-bs [23:06] *** zerkalo has joined #archiveteam-bs [23:07] *** alard has joined #archiveteam-bs [23:07] *** mutoso has joined #archiveteam-bs [23:07] Where's the Newsgrabber list? [23:08] http://newsgrabber.harrycross.me/services.html or https://github.com/ArchiveTeam/NewsGrabber/tree/master/services [23:10] *** Lord_Nigh has joined #archiveteam-bs [23:38] *** Stiletto has quit IRC (Read error: Operation timed out)