[00:47] *** dashcloud has quit IRC (Ping timeout: 250 seconds) [00:47] *** dashcloud has joined #archiveteam-bs [00:55] so apparently Sears in the US is facing some of the same things NCIX is going through, that being vendors are on edge with the possibility that they might not get paid [00:56] http://money.cnn.com/2017/11/01/news/companies/sears-kmart-holiday-sales/index.html [00:56] in typical fashion eddie says "not our fault" [01:07] *** svchfoo3 has quit IRC (Read error: Operation timed out) [01:10] *** svchfoo3 has joined #archiveteam-bs [01:11] *** svchfoo1 sets mode: +o svchfoo3 [01:11] *** svchfoo3 sets mode: +o PurpleSym [01:23] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [01:24] *** BlueMaxim has joined #archiveteam-bs [01:43] *** jrochkind has joined #archiveteam-bs [01:51] *** balrog has quit IRC (Read error: Operation timed out) [01:51] *** jrochkind has quit IRC (jrochkind) [01:52] *** balrog has joined #archiveteam-bs [01:52] *** swebb sets mode: +o balrog [01:53] *** drumstick has quit IRC (Ping timeout: 360 seconds) [01:54] *** drumstick has joined #archiveteam-bs [01:59] *** pizzaiolo has quit IRC (Remote host closed the connection) [02:06] *** balrog has quit IRC (Quit: Bye) [02:09] *** balrog has joined #archiveteam-bs [02:09] *** swebb sets mode: +o balrog [03:13] *** tuluu has quit IRC (Read error: Operation timed out) [03:14] *** tuluu has joined #archiveteam-bs [03:43] *** jspiros has quit IRC (Ping timeout: 492 seconds) [03:50] *** dashcloud has quit IRC (Ping timeout: 250 seconds) [03:54] *** dashcloud has joined #archiveteam-bs [04:16] *** qw3rty117 has joined #archiveteam-bs [04:20] *** qw3rty116 has quit IRC (Read error: Operation timed out) [04:40] *** dashcloud has quit IRC (Read error: Operation timed out) [04:40] *** dashcloud has joined #archiveteam-bs [04:43] *** Pixi has quit IRC (Quit: Pixi) [04:50] *** Pixi has joined #archiveteam-bs [04:52] *** jspiros has joined #archiveteam-bs [05:18] SketchCow: one of your tapes is damaged [05:18] the clear plastic broke on side of the tape [05:18] no label on it [05:23] also top of the tape is also crack [05:25] so i found this on tape: https://www.amazon.com/Life-Death-Peter-Sellers/dp/B0007R4SX6 [05:26] i'm going to skip it [05:29] there is NY 1 on this tape [05:30] so not a total past [05:49] *** superkuh has quit IRC (Read error: Operation timed out) [06:04] *** superkuh has joined #archiveteam-bs [06:26] so this tape is really random [06:26] we have a bit of OC, SNL, then the Golden Globes [06:40] *** schbirid has joined #archiveteam-bs [08:08] *** Pixi has quit IRC (Quit: Pixi) [08:47] *** odemg has quit IRC (Ping timeout: 250 seconds) [08:53] *** superkuh has quit IRC (Read error: Operation timed out) [09:08] *** superkuh has joined #archiveteam-bs [09:17] *** twigfoot has quit IRC (Read error: Operation timed out) [09:18] *** twigfoot has joined #archiveteam-bs [09:18] *** Stilett0 has joined #archiveteam-bs [09:18] *** antomati_ has joined #archiveteam-bs [09:18] *** swebb sets mode: +o antomati_ [09:18] *** Mayonaise has quit IRC (Read error: Operation timed out) [09:18] *** balrog has quit IRC (Read error: Operation timed out) [09:21] *** logchfoo3 starts logging #archiveteam-bs at Thu Nov 02 09:21:08 2017 [09:21] *** logchfoo3 has joined #archiveteam-bs [09:21] *** dxrt has joined #archiveteam-bs [09:21] *** acridAxid has quit IRC (Read error: Operation timed out) [09:21] *** svchfoo3 sets mode: +o dxrt [09:21] *** acridAxid has joined #archiveteam-bs [09:21] *** balrog has joined #archiveteam-bs [09:21] *** swebb sets mode: +o balrog [09:23] *** arkiver has joined #archiveteam-bs [09:23] *** qw3rty117 has quit IRC (Read error: Operation timed out) [09:24] *** C4K3 has joined #archiveteam-bs [09:25] *** bwn has joined #archiveteam-bs [09:26] *** rocode has joined #archiveteam-bs [09:26] *** bsmith093 has joined #archiveteam-bs [09:28] *** beardicus has joined #archiveteam-bs [09:29] *** squires has joined #archiveteam-bs [09:34] *** pizzaiolo has joined #archiveteam-bs [10:25] *** PotcFdk has joined #archiveteam-bs [10:26] *** zhongfu has quit IRC (Ping timeout: 260 seconds) [10:26] *** zhongfu has joined #archiveteam-bs [10:28] *** drumstick has quit IRC (Read error: Operation timed out) [10:28] *** drumstick has joined #archiveteam-bs [10:40] *** ja0Hai has quit IRC (Quit: leaving) [10:54] JAA: image scrape is complete for ctrl-v uploading now (without delete this time!) [10:54] *** BlueMaxim has quit IRC (Quit: Leaving) [11:04] \o/ [11:05] 200Gb of images only [11:05] They're not responding to my emails either [11:37] so looks like the MTV it list party is hitting 10000k [12:01] *** dboard2 has quit IRC (Ping timeout: 248 seconds) [12:52] Hey, Those who use ia uploader [12:52] Igloorandomidentifier122333441122223123123ctrlv: uploading var-grabsites-ctrl-v-image-grab.txt-2017-10-30-6c30c598-00000.warc.gz: [################################] 5121/5121 - 00:15:37 [12:53] error uploading var-grabsites-ctrl-v-image-grab.txt-2017-10-30-6c30c598-00000.warc.gz: Access Denied - You lack sufficient privileges to write to those collections [12:53] Any idea what i've done wrong? :( [12:53] ./ia upload --spreadsheet=upload.csv --header "x-amz-auto-make-bucket:1" --header "x-archive-meta-mediatype:web" [12:55] What does your upload.csv look like? [12:58] The -00001.warc.gz was uploaded correctly. Any difference in the CSV lines for 00000 and 00001? [12:59] https://pastebin.com/ceEjSH6v [12:59] 0001 doesn't have an identifier associated to it? [12:59] Ah yeah. [12:59] I used the example from the ia documents [12:59] You specified "ctrl-v-nov-17-igloo" as the collection. [13:00] Only IA people can create new collections. [13:00] Ah, That's missing in the documentation for the ia uploader. Should that just be blank? [13:01] I'm pretty sure it's mentioned somewhere. [13:01] By default (if you don't specify a collection), it gets uploaded to the "opensource" collection, which is "Community Texts". [13:02] This is my firs time doing it this way, I normally use another script which does that [13:02] I'll just strip the collection out [13:02] I believe you have to ask Jason to move it to the ArchiveTeam collection if you want it there. [13:03] A separate collection for just CtrlV probably doesn't make too much sense. [13:03] I just want it to end up in the WBM, whatevers needed to make that happen really [13:03] Nah, It's not big enough really [13:03] For that, you have to set the mediatype to "web". [13:04] --header "x-archive-meta-mediatype:web" I believe should do that? [13:04] Yep. [13:04] I think that's all that's necessary, but I'm not sure. [13:04] The canonical way would be --metadata='mediatype:web', by the way. [13:04] Don't think that matters though. [13:05] I'll get them up, Once they're there they can be manhandled to the right place [13:05] Yeah, I think pretty much everything except the mediatype can be changed easily. [13:35] *** zhongfu has quit IRC (Ping timeout: 260 seconds) [13:35] *** zhongfu has joined #archiveteam-bs [14:03] DTIC ADA078576: Operation GREENHOUSE.: [14:03] https://archive.org/details/DTIC_ADA078576 [14:31] SketchCow: btw i found the first duplicate tape [14:31] i think i have 2 tapes of GDC 1999 keynote [14:32] plus side is maybe maybe the 2nd copy has better tracking [14:41] *** drumstick has quit IRC (Ping timeout: 360 seconds) [15:34] *** Stilett0 is now known as Stiletto [16:03] godane: Totally understood [16:06] on tape 1 of kids in the hall may have white mold in it [16:07] there are white parts on the tape so i think that's mold [16:14] Possibly [16:16] the tape doesn't look that bad but will not risk it on this vcr [16:17] also some of the tapes skip frames sometimes [16:17] i noticed on my other set of tapes i bought last week [16:18] anyways i may have The OC 2003-09-16 episode broadcast [16:21] there is also going to be one hour of random tv tape [16:21] from 2003-11 to 2004-05 from what i can tell [16:30] *** Pixi has joined #archiveteam-bs [16:36] *** Pixi has quit IRC (Quit: Pixi) [16:38] Uhm... https://archive.org/search.php?query=download&and[]=collection%3A%22users%22 [16:45] *** dboard2 has joined #archiveteam-bs [17:01] *** pizzaiolo has quit IRC (pizzaiolo) [18:21] so that last tape has some more random tv [18:22] a bit of last call and jay leno [18:26] also this tape has some of National Football League 75th Anniversary All-Time Team at the end [18:45] *** Mateon1 has quit IRC (Read error: Operation timed out) [18:45] *** Mateon1 has joined #archiveteam-bs [20:02] How does IA decide which collection I can upload items to? Because, uh, I can create new ones: https://archive.org/details/test1509649174 But I can’t put items in there. [20:07] JAA? [20:08] stackoverflow seems to be self combusing btw [20:08] https://twitter.com/AndrewBrobston/status/926130048771469312 [20:09] I've been throwing stuff into 'opensource' because I couldn't make a collection _and_ put stuff in it. [20:11] Yeah, I thought only admins could create new ones. Which seems to be true. Kind of. [20:19] Interesting. [20:19] SketchCow, astrid ^ [20:19] hm interesting [20:19] i don't know much about how collections work [20:20] what do you mean you can't put items in there [20:20] what happens when you try [20:22] astrid: Using `ia`: error uploading test.txt to test1509649864, 403 Client Error: Forbidden for url: https://s3.us.archive.org/test1509649864/test.txt [20:23] hm, well, sounds like you can't then [20:24] send email to info@ i guess :P [20:32] *** jschwart has joined #archiveteam-bs [20:34] SmileyG: Welp. At least SO is very open about their data and uploads regular data dumps to IA. I wonder how much of it is in the Wayback Machine though. [20:38] *** RichardG has quit IRC (Read error: Connection reset by peer) [20:46] *** RichardG has joined #archiveteam-bs [20:58] *** jschwart has quit IRC (Read error: Connection reset by peer) [20:59] *** jschwart has joined #archiveteam-bs [21:25] *** jschwart has quit IRC (Read error: Connection reset by peer) [21:25] *** jschwart has joined #archiveteam-bs [21:32] *** schbirid has quit IRC (Quit: Leaving) [21:37] *** Dimtree has quit IRC (Read error: Operation timed out) [21:42] *** Pixi has joined #archiveteam-bs [22:19] *** Stiletto has quit IRC (Read error: Operation timed out) [22:24] *** MadArchiv has joined #archiveteam-bs [22:24] :-) [22:25] Wow, this was easier than I thought. Anyway, where do we start? [22:30] *** drumstick has joined #archiveteam-bs [22:32] *** BlueMaxim has joined #archiveteam-bs [22:35] does anyone know tools that spider google cache well? or is google too strict about automated requests? [22:36] MadArchiv: I think we need a list of examples first. Ideally, it would cover the various different ways of how web comics are typically distributed. For example: standalone websites vs. platforms where a large number of authors share their work; plain HTML with embedded images vs. script-heavy stuff or "PDF viewer"-like interfaces; single images vs. one image per pane; special stuff like hidden bonus p [22:36] anes, title texts, etc. [22:37] Then, we need to figure out how to actually store them. We'll probably want WARCs, but we might also want to archive them in a different way. [22:37] I mean, grab-site is probably just going to do the right thing for 99% of webcomics. [22:38] nightpool: They are very strict. I haven't done systematic tests, but from ArchiveBot usage, I believe they block after about 100 requests within a few minutes. [22:38] okay, thanks. [22:38] Yes, but the idea is to have something that can discover and archive new comics as they are published etc., not just grab the entire archives repeatedly. [22:38] Which might be a bit trickier with wpull/grab-site. [22:39] It's basically NewsGrabber, but for comics. [22:39] ah, sure. [22:48] *** jschwart has quit IRC (Quit: Konversation terminated!) [22:51] Comic listing sites such as comic rocket could also help us, right? Right?? Well, there's also Tv Tropes, which also lists several webcomics and could be helpful if we were also to make a manual list, especially since they have indexes. [22:56] Definitely. [22:58] What about webcomics that have not been updated in a long time, like those that have reached an ending or have been abandoned? As far as I know, those are the ones that are at most risk, especially the latter. [23:00] if they're complete then we can just hit them with archivebot and call it done [23:01] Yeah, those don't need to go into that system of "retrieve new comics as they're published", obviously, but we should still archive them. [23:01] ArchiveBot might not be a good idea at the moment though, we already have too little capacity. [23:01] What about wget, that's they recommended me on reddit [23:02] Yeah, wget or wpull or grab-site would work. [23:02] Well, depends a bit on the website I guess. JavaScript-heavy sites won't work well with these tools. [23:02] grab-site is a lot more powerful than wget [23:03] *** Pixi has quit IRC (Quit: Pixi) [23:04] *** Pixi has joined #archiveteam-bs [23:09] DFJustin: Oh, nice to know! By the way, if we're gonna archive comics that have been completed, why don't we start looking for indexes of them? Hiveworks, for example, have a list (https://thehiveworks.com/completed) of all comics hosted by them that have been oficially declared done by their creators, either because they bailed out of it (*coff coff* Clique Refresh) or because they simply completed it. [23:22] *** Pixi has quit IRC (Quit: Pixi) [23:59] *** YetAnothe has joined #archiveteam-bs [23:59] *** Dimtree has joined #archiveteam-bs