#archiveteam-bs 2017-11-02,Thu

↑back Search

Time Nickname Message
00:47 πŸ”— dashcloud has quit IRC (Ping timeout: 250 seconds)
00:47 πŸ”— dashcloud has joined #archiveteam-bs
00:55 πŸ”— wp494 so apparently Sears in the US is facing some of the same things NCIX is going through, that being vendors are on edge with the possibility that they might not get paid
00:56 πŸ”— wp494 http://money.cnn.com/2017/11/01/news/companies/sears-kmart-holiday-sales/index.html
00:56 πŸ”— wp494 in typical fashion eddie says "not our fault"
01:07 πŸ”— svchfoo3 has quit IRC (Read error: Operation timed out)
01:10 πŸ”— svchfoo3 has joined #archiveteam-bs
01:11 πŸ”— svchfoo1 sets mode: +o svchfoo3
01:11 πŸ”— svchfoo3 sets mode: +o PurpleSym
01:23 πŸ”— BlueMaxim has quit IRC (Read error: Connection reset by peer)
01:24 πŸ”— BlueMaxim has joined #archiveteam-bs
01:43 πŸ”— jrochkind has joined #archiveteam-bs
01:51 πŸ”— balrog has quit IRC (Read error: Operation timed out)
01:51 πŸ”— jrochkind has quit IRC (jrochkind)
01:52 πŸ”— balrog has joined #archiveteam-bs
01:52 πŸ”— swebb sets mode: +o balrog
01:53 πŸ”— drumstick has quit IRC (Ping timeout: 360 seconds)
01:54 πŸ”— drumstick has joined #archiveteam-bs
01:59 πŸ”— pizzaiolo has quit IRC (Remote host closed the connection)
02:06 πŸ”— balrog has quit IRC (Quit: Bye)
02:09 πŸ”— balrog has joined #archiveteam-bs
02:09 πŸ”— swebb sets mode: +o balrog
03:13 πŸ”— tuluu has quit IRC (Read error: Operation timed out)
03:14 πŸ”— tuluu has joined #archiveteam-bs
03:43 πŸ”— jspiros has quit IRC (Ping timeout: 492 seconds)
03:50 πŸ”— dashcloud has quit IRC (Ping timeout: 250 seconds)
03:54 πŸ”— dashcloud has joined #archiveteam-bs
04:16 πŸ”— qw3rty117 has joined #archiveteam-bs
04:20 πŸ”— qw3rty116 has quit IRC (Read error: Operation timed out)
04:40 πŸ”— dashcloud has quit IRC (Read error: Operation timed out)
04:40 πŸ”— dashcloud has joined #archiveteam-bs
04:43 πŸ”— Pixi has quit IRC (Quit: Pixi)
04:50 πŸ”— Pixi has joined #archiveteam-bs
04:52 πŸ”— jspiros has joined #archiveteam-bs
05:18 πŸ”— godane SketchCow: one of your tapes is damaged
05:18 πŸ”— godane the clear plastic broke on side of the tape
05:18 πŸ”— godane no label on it
05:23 πŸ”— godane also top of the tape is also crack
05:25 πŸ”— godane so i found this on tape: https://www.amazon.com/Life-Death-Peter-Sellers/dp/B0007R4SX6
05:26 πŸ”— godane i'm going to skip it
05:29 πŸ”— godane there is NY 1 on this tape
05:30 πŸ”— godane so not a total past
05:49 πŸ”— superkuh has quit IRC (Read error: Operation timed out)
06:04 πŸ”— superkuh has joined #archiveteam-bs
06:26 πŸ”— godane so this tape is really random
06:26 πŸ”— godane we have a bit of OC, SNL, then the Golden Globes
06:40 πŸ”— schbirid has joined #archiveteam-bs
08:08 πŸ”— Pixi has quit IRC (Quit: Pixi)
08:47 πŸ”— odemg has quit IRC (Ping timeout: 250 seconds)
08:53 πŸ”— superkuh has quit IRC (Read error: Operation timed out)
09:08 πŸ”— superkuh has joined #archiveteam-bs
09:17 πŸ”— twigfoot has quit IRC (Read error: Operation timed out)
09:18 πŸ”— twigfoot has joined #archiveteam-bs
09:18 πŸ”— Stilett0 has joined #archiveteam-bs
09:18 πŸ”— antomati_ has joined #archiveteam-bs
09:18 πŸ”— swebb sets mode: +o antomati_
09:18 πŸ”— Mayonaise has quit IRC (Read error: Operation timed out)
09:18 πŸ”— balrog has quit IRC (Read error: Operation timed out)
09:21 πŸ”— logchfoo3 starts logging #archiveteam-bs at Thu Nov 02 09:21:08 2017
09:21 πŸ”— logchfoo3 has joined #archiveteam-bs
09:21 πŸ”— dxrt has joined #archiveteam-bs
09:21 πŸ”— acridAxid has quit IRC (Read error: Operation timed out)
09:21 πŸ”— svchfoo3 sets mode: +o dxrt
09:21 πŸ”— acridAxid has joined #archiveteam-bs
09:21 πŸ”— balrog has joined #archiveteam-bs
09:21 πŸ”— swebb sets mode: +o balrog
09:23 πŸ”— arkiver has joined #archiveteam-bs
09:23 πŸ”— qw3rty117 has quit IRC (Read error: Operation timed out)
09:24 πŸ”— C4K3 has joined #archiveteam-bs
09:25 πŸ”— bwn has joined #archiveteam-bs
09:26 πŸ”— rocode has joined #archiveteam-bs
09:26 πŸ”— bsmith093 has joined #archiveteam-bs
09:28 πŸ”— beardicus has joined #archiveteam-bs
09:29 πŸ”— squires has joined #archiveteam-bs
09:34 πŸ”— pizzaiolo has joined #archiveteam-bs
10:25 πŸ”— PotcFdk has joined #archiveteam-bs
10:26 πŸ”— zhongfu has quit IRC (Ping timeout: 260 seconds)
10:26 πŸ”— zhongfu has joined #archiveteam-bs
10:28 πŸ”— drumstick has quit IRC (Read error: Operation timed out)
10:28 πŸ”— drumstick has joined #archiveteam-bs
10:40 πŸ”— ja0Hai has quit IRC (Quit: leaving)
10:54 πŸ”— Igloo JAA: image scrape is complete for ctrl-v uploading now (without delete this time!)
10:54 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
11:04 πŸ”— JAA \o/
11:05 πŸ”— Igloo 200Gb of images only
11:05 πŸ”— Igloo They're not responding to my emails either
11:37 πŸ”— godane so looks like the MTV it list party is hitting 10000k
12:01 πŸ”— dboard2 has quit IRC (Ping timeout: 248 seconds)
12:52 πŸ”— Igloo Hey, Those who use ia uploader
12:52 πŸ”— Igloo Igloorandomidentifier122333441122223123123ctrlv: uploading var-grabsites-ctrl-v-image-grab.txt-2017-10-30-6c30c598-00000.warc.gz: [################################] 5121/5121 - 00:15:37
12:53 πŸ”— Igloo error uploading var-grabsites-ctrl-v-image-grab.txt-2017-10-30-6c30c598-00000.warc.gz: Access Denied - You lack sufficient privileges to write to those collections
12:53 πŸ”— Igloo Any idea what i've done wrong? :(
12:53 πŸ”— Igloo ./ia upload --spreadsheet=upload.csv --header "x-amz-auto-make-bucket:1" --header "x-archive-meta-mediatype:web"
12:55 πŸ”— JAA What does your upload.csv look like?
12:58 πŸ”— JAA The -00001.warc.gz was uploaded correctly. Any difference in the CSV lines for 00000 and 00001?
12:59 πŸ”— Igloo https://pastebin.com/ceEjSH6v
12:59 πŸ”— Igloo 0001 doesn't have an identifier associated to it?
12:59 πŸ”— JAA Ah yeah.
12:59 πŸ”— Igloo I used the example from the ia documents
12:59 πŸ”— JAA You specified "ctrl-v-nov-17-igloo" as the collection.
13:00 πŸ”— JAA Only IA people can create new collections.
13:00 πŸ”— Igloo Ah, That's missing in the documentation for the ia uploader. Should that just be blank?
13:01 πŸ”— JAA I'm pretty sure it's mentioned somewhere.
13:01 πŸ”— JAA By default (if you don't specify a collection), it gets uploaded to the "opensource" collection, which is "Community Texts".
13:02 πŸ”— Igloo This is my firs time doing it this way, I normally use another script which does that
13:02 πŸ”— Igloo I'll just strip the collection out
13:02 πŸ”— JAA I believe you have to ask Jason to move it to the ArchiveTeam collection if you want it there.
13:03 πŸ”— JAA A separate collection for just CtrlV probably doesn't make too much sense.
13:03 πŸ”— Igloo I just want it to end up in the WBM, whatevers needed to make that happen really
13:03 πŸ”— Igloo Nah, It's not big enough really
13:03 πŸ”— JAA For that, you have to set the mediatype to "web".
13:04 πŸ”— Igloo --header "x-archive-meta-mediatype:web" I believe should do that?
13:04 πŸ”— JAA Yep.
13:04 πŸ”— JAA I think that's all that's necessary, but I'm not sure.
13:04 πŸ”— JAA The canonical way would be --metadata='mediatype:web', by the way.
13:04 πŸ”— JAA Don't think that matters though.
13:05 πŸ”— Igloo I'll get them up, Once they're there they can be manhandled to the right place
13:05 πŸ”— JAA Yeah, I think pretty much everything except the mediatype can be changed easily.
13:35 πŸ”— zhongfu has quit IRC (Ping timeout: 260 seconds)
13:35 πŸ”— zhongfu has joined #archiveteam-bs
14:03 πŸ”— godane DTIC ADA078576: Operation GREENHOUSE.:
14:03 πŸ”— godane https://archive.org/details/DTIC_ADA078576
14:31 πŸ”— godane SketchCow: btw i found the first duplicate tape
14:31 πŸ”— godane i think i have 2 tapes of GDC 1999 keynote
14:32 πŸ”— godane plus side is maybe maybe the 2nd copy has better tracking
14:41 πŸ”— drumstick has quit IRC (Ping timeout: 360 seconds)
15:34 πŸ”— Stilett0 is now known as Stiletto
16:03 πŸ”— SketchCow godane: Totally understood
16:06 πŸ”— godane on tape 1 of kids in the hall may have white mold in it
16:07 πŸ”— godane there are white parts on the tape so i think that's mold
16:14 πŸ”— SketchCow Possibly
16:16 πŸ”— godane the tape doesn't look that bad but will not risk it on this vcr
16:17 πŸ”— godane also some of the tapes skip frames sometimes
16:17 πŸ”— godane i noticed on my other set of tapes i bought last week
16:18 πŸ”— godane anyways i may have The OC 2003-09-16 episode broadcast
16:21 πŸ”— godane there is also going to be one hour of random tv tape
16:21 πŸ”— godane from 2003-11 to 2004-05 from what i can tell
16:30 πŸ”— Pixi has joined #archiveteam-bs
16:36 πŸ”— Pixi has quit IRC (Quit: Pixi)
16:38 πŸ”— JAA Uhm... https://archive.org/search.php?query=download&and[]=collection%3A%22users%22
16:45 πŸ”— dboard2 has joined #archiveteam-bs
17:01 πŸ”— pizzaiolo has quit IRC (pizzaiolo)
18:21 πŸ”— godane so that last tape has some more random tv
18:22 πŸ”— godane a bit of last call and jay leno
18:26 πŸ”— godane also this tape has some of National Football League 75th Anniversary All-Time Team at the end
18:45 πŸ”— Mateon1 has quit IRC (Read error: Operation timed out)
18:45 πŸ”— Mateon1 has joined #archiveteam-bs
20:02 πŸ”— PurpleSym How does IA decide which collection I can upload items to? Because, uh, I can create new ones: https://archive.org/details/test1509649174 But I can’t put items in there.
20:07 πŸ”— SmileyG JAA?
20:08 πŸ”— SmileyG stackoverflow seems to be self combusing btw
20:08 πŸ”— SmileyG https://twitter.com/AndrewBrobston/status/926130048771469312
20:09 πŸ”— DrasticAc I've been throwing stuff into 'opensource' because I couldn't make a collection _and_ put stuff in it.
20:11 πŸ”— PurpleSym Yeah, I thought only admins could create new ones. Which seems to be true. Kind of.
20:19 πŸ”— JAA Interesting.
20:19 πŸ”— JAA SketchCow, astrid ^
20:19 πŸ”— astrid hm interesting
20:19 πŸ”— astrid i don't know much about how collections work
20:20 πŸ”— astrid what do you mean you can't put items in there
20:20 πŸ”— astrid what happens when you try
20:22 πŸ”— PurpleSym astrid: Using `ia`: error uploading test.txt to test1509649864, 403 Client Error: Forbidden for url: https://s3.us.archive.org/test1509649864/test.txt
20:23 πŸ”— astrid hm, well, sounds like you can't then
20:24 πŸ”— astrid send email to info@ i guess :P
20:32 πŸ”— jschwart has joined #archiveteam-bs
20:34 πŸ”— JAA SmileyG: Welp. At least SO is very open about their data and uploads regular data dumps to IA. I wonder how much of it is in the Wayback Machine though.
20:38 πŸ”— RichardG has quit IRC (Read error: Connection reset by peer)
20:46 πŸ”— RichardG has joined #archiveteam-bs
20:58 πŸ”— jschwart has quit IRC (Read error: Connection reset by peer)
20:59 πŸ”— jschwart has joined #archiveteam-bs
21:25 πŸ”— jschwart has quit IRC (Read error: Connection reset by peer)
21:25 πŸ”— jschwart has joined #archiveteam-bs
21:32 πŸ”— schbirid has quit IRC (Quit: Leaving)
21:37 πŸ”— Dimtree has quit IRC (Read error: Operation timed out)
21:42 πŸ”— Pixi has joined #archiveteam-bs
22:19 πŸ”— Stiletto has quit IRC (Read error: Operation timed out)
22:24 πŸ”— MadArchiv has joined #archiveteam-bs
22:24 πŸ”— JAA :-)
22:25 πŸ”— MadArchiv Wow, this was easier than I thought. Anyway, where do we start?
22:30 πŸ”— drumstick has joined #archiveteam-bs
22:32 πŸ”— BlueMaxim has joined #archiveteam-bs
22:35 πŸ”— nightpool does anyone know tools that spider google cache well? or is google too strict about automated requests?
22:36 πŸ”— JAA MadArchiv: I think we need a list of examples first. Ideally, it would cover the various different ways of how web comics are typically distributed. For example: standalone websites vs. platforms where a large number of authors share their work; plain HTML with embedded images vs. script-heavy stuff or "PDF viewer"-like interfaces; single images vs. one image per pane; special stuff like hidden bonus p
22:36 πŸ”— JAA anes, title texts, etc.
22:37 πŸ”— JAA Then, we need to figure out how to actually store them. We'll probably want WARCs, but we might also want to archive them in a different way.
22:37 πŸ”— nightpool I mean, grab-site is probably just going to do the right thing for 99% of webcomics.
22:38 πŸ”— JAA nightpool: They are very strict. I haven't done systematic tests, but from ArchiveBot usage, I believe they block after about 100 requests within a few minutes.
22:38 πŸ”— nightpool okay, thanks.
22:38 πŸ”— JAA Yes, but the idea is to have something that can discover and archive new comics as they are published etc., not just grab the entire archives repeatedly.
22:38 πŸ”— JAA Which might be a bit trickier with wpull/grab-site.
22:39 πŸ”— JAA It's basically NewsGrabber, but for comics.
22:39 πŸ”— nightpool ah, sure.
22:48 πŸ”— jschwart has quit IRC (Quit: Konversation terminated!)
22:51 πŸ”— MadArchiv Comic listing sites such as comic rocket could also help us, right? Right?? Well, there's also Tv Tropes, which also lists several webcomics and could be helpful if we were also to make a manual list, especially since they have indexes.
22:56 πŸ”— JAA Definitely.
22:58 πŸ”— MadArchiv What about webcomics that have not been updated in a long time, like those that have reached an ending or have been abandoned? As far as I know, those are the ones that are at most risk, especially the latter.
23:00 πŸ”— astrid if they're complete then we can just hit them with archivebot and call it done
23:01 πŸ”— JAA Yeah, those don't need to go into that system of "retrieve new comics as they're published", obviously, but we should still archive them.
23:01 πŸ”— JAA ArchiveBot might not be a good idea at the moment though, we already have too little capacity.
23:01 πŸ”— MadArchiv What about wget, that's they recommended me on reddit
23:02 πŸ”— JAA Yeah, wget or wpull or grab-site would work.
23:02 πŸ”— JAA Well, depends a bit on the website I guess. JavaScript-heavy sites won't work well with these tools.
23:02 πŸ”— DFJustin grab-site is a lot more powerful than wget
23:03 πŸ”— Pixi has quit IRC (Quit: Pixi)
23:04 πŸ”— Pixi has joined #archiveteam-bs
23:09 πŸ”— MadArchiv DFJustin: Oh, nice to know! By the way, if we're gonna archive comics that have been completed, why don't we start looking for indexes of them? Hiveworks, for example, have a list (https://thehiveworks.com/completed) of all comics hosted by them that have been oficially declared done by their creators, either because they bailed out of it (*coff coff* Clique Refresh) or because they simply completed it.
23:22 πŸ”— Pixi has quit IRC (Quit: Pixi)
23:59 πŸ”— YetAnothe has joined #archiveteam-bs
23:59 πŸ”— Dimtree has joined #archiveteam-bs

irclogger-viewer