#archiveteam-bs 2017-11-02,Thu

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***dashcloud has quit IRC (Ping timeout: 250 seconds)
dashcloud has joined #archiveteam-bs
[00:47]
wp494so apparently Sears in the US is facing some of the same things NCIX is going through, that being vendors are on edge with the possibility that they might not get paid
http://money.cnn.com/2017/11/01/news/companies/sears-kmart-holiday-sales/index.html
in typical fashion eddie says "not our fault"
[00:55]
***svchfoo3 has quit IRC (Read error: Operation timed out)
svchfoo3 has joined #archiveteam-bs
svchfoo1 sets mode: +o svchfoo3
svchfoo3 sets mode: +o PurpleSym
[01:07]
BlueMaxim has quit IRC (Read error: Connection reset by peer)
BlueMaxim has joined #archiveteam-bs
[01:23]
.... (idle for 19mn)
jrochkind has joined #archiveteam-bs [01:43]
balrog has quit IRC (Read error: Operation timed out)
jrochkind has quit IRC (jrochkind)
balrog has joined #archiveteam-bs
swebb sets mode: +o balrog
drumstick has quit IRC (Ping timeout: 360 seconds)
drumstick has joined #archiveteam-bs
[01:51]
pizzaiolo has quit IRC (Remote host closed the connection) [01:59]
balrog has quit IRC (Quit: Bye)
balrog has joined #archiveteam-bs
swebb sets mode: +o balrog
[02:06]
............. (idle for 1h4mn)
tuluu has quit IRC (Read error: Operation timed out)
tuluu has joined #archiveteam-bs
[03:13]
...... (idle for 29mn)
jspiros has quit IRC (Ping timeout: 492 seconds) [03:43]
dashcloud has quit IRC (Ping timeout: 250 seconds)
dashcloud has joined #archiveteam-bs
[03:50]
..... (idle for 22mn)
qw3rty117 has joined #archiveteam-bs
qw3rty116 has quit IRC (Read error: Operation timed out)
[04:16]
..... (idle for 20mn)
dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #archiveteam-bs
Pixi has quit IRC (Quit: Pixi)
[04:40]
Pixi has joined #archiveteam-bs
jspiros has joined #archiveteam-bs
[04:50]
...... (idle for 26mn)
godaneSketchCow: one of your tapes is damaged
the clear plastic broke on side of the tape
no label on it
[05:18]
also top of the tape is also crack
so i found this on tape: https://www.amazon.com/Life-Death-Peter-Sellers/dp/B0007R4SX6
i'm going to skip it
there is NY 1 on this tape
so not a total past
[05:23]
.... (idle for 19mn)
***superkuh has quit IRC (Read error: Operation timed out) [05:49]
.... (idle for 15mn)
superkuh has joined #archiveteam-bs [06:04]
..... (idle for 22mn)
godaneso this tape is really random
we have a bit of OC, SNL, then the Golden Globes
[06:26]
***schbirid has joined #archiveteam-bs [06:40]
.................. (idle for 1h28mn)
Pixi has quit IRC (Quit: Pixi) [08:08]
........ (idle for 39mn)
odemg has quit IRC (Ping timeout: 250 seconds) [08:47]
superkuh has quit IRC (Read error: Operation timed out) [08:53]
.... (idle for 15mn)
superkuh has joined #archiveteam-bs [09:08]
twigfoot has quit IRC (Read error: Operation timed out)
twigfoot has joined #archiveteam-bs
Stilett0 has joined #archiveteam-bs
antomati_ has joined #archiveteam-bs
swebb sets mode: +o antomati_
Mayonaise has quit IRC (Read error: Operation timed out)
balrog has quit IRC (Read error: Operation timed out)
logchfoo3 starts logging #archiveteam-bs at Thu Nov 02 09:21:08 2017
logchfoo3 has joined #archiveteam-bs
dxrt has joined #archiveteam-bs
acridAxid has quit IRC (Read error: Operation timed out)
svchfoo3 sets mode: +o dxrt
acridAxid has joined #archiveteam-bs
balrog has joined #archiveteam-bs
swebb sets mode: +o balrog
arkiver has joined #archiveteam-bs
qw3rty117 has quit IRC (Read error: Operation timed out)
C4K3 has joined #archiveteam-bs
bwn has joined #archiveteam-bs
rocode has joined #archiveteam-bs
bsmith093 has joined #archiveteam-bs
beardicus has joined #archiveteam-bs
squires has joined #archiveteam-bs
[09:17]
pizzaiolo has joined #archiveteam-bs [09:34]
........... (idle for 51mn)
PotcFdk has joined #archiveteam-bs
zhongfu has quit IRC (Ping timeout: 260 seconds)
zhongfu has joined #archiveteam-bs
drumstick has quit IRC (Read error: Operation timed out)
drumstick has joined #archiveteam-bs
[10:25]
ja0Hai has quit IRC (Quit: leaving) [10:40]
IglooJAA: image scrape is complete for ctrl-v uploading now (without delete this time!) [10:54]
***BlueMaxim has quit IRC (Quit: Leaving) [10:54]
JAA\o/ [11:04]
Igloo200Gb of images only
They're not responding to my emails either
[11:05]
....... (idle for 32mn)
godaneso looks like the MTV it list party is hitting 10000k [11:37]
..... (idle for 24mn)
***dboard2 has quit IRC (Ping timeout: 248 seconds) [12:01]
........... (idle for 51mn)
IglooHey, Those who use ia uploader
Igloorandomidentifier122333441122223123123ctrlv: uploading var-grabsites-ctrl-v-image-grab.txt-2017-10-30-6c30c598-00000.warc.gz: [################################] 5121/5121 - 00:15:37
error uploading var-grabsites-ctrl-v-image-grab.txt-2017-10-30-6c30c598-00000.warc.gz: Access Denied - You lack sufficient privileges to write to those collections
Any idea what i've done wrong? :(
./ia upload --spreadsheet=upload.csv --header "x-amz-auto-make-bucket:1" --header "x-archive-meta-mediatype:web"
[12:52]
JAAWhat does your upload.csv look like?
The -00001.warc.gz was uploaded correctly. Any difference in the CSV lines for 00000 and 00001?
[12:55]
Igloohttps://pastebin.com/ceEjSH6v
0001 doesn't have an identifier associated to it?
[12:59]
JAAAh yeah. [12:59]
IglooI used the example from the ia documents [12:59]
JAAYou specified "ctrl-v-nov-17-igloo" as the collection.
Only IA people can create new collections.
[12:59]
IglooAh, That's missing in the documentation for the ia uploader. Should that just be blank? [13:00]
JAAI'm pretty sure it's mentioned somewhere.
By default (if you don't specify a collection), it gets uploaded to the "opensource" collection, which is "Community Texts".
[13:01]
IglooThis is my firs time doing it this way, I normally use another script which does that
I'll just strip the collection out
[13:02]
JAAI believe you have to ask Jason to move it to the ArchiveTeam collection if you want it there.
A separate collection for just CtrlV probably doesn't make too much sense.
[13:02]
IglooI just want it to end up in the WBM, whatevers needed to make that happen really
Nah, It's not big enough really
[13:03]
JAAFor that, you have to set the mediatype to "web". [13:03]
Igloo--header "x-archive-meta-mediatype:web" I believe should do that? [13:04]
JAAYep.
I think that's all that's necessary, but I'm not sure.
The canonical way would be --metadata='mediatype:web', by the way.
Don't think that matters though.
[13:04]
IglooI'll get them up, Once they're there they can be manhandled to the right place [13:05]
JAAYeah, I think pretty much everything except the mediatype can be changed easily. [13:05]
....... (idle for 30mn)
***zhongfu has quit IRC (Ping timeout: 260 seconds)
zhongfu has joined #archiveteam-bs
[13:35]
...... (idle for 28mn)
godaneDTIC ADA078576: Operation GREENHOUSE.:
https://archive.org/details/DTIC_ADA078576
[14:03]
...... (idle for 28mn)
SketchCow: btw i found the first duplicate tape
i think i have 2 tapes of GDC 1999 keynote
plus side is maybe maybe the 2nd copy has better tracking
[14:31]
***drumstick has quit IRC (Ping timeout: 360 seconds) [14:41]
........... (idle for 53mn)
Stilett0 is now known as Stiletto [15:34]
...... (idle for 29mn)
SketchCowgodane: Totally understood [16:03]
godaneon tape 1 of kids in the hall may have white mold in it
there are white parts on the tape so i think that's mold
[16:06]
SketchCowPossibly [16:14]
godanethe tape doesn't look that bad but will not risk it on this vcr
also some of the tapes skip frames sometimes
i noticed on my other set of tapes i bought last week
anyways i may have The OC 2003-09-16 episode broadcast
there is also going to be one hour of random tv tape
from 2003-11 to 2004-05 from what i can tell
[16:16]
***Pixi has joined #archiveteam-bs [16:30]
Pixi has quit IRC (Quit: Pixi) [16:36]
JAAUhm... https://archive.org/search.php?query=download&and[]=collection%3A%22users%22 [16:38]
***dboard2 has joined #archiveteam-bs [16:45]
.... (idle for 16mn)
pizzaiolo has quit IRC (pizzaiolo) [17:01]
................. (idle for 1h20mn)
godaneso that last tape has some more random tv
a bit of last call and jay leno
also this tape has some of National Football League 75th Anniversary All-Time Team at the end
[18:21]
.... (idle for 19mn)
***Mateon1 has quit IRC (Read error: Operation timed out)
Mateon1 has joined #archiveteam-bs
[18:45]
................ (idle for 1h17mn)
PurpleSymHow does IA decide which collection I can upload items to? Because, uh, I can create new ones: https://archive.org/details/test1509649174 But I can’t put items in there. [20:02]
SmileyGJAA?
stackoverflow seems to be self combusing btw
https://twitter.com/AndrewBrobston/status/926130048771469312
[20:07]
DrasticAcI've been throwing stuff into 'opensource' because I couldn't make a collection _and_ put stuff in it. [20:09]
PurpleSymYeah, I thought only admins could create new ones. Which seems to be true. Kind of. [20:11]
JAAInteresting.
SketchCow, astrid ^
[20:19]
astridhm interesting
i don't know much about how collections work
what do you mean you can't put items in there
what happens when you try
[20:19]
PurpleSymastrid: Using `ia`: error uploading test.txt to test1509649864, 403 Client Error: Forbidden for url: https://s3.us.archive.org/test1509649864/test.txt [20:22]
astridhm, well, sounds like you can't then
send email to info@ i guess :P
[20:23]
***jschwart has joined #archiveteam-bs [20:32]
JAASmileyG: Welp. At least SO is very open about their data and uploads regular data dumps to IA. I wonder how much of it is in the Wayback Machine though. [20:34]
***RichardG has quit IRC (Read error: Connection reset by peer) [20:38]
RichardG has joined #archiveteam-bs [20:46]
jschwart has quit IRC (Read error: Connection reset by peer)
jschwart has joined #archiveteam-bs
[20:58]
...... (idle for 26mn)
jschwart has quit IRC (Read error: Connection reset by peer)
jschwart has joined #archiveteam-bs
[21:25]
schbirid has quit IRC (Quit: Leaving) [21:32]
Dimtree has quit IRC (Read error: Operation timed out) [21:37]
Pixi has joined #archiveteam-bs [21:42]
........ (idle for 37mn)
Stiletto has quit IRC (Read error: Operation timed out) [22:19]
MadArchiv has joined #archiveteam-bs [22:24]
JAA:-) [22:24]
MadArchivWow, this was easier than I thought. Anyway, where do we start? [22:25]
***drumstick has joined #archiveteam-bs
BlueMaxim has joined #archiveteam-bs
[22:30]
nightpooldoes anyone know tools that spider google cache well? or is google too strict about automated requests? [22:35]
JAAMadArchiv: I think we need a list of examples first. Ideally, it would cover the various different ways of how web comics are typically distributed. For example: standalone websites vs. platforms where a large number of authors share their work; plain HTML with embedded images vs. script-heavy stuff or "PDF viewer"-like interfaces; single images vs. one image per pane; special stuff like hidden bonus p
anes, title texts, etc.
Then, we need to figure out how to actually store them. We'll probably want WARCs, but we might also want to archive them in a different way.
[22:36]
nightpoolI mean, grab-site is probably just going to do the right thing for 99% of webcomics. [22:37]
JAAnightpool: They are very strict. I haven't done systematic tests, but from ArchiveBot usage, I believe they block after about 100 requests within a few minutes. [22:38]
nightpoolokay, thanks. [22:38]
JAAYes, but the idea is to have something that can discover and archive new comics as they are published etc., not just grab the entire archives repeatedly.
Which might be a bit trickier with wpull/grab-site.
It's basically NewsGrabber, but for comics.
[22:38]
nightpoolah, sure. [22:39]
***jschwart has quit IRC (Quit: Konversation terminated!) [22:48]
MadArchivComic listing sites such as comic rocket could also help us, right? Right?? Well, there's also Tv Tropes, which also lists several webcomics and could be helpful if we were also to make a manual list, especially since they have indexes. [22:51]
JAADefinitely. [22:56]
MadArchivWhat about webcomics that have not been updated in a long time, like those that have reached an ending or have been abandoned? As far as I know, those are the ones that are at most risk, especially the latter. [22:58]
astridif they're complete then we can just hit them with archivebot and call it done [23:00]
JAAYeah, those don't need to go into that system of "retrieve new comics as they're published", obviously, but we should still archive them.
ArchiveBot might not be a good idea at the moment though, we already have too little capacity.
[23:01]
MadArchivWhat about wget, that's they recommended me on reddit [23:01]
JAAYeah, wget or wpull or grab-site would work.
Well, depends a bit on the website I guess. JavaScript-heavy sites won't work well with these tools.
[23:02]
DFJustingrab-site is a lot more powerful than wget [23:02]
***Pixi has quit IRC (Quit: Pixi)
Pixi has joined #archiveteam-bs
[23:03]
MadArchivDFJustin: Oh, nice to know! By the way, if we're gonna archive comics that have been completed, why don't we start looking for indexes of them? Hiveworks, for example, have a list (https://thehiveworks.com/completed) of all comics hosted by them that have been oficially declared done by their creators, either because they bailed out of it (*coff coff* Clique Refresh) or because they simply completed it. [23:09]
***Pixi has quit IRC (Quit: Pixi) [23:22]
........ (idle for 37mn)
YetAnothe has joined #archiveteam-bs
Dimtree has joined #archiveteam-bs
[23:59]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)