#archiveteam-bs 2019-03-09,Sat

↑back Search

Time Nickname Message
00:01 🔗 schbirid has quit IRC (Remote host closed the connection)
00:46 🔗 BlueMax has joined #archiveteam-bs
00:48 🔗 ivan has quit IRC (Read error: Operation timed out)
00:49 🔗 kiska1 has quit IRC (Read error: Operation timed out)
00:49 🔗 ivan has joined #archiveteam-bs
00:49 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
00:49 🔗 Terbium has quit IRC (Read error: Operation timed out)
00:49 🔗 bobmcjr has quit IRC (Read error: Operation timed out)
00:49 🔗 HashbangI has quit IRC (Write error: Broken pipe)
00:49 🔗 dxrt_ has quit IRC (Write error: Broken pipe)
00:49 🔗 yano has quit IRC (Read error: Operation timed out)
00:49 🔗 paul2520 has quit IRC (Write error: Broken pipe)
00:50 🔗 Odd0002 has quit IRC (Read error: Operation timed out)
00:50 🔗 PhrackD has quit IRC (Read error: Operation timed out)
00:51 🔗 sep332 has quit IRC (Read error: Operation timed out)
00:51 🔗 TigerbotH has quit IRC (Read error: Operation timed out)
00:51 🔗 Mayonaise has joined #archiveteam-bs
00:52 🔗 benjins has quit IRC (Read error: Operation timed out)
00:52 🔗 Odd0002 has joined #archiveteam-bs
00:53 🔗 Exairnous has quit IRC (Read error: Operation timed out)
00:54 🔗 Exairnous has joined #archiveteam-bs
00:54 🔗 yano has joined #archiveteam-bs
00:54 🔗 benjins has joined #archiveteam-bs
00:55 🔗 Terbium has joined #archiveteam-bs
00:58 🔗 BlueMax has quit IRC (Quit: Leaving)
00:59 🔗 lag__ has quit IRC (Read error: Connection reset by peer)
00:59 🔗 qw3rty111 has quit IRC (Read error: Operation timed out)
01:00 🔗 step has quit IRC (Ping timeout: 600 seconds)
01:05 🔗 PotcFdk has quit IRC (Ping timeout: 600 seconds)
01:05 🔗 SimpBrain has joined #archiveteam-bs
01:13 🔗 Somebody2 We should probably make a wiki page for https://www.ArtStation.com -- seems to have a bunch of neat art on it, and it will probably die eventually.
01:15 🔗 Somebody2 https://www.archiveteam.org/index.php?title=ArtStation&action=edit&redlink=1 <- if someone wants to add it
01:20 🔗 kiska1 has joined #archiveteam-bs
01:21 🔗 bobmcjr has joined #archiveteam-bs
01:21 🔗 qw3rty111 has joined #archiveteam-bs
01:24 🔗 step has joined #archiveteam-bs
01:24 🔗 TigerbotH has joined #archiveteam-bs
01:25 🔗 HashbangI has joined #archiveteam-bs
01:28 🔗 dxrt_ has joined #archiveteam-bs
01:28 🔗 dxrt sets mode: +o dxrt_
01:29 🔗 PhrackD has joined #archiveteam-bs
01:30 🔗 sep332 has joined #archiveteam-bs
01:35 🔗 godane i'm starting to have that sync issue again
01:35 🔗 godane SketchCow: i may need better capture hardware from you guys
01:36 🔗 godane it will have to supported in linux
01:37 🔗 SimpBrain has quit IRC (Remote host closed the connection)
01:38 🔗 paul2520 has joined #archiveteam-bs
01:38 🔗 BlueMax has joined #archiveteam-bs
01:44 🔗 SimpBrain has joined #archiveteam-bs
01:54 🔗 Exairnous JAA: Thanks for the explanation. I tried my experiment and it didn't work. Youtube really is annoying :/
01:55 🔗 PotcFdk has joined #archiveteam-bs
01:55 🔗 Exairnous JAA: Leaving Youtube stuff for the moment. Does a job from archivebot get dumped into IA all at once or does it trickle in?
01:58 🔗 Flashfire At 5GB intervals I think
01:59 🔗 kiskabak has joined #archiveteam-bs
01:59 🔗 Albardin has joined #archiveteam-bs
02:01 🔗 kiska Depends on the pipeline. kiskaJP and kiskaSN ones do one every 2GiB, kiskaJDC does 5GiB and so does every other pipeline I know of
02:17 🔗 Exairnous what determines which language sites get archived in?
04:01 🔗 VerifiedJ has quit IRC (Ping timeout: 252 seconds)
04:06 🔗 odemgi_ has joined #archiveteam-bs
04:08 🔗 odemgi has quit IRC (Ping timeout: 252 seconds)
04:09 🔗 C4K3 has joined #archiveteam-bs
04:15 🔗 odemg has quit IRC (Ping timeout: 615 seconds)
04:21 🔗 VerifiedJ has joined #archiveteam-bs
04:21 🔗 odemg has joined #archiveteam-bs
04:26 🔗 bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…)
04:43 🔗 qw3rty112 has joined #archiveteam-bs
04:49 🔗 qw3rty111 has quit IRC (Read error: Operation timed out)
05:21 🔗 kiska1 has quit IRC (Ping timeout (120 seconds))
05:21 🔗 kiska1 has joined #archiveteam-bs
05:30 🔗 Exairnous Fusl: do you know what determines which language sites get archived in?
06:36 🔗 n00b344_ has joined #archiveteam-bs
06:48 🔗 n00b344_ has quit IRC (Ping timeout: 265 seconds)
07:57 🔗 Despatche has joined #archiveteam-bs
08:14 🔗 Mateon1 has quit IRC (Ping timeout: 740 seconds)
08:15 🔗 Mateon1 has joined #archiveteam-bs
08:28 🔗 Flashfire And to think I started that list because I was pissed off about something
08:28 🔗 Flashfire also who is amerepheasant
08:28 🔗 Flashfire Do they go by a different name here?
08:40 🔗 schbirid has joined #archiveteam-bs
08:50 🔗 wp494 has quit IRC (Ping timeout: 255 seconds)
08:51 🔗 wp494 has joined #archiveteam-bs
08:57 🔗 Despatche has quit IRC (Read error: Connection reset by peer)
08:58 🔗 Despatche has joined #archiveteam-bs
09:02 🔗 PurpleSym Fusl: Will your ArchiveBot pipelines be back at some point?
09:04 🔗 Fusl not really. i'm not really into manually sshing to the instances and kicking the wpull processes with a stick to make them continue working when a connection gets stuck
09:04 🔗 Fusl and i wont really have the resources for debugging this code
09:04 🔗 kiska You can always ask JAA to manage your pipelines, thats what I do
09:05 🔗 Fusl the pipeline runs in docker containers
09:06 🔗 PurpleSym Yeah, the need for manual intervention really is a burden.
09:07 🔗 Fusl which is a bummer since i wanted to scale this up to ~100 pipelines eventually but hell no i'm not gonna poke them with a stick every now and then
09:07 🔗 Despatche has quit IRC (Read error: Operation timed out)
09:09 🔗 PurpleSym I guess we should throw a serious amount of manpower at fixing this SSL-related(?) issue first then.
10:47 🔗 BlueMax has quit IRC (Quit: Leaving)
12:16 🔗 bitBaron has joined #archiveteam-bs
12:16 🔗 JAA Scaling it up that much with the current setup will probably not work well anyway. The control node is already at its limits, and while we could of course replace that with a more powerful machine, we should rather look into optimising the setup instead I think. That will require significant changes though.
12:19 🔗 godane so i found out there is a option called rfbufsize for ffmpeg
12:19 🔗 godane so i set that to 100M
12:20 🔗 godane i hope that will fix my problem with audio sync issues forever cause to me that like have a anti-skip tech in a cd player
12:22 🔗 godane 100mb buffer is like 2 and half minute buffer with mpg video
12:22 🔗 Fusl have you tried playing with async or vsync option?
12:22 🔗 godane i have async set to 1
12:23 🔗 Fusl try -async 10
12:23 🔗 Fusl async 1 only corrects at the start of a stream
12:24 🔗 godane ok i changed it to 10
12:26 🔗 godane i get a 'click click click' when i start getting this out of sync issue
12:26 🔗 godane anyways i kept the rtbufsize setting
12:27 🔗 PurpleSym JAA: Do you have anything in particular that needs to be optimized?
12:27 🔗 PurpleSym I have some time on my hands and it would be nice if we could get wpull into a maintainable state again.
12:31 🔗 nataraj_ has joined #archiveteam-bs
12:32 🔗 godane i'm now question if that tape audio is just off no matter what i do
12:35 🔗 godane anyways i'm going to try another tape thats more live action video
12:37 🔗 godane its doing a alot better right now with this tape
12:44 🔗 JAA PurpleSym: Actually, I was talking about the pipeline and control node, not wpull. The dashboard server has a serious memory leak and uses 100 % of a core constantly. wpull could also use some improvements though; bulk ignore handling instead of the current ignore system is a major one.
12:46 🔗 PurpleSym I know. Wrt wpull: I wasn’t talking about improvements, just general maintainabiltiy fixes like: Fixing the test suite, upgrading dependencies/Python version (3.6/3.7), dropping phantomjs, …
12:46 🔗 PurpleSym Is there anyone within the @ArchiveTeam GitHub organization who is willing/able to review pull requests?
12:47 🔗 JAA Ah, yeah, that would also be very good.
12:47 🔗 JAA Yes, ivan and myself for wpull PRs.
12:48 🔗 JAA I need to finally get my PR merged which contains a variety of bug fixes and some new features.
12:48 🔗 JAA Maybe I'll do that this weekend.
12:50 🔗 phuz has quit IRC (Remote host closed the connection)
12:50 🔗 PurpleSym Uh, the big one, #393?
12:51 🔗 JAA Yeah
12:58 🔗 PurpleSym The bugfix changes included look like we could cherry-pick them without merging the entire thing (i.e. URL priorization).
12:59 🔗 schbirid has quit IRC (Remote host closed the connection)
12:59 🔗 PurpleSym Do we have access to wpull.readthedocs.io?
13:09 🔗 JAA I think that's rebuilt from the repo automatically, no?
13:10 🔗 JAA But I'm sure Chris has access to it otherwise.
13:31 🔗 godane so ffmpeg failed capture again
13:31 🔗 godane time would stop randomly and just repeat
13:31 🔗 godane if this next capture doesn't work SketchCow may have to send me a vhs to dvd machine or something
13:54 🔗 SimpBrain has quit IRC (Remote host closed the connection)
14:21 🔗 Oddly has joined #archiveteam-bs
14:33 🔗 nataraj_ is now known as nataraj
15:07 🔗 bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…)
15:38 🔗 bitBaron has joined #archiveteam-bs
16:09 🔗 godane SketchCow: ffmpeg keeps stop randomly sometimes
16:10 🔗 godane i'm not going to be able to digitize tapes if this keeps happening
16:16 🔗 godane it was not even keeping in sync anyways
16:17 🔗 godane dashcloud: i'm fucked with capturing the rest of your tapes
16:17 🔗 godane i will upload what i got and make a post about my problem on my patreon
16:18 🔗 justas is now known as jut
16:30 🔗 SketchCow I'm confused how this started cropping up
16:36 🔗 godane i really don't know
16:36 🔗 godane my guess would be something changed between 2018-05-20 debian and 2018-12-15 debian
16:37 🔗 godane cause i have a old 20180520 copy of debian testing that i made awhile back that i may try
16:37 🔗 godane that copy is the one that did all the tapes you sent me this past summer
16:38 🔗 godane in less at of those are out of sync in places too
16:39 🔗 bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…)
16:41 🔗 godane in other news i'm close to get all of 1972 french news from ina journal du uploaded
16:59 🔗 odemgi_ godane, what was on his tapes?
16:59 🔗 omarroth has joined #archiveteam-bs
17:09 🔗 nataraj has quit IRC (Read error: Operation timed out)
17:10 🔗 godane some woc movies and stuff
17:10 🔗 godane one of the simpsons ones had the ads pause out
17:12 🔗 godane i will see how things work in debian testing 20180520 soon
17:34 🔗 adarsh has joined #archiveteam-bs
17:44 🔗 nataraj has joined #archiveteam-bs
17:54 🔗 adarsh has quit IRC (adarsh)
17:54 🔗 godane so i think i fixed it
17:55 🔗 adarsh has joined #archiveteam-bs
17:55 🔗 godane see how it would randomly stop capture its fixed at least with this video until the full tape is digitized
17:55 🔗 godane good news is audio is in sync from what i can tell 12 minutes
17:55 🔗 godane *in
17:56 🔗 wp494 has quit IRC (Read error: Operation timed out)
17:57 🔗 godane spoke too soon
17:57 🔗 wp494 has joined #archiveteam-bs
17:57 🔗 godane stop capturing at 15:40.71
17:57 🔗 godane :'(
18:02 🔗 godane has quit IRC (Quit: Leaving.)
18:03 🔗 adarsh has quit IRC (adarsh)
18:06 🔗 adarsh has joined #archiveteam-bs
18:07 🔗 Oddly has quit IRC (Read error: Operation timed out)
18:11 🔗 name__ has joined #archiveteam-bs
18:15 🔗 Ravenloft has joined #archiveteam-bs
18:16 🔗 adarsh has quit IRC (Ping timeout: 615 seconds)
18:24 🔗 phuzion has joined #archiveteam-bs
18:26 🔗 VerifiedJ has quit IRC (Ping timeout: 252 seconds)
18:36 🔗 godane has joined #archiveteam-bs
18:36 🔗 godane so i'm done with capturing tapes for now
18:36 🔗 godane cause i can't cause of ffmpeg failing on me
18:41 🔗 godane SketchCow: i uploaded my error log to your FOS
18:41 🔗 godane under godane-ffmpeg-error-logs
18:42 🔗 godane i'm also uploading the tape.mpg file that stopped capturing
18:43 🔗 godane there are number of theories i have why stuff is not working anymore
18:43 🔗 godane one is my usb ports are crapping out
18:44 🔗 godane which is very bad
18:45 🔗 godane the other is ffmpeg is fucking with me cause of something that changed in the debian build of it
18:45 🔗 godane which is what i hope
18:53 🔗 yuitimoth has quit IRC (Read error: Operation timed out)
19:06 🔗 VerifiedJ has joined #archiveteam-bs
19:09 🔗 Oddly has joined #archiveteam-bs
19:12 🔗 m007a83 has quit IRC (Ping timeout: 252 seconds)
19:32 🔗 nataraj has quit IRC (Quit: Konversation terminated!)
19:32 🔗 nataraj has joined #archiveteam-bs
19:43 🔗 Jopik has quit IRC (Quit: Leaving)
19:47 🔗 Jopik has joined #archiveteam-bs
19:47 🔗 Jopik has quit IRC (Remote host closed the connection)
19:47 🔗 Jopik has joined #archiveteam-bs
19:50 🔗 HashbangI has quit IRC (Remote host closed the connection)
20:23 🔗 HashbangI has joined #archiveteam-bs
20:41 🔗 godane this is what i mean by time capturing stopping : https://pastebin.com/Aq7hULNk
20:48 🔗 godane so i think add vsync may have fix it?
20:48 🔗 godane i have async at 10 and vsync at 2
20:49 🔗 godane i can say this is alot smoother playback also
20:50 🔗 godane there is normally a jitter to the videos i capture every few seconds
20:50 🔗 godane i'm not noticeing it with this one
21:00 🔗 wyatt8740 has quit IRC (Read error: Operation timed out)
21:11 🔗 godane at least now its past 25 minutes captured
21:14 🔗 godane so i want to say this recording of the great race is from 1993-09
21:14 🔗 godane dashcloud: btw alot of your tapes are in bad shape
21:15 🔗 godane like i have tapes from 1993 that was not ghosting the image picture
21:15 🔗 godane even stuff from 2004 looks very bad
21:20 🔗 yuitimoth has joined #archiveteam-bs
21:45 🔗 BlueMax has joined #archiveteam-bs
21:54 🔗 Despatche has joined #archiveteam-bs
21:54 🔗 nataraj has quit IRC (Ping timeout: 268 seconds)
21:58 🔗 tammy_ has joined #archiveteam-bs
22:01 🔗 Dj-Wawa has joined #archiveteam-bs
22:27 🔗 JAA tammy_: Yeah, let's talk here, the other channel's mainly for announcements.
22:28 🔗 JAA I honestly have no idea how to upload this with curl or similar. It's described in that abouts3.txt file I linked, but I have zero experience with it.
22:29 🔗 JAA If you can, I strongly recommend using the 'ia' tool instead. You just need the 'internetarchive' Python package. You can install that in the user's directory or a venv if you don't want to have it system-wide.
22:29 🔗 JAA Then you need to run "ia configure" to set up the tokens required for the upload.
22:30 🔗 tammy_ I have ia installed and logged in
22:30 🔗 JAA The upload itself is simply "ia upload --metadata=mediatype:web ITEMNAME FILES". You can also set the description etc. through there if you want, or you can do that through the web interface.
22:31 🔗 ivan the curl stuff in abouts3.txt is pretty easy
22:31 🔗 JAA For the item name, I recommend using something like example.com_YYYYMM, so interfacelift.com_201704 I think.
22:31 🔗 ivan https://gist.github.com/ivan/079530350ac94851d581b55b1d372440
22:32 🔗 tammy_ just no idea how to use and properly tag things in the ia tool
22:33 🔗 tammy_ I'll try it again later on I guess
22:33 🔗 JAA The files you need to upload are the WARCs (*.warc.gz). I'd also include the temporary log file (tmp-*.log.gz), which is probably not included in the meta WARC, and the script.py.
22:34 🔗 JAA Tagging, describing, etc. can be done later through the web interface. Probably easier than through the CLI if you only want to do this once.
22:34 🔗 JAA mediatype:web is important to include in the initial upload. Anything else (basically) can be modified later.
22:38 🔗 JAA I'm not the best person to ask regarding which item metadata you should provide since I'm really bad at that (my uploads basically only have a description of the contents, but no tags or other extra fields). But as a rule of thumb, the more info you include, the better.
22:41 🔗 killsushi has joined #archiveteam-bs
22:44 🔗 tammy_ has quit IRC (Ping timeout: 260 seconds)
22:46 🔗 tammy_ has joined #archiveteam-bs
22:55 🔗 BartoCH has quit IRC (Ping timeout: 615 seconds)
22:55 🔗 Oddly has quit IRC (Ping timeout: 255 seconds)
22:56 🔗 BartoCH has joined #archiveteam-bs
22:57 🔗 Exairnous JAA: If I use Rhizome's Webrecorder to make a WARC of the youtube channel I'm trying to archive and somehow got it uploaded to IA/WBM do you think it would playback better?
22:58 🔗 JAA Exairnous: Didn't you already ask this or something very close to this yesterday or the day before?
22:58 🔗 JAA Part of the problem is WBM's playback, not the WARCs themselves. There's nothing we can do about that.
23:00 🔗 Exairnous JAA: Maybe, I'm new to all of this so, I'm just trying to get my head around stuff and end up with a good archive.
23:00 🔗 JAA In my opinion, the most important thing is to preserve the data that could disappear in some format that could potentially be played back, whether the current software supports it or not. As long as the data is there, the information at least isn't lost, even if it's partly or fully inaccessible for the time being.
23:01 🔗 JAA Of course, the ideal archive would be the actual server-side software and database/file storage, since that's the only way to actually archive the entire service.
23:02 🔗 JAA Realistically, we just need to try to save all requests that a browser makes to display a certain website and hope that someone figures out how to play that back in a browser at a later date.
23:03 🔗 JAA We'd actually have to make all possible requests that any browser version could make to really cover that though.
23:03 🔗 JAA Or we'd have to somehow fake the old browser at the playback date.
23:04 🔗 JAA I'm not sure what's easier, but in any case, I don't really worry about that. I just try to preserve all information, and it should be possible to play that back somehow, even if it requires a lot of awful hacks etc.
23:04 🔗 Exairnous The Rhizome software is actually interesting, because it does a lot of what you just discribed
23:07 🔗 Exairnous JAA: So since youtube won't playback right now, I may modify the site to include local copies of the videos as well and then ask for it to be re-run through archivebot. If that's alright with people here.
23:08 🔗 JAA Exairnous: You mean your website? Yeah, sure. Clean <video> tags without JS interference should playback just fine, I believe.
23:09 🔗 JAA Or if they don't, we can ask the WBM team to make sure that <source> tags get rewritten accordingly, but I'd be surprised if that didn't happen already.
23:10 🔗 Exairnous JAA: Yeah, everything on the site (including the videos) appears to have archived superbly. Although any external links (like Twitter, GoogleMaps, etc.) appear to have the main ui translated in german?
23:12 🔗 JAA Yeah, that happens when websites try to be clever and determine your locale through your IP. It was probably archived through a pipeline running on a server in Germany.
23:12 🔗 JAA (I.e. one of mine)
23:13 🔗 Exairnous Any chance of changing that to english now or would it have to be rearchived?
23:15 🔗 JAA It would have to be rearchived, and we'd have to hope that Twitter doesn't suddenly decides we want to see it in Arabic, Thai, or Catalan.
23:15 🔗 JAA (And no, there's no way to guarantee that.)
23:16 🔗 JAA I've seen Twitter, Facebook, and other snapshots on the WBM in just about every language.
23:16 🔗 Exairnous Actually the Facebook archive for my page appears to switch languages as you scroll down
23:17 🔗 JAA Ignore it and move on. The content should be in the original language, and that's what matters most.
23:17 🔗 Exairnous Yeah, it is. Just weird, especially Facebooks
23:18 🔗 name__ has quit IRC (Ping timeout: 260 seconds)
23:18 🔗 JAA There's a real simple explanation for that: Facebook sucks.
23:18 🔗 Exairnous :D
23:18 🔗 Exairnous Very, very, very, much true
23:20 🔗 JAA They aren't talking about their backend very much, but I know that a few years ago, they were running a monolithic PHP executable of over a GB for the entire site.
23:21 🔗 JAA And if you've ever looked at the code for their pages, you'll understand why I hate them just purely technologically speaking from the client perspective (not to mention all the ethical issues).
23:21 🔗 Exairnous I heard that their PHP gets auto-converted into C++
23:21 🔗 Exairnous Cause nothing could go wrong with that :P
23:22 🔗 JAA Yeah, HipHop it's called.
23:22 🔗 JAA https://en.wikipedia.org/wiki/HipHop_for_PHP
23:23 🔗 JAA The 1+ GB executable thing is mentioned there as well.
23:24 🔗 JAA They moved away from the PHP to C++ transpilation a while ago though and just wrote their own PHP engine/VM.
23:26 🔗 Exairnous Because when in doubt reinvent the whole wheel
23:36 🔗 bitBaron has joined #archiveteam-bs

irclogger-viewer