[00:01] *** schbirid has quit IRC (Remote host closed the connection) [00:46] *** BlueMax has joined #archiveteam-bs [00:48] *** ivan has quit IRC (Read error: Operation timed out) [00:49] *** kiska1 has quit IRC (Read error: Operation timed out) [00:49] *** ivan has joined #archiveteam-bs [00:49] *** Mayonaise has quit IRC (Read error: Operation timed out) [00:49] *** Terbium has quit IRC (Read error: Operation timed out) [00:49] *** bobmcjr has quit IRC (Read error: Operation timed out) [00:49] *** HashbangI has quit IRC (Write error: Broken pipe) [00:49] *** dxrt_ has quit IRC (Write error: Broken pipe) [00:49] *** yano has quit IRC (Read error: Operation timed out) [00:49] *** paul2520 has quit IRC (Write error: Broken pipe) [00:50] *** Odd0002 has quit IRC (Read error: Operation timed out) [00:50] *** PhrackD has quit IRC (Read error: Operation timed out) [00:51] *** sep332 has quit IRC (Read error: Operation timed out) [00:51] *** TigerbotH has quit IRC (Read error: Operation timed out) [00:51] *** Mayonaise has joined #archiveteam-bs [00:52] *** benjins has quit IRC (Read error: Operation timed out) [00:52] *** Odd0002 has joined #archiveteam-bs [00:53] *** Exairnous has quit IRC (Read error: Operation timed out) [00:54] *** Exairnous has joined #archiveteam-bs [00:54] *** yano has joined #archiveteam-bs [00:54] *** benjins has joined #archiveteam-bs [00:55] *** Terbium has joined #archiveteam-bs [00:58] *** BlueMax has quit IRC (Quit: Leaving) [00:59] *** lag__ has quit IRC (Read error: Connection reset by peer) [00:59] *** qw3rty111 has quit IRC (Read error: Operation timed out) [01:00] *** step has quit IRC (Ping timeout: 600 seconds) [01:05] *** PotcFdk has quit IRC (Ping timeout: 600 seconds) [01:05] *** SimpBrain has joined #archiveteam-bs [01:13] We should probably make a wiki page for https://www.ArtStation.com -- seems to have a bunch of neat art on it, and it will probably die eventually. [01:15] https://www.archiveteam.org/index.php?title=ArtStation&action=edit&redlink=1 <- if someone wants to add it [01:20] *** kiska1 has joined #archiveteam-bs [01:21] *** bobmcjr has joined #archiveteam-bs [01:21] *** qw3rty111 has joined #archiveteam-bs [01:24] *** step has joined #archiveteam-bs [01:24] *** TigerbotH has joined #archiveteam-bs [01:25] *** HashbangI has joined #archiveteam-bs [01:28] *** dxrt_ has joined #archiveteam-bs [01:28] *** dxrt sets mode: +o dxrt_ [01:29] *** PhrackD has joined #archiveteam-bs [01:30] *** sep332 has joined #archiveteam-bs [01:35] i'm starting to have that sync issue again [01:35] SketchCow: i may need better capture hardware from you guys [01:36] it will have to supported in linux [01:37] *** SimpBrain has quit IRC (Remote host closed the connection) [01:38] *** paul2520 has joined #archiveteam-bs [01:38] *** BlueMax has joined #archiveteam-bs [01:44] *** SimpBrain has joined #archiveteam-bs [01:54] JAA: Thanks for the explanation. I tried my experiment and it didn't work. Youtube really is annoying :/ [01:55] *** PotcFdk has joined #archiveteam-bs [01:55] JAA: Leaving Youtube stuff for the moment. Does a job from archivebot get dumped into IA all at once or does it trickle in? [01:58] At 5GB intervals I think [01:59] *** kiskabak has joined #archiveteam-bs [01:59] *** Albardin has joined #archiveteam-bs [02:01] Depends on the pipeline. kiskaJP and kiskaSN ones do one every 2GiB, kiskaJDC does 5GiB and so does every other pipeline I know of [02:17] what determines which language sites get archived in? [04:01] *** VerifiedJ has quit IRC (Ping timeout: 252 seconds) [04:06] *** odemgi_ has joined #archiveteam-bs [04:08] *** odemgi has quit IRC (Ping timeout: 252 seconds) [04:09] *** C4K3 has joined #archiveteam-bs [04:15] *** odemg has quit IRC (Ping timeout: 615 seconds) [04:21] *** VerifiedJ has joined #archiveteam-bs [04:21] *** odemg has joined #archiveteam-bs [04:26] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) [04:43] *** qw3rty112 has joined #archiveteam-bs [04:49] *** qw3rty111 has quit IRC (Read error: Operation timed out) [05:21] *** kiska1 has quit IRC (Ping timeout (120 seconds)) [05:21] *** kiska1 has joined #archiveteam-bs [05:30] Fusl: do you know what determines which language sites get archived in? [06:36] *** n00b344_ has joined #archiveteam-bs [06:48] *** n00b344_ has quit IRC (Ping timeout: 265 seconds) [07:57] *** Despatche has joined #archiveteam-bs [08:14] *** Mateon1 has quit IRC (Ping timeout: 740 seconds) [08:15] *** Mateon1 has joined #archiveteam-bs [08:28] And to think I started that list because I was pissed off about something [08:28] also who is amerepheasant [08:28] Do they go by a different name here? [08:40] *** schbirid has joined #archiveteam-bs [08:50] *** wp494 has quit IRC (Ping timeout: 255 seconds) [08:51] *** wp494 has joined #archiveteam-bs [08:57] *** Despatche has quit IRC (Read error: Connection reset by peer) [08:58] *** Despatche has joined #archiveteam-bs [09:02] Fusl: Will your ArchiveBot pipelines be back at some point? [09:04] not really. i'm not really into manually sshing to the instances and kicking the wpull processes with a stick to make them continue working when a connection gets stuck [09:04] and i wont really have the resources for debugging this code [09:04] You can always ask JAA to manage your pipelines, thats what I do [09:05] the pipeline runs in docker containers [09:06] Yeah, the need for manual intervention really is a burden. [09:07] which is a bummer since i wanted to scale this up to ~100 pipelines eventually but hell no i'm not gonna poke them with a stick every now and then [09:07] *** Despatche has quit IRC (Read error: Operation timed out) [09:09] I guess we should throw a serious amount of manpower at fixing this SSL-related(?) issue first then. [10:47] *** BlueMax has quit IRC (Quit: Leaving) [12:16] *** bitBaron has joined #archiveteam-bs [12:16] Scaling it up that much with the current setup will probably not work well anyway. The control node is already at its limits, and while we could of course replace that with a more powerful machine, we should rather look into optimising the setup instead I think. That will require significant changes though. [12:19] so i found out there is a option called rfbufsize for ffmpeg [12:19] so i set that to 100M [12:20] i hope that will fix my problem with audio sync issues forever cause to me that like have a anti-skip tech in a cd player [12:22] 100mb buffer is like 2 and half minute buffer with mpg video [12:22] have you tried playing with async or vsync option? [12:22] i have async set to 1 [12:23] try -async 10 [12:23] async 1 only corrects at the start of a stream [12:24] ok i changed it to 10 [12:26] i get a 'click click click' when i start getting this out of sync issue [12:26] anyways i kept the rtbufsize setting [12:27] JAA: Do you have anything in particular that needs to be optimized? [12:27] I have some time on my hands and it would be nice if we could get wpull into a maintainable state again. [12:31] *** nataraj_ has joined #archiveteam-bs [12:32] i'm now question if that tape audio is just off no matter what i do [12:35] anyways i'm going to try another tape thats more live action video [12:37] its doing a alot better right now with this tape [12:44] PurpleSym: Actually, I was talking about the pipeline and control node, not wpull. The dashboard server has a serious memory leak and uses 100 % of a core constantly. wpull could also use some improvements though; bulk ignore handling instead of the current ignore system is a major one. [12:46] I know. Wrt wpull: I wasn’t talking about improvements, just general maintainabiltiy fixes like: Fixing the test suite, upgrading dependencies/Python version (3.6/3.7), dropping phantomjs, … [12:46] Is there anyone within the @ArchiveTeam GitHub organization who is willing/able to review pull requests? [12:47] Ah, yeah, that would also be very good. [12:47] Yes, ivan and myself for wpull PRs. [12:48] I need to finally get my PR merged which contains a variety of bug fixes and some new features. [12:48] Maybe I'll do that this weekend. [12:50] *** phuz has quit IRC (Remote host closed the connection) [12:50] Uh, the big one, #393? [12:51] Yeah [12:58] The bugfix changes included look like we could cherry-pick them without merging the entire thing (i.e. URL priorization). [12:59] *** schbirid has quit IRC (Remote host closed the connection) [12:59] Do we have access to wpull.readthedocs.io? [13:09] I think that's rebuilt from the repo automatically, no? [13:10] But I'm sure Chris has access to it otherwise. [13:31] so ffmpeg failed capture again [13:31] time would stop randomly and just repeat [13:31] if this next capture doesn't work SketchCow may have to send me a vhs to dvd machine or something [13:54] *** SimpBrain has quit IRC (Remote host closed the connection) [14:21] *** Oddly has joined #archiveteam-bs [14:33] *** nataraj_ is now known as nataraj [15:07] *** bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) [15:38] *** bitBaron has joined #archiveteam-bs [16:09] SketchCow: ffmpeg keeps stop randomly sometimes [16:10] i'm not going to be able to digitize tapes if this keeps happening [16:16] it was not even keeping in sync anyways [16:17] dashcloud: i'm fucked with capturing the rest of your tapes [16:17] i will upload what i got and make a post about my problem on my patreon [16:18] *** justas is now known as jut [16:30] I'm confused how this started cropping up [16:36] i really don't know [16:36] my guess would be something changed between 2018-05-20 debian and 2018-12-15 debian [16:37] cause i have a old 20180520 copy of debian testing that i made awhile back that i may try [16:37] that copy is the one that did all the tapes you sent me this past summer [16:38] in less at of those are out of sync in places too [16:39] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) [16:41] in other news i'm close to get all of 1972 french news from ina journal du uploaded [16:59] godane, what was on his tapes? [16:59] *** omarroth has joined #archiveteam-bs [17:09] *** nataraj has quit IRC (Read error: Operation timed out) [17:10] some woc movies and stuff [17:10] one of the simpsons ones had the ads pause out [17:12] i will see how things work in debian testing 20180520 soon [17:34] *** adarsh has joined #archiveteam-bs [17:44] *** nataraj has joined #archiveteam-bs [17:54] *** adarsh has quit IRC (adarsh) [17:54] so i think i fixed it [17:55] *** adarsh has joined #archiveteam-bs [17:55] see how it would randomly stop capture its fixed at least with this video until the full tape is digitized [17:55] good news is audio is in sync from what i can tell 12 minutes [17:55] *in [17:56] *** wp494 has quit IRC (Read error: Operation timed out) [17:57] spoke too soon [17:57] *** wp494 has joined #archiveteam-bs [17:57] stop capturing at 15:40.71 [17:57] :'( [18:02] *** godane has quit IRC (Quit: Leaving.) [18:03] *** adarsh has quit IRC (adarsh) [18:06] *** adarsh has joined #archiveteam-bs [18:07] *** Oddly has quit IRC (Read error: Operation timed out) [18:11] *** name__ has joined #archiveteam-bs [18:15] *** Ravenloft has joined #archiveteam-bs [18:16] *** adarsh has quit IRC (Ping timeout: 615 seconds) [18:24] *** phuzion has joined #archiveteam-bs [18:26] *** VerifiedJ has quit IRC (Ping timeout: 252 seconds) [18:36] *** godane has joined #archiveteam-bs [18:36] so i'm done with capturing tapes for now [18:36] cause i can't cause of ffmpeg failing on me [18:41] SketchCow: i uploaded my error log to your FOS [18:41] under godane-ffmpeg-error-logs [18:42] i'm also uploading the tape.mpg file that stopped capturing [18:43] there are number of theories i have why stuff is not working anymore [18:43] one is my usb ports are crapping out [18:44] which is very bad [18:45] the other is ffmpeg is fucking with me cause of something that changed in the debian build of it [18:45] which is what i hope [18:53] *** yuitimoth has quit IRC (Read error: Operation timed out) [19:06] *** VerifiedJ has joined #archiveteam-bs [19:09] *** Oddly has joined #archiveteam-bs [19:12] *** m007a83 has quit IRC (Ping timeout: 252 seconds) [19:32] *** nataraj has quit IRC (Quit: Konversation terminated!) [19:32] *** nataraj has joined #archiveteam-bs [19:43] *** Jopik has quit IRC (Quit: Leaving) [19:47] *** Jopik has joined #archiveteam-bs [19:47] *** Jopik has quit IRC (Remote host closed the connection) [19:47] *** Jopik has joined #archiveteam-bs [19:50] *** HashbangI has quit IRC (Remote host closed the connection) [20:23] *** HashbangI has joined #archiveteam-bs [20:41] this is what i mean by time capturing stopping : https://pastebin.com/Aq7hULNk [20:48] so i think add vsync may have fix it? [20:48] i have async at 10 and vsync at 2 [20:49] i can say this is alot smoother playback also [20:50] there is normally a jitter to the videos i capture every few seconds [20:50] i'm not noticeing it with this one [21:00] *** wyatt8740 has quit IRC (Read error: Operation timed out) [21:11] at least now its past 25 minutes captured [21:14] so i want to say this recording of the great race is from 1993-09 [21:14] dashcloud: btw alot of your tapes are in bad shape [21:15] like i have tapes from 1993 that was not ghosting the image picture [21:15] even stuff from 2004 looks very bad [21:20] *** yuitimoth has joined #archiveteam-bs [21:45] *** BlueMax has joined #archiveteam-bs [21:54] *** Despatche has joined #archiveteam-bs [21:54] *** nataraj has quit IRC (Ping timeout: 268 seconds) [21:58] *** tammy_ has joined #archiveteam-bs [22:01] *** Dj-Wawa has joined #archiveteam-bs [22:27] tammy_: Yeah, let's talk here, the other channel's mainly for announcements. [22:28] I honestly have no idea how to upload this with curl or similar. It's described in that abouts3.txt file I linked, but I have zero experience with it. [22:29] If you can, I strongly recommend using the 'ia' tool instead. You just need the 'internetarchive' Python package. You can install that in the user's directory or a venv if you don't want to have it system-wide. [22:29] Then you need to run "ia configure" to set up the tokens required for the upload. [22:30] I have ia installed and logged in [22:30] The upload itself is simply "ia upload --metadata=mediatype:web ITEMNAME FILES". You can also set the description etc. through there if you want, or you can do that through the web interface. [22:31] the curl stuff in abouts3.txt is pretty easy [22:31] For the item name, I recommend using something like example.com_YYYYMM, so interfacelift.com_201704 I think. [22:31] https://gist.github.com/ivan/079530350ac94851d581b55b1d372440 [22:32] just no idea how to use and properly tag things in the ia tool [22:33] I'll try it again later on I guess [22:33] The files you need to upload are the WARCs (*.warc.gz). I'd also include the temporary log file (tmp-*.log.gz), which is probably not included in the meta WARC, and the script.py. [22:34] Tagging, describing, etc. can be done later through the web interface. Probably easier than through the CLI if you only want to do this once. [22:34] mediatype:web is important to include in the initial upload. Anything else (basically) can be modified later. [22:38] I'm not the best person to ask regarding which item metadata you should provide since I'm really bad at that (my uploads basically only have a description of the contents, but no tags or other extra fields). But as a rule of thumb, the more info you include, the better. [22:41] *** killsushi has joined #archiveteam-bs [22:44] *** tammy_ has quit IRC (Ping timeout: 260 seconds) [22:46] *** tammy_ has joined #archiveteam-bs [22:55] *** BartoCH has quit IRC (Ping timeout: 615 seconds) [22:55] *** Oddly has quit IRC (Ping timeout: 255 seconds) [22:56] *** BartoCH has joined #archiveteam-bs [22:57] JAA: If I use Rhizome's Webrecorder to make a WARC of the youtube channel I'm trying to archive and somehow got it uploaded to IA/WBM do you think it would playback better? [22:58] Exairnous: Didn't you already ask this or something very close to this yesterday or the day before? [22:58] Part of the problem is WBM's playback, not the WARCs themselves. There's nothing we can do about that. [23:00] JAA: Maybe, I'm new to all of this so, I'm just trying to get my head around stuff and end up with a good archive. [23:00] In my opinion, the most important thing is to preserve the data that could disappear in some format that could potentially be played back, whether the current software supports it or not. As long as the data is there, the information at least isn't lost, even if it's partly or fully inaccessible for the time being. [23:01] Of course, the ideal archive would be the actual server-side software and database/file storage, since that's the only way to actually archive the entire service. [23:02] Realistically, we just need to try to save all requests that a browser makes to display a certain website and hope that someone figures out how to play that back in a browser at a later date. [23:03] We'd actually have to make all possible requests that any browser version could make to really cover that though. [23:03] Or we'd have to somehow fake the old browser at the playback date. [23:04] I'm not sure what's easier, but in any case, I don't really worry about that. I just try to preserve all information, and it should be possible to play that back somehow, even if it requires a lot of awful hacks etc. [23:04] The Rhizome software is actually interesting, because it does a lot of what you just discribed [23:07] JAA: So since youtube won't playback right now, I may modify the site to include local copies of the videos as well and then ask for it to be re-run through archivebot. If that's alright with people here. [23:08] Exairnous: You mean your website? Yeah, sure. Clean