[00:12] hook54321: By the way, I'm finally looking into the Catalan webcam stuff. It's been sitting on my disk for a year, need to get those several ten thousand files cleaned up. [00:19] oh nice [00:20] also, was the lunduke-a-thon recording ever uploaded? [00:27] Hmm, doesn't look like it. [00:28] It was only a partial recording though, and I think there was a better one somewhere? Don't remember the details though. [00:28] My youtube-dl crashed after 6 hours or so. [00:31] I just discovered that the megawarc factory will probably break at the end of this year: https://github.com/ArchiveTeam/archiveteam-megawarc-factory/blob/923661847fd5a5dfa651467da09be56a4528f7aa/pack-one#L58 [00:32] oh my god ... [00:33] Good catch [00:38] *** Sk1d has quit IRC (Read error: Operation timed out) [00:39] Filed as https://github.com/ArchiveTeam/archiveteam-megawarc-factory/issues/6 [00:40] To be precise, it will not break entirely, but it will be very unreliable. (It would work on the first of February, for example.) [00:40] All of that should really be solved differently probably. [00:41] *** Sk1d has joined #archiveteam-bs [00:42] *** wyatt8740 has quit IRC (Read error: Operation timed out) [00:42] *** wyatt8740 has joined #archiveteam-bs [00:52] This is strange: https://pastebin.com/vPU7zUPS That file seems to be almost exactly the same size as mine. Can't check the duration/bitrate right now since the file is on a machine without ffprobe and similar tools, but still, really weird. I most certainly never uploaded my file anywhere. [01:02] hook54321: I went through my logs, and apparently he was going to release the entire thing at some point? Did that ever happen? I can't find anything. [01:10] hook54321: nope, he didn't. :/ [01:37] oops, didn't mean to tag myself [01:46] Heh [02:13] *** ndiddy has quit IRC () [02:32] 108634 WARCs being megawarc'd now. Never again. [02:34] Should be 380 WARCs afterwards. [02:53] Yeah never again xD [02:54] The initial wait for the collection was hell on aws, and the constant wait for whether I needed to provision more instances. In the end it megawarc'd about 60k warcs I think, [02:57] Never again until the next time I decide to grab webcam images every five minutes for months. :-P [02:58] Let's hope there's something interesting in there at least. [03:06] *** Mateon1 has quit IRC (Read error: Operation timed out) [03:11] *** Mateon1 has joined #archiveteam-bs [03:13] I've been working on a list of Venezuela government sites that should probably be crawled. Is this page a good place to put them: https://www.archiveteam.org/index.php?title=ArchiveBot/Venezuela_politics/list? [03:14] Or is that only for bots? [03:30] one of my pc computing magazines from 1993 is getting upload soon [03:32] *** Sk1d has quit IRC (Read error: Operation timed out) [03:33] *** MagicHAck has joined #archiveteam-bs [03:35] *** Sk1d has joined #archiveteam-bs [03:56] *** MagicHAck has quit IRC (Quit: Page closed) [04:12] *** RichardG has quit IRC (Ping timeout: 268 seconds) [04:16] Is there a way to have wget decompress files after downloading, and do recursive downloading on that decompressed file? [04:18] (assuming you specify --header="accept-encoding: gzip" [04:24] *** odemgi has joined #archiveteam-bs [04:27] *** odemgi_ has quit IRC (Ping timeout: 252 seconds) [04:28] *** qw3rty114 has joined #archiveteam-bs [04:28] *** odemg has quit IRC (Read error: Operation timed out) [04:33] *** qw3rty113 has quit IRC (Read error: Operation timed out) [04:40] *** Pixi` has joined #archiveteam-bs [04:41] *** odemg has joined #archiveteam-bs [04:43] *** Pixi has quit IRC (Read error: Operation timed out) [04:44] *** ats has quit IRC (Read error: Operation timed out) [04:45] *** ats has joined #archiveteam-bs [04:58] *** ubahn_ has joined #archiveteam-bs [05:03] *** omarroth has quit IRC (Remote host closed the connection) [05:06] *** ubahn has quit IRC (Ping timeout: 615 seconds) [05:07] *** Albardin has joined #archiveteam-bs [05:07] *** SomeoneEl has joined #archiveteam-bs [05:07] *** kiskabak has joined #archiveteam-bs [05:15] *** icedice has quit IRC (Quit: Leaving) [05:21] *** arbin has quit IRC (Quit: .) [05:22] *** arbin has joined #archiveteam-bs [05:36] *** wp494_ has joined #archiveteam-bs [05:39] *** wp494 has quit IRC (Ping timeout: 364 seconds) [07:24] *** t3 has quit IRC (Quit: Connection closed for inactivity) [07:29] *** Exairnous has quit IRC (Read error: Operation timed out) [08:38] *** RichardG has joined #archiveteam-bs [08:50] *** BlueMax has quit IRC (Read error: Connection reset by peer) [09:10] *** Sk1d has quit IRC (Read error: Operation timed out) [09:12] *** Sk1d has joined #archiveteam-bs [09:42] *** wyatt8740 has quit IRC (Read error: Operation timed out) [09:42] *** wyatt8740 has joined #archiveteam-bs [10:09] *** PhrackD has quit IRC (Read error: Operation timed out) [10:10] *** PhrackD has joined #archiveteam-bs [10:57] *** schbirid has quit IRC (Remote host closed the connection) [11:14] jodizzle: The /list is for manual additions (one URL per line). It's then managed by VoynichCr's bot to keep the list sorted and to list the ArchiveBot archival status on the non-list page. So that page's probably a good place to put it. Unless you want to create a separate page for government site as opposed to political campaigns etc. [11:27] JAA: I see, thanks for the info. [11:28] To be clear, the actual archival still has to be done by someone with voice on #archivebot, right? As in, the bot doesn't do any archiving itself? [11:30] jodizzle: Correct. [11:30] Alright, thanks. [11:33] latest scan magazine : https://archive.org/details/pc-computing-magazine-v6i1 [13:57] *** schbirid has joined #archiveteam-bs [14:34] *** sep332_ has joined #archiveteam-bs [14:36] *** wp494 has joined #archiveteam-bs [14:42] *** wp494_ has quit IRC (Read error: Operation timed out) [14:43] *** exoire has joined #archiveteam-bs [14:44] *** exoire has quit IRC (Client Quit) [15:00] *** sep332_ has quit IRC (Read error: Operation timed out) [15:05] *** exoire has joined #archiveteam-bs [15:16] *** Sk1d has quit IRC (Read error: Operation timed out) [15:19] *** Sk1d has joined #archiveteam-bs [15:20] *** RichardG has quit IRC (Ping timeout: 268 seconds) [15:27] hook54321: So in addition to those Catalan traffic cam WARCs that are currently uploading to IA, I also have a half-hour MJPEG of an airfield. The file's just the raw HTTP content, which means that it can't be played back very well. Firefox doesn't recognise it. VLC can open and decode it, but it really doesn't like the file it seems. ffmpeg doesn't like it either ("Invalid data found when processing [15:27] input"). Still worth keeping? [15:32] It's about 380 MiB, and it looks like you made another recording of the same camera (at an earlier time) here: https://archive.org/details/vlc-record-2017-10-28-00h38m25s-http___87.111.199.123_85_mjpg_video.mjpg- [15:36] Oh, looks like the part I downloaded is also contained in https://archive.org/details/87.111.199.123_85_mjpg_video.mjpg- (the sixth video, starting at around 5h35mn, to be precise). So I guess there's no need really to keep this messy file? [15:36] Although the quality of my file seems to be a bit better. [15:36] But if we are to keep it, we probably need to figure out how to convert it into a usable video file. [15:38] if vlc will play it, you can probably just have it transcode into something else too [15:39] It doesn't play it properly though. Throws all sorts of errors and randomly aborts the playback. [15:39] ah, maybe not then [15:43] I don't think the file itself is corrupt or anything, just a weird format, similar to a MIME multipart message. [15:44] Maybe I can somehow decompose that into the individual JPEG frames and then combine those into a video again. [15:44] Is there a way around sites that require you to accept a message before viewing content? in live web save, https://uk.news.yahoo.com/queen-evacuated-case-brexit-unrest-media-001107410.html yields http://web.archive.org/web/20190203154302/https://guce.oath.com/collectConsent?sessionId=1_cc-session_8b518883-1a1b-42b3-b0cb-8dfbca2cb922&lang=en-GB&inline=false&jsVersion=null&experiment=null [15:45] paul2520: Almost certainly not in WBM save, no. With other tools, you probably need to set the appropriate cookie. [15:48] :-( [16:42] *** Exairnous has joined #archiveteam-bs [16:53] *** Sk1d has quit IRC (Read error: Operation timed out) [16:56] *** Sk1d has joined #archiveteam-bs [17:09] https://starfrosch.com/2019/01/31/the-free-music-archive-spam-engine/ [18:15] *** sep332_ has joined #archiveteam-bs [18:30] *** adinbied has joined #archiveteam-bs [18:31] Are there any archive.org admins around? I need some identifiers on my uploads changed -- I put in a request on the forums last week but haven't heard back. Thanks! [18:35] *** sep332_ has quit IRC (Ping timeout: 600 seconds) [18:43] *** sep332_ has joined #archiveteam-bs [18:45] *** exoire has quit IRC (Client terminated!) [18:45] *** exoire has joined #archiveteam-bs [18:49] i have some sort of admin access, but no idea what i'm doing [18:49] what's the item? [18:50] https://archive.org/post/1098824/identifier-change-requests [18:51] nah, not enough access for that by the looks of it [18:53] *** sep332_ has quit IRC (Ping timeout: 600 seconds) [19:21] *** cascode has joined #archiveteam-bs [19:23] *** icedice has joined #archiveteam-bs [19:29] *** bithippo has quit IRC (My MacBook Air has gone to sleep. ZZZzzz…) [19:40] *** sep332_ has joined #archiveteam-bs [19:43] JAA: If we're able to convert it we should keep it I think, if not I'm not sure. I'm surprised ffmpeg doesn't like it. [19:44] I just played the stream in VLC and then pressed record, which is why it's lower quality. [19:48] I think I have some more recordings on a hard drive somewhere, haven't uploaded them yet though because they're so big and take forever. [20:05] *** sep332_ has quit IRC (Ping timeout: 600 seconds) [20:29] hook54321: Looks like ffmpeg only supports plain MJPEGs which are basically just a concatenation of JPEGs. This file has those MIME multipart boundaries inbetween. [20:30] But according to Wikipedia, it's not exactly defined what "MJPEG" actually is. As in, there is no single standard for it. [20:34] I'm guessing there's no way to make a plain MJPEG [20:34] ? [20:35] *make it [20:35] Certainly possible, but no idea how easy it is. [20:37] *** RichardG has joined #archiveteam-bs [20:42] *** Sk1d has quit IRC (Read error: Operation timed out) [20:44] Well, should probably just be a matter of parsing that MIME stuff. [20:45] *** Sk1d has joined #archiveteam-bs [20:48] *** Stilett0 has joined #archiveteam-bs [20:49] *** sep332_ has joined #archiveteam-bs [20:50] *** Stiletto has quit IRC (Read error: Operation timed out) [20:55] *** twigfoot has quit IRC (Read error: Operation timed out) [20:57] *** twigfoot has joined #archiveteam-bs [21:19] *** sep332_ has quit IRC (Ping timeout: 600 seconds) [21:22] *** Sk1d has quit IRC (Read error: Operation timed out) [21:25] *** Sk1d has joined #archiveteam-bs [21:35] *** BlueMax has joined #archiveteam-bs [21:36] *** tech234a has joined #archiveteam-bs [21:36] *** bithippo has joined #archiveteam-bs [22:14] *** Sk1d has quit IRC (Read error: Operation timed out) [22:17] *** Sk1d has joined #archiveteam-bs [22:30] *** schbirid has quit IRC (Remote host closed the connection) [22:35] *** ubahn_ has quit IRC (Quit: ubahn_) [22:57] *** sep332_ has joined #archiveteam-bs [23:29] *** sep332_ has quit IRC (Read error: Operation timed out) [23:37] *** wp494_ has joined #archiveteam-bs [23:41] *** wp494 has quit IRC (Read error: Operation timed out) [23:47] *** Sk1d has quit IRC (Read error: Operation timed out) [23:51] *** Sk1d has joined #archiveteam-bs