[00:08] @rolfoid: I don't see any CompuServ forums related grab scripts in https://github.com/ArchiveTeam; you could look at once of these as an example as how to create the pipeline code a warrior would run: https://github.com/ArchiveTeam?utf8=%E2%9C%93&q=grab&type=&language= [00:08] yeah none have been made yet [00:09] Have not had the human time to pull apart the forum URLs to determine how to iterate successfully :( [00:10] CompuServe is a PITA unfortunately. Redirect and cookie hell. [00:10] FFS [00:11] Also, thread IDs are shared among all subforums, but you have to know which subforum a thread is in to access it. [00:11] So you either have to write a scraper beforehand or try all subforums for each thread ID. [00:11] ಠ_ಠ [00:12] I've said it before: I can definitely understand why they want to get rid of this awful forum software. [00:12] *** Atom has joined #archiveteam-bs [00:12] Yeah, no disagreement there. [00:12] The code is probably similarly unusable/unmaintainable. [00:13] Huh. There are RSS feeds. [00:15] Sigh, I think its a lost cause unfortunately. [00:17] "Also, thread IDs are shared among all subforums, but you have to know which subforum a thread is in to access it" that would be really funny if it weren't so sad [00:18] it's a fairly common webapp thing, but remains confusing [00:19] I dumped a WARC archive I grabbed on 11/14/17 here: https://archive.org/details/member.compuserve.com-forum_center-2017-11-15-warc-archive, buuuuuuuut I'm sure I didn't grab everything. I only grabbed what grab-site/wpull could see. [00:21] URL discovery is an unsolved problem at scale [00:22] 190 MiB seems suspiciously small. The ArchiveBot job is at 306 GiB and not even close to done yet. [00:22] i assume the archivebot job is grabbing outlinks, though, which tends to inflate that ... a lot [00:23] @JAA: I admit my attempt was.....lacking. [00:23] Yeah, but not by a factor of over 1000? [00:23] bithippo: Any bit helps though. :-) [00:24] Definitely curious what I was doing wrong considering the ArchiveBot job has 2.4MM requests so far. [00:25] Ahh [00:26] `docker exec warcfactory grab-site http://member.compuserve.com/forum_center/ ` [00:26] Ah, yeah. [00:26] I constrained excessively. [00:26] Lesson learned. [00:28] Safe to nuke my item in IA then with ArchiveBot on the job? [00:29] eh, leave it there [00:29] it's not hurting anybody :) [00:29] 10-4! [00:46] *** prb has joined #archiveteam-bs [00:57] *** icedice has joined #archiveteam-bs [01:15] *** Mateon1 has quit IRC (Remote host closed the connection) [01:19] *** Mateon1 has joined #archiveteam-bs [01:23] *** MrDignity has quit IRC (Ping timeout: 248 seconds) [01:25] i now know why the internet archive didn't take my uploads: https://twitter.com/textfiles/status/935139710686527491 [01:25] i went to bed at 930am hoping some my stuff to get upload [01:26] anyways i fixed when i got up at around 530pm [01:29] *** MrDignity has joined #archiveteam-bs [01:45] *** dashcloud has joined #archiveteam-bs [01:55] *** icedice2 has joined #archiveteam-bs [01:57] *** icedice has quit IRC (Read error: Operation timed out) [02:09] looks like i screwed up the names of Felicity episodes for S01E01 to S01E05 [02:10] thats cause i forgot to put in We WOC in the names [02:15] anyways here is a list of tapes that have been uploaded: https://www.patreon.com/posts/digitize-tapes-15581150 [02:49] *** schbirid2 has joined #archiveteam-bs [02:52] *** schbirid has quit IRC (Read error: Operation timed out) [03:03] *** pizzaiolo has quit IRC (Remote host closed the connection) [03:20] *** ndiddy has quit IRC () [03:25] *** icedice2 has quit IRC (Read error: Connection reset by peer) [04:05] so ffmpeg 3.4 is not working for me [04:06] *** wp494_ has joined #archiveteam-bs [04:07] *** wp494 has quit IRC (Ping timeout: 245 seconds) [04:07] so vlc works with so i don't know the problem with ffmpeg [04:08] frame= 16 fps=0.0 q=0.0 size= 0kB time=00:00:00.00 bitrate=N/A speed= [04:08] normally i does that maybe twice before it works [04:08] but i have tried 10 times [04:12] *** godane has quit IRC (Quit: Leaving.) [04:22] *** BlueMaxim has joined #archiveteam-bs [04:32] *** username1 has joined #archiveteam-bs [04:33] *** qw3rty111 has joined #archiveteam-bs [04:33] *** godane has joined #archiveteam-bs [04:34] so i fixed it [04:34] turned out i had to unplug the vcr [04:34] and plug it back in [04:35] *** schbirid2 has quit IRC (Read error: Operation timed out) [04:38] *** qw3rty119 has quit IRC (Read error: Operation timed out) [04:41] [alsa @ 0x2287260] Thread message queue blocking; consider raising the thread_queue_size option (current value: 1024) [04:41] [swscaler @ 0x22c5d80] Warning: data is not aligned! This can lead to a speed loss [04:41] i'm getting that error now [04:43] i put the thread_queue_size at 2048 [04:46] so this felicity tape start off with S01E16 [04:52] its called Felicity tape 4 in blue marker [04:53] the screener of Uprising for home video was taped over for this Felicity tape [05:38] *** ZexaronS has quit IRC (Quit: Leaving) [07:00] *** bithippo has quit IRC (My MacBook Air has gone to sleep. ZZZzzz…) [07:12] *** wp494_ is now known as wp494 [07:27] *** fie has joined #archiveteam-bs [08:03] *** Valentin- has joined #archiveteam-bs [08:06] *** Valentine has quit IRC (Ping timeout: 506 seconds) [08:23] *** dashcloud has quit IRC (Read error: Operation timed out) [08:23] *** dashcloud has joined #archiveteam-bs [10:28] *** dashcloud has quit IRC (Ping timeout: 250 seconds) [10:32] *** dashcloud has joined #archiveteam-bs [10:45] *** jschwart has joined #archiveteam-bs [10:52] i hope this felicity tape is 8 hours [10:54] *** BlueMaxim has quit IRC (Quit: Leaving) [11:00] its really is 8 hour tape [11:05] i'm editing the 6 hour block i currently have so i can start uploading [11:06] i think there is a episode missing though [11:06] i only say that cause this goes thur S01E16 to S01E22 [11:10] *** pizzaiolo has joined #archiveteam-bs [11:12] *** pizzaiolo has quit IRC (Client Quit) [11:13] *** pizzaiolo has joined #archiveteam-bs [11:13] *** antomati_ has joined #archiveteam-bs [11:13] *** swebb sets mode: +o antomati_ [11:16] *** antomatic has quit IRC (Ping timeout: 250 seconds) [11:31] so from what i can tell there was episode recorded over that aired on at around 12pm [11:32] this took up between 04:03:27 to 04:04:24 of the tape [11:35] *04:07:24 i mean [13:00] so the uploading of those Felicity tapes is happening now [13:00] it been happen for the last hour plus [13:08] *** icedice has joined #archiveteam-bs [13:10] *** Mateon1 has quit IRC (Read error: Operation timed out) [13:10] *** Mateon1 has joined #archiveteam-bs [13:10] *** icedice has quit IRC (Client Quit) [13:12] *** icedice has joined #archiveteam-bs [13:34] Goodbye Archive Team. I'm out. [13:34] *** JensRex has left [13:37] Is something straining the wiki server? I'm getting 508 errors [13:38] What's happened to JensRex? [14:13] *** bithippo has joined #archiveteam-bs [14:14] there maybe S02E03 on this tape also [14:21] *** icedice has quit IRC (Quit: Leaving) [14:30] *** icedice has joined #archiveteam-bs [14:38] *** icedice has quit IRC (Quit: Leaving) [14:41] *** jspiros has quit IRC (Read error: Operation timed out) [14:47] *** jspiros has joined #archiveteam-bs [15:01] so tape is done [15:02] and complete episode of S02E03 was on this tape [15:02] this must have been a T-180 tape [15:24] *** jschwart has quit IRC (Ping timeout: 260 seconds) [16:22] *** jschwart has joined #archiveteam-bs [16:37] *** beardicus has quit IRC (bye) [16:46] *** beardicus has joined #archiveteam-bs [17:23] *** ZexaronS has joined #archiveteam-bs [18:04] chfoo: you around [18:05] I broke wpull in a new and exciting way [18:07] Impressive. [18:08] http://paste.debian.net/998029/ [18:08] I /think/ its closing the connection and then trying to check the connection, its is version 1.2.3 [18:08] Im testing if I can get it to do it on latest now [18:11] Yeah, Sending headers -> Closing connection -> Reading header doesn't look good. [18:11] Can you reproduce it? [18:12] Works fine for me with that URL. [18:12] ya [18:12] happens every time on my plat (aarm64) [18:12] Huh [18:12] its in Newgrabber-warrior to boot [18:14] *** username1 is now known as schbirid [18:15] HA, latest version doesn't do it [18:15] matter of fact wpull 2.0.1 runs a ton faster then 1.2.3 [18:15] arkiver: why are you still using 1.2.3 :P [18:16] 2.0.1 is very crashy though. :-/ [18:16] FalconK's fork less so, but youtube-dl is completely broken there. [18:16] ooh there is a fork? [18:16] Yeah, it's used by ArchiveBot. [18:18] 2.0.1 also sometimes gets stuck while retrieving a URL and may spontaneously decide to eat all your RAM and CPU. [18:18] I wonder if any work has been done to update wget+lua [18:18] or its all being put into wpull [18:18] There is definitely a newer version of wget-lua than the one we're using in the warrior. [18:20] And well, wpull development is also pretty much dead. (Cf. discussion some weeks ago about Open Collective etc.) [18:20] Ya, I can't run the newsgrabber pipeline on AARM64 it just breaks [18:20] since it can't use the pre-compiled version of wpull that ark is using [18:21] also it runs concurrent 5 on wpull, so running the pipeline with 4 is about the best you want to do [18:22] I've used 16 concurrent connections before, and it worked fine. [18:22] Ya [18:22] That's with 1.2.3. [18:22] its not a huge deal [18:22] mostly just threads for days [18:23] Yeah [18:26] is it on pip? [18:31] schbirid: You mean PyPI? [18:31] yeah [18:31] No, only GitHub. [18:34] how can i install that to ~/.local ? :} [18:35] I don't know the syntax by heart, but pip supports installing from git repositories. [18:35] oh nice [18:35] thx [18:36] pip install git+git://github.com/... [18:36] Or git+https or git+ssh or whatever. [18:36] pip install --user git+https://github.com/falconkirtaran/wpull.git [18:37] yeah there was something with 2.0.1 that made us not use it [18:37] It would be really nice if we could turn 2.0.x into something usable. [18:37] anyway, I /was/ going to spin up like 50 scaleway ARM64 instances for shits [18:38] but that got shot in the food [18:38] foot* [18:38] schbirid: pip install --user [18:49] *** jschwart has quit IRC (Quit: Konversation terminated!) [18:59] *** dashcloud has quit IRC (Read error: Operation timed out) [19:01] *** dashcloud has joined #archiveteam-bs [19:22] *** jschwart has joined #archiveteam-bs [19:28] *** MrDignity has quit IRC (Remote host closed the connection) [19:28] *** MrDignity has joined #archiveteam-bs [19:53] *** icedice has joined #archiveteam-bs [20:01] *** Stilett0 has quit IRC (Read error: Operation timed out) [20:22] *** Stilett0 has joined #archiveteam-bs [20:45] *** LastNinja has joined #archiveteam-bs [20:47] *** icedice has quit IRC (Quit: Leaving) [20:48] *** icedice has joined #archiveteam-bs [21:05] *** Stiletto has joined #archiveteam-bs [21:06] *** Stiletto has quit IRC (Client Quit) [21:16] *** icedice has quit IRC (Ping timeout: 506 seconds) [21:45] *** icedice has joined #archiveteam-bs [21:48] godane: thread message queueing is some black magic constant- 1024, 2048 or 4096 are what I've used [21:49] I've often used 4096 as max sizes for single receives and then regretted it. I've started just going for a megabyte instead. The memory is no issue and it saves you a lot of headache. [22:15] I wish IA spoke git natively [22:16] ie upload git repo, be able to clone from it (but still download as zip or with torrent) [22:17] bithippo: ia is sometimes almost impossible to use without the capability to calculate range offsets into zipfiles [22:18] bithippo: but if one has it, it's a decent user experience for the downloader side of things [22:19] *** jschwart has quit IRC (Quit: Konversation terminated!) [22:23] *** BlueMaxim has joined #archiveteam-bs [22:24] Makes sense. [22:27] *** jtn2 has quit IRC (Remote host closed the connection) [22:35] *** BlueMaxim has quit IRC (Quit: Leaving) [22:36] *** dashcloud has quit IRC (Read error: Operation timed out) [22:39] *** jtn2 has joined #archiveteam-bs [22:44] *** dashcloud has joined #archiveteam-bs [22:45] *** jtn2_ has joined #archiveteam-bs [22:48] *** jtn2 has quit IRC (Read error: Operation timed out) [22:53] *** jtn2_ has quit IRC (Read error: Operation timed out) [22:59] *** jtn2 has joined #archiveteam-bs [23:06] *** jtn2 has quit IRC (Ping timeout: 250 seconds) [23:07] *** jtn2 has joined #archiveteam-bs [23:16] CoolCanuc: just an update on the Metro stuff, there wasn't even anything in the boxes today [23:16] (in winnipeg that is) [23:17] *** BlueMaxim has joined #archiveteam-bs [23:19] so either they've already closed shop (which seems to be the case seeing as one of their political writers is gone, and a lady that hung out in Winnipeg Square underground wishing people well in the mornings who was apparently affiliated with them wasn't even there today) or it'll be less frequent publishing until january [23:19] I'm inclined to think the former [23:23] Ceryn: any idea what the past duration exceeded messages are, and if I should be worried about them? [23:25] dashcloud: Oh no, I have no clue about the context. Hard-coded length limits just usually aren't as large as you think they are. :P [23:25] okay- thanks [23:26] *** icedice has quit IRC (Read error: Operation timed out)