#archiveteam-bs 2017-11-28,Tue

↑back Search

Time Nickname Message
00:08 🔗 bithippo @rolfoid: I don't see any CompuServ forums related grab scripts in https://github.com/ArchiveTeam; you could look at once of these as an example as how to create the pipeline code a warrior would run: https://github.com/ArchiveTeam?utf8=%E2%9C%93&q=grab&type=&language=
00:08 🔗 astrid yeah none have been made yet
00:09 🔗 bithippo Have not had the human time to pull apart the forum URLs to determine how to iterate successfully :(
00:10 🔗 JAA CompuServe is a PITA unfortunately. Redirect and cookie hell.
00:10 🔗 bithippo FFS
00:11 🔗 JAA Also, thread IDs are shared among all subforums, but you have to know which subforum a thread is in to access it.
00:11 🔗 JAA So you either have to write a scraper beforehand or try all subforums for each thread ID.
00:11 🔗 bithippo ಠ_ಠ
00:12 🔗 JAA I've said it before: I can definitely understand why they want to get rid of this awful forum software.
00:12 🔗 Atom has joined #archiveteam-bs
00:12 🔗 bithippo Yeah, no disagreement there.
00:12 🔗 JAA The code is probably similarly unusable/unmaintainable.
00:13 🔗 bithippo Huh. There are RSS feeds.
00:15 🔗 bithippo Sigh, I think its a lost cause unfortunately.
00:17 🔗 rolfoid "Also, thread IDs are shared among all subforums, but you have to know which subforum a thread is in to access it" that would be really funny if it weren't so sad
00:18 🔗 astrid it's a fairly common webapp thing, but remains confusing
00:19 🔗 bithippo I dumped a WARC archive I grabbed on 11/14/17 here: https://archive.org/details/member.compuserve.com-forum_center-2017-11-15-warc-archive, buuuuuuuut I'm sure I didn't grab everything. I only grabbed what grab-site/wpull could see.
00:21 🔗 bithippo URL discovery is an unsolved problem at scale
00:22 🔗 JAA 190 MiB seems suspiciously small. The ArchiveBot job is at 306 GiB and not even close to done yet.
00:22 🔗 astrid i assume the archivebot job is grabbing outlinks, though, which tends to inflate that ... a lot
00:23 🔗 bithippo @JAA: I admit my attempt was.....lacking.
00:23 🔗 JAA Yeah, but not by a factor of over 1000?
00:23 🔗 JAA bithippo: Any bit helps though. :-)
00:24 🔗 bithippo Definitely curious what I was doing wrong considering the ArchiveBot job has 2.4MM requests so far.
00:25 🔗 bithippo Ahh
00:26 🔗 bithippo `docker exec warcfactory grab-site http://member.compuserve.com/forum_center/ `
00:26 🔗 JAA Ah, yeah.
00:26 🔗 bithippo I constrained excessively.
00:26 🔗 bithippo Lesson learned.
00:28 🔗 bithippo Safe to nuke my item in IA then with ArchiveBot on the job?
00:29 🔗 astrid eh, leave it there
00:29 🔗 astrid it's not hurting anybody :)
00:29 🔗 bithippo 10-4!
00:46 🔗 prb has joined #archiveteam-bs
00:57 🔗 icedice has joined #archiveteam-bs
01:15 🔗 Mateon1 has quit IRC (Remote host closed the connection)
01:19 🔗 Mateon1 has joined #archiveteam-bs
01:23 🔗 MrDignity has quit IRC (Ping timeout: 248 seconds)
01:25 🔗 godane i now know why the internet archive didn't take my uploads: https://twitter.com/textfiles/status/935139710686527491
01:25 🔗 godane i went to bed at 930am hoping some my stuff to get upload
01:26 🔗 godane anyways i fixed when i got up at around 530pm
01:29 🔗 MrDignity has joined #archiveteam-bs
01:45 🔗 dashcloud has joined #archiveteam-bs
01:55 🔗 icedice2 has joined #archiveteam-bs
01:57 🔗 icedice has quit IRC (Read error: Operation timed out)
02:09 🔗 godane looks like i screwed up the names of Felicity episodes for S01E01 to S01E05
02:10 🔗 godane thats cause i forgot to put in We WOC in the names
02:15 🔗 godane anyways here is a list of tapes that have been uploaded: https://www.patreon.com/posts/digitize-tapes-15581150
02:49 🔗 schbirid2 has joined #archiveteam-bs
02:52 🔗 schbirid has quit IRC (Read error: Operation timed out)
03:03 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
03:20 🔗 ndiddy has quit IRC ()
03:25 🔗 icedice2 has quit IRC (Read error: Connection reset by peer)
04:05 🔗 godane so ffmpeg 3.4 is not working for me
04:06 🔗 wp494_ has joined #archiveteam-bs
04:07 🔗 wp494 has quit IRC (Ping timeout: 245 seconds)
04:07 🔗 godane so vlc works with so i don't know the problem with ffmpeg
04:08 🔗 godane frame= 16 fps=0.0 q=0.0 size= 0kB time=00:00:00.00 bitrate=N/A speed=
04:08 🔗 godane normally i does that maybe twice before it works
04:08 🔗 godane but i have tried 10 times
04:12 🔗 godane has quit IRC (Quit: Leaving.)
04:22 🔗 BlueMaxim has joined #archiveteam-bs
04:32 🔗 username1 has joined #archiveteam-bs
04:33 🔗 qw3rty111 has joined #archiveteam-bs
04:33 🔗 godane has joined #archiveteam-bs
04:34 🔗 godane so i fixed it
04:34 🔗 godane turned out i had to unplug the vcr
04:34 🔗 godane and plug it back in
04:35 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
04:38 🔗 qw3rty119 has quit IRC (Read error: Operation timed out)
04:41 🔗 godane [alsa @ 0x2287260] Thread message queue blocking; consider raising the thread_queue_size option (current value: 1024)
04:41 🔗 godane [swscaler @ 0x22c5d80] Warning: data is not aligned! This can lead to a speed loss
04:41 🔗 godane i'm getting that error now
04:43 🔗 godane i put the thread_queue_size at 2048
04:46 🔗 godane so this felicity tape start off with S01E16
04:52 🔗 godane its called Felicity tape 4 in blue marker
04:53 🔗 godane the screener of Uprising for home video was taped over for this Felicity tape
05:38 🔗 ZexaronS has quit IRC (Quit: Leaving)
07:00 🔗 bithippo has quit IRC (My MacBook Air has gone to sleep. ZZZzzz…)
07:12 🔗 wp494_ is now known as wp494
07:27 🔗 fie has joined #archiveteam-bs
08:03 🔗 Valentin- has joined #archiveteam-bs
08:06 🔗 Valentine has quit IRC (Ping timeout: 506 seconds)
08:23 🔗 dashcloud has quit IRC (Read error: Operation timed out)
08:23 🔗 dashcloud has joined #archiveteam-bs
10:28 🔗 dashcloud has quit IRC (Ping timeout: 250 seconds)
10:32 🔗 dashcloud has joined #archiveteam-bs
10:45 🔗 jschwart has joined #archiveteam-bs
10:52 🔗 godane i hope this felicity tape is 8 hours
10:54 🔗 BlueMaxim has quit IRC (Quit: Leaving)
11:00 🔗 godane its really is 8 hour tape
11:05 🔗 godane i'm editing the 6 hour block i currently have so i can start uploading
11:06 🔗 godane i think there is a episode missing though
11:06 🔗 godane i only say that cause this goes thur S01E16 to S01E22
11:10 🔗 pizzaiolo has joined #archiveteam-bs
11:12 🔗 pizzaiolo has quit IRC (Client Quit)
11:13 🔗 pizzaiolo has joined #archiveteam-bs
11:13 🔗 antomati_ has joined #archiveteam-bs
11:13 🔗 swebb sets mode: +o antomati_
11:16 🔗 antomatic has quit IRC (Ping timeout: 250 seconds)
11:31 🔗 godane so from what i can tell there was episode recorded over that aired on at around 12pm
11:32 🔗 godane this took up between 04:03:27 to 04:04:24 of the tape
11:35 🔗 godane *04:07:24 i mean
13:00 🔗 godane so the uploading of those Felicity tapes is happening now
13:00 🔗 godane it been happen for the last hour plus
13:08 🔗 icedice has joined #archiveteam-bs
13:10 🔗 Mateon1 has quit IRC (Read error: Operation timed out)
13:10 🔗 Mateon1 has joined #archiveteam-bs
13:10 🔗 icedice has quit IRC (Client Quit)
13:12 🔗 icedice has joined #archiveteam-bs
13:34 🔗 JensRex Goodbye Archive Team. I'm out.
13:34 🔗 JensRex has left
13:37 🔗 Muad-Dib Is something straining the wiki server? I'm getting 508 errors
13:38 🔗 Igloo What's happened to JensRex?
14:13 🔗 bithippo has joined #archiveteam-bs
14:14 🔗 godane there maybe S02E03 on this tape also
14:21 🔗 icedice has quit IRC (Quit: Leaving)
14:30 🔗 icedice has joined #archiveteam-bs
14:38 🔗 icedice has quit IRC (Quit: Leaving)
14:41 🔗 jspiros has quit IRC (Read error: Operation timed out)
14:47 🔗 jspiros has joined #archiveteam-bs
15:01 🔗 godane so tape is done
15:02 🔗 godane and complete episode of S02E03 was on this tape
15:02 🔗 godane this must have been a T-180 tape
15:24 🔗 jschwart has quit IRC (Ping timeout: 260 seconds)
16:22 🔗 jschwart has joined #archiveteam-bs
16:37 🔗 beardicus has quit IRC (bye)
16:46 🔗 beardicus has joined #archiveteam-bs
17:23 🔗 ZexaronS has joined #archiveteam-bs
18:04 🔗 jrwr chfoo: you around
18:05 🔗 jrwr I broke wpull in a new and exciting way
18:07 🔗 JAA Impressive.
18:08 🔗 jrwr http://paste.debian.net/998029/
18:08 🔗 jrwr I /think/ its closing the connection and then trying to check the connection, its is version 1.2.3
18:08 🔗 jrwr Im testing if I can get it to do it on latest now
18:11 🔗 JAA Yeah, Sending headers -> Closing connection -> Reading header doesn't look good.
18:11 🔗 JAA Can you reproduce it?
18:12 🔗 JAA Works fine for me with that URL.
18:12 🔗 jrwr ya
18:12 🔗 jrwr happens every time on my plat (aarm64)
18:12 🔗 JAA Huh
18:12 🔗 jrwr its in Newgrabber-warrior to boot
18:14 🔗 username1 is now known as schbirid
18:15 🔗 jrwr HA, latest version doesn't do it
18:15 🔗 jrwr matter of fact wpull 2.0.1 runs a ton faster then 1.2.3
18:15 🔗 jrwr arkiver: why are you still using 1.2.3 :P
18:16 🔗 JAA 2.0.1 is very crashy though. :-/
18:16 🔗 JAA FalconK's fork less so, but youtube-dl is completely broken there.
18:16 🔗 schbirid ooh there is a fork?
18:16 🔗 JAA Yeah, it's used by ArchiveBot.
18:18 🔗 JAA 2.0.1 also sometimes gets stuck while retrieving a URL and may spontaneously decide to eat all your RAM and CPU.
18:18 🔗 jrwr I wonder if any work has been done to update wget+lua
18:18 🔗 jrwr or its all being put into wpull
18:18 🔗 JAA There is definitely a newer version of wget-lua than the one we're using in the warrior.
18:20 🔗 JAA And well, wpull development is also pretty much dead. (Cf. discussion some weeks ago about Open Collective etc.)
18:20 🔗 jrwr Ya, I can't run the newsgrabber pipeline on AARM64 it just breaks
18:20 🔗 jrwr since it can't use the pre-compiled version of wpull that ark is using
18:21 🔗 jrwr also it runs concurrent 5 on wpull, so running the pipeline with 4 is about the best you want to do
18:22 🔗 JAA I've used 16 concurrent connections before, and it worked fine.
18:22 🔗 jrwr Ya
18:22 🔗 JAA That's with 1.2.3.
18:22 🔗 jrwr its not a huge deal
18:22 🔗 jrwr mostly just threads for days
18:23 🔗 JAA Yeah
18:26 🔗 schbirid is it on pip?
18:31 🔗 JAA schbirid: You mean PyPI?
18:31 🔗 schbirid yeah
18:31 🔗 JAA No, only GitHub.
18:34 🔗 schbirid how can i install that to ~/.local ? :}
18:35 🔗 JAA I don't know the syntax by heart, but pip supports installing from git repositories.
18:35 🔗 schbirid oh nice
18:35 🔗 schbirid thx
18:36 🔗 JAA pip install git+git://github.com/...
18:36 🔗 JAA Or git+https or git+ssh or whatever.
18:36 🔗 schbirid pip install --user git+https://github.com/falconkirtaran/wpull.git
18:37 🔗 arkiver yeah there was something with 2.0.1 that made us not use it
18:37 🔗 JAA It would be really nice if we could turn 2.0.x into something usable.
18:37 🔗 jrwr anyway, I /was/ going to spin up like 50 scaleway ARM64 instances for shits
18:38 🔗 jrwr but that got shot in the food
18:38 🔗 jrwr foot*
18:38 🔗 jrwr schbirid: pip install --user <package>
18:49 🔗 jschwart has quit IRC (Quit: Konversation terminated!)
18:59 🔗 dashcloud has quit IRC (Read error: Operation timed out)
19:01 🔗 dashcloud has joined #archiveteam-bs
19:22 🔗 jschwart has joined #archiveteam-bs
19:28 🔗 MrDignity has quit IRC (Remote host closed the connection)
19:28 🔗 MrDignity has joined #archiveteam-bs
19:53 🔗 icedice has joined #archiveteam-bs
20:01 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
20:22 🔗 Stilett0 has joined #archiveteam-bs
20:45 🔗 LastNinja has joined #archiveteam-bs
20:47 🔗 icedice has quit IRC (Quit: Leaving)
20:48 🔗 icedice has joined #archiveteam-bs
21:05 🔗 Stiletto has joined #archiveteam-bs
21:06 🔗 Stiletto has quit IRC (Client Quit)
21:16 🔗 icedice has quit IRC (Ping timeout: 506 seconds)
21:45 🔗 icedice has joined #archiveteam-bs
21:48 🔗 dashcloud godane: thread message queueing is some black magic constant- 1024, 2048 or 4096 are what I've used
21:49 🔗 Ceryn I've often used 4096 as max sizes for single receives and then regretted it. I've started just going for a megabyte instead. The memory is no issue and it saves you a lot of headache.
22:15 🔗 bithippo I wish IA spoke git natively
22:16 🔗 bithippo ie upload git repo, be able to clone from it (but still download as zip or with torrent)
22:17 🔗 Harzilein bithippo: ia is sometimes almost impossible to use without the capability to calculate range offsets into zipfiles
22:18 🔗 Harzilein bithippo: but if one has it, it's a decent user experience for the downloader side of things
22:19 🔗 jschwart has quit IRC (Quit: Konversation terminated!)
22:23 🔗 BlueMaxim has joined #archiveteam-bs
22:24 🔗 bithippo Makes sense.
22:27 🔗 jtn2 has quit IRC (Remote host closed the connection)
22:35 🔗 BlueMaxim has quit IRC (Quit: Leaving)
22:36 🔗 dashcloud has quit IRC (Read error: Operation timed out)
22:39 🔗 jtn2 has joined #archiveteam-bs
22:44 🔗 dashcloud has joined #archiveteam-bs
22:45 🔗 jtn2_ has joined #archiveteam-bs
22:48 🔗 jtn2 has quit IRC (Read error: Operation timed out)
22:53 🔗 jtn2_ has quit IRC (Read error: Operation timed out)
22:59 🔗 jtn2 has joined #archiveteam-bs
23:06 🔗 jtn2 has quit IRC (Ping timeout: 250 seconds)
23:07 🔗 jtn2 has joined #archiveteam-bs
23:16 🔗 wp494 CoolCanuc: just an update on the Metro stuff, there wasn't even anything in the boxes today
23:16 🔗 wp494 (in winnipeg that is)
23:17 🔗 BlueMaxim has joined #archiveteam-bs
23:19 🔗 wp494 so either they've already closed shop (which seems to be the case seeing as one of their political writers is gone, and a lady that hung out in Winnipeg Square underground wishing people well in the mornings who was apparently affiliated with them wasn't even there today) or it'll be less frequent publishing until january
23:19 🔗 wp494 I'm inclined to think the former
23:23 🔗 dashcloud Ceryn: any idea what the past duration exceeded messages are, and if I should be worried about them?
23:25 🔗 Ceryn dashcloud: Oh no, I have no clue about the context. Hard-coded length limits just usually aren't as large as you think they are. :P
23:25 🔗 dashcloud okay- thanks
23:26 🔗 icedice has quit IRC (Read error: Operation timed out)

irclogger-viewer