#archiveteam-bs 2017-11-28,Tue

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
bithippo@rolfoid: I don't see any CompuServ forums related grab scripts in https://github.com/ArchiveTeam; you could look at once of these as an example as how to create the pipeline code a warrior would run: https://github.com/ArchiveTeam?utf8=%E2%9C%93&q=grab&type=&language= [00:08]
astridyeah none have been made yet [00:08]
bithippoHave not had the human time to pull apart the forum URLs to determine how to iterate successfully :( [00:09]
JAACompuServe is a PITA unfortunately. Redirect and cookie hell. [00:10]
bithippoFFS [00:10]
JAAAlso, thread IDs are shared among all subforums, but you have to know which subforum a thread is in to access it.
So you either have to write a scraper beforehand or try all subforums for each thread ID.
[00:11]
bithippoಠ_ಠ [00:11]
JAAI've said it before: I can definitely understand why they want to get rid of this awful forum software. [00:12]
***Atom has joined #archiveteam-bs [00:12]
bithippoYeah, no disagreement there. [00:12]
JAAThe code is probably similarly unusable/unmaintainable. [00:12]
bithippoHuh. There are RSS feeds.
Sigh, I think its a lost cause unfortunately.
[00:13]
rolfoid"Also, thread IDs are shared among all subforums, but you have to know which subforum a thread is in to access it" that would be really funny if it weren't so sad [00:17]
astridit's a fairly common webapp thing, but remains confusing [00:18]
bithippoI dumped a WARC archive I grabbed on 11/14/17 here: https://archive.org/details/member.compuserve.com-forum_center-2017-11-15-warc-archive, buuuuuuuut I'm sure I didn't grab everything. I only grabbed what grab-site/wpull could see.
URL discovery is an unsolved problem at scale
[00:19]
JAA190 MiB seems suspiciously small. The ArchiveBot job is at 306 GiB and not even close to done yet. [00:22]
astridi assume the archivebot job is grabbing outlinks, though, which tends to inflate that ... a lot [00:22]
bithippo@JAA: I admit my attempt was.....lacking. [00:23]
JAAYeah, but not by a factor of over 1000?
bithippo: Any bit helps though. :-)
[00:23]
bithippoDefinitely curious what I was doing wrong considering the ArchiveBot job has 2.4MM requests so far.
Ahh
`docker exec warcfactory grab-site http://member.compuserve.com/forum_center/ `
[00:24]
JAAAh, yeah. [00:26]
bithippoI constrained excessively.
Lesson learned.
Safe to nuke my item in IA then with ArchiveBot on the job?
[00:26]
astrideh, leave it there
it's not hurting anybody :)
[00:29]
bithippo10-4! [00:29]
.... (idle for 17mn)
***prb has joined #archiveteam-bs [00:46]
icedice has joined #archiveteam-bs [00:57]
.... (idle for 18mn)
Mateon1 has quit IRC (Remote host closed the connection)
Mateon1 has joined #archiveteam-bs
MrDignity has quit IRC (Ping timeout: 248 seconds)
[01:15]
godanei now know why the internet archive didn't take my uploads: https://twitter.com/textfiles/status/935139710686527491
i went to bed at 930am hoping some my stuff to get upload
anyways i fixed when i got up at around 530pm
[01:25]
***MrDignity has joined #archiveteam-bs [01:29]
.... (idle for 16mn)
dashcloud has joined #archiveteam-bs [01:45]
icedice2 has joined #archiveteam-bs
icedice has quit IRC (Read error: Operation timed out)
[01:55]
godanelooks like i screwed up the names of Felicity episodes for S01E01 to S01E05
thats cause i forgot to put in We WOC in the names
[02:09]
anyways here is a list of tapes that have been uploaded: https://www.patreon.com/posts/digitize-tapes-15581150 [02:15]
....... (idle for 34mn)
***schbirid2 has joined #archiveteam-bs
schbirid has quit IRC (Read error: Operation timed out)
[02:49]
pizzaiolo has quit IRC (Remote host closed the connection) [03:03]
.... (idle for 17mn)
ndiddy has quit IRC () [03:20]
icedice2 has quit IRC (Read error: Connection reset by peer) [03:25]
......... (idle for 40mn)
godaneso ffmpeg 3.4 is not working for me [04:05]
***wp494_ has joined #archiveteam-bs
wp494 has quit IRC (Ping timeout: 245 seconds)
[04:06]
godaneso vlc works with so i don't know the problem with ffmpeg
frame= 16 fps=0.0 q=0.0 size= 0kB time=00:00:00.00 bitrate=N/A speed=
normally i does that maybe twice before it works
but i have tried 10 times
[04:07]
***godane has quit IRC (Quit: Leaving.) [04:12]
BlueMaxim has joined #archiveteam-bs [04:22]
username1 has joined #archiveteam-bs
qw3rty111 has joined #archiveteam-bs
godane has joined #archiveteam-bs
[04:32]
godaneso i fixed it
turned out i had to unplug the vcr
and plug it back in
[04:34]
***schbirid2 has quit IRC (Read error: Operation timed out)
qw3rty119 has quit IRC (Read error: Operation timed out)
[04:35]
godane[alsa @ 0x2287260] Thread message queue blocking; consider raising the thread_queue_size option (current value: 1024)
[swscaler @ 0x22c5d80] Warning: data is not aligned! This can lead to a speed loss
i'm getting that error now
i put the thread_queue_size at 2048
so this felicity tape start off with S01E16
[04:41]
its called Felicity tape 4 in blue marker
the screener of Uprising for home video was taped over for this Felicity tape
[04:52]
.......... (idle for 45mn)
***ZexaronS has quit IRC (Quit: Leaving) [05:38]
................. (idle for 1h22mn)
bithippo has quit IRC (My MacBook Air has gone to sleep. ZZZzzz…) [07:00]
wp494_ is now known as wp494 [07:12]
.... (idle for 15mn)
fie has joined #archiveteam-bs [07:27]
........ (idle for 36mn)
Valentin- has joined #archiveteam-bs
Valentine has quit IRC (Ping timeout: 506 seconds)
[08:03]
.... (idle for 17mn)
dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #archiveteam-bs
[08:23]
.......................... (idle for 2h5mn)
dashcloud has quit IRC (Ping timeout: 250 seconds)
dashcloud has joined #archiveteam-bs
[10:28]
jschwart has joined #archiveteam-bs [10:45]
godanei hope this felicity tape is 8 hours [10:52]
***BlueMaxim has quit IRC (Quit: Leaving) [10:54]
godaneits really is 8 hour tape [11:00]
i'm editing the 6 hour block i currently have so i can start uploading
i think there is a episode missing though
i only say that cause this goes thur S01E16 to S01E22
[11:05]
***pizzaiolo has joined #archiveteam-bs
pizzaiolo has quit IRC (Client Quit)
pizzaiolo has joined #archiveteam-bs
antomati_ has joined #archiveteam-bs
swebb sets mode: +o antomati_
antomatic has quit IRC (Ping timeout: 250 seconds)
[11:10]
.... (idle for 15mn)
godaneso from what i can tell there was episode recorded over that aired on at around 12pm
this took up between 04:03:27 to 04:04:24 of the tape
*04:07:24 i mean
[11:31]
.................. (idle for 1h25mn)
so the uploading of those Felicity tapes is happening now
it been happen for the last hour plus
[13:00]
***icedice has joined #archiveteam-bs
Mateon1 has quit IRC (Read error: Operation timed out)
Mateon1 has joined #archiveteam-bs
icedice has quit IRC (Client Quit)
icedice has joined #archiveteam-bs
[13:08]
..... (idle for 22mn)
JensRexGoodbye Archive Team. I'm out. [13:34]
***JensRex has left [13:34]
Muad-DibIs something straining the wiki server? I'm getting 508 errors [13:37]
IglooWhat's happened to JensRex? [13:38]
........ (idle for 35mn)
***bithippo has joined #archiveteam-bs [14:13]
godanethere maybe S02E03 on this tape also [14:14]
***icedice has quit IRC (Quit: Leaving) [14:21]
icedice has joined #archiveteam-bs [14:30]
icedice has quit IRC (Quit: Leaving)
jspiros has quit IRC (Read error: Operation timed out)
[14:38]
jspiros has joined #archiveteam-bs [14:47]
godaneso tape is done
and complete episode of S02E03 was on this tape
this must have been a T-180 tape
[15:01]
..... (idle for 22mn)
***jschwart has quit IRC (Ping timeout: 260 seconds) [15:24]
............ (idle for 58mn)
jschwart has joined #archiveteam-bs [16:22]
.... (idle for 15mn)
beardicus has quit IRC (bye) [16:37]
beardicus has joined #archiveteam-bs [16:46]
........ (idle for 37mn)
ZexaronS has joined #archiveteam-bs [17:23]
......... (idle for 41mn)
jrwrchfoo: you around
I broke wpull in a new and exciting way
[18:04]
JAAImpressive. [18:07]
jrwrhttp://paste.debian.net/998029/
I /think/ its closing the connection and then trying to check the connection, its is version 1.2.3
Im testing if I can get it to do it on latest now
[18:08]
JAAYeah, Sending headers -> Closing connection -> Reading header doesn't look good.
Can you reproduce it?
Works fine for me with that URL.
[18:11]
jrwrya
happens every time on my plat (aarm64)
[18:12]
JAAHuh [18:12]
jrwrits in Newgrabber-warrior to boot [18:12]
***username1 is now known as schbirid [18:14]
jrwrHA, latest version doesn't do it
matter of fact wpull 2.0.1 runs a ton faster then 1.2.3
arkiver: why are you still using 1.2.3 :P
[18:15]
JAA2.0.1 is very crashy though. :-/
FalconK's fork less so, but youtube-dl is completely broken there.
[18:16]
schbiridooh there is a fork? [18:16]
JAAYeah, it's used by ArchiveBot.
2.0.1 also sometimes gets stuck while retrieving a URL and may spontaneously decide to eat all your RAM and CPU.
[18:16]
jrwrI wonder if any work has been done to update wget+lua
or its all being put into wpull
[18:18]
JAAThere is definitely a newer version of wget-lua than the one we're using in the warrior.
And well, wpull development is also pretty much dead. (Cf. discussion some weeks ago about Open Collective etc.)
[18:18]
jrwrYa, I can't run the newsgrabber pipeline on AARM64 it just breaks
since it can't use the pre-compiled version of wpull that ark is using
also it runs concurrent 5 on wpull, so running the pipeline with 4 is about the best you want to do
[18:20]
JAAI've used 16 concurrent connections before, and it worked fine. [18:22]
jrwrYa [18:22]
JAAThat's with 1.2.3. [18:22]
jrwrits not a huge deal
mostly just threads for days
[18:22]
JAAYeah [18:23]
schbiridis it on pip? [18:26]
JAAschbirid: You mean PyPI? [18:31]
schbiridyeah [18:31]
JAANo, only GitHub. [18:31]
schbiridhow can i install that to ~/.local ? :} [18:34]
JAAI don't know the syntax by heart, but pip supports installing from git repositories. [18:35]
schbiridoh nice
thx
[18:35]
JAApip install git+git://github.com/...
Or git+https or git+ssh or whatever.
[18:36]
schbiridpip install --user git+https://github.com/falconkirtaran/wpull.git [18:36]
arkiveryeah there was something with 2.0.1 that made us not use it [18:37]
JAAIt would be really nice if we could turn 2.0.x into something usable. [18:37]
jrwranyway, I /was/ going to spin up like 50 scaleway ARM64 instances for shits
but that got shot in the food
foot*
schbirid: pip install --user <package>
[18:37]
***jschwart has quit IRC (Quit: Konversation terminated!) [18:49]
dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #archiveteam-bs
[18:59]
..... (idle for 21mn)
jschwart has joined #archiveteam-bs [19:22]
MrDignity has quit IRC (Remote host closed the connection)
MrDignity has joined #archiveteam-bs
[19:28]
...... (idle for 25mn)
icedice has joined #archiveteam-bs [19:53]
Stilett0 has quit IRC (Read error: Operation timed out) [20:01]
..... (idle for 21mn)
Stilett0 has joined #archiveteam-bs [20:22]
..... (idle for 23mn)
LastNinja has joined #archiveteam-bs
icedice has quit IRC (Quit: Leaving)
icedice has joined #archiveteam-bs
[20:45]
.... (idle for 17mn)
Stiletto has joined #archiveteam-bs
Stiletto has quit IRC (Client Quit)
[21:05]
icedice has quit IRC (Ping timeout: 506 seconds) [21:16]
...... (idle for 29mn)
icedice has joined #archiveteam-bs [21:45]
dashcloudgodane: thread message queueing is some black magic constant- 1024, 2048 or 4096 are what I've used [21:48]
CerynI've often used 4096 as max sizes for single receives and then regretted it. I've started just going for a megabyte instead. The memory is no issue and it saves you a lot of headache. [21:49]
...... (idle for 26mn)
bithippoI wish IA spoke git natively
ie upload git repo, be able to clone from it (but still download as zip or with torrent)
[22:15]
Harzileinbithippo: ia is sometimes almost impossible to use without the capability to calculate range offsets into zipfiles
bithippo: but if one has it, it's a decent user experience for the downloader side of things
[22:17]
***jschwart has quit IRC (Quit: Konversation terminated!)
BlueMaxim has joined #archiveteam-bs
[22:19]
bithippoMakes sense. [22:24]
***jtn2 has quit IRC (Remote host closed the connection) [22:27]
BlueMaxim has quit IRC (Quit: Leaving)
dashcloud has quit IRC (Read error: Operation timed out)
jtn2 has joined #archiveteam-bs
[22:35]
dashcloud has joined #archiveteam-bs
jtn2_ has joined #archiveteam-bs
jtn2 has quit IRC (Read error: Operation timed out)
[22:44]
jtn2_ has quit IRC (Read error: Operation timed out) [22:53]
jtn2 has joined #archiveteam-bs [22:59]
jtn2 has quit IRC (Ping timeout: 250 seconds)
jtn2 has joined #archiveteam-bs
[23:06]
wp494CoolCanuc: just an update on the Metro stuff, there wasn't even anything in the boxes today
(in winnipeg that is)
[23:16]
***BlueMaxim has joined #archiveteam-bs [23:17]
wp494so either they've already closed shop (which seems to be the case seeing as one of their political writers is gone, and a lady that hung out in Winnipeg Square underground wishing people well in the mornings who was apparently affiliated with them wasn't even there today) or it'll be less frequent publishing until january
I'm inclined to think the former
[23:19]
dashcloudCeryn: any idea what the past duration exceeded messages are, and if I should be worried about them? [23:23]
Ceryndashcloud: Oh no, I have no clue about the context. Hard-coded length limits just usually aren't as large as you think they are. :P [23:25]
dashcloudokay- thanks [23:25]
***icedice has quit IRC (Read error: Operation timed out) [23:26]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)