Time |
Nickname |
Message |
00:02
🔗
|
|
bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) |
00:13
🔗
|
|
omarroth has joined #archiveteam-bs |
00:19
🔗
|
ayanami_ |
JAA How long did it take to archive? How much GB/TB/PB/whatever in data was saved? |
00:20
🔗
|
|
RichardG_ is now known as RichardG |
00:20
🔗
|
JAA |
ayanami_: Just a couple hours, 3 GiB or so. This is only the thread HTML and a few associated things though, no images, attachments, etc. |
00:21
🔗
|
JAA |
Don't have the exact size for just the thread URLs since it's all running in the same grab and writing to the same WARCs. |
00:21
🔗
|
JAA |
About 3 hours and 20 minutes for the thread URLs. |
00:22
🔗
|
JAA |
Post URLs will take a while since they go up to over 2.3 million (so it takes 2.3 million requests). Should still finish in time easily though. |
00:22
🔗
|
JAA |
I'm doing roughly 10k requests per minute at the moment. |
00:23
🔗
|
JAA |
Oh, that's ~3 GiB of compressed WARCs. I don't know the uncompressed size. |
00:24
🔗
|
JAA |
It's text, so it compresses very well, but it won't be huge since the forums aren't *that* large. |
00:27
🔗
|
|
bitBaron has joined #archiveteam-bs |
00:47
🔗
|
|
wyatt8740 has quit IRC (Ping timeout: 360 seconds) |
00:55
🔗
|
JAA |
Current ETA is around 10:00 UTC. |
01:06
🔗
|
|
tuluu has quit IRC (Read error: Connection refused) |
01:07
🔗
|
|
bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) |
01:08
🔗
|
|
tuluu has joined #archiveteam-bs |
01:10
🔗
|
|
ATrescue has quit IRC (Ping timeout: 260 seconds) |
02:03
🔗
|
|
ATrescue has joined #archiveteam-bs |
02:17
🔗
|
|
Anthony_ has joined #archiveteam-bs |
02:27
🔗
|
|
Anthony_ has quit IRC (Ping timeout: 262 seconds) |
02:33
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
02:42
🔗
|
|
omarroth has quit IRC (Remote host closed the connection) |
02:55
🔗
|
JAA |
Huh, my request rate plummeted the second I started the ArchiveBot job. I guess I shouldn't go much faster then. |
02:57
🔗
|
ayanami_ |
For what? 99? JAA |
03:01
🔗
|
JAA |
Yeah |
03:03
🔗
|
|
ayanami_ has quit IRC (Quit: Leaving) |
03:08
🔗
|
|
qw3rty114 has joined #archiveteam-bs |
03:14
🔗
|
|
BlueMax has joined #archiveteam-bs |
03:15
🔗
|
|
qw3rty113 has quit IRC (Read error: Operation timed out) |
03:16
🔗
|
|
odemgi has joined #archiveteam-bs |
03:16
🔗
|
|
RomeSilva has quit IRC (Ping timeout: 246 seconds) |
03:18
🔗
|
|
odemgi_ has quit IRC (Ping timeout: 252 seconds) |
03:18
🔗
|
marked |
How about #Etch-A-Sketch for Sony Sketch ? |
03:18
🔗
|
Flashfire |
Nah I think that is TradeMarked |
03:19
🔗
|
Flashfire |
not that we really cared about that in the past |
03:25
🔗
|
|
odemg has quit IRC (Ping timeout: 615 seconds) |
03:28
🔗
|
JAA |
#sketchy ? |
03:29
🔗
|
JAA |
Oh, occupied. |
03:29
🔗
|
Flashfire |
occupado |
03:30
🔗
|
JAA |
Who dares to steal a potential AT IRC channel?! |
03:30
🔗
|
Flashfire |
THE CHEEK OF THEM |
03:31
🔗
|
|
odemg has joined #archiveteam-bs |
03:35
🔗
|
marked |
#SketchyGrab |
03:40
🔗
|
marked |
#EraseASketch |
04:41
🔗
|
|
balrog has quit IRC (Read error: Operation timed out) |
05:13
🔗
|
|
Mayonaise has quit IRC (Read error: Operation timed out) |
05:14
🔗
|
|
balrog has joined #archiveteam-bs |
05:15
🔗
|
|
Mayonaise has joined #archiveteam-bs |
05:18
🔗
|
|
d5f4a3622 has quit IRC (Quit: WeeChat 2.4) |
05:23
🔗
|
|
d5f4a3622 has joined #archiveteam-bs |
05:41
🔗
|
|
balrog has quit IRC (Read error: Operation timed out) |
05:53
🔗
|
|
Zerote_ has joined #archiveteam-bs |
06:10
🔗
|
|
Despatche has quit IRC (Quit: Read error: Connection reset by deer) |
06:11
🔗
|
|
RomeSilva has joined #archiveteam-bs |
06:26
🔗
|
|
tuluu has quit IRC (Read error: Connection refused) |
06:27
🔗
|
|
tuluu has joined #archiveteam-bs |
06:31
🔗
|
|
balrog has joined #archiveteam-bs |
06:36
🔗
|
|
ivan has quit IRC (Leaving) |
06:38
🔗
|
|
ivan has joined #archiveteam-bs |
06:56
🔗
|
|
RichardG_ has joined #archiveteam-bs |
06:56
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
07:16
🔗
|
|
Zerote_ has quit IRC (Ping timeout: 600 seconds) |
07:21
🔗
|
|
Zerote_ has joined #archiveteam-bs |
07:33
🔗
|
|
RichardG has joined #archiveteam-bs |
07:33
🔗
|
|
RichardG_ has quit IRC (Read error: Connection reset by peer) |
08:18
🔗
|
|
RomeSilva has quit IRC (Read error: Connection reset by peer) |
08:19
🔗
|
|
RomeSilva has joined #archiveteam-bs |
08:28
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
08:29
🔗
|
|
RichardG has joined #archiveteam-bs |
09:00
🔗
|
|
tuluu has quit IRC (Read error: Connection refused) |
09:01
🔗
|
|
tuluu has joined #archiveteam-bs |
09:34
🔗
|
|
VerifiedJ has joined #archiveteam-bs |
09:38
🔗
|
|
Verified_ has quit IRC (Ping timeout: 252 seconds) |
09:39
🔗
|
|
VerifiedJ has quit IRC (Ping timeout: 252 seconds) |
09:56
🔗
|
|
ColdIce has quit IRC (Remote host closed the connection) |
09:56
🔗
|
|
ColdIce has joined #archiveteam-bs |
10:10
🔗
|
|
Dallas has quit IRC (Quit: The Lounge - https://thelounge.chat) |
10:12
🔗
|
|
Dallas has joined #archiveteam-bs |
10:15
🔗
|
|
Dallas has quit IRC (Client Quit) |
10:16
🔗
|
|
Dallas has joined #archiveteam-bs |
10:22
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
10:29
🔗
|
JAA |
Oops, bug in my 99.se script causing an infinite loop. Welp. |
10:34
🔗
|
|
Dj-Wawa has quit IRC (Quit: Connection closed for inactivity) |
10:47
🔗
|
JAA |
Fixed and resumed. |
10:47
🔗
|
JAA |
All threads and posts pages have been retrieved, now doing the user profiles. |
10:48
🔗
|
|
arbin has quit IRC (Quit: .) |
10:52
🔗
|
|
arbin has joined #archiveteam-bs |
10:56
🔗
|
schbirid |
any ideas why http://web.archive.org/web/*/https://geoportal-hamburg.de/gdi3d/datasource-data/Schraegluftbilder2018/50_07032_lvl02-oblique-left/5/10/10.jpg might not want to archive? |
11:06
🔗
|
|
VerifiedJ has joined #archiveteam-bs |
11:09
🔗
|
JAA |
schbirid: Not really, but I've had that issue before on kkl-luzern.ch. Perhaps a ban of IA's IP range? |
11:10
🔗
|
JAA |
By the way, it looks like 99.se is running backups at 03:00 UTC. That's the only time I got a few timeout errors during my grab. |
11:11
🔗
|
schbirid |
hm weird |
11:11
🔗
|
schbirid |
i had no issues getting images added earlier http://web.archive.org/web/*/https://geoportal-hamburg.de/gdi3d/datasource-data/Schraegluftbilder2018//* |
11:11
🔗
|
JAA |
Huh |
11:12
🔗
|
schbirid |
they had an expired certificate last weekend |
11:12
🔗
|
schbirid |
maybe that got cached somewhere? |
11:17
🔗
|
JAA |
Yeah, also sounds plausible. |
11:17
🔗
|
JAA |
Probably a case for info@archive.org, but I haven't had much luck getting through there lately. |
11:48
🔗
|
|
icedice has joined #archiveteam-bs |
11:50
🔗
|
|
Zerote_ has quit IRC (Ping timeout: 600 seconds) |
11:53
🔗
|
|
enowaldo has joined #archiveteam-bs |
12:02
🔗
|
|
enowaldo has quit IRC (Ping timeout: 252 seconds) |
12:21
🔗
|
JAA |
99.se is done. I covered a couple user profiles many times due to missing item deduplication in qwarc (coming soon). 35 GiB of WARCs, mostly from the posts pages I think. |
12:22
🔗
|
|
bitBaron has joined #archiveteam-bs |
12:25
🔗
|
|
bitBaron has quit IRC (Read error: Operation timed out) |
12:28
🔗
|
Igloo |
Awesome JAA |
12:28
🔗
|
marked |
dedup could be done post crawl if later needed |
12:33
🔗
|
|
enowaldo has joined #archiveteam-bs |
12:36
🔗
|
|
Madbrad has joined #archiveteam-bs |
12:40
🔗
|
|
enowaldo has quit IRC (Read error: Operation timed out) |
12:42
🔗
|
|
odemgi_ has joined #archiveteam-bs |
12:45
🔗
|
|
odemgi has quit IRC (Ping timeout: 252 seconds) |
12:51
🔗
|
|
Damme has quit IRC (Read error: Connection reset by peer) |
12:56
🔗
|
|
bitBaron has joined #archiveteam-bs |
12:57
🔗
|
|
gilbahat has joined #archiveteam-bs |
13:00
🔗
|
marked |
Who/what is OTW? Do you have an idea of the size of bookcity.co.il ? |
13:00
🔗
|
gilbahat |
OTW is the organization for transformative works, mainly known for their flagship project 'Archive of our own' which is a fanfiction specific archive |
13:01
🔗
|
gilbahat |
one of their subprojects is called 'open doors' which specializes in rescuing and re-cataloguing fanfiction archives |
13:02
🔗
|
JAA |
~200k "books" (stories?) according to the homepage. |
13:03
🔗
|
|
a_spook_ has joined #archiveteam-bs |
13:03
🔗
|
gilbahat |
yes |
13:03
🔗
|
gilbahat |
by fanfiction sites, this is considered a decent amount |
13:03
🔗
|
JAA |
Each story has one page it seems, no pagination. |
13:03
🔗
|
gilbahat |
there are multiple episodes though |
13:04
🔗
|
gilbahat |
I do wonder if the 200k count really is 'books' or 'episodes' |
13:04
🔗
|
marked |
What's the semantic feature you mentioned? |
13:04
🔗
|
gilbahat |
they have a tag-based system |
13:05
🔗
|
JAA |
Do you have an example of such a multi-episode book? |
13:05
🔗
|
gilbahat |
yes, sec. it also has fan-out (multiple options for an episode) |
13:05
🔗
|
a_spook_ |
schbirid: removing https seemed to work: http://web.archive.org/web/20190430130123/http://geoportal-hamburg.de/gdi3d/datasource-data/Schraegluftbilder2018/50_07032_lvl02-oblique-left/5/10/10.jpg |
13:05
🔗
|
gilbahat |
http://bookcity.co.il/book.asp?id=206044 has fan-out: episode 3 / episode 3.1 |
13:05
🔗
|
|
m007a83 has quit IRC (Quit: Fuck you Comcast) |
13:08
🔗
|
|
m007a83 has joined #archiveteam-bs |
13:08
🔗
|
JAA |
Oh, I see now that there's a menu on the right with the pagination. Requires JS, so ArchiveBot will *not* cover it unless those pages are linked elsewhere. |
13:08
🔗
|
marked |
So the right side nav, it's flipping to different cgi get URLs |
13:08
🔗
|
JAA |
(Menu only appears with JS enabled as well.) |
13:09
🔗
|
|
Despatche has joined #archiveteam-bs |
13:10
🔗
|
SketchCow |
After the power outage, FOS came back without having the script that uploads archivebot uploads. |
13:10
🔗
|
SketchCow |
A lot of data comes in via archivebot. |
13:10
🔗
|
SketchCow |
Just wanted to pass along. Seems like terabytes a day |
13:11
🔗
|
gilbahat |
I think the mobile pages have a non-js nav |
13:11
🔗
|
gilbahat |
it has a different url scheme for mobile |
13:12
🔗
|
Igloo |
SketchCow: Ok, Noted. Do you have the script or is it just not running? |
13:13
🔗
|
JAA |
gilbahat: Where can I find the mobile site? Difficult to navigate not knowing Hebrew. :-) |
13:13
🔗
|
marked |
JAA: there's an additional nav in center of page that uses forms drop down |
13:15
🔗
|
|
omarroth has joined #archiveteam-bs |
13:15
🔗
|
JAA |
marked: Come again? |
13:15
🔗
|
marked |
(i'll be surprised is this works) where it says: רשימת הפרקים |
13:16
🔗
|
JAA |
Oh right, that's the one I was looking at before. |
13:16
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
13:16
🔗
|
JAA |
The one on the right side doesn't use JS. :-) |
13:17
🔗
|
JAA |
It only shows up with JS enabled it seems, but the links are plain HTML. |
13:17
🔗
|
|
RichardG has joined #archiveteam-bs |
13:22
🔗
|
gilbahat |
I can help with any hebrew issues |
13:24
🔗
|
marked |
woops, on Android, front page redirects to http://bookcity.co.il/mobile/ |
13:27
🔗
|
|
enowaldo has joined #archiveteam-bs |
13:29
🔗
|
|
bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) |
13:31
🔗
|
marked |
the mobile version is pageable without JS: http://bookcity.co.il/mobile/page.asp?id=267514 |
13:33
🔗
|
JAA |
Hmm, it does that redirect for ArchiveBot as well. |
13:33
🔗
|
JAA |
Guess it's time to restart with a browser UA and a separate job for the mobile page. |
13:37
🔗
|
SketchCow |
Igloo: It's running now. |
13:37
🔗
|
SketchCow |
I'm just noting how much the machine uploads |
13:42
🔗
|
|
Oddly has joined #archiveteam-bs |
13:44
🔗
|
marked |
yeah, at least the mobile side will be sure to get all the content. on the desktop side, the only other thing I can think of is enumerating all the story ID's and figure it will come together on playback when that browser has javascript turned on |
13:44
🔗
|
marked |
^grab by sequential ID |
13:44
🔗
|
JAA |
Yup |
13:45
🔗
|
JAA |
gilbahat: Any idea how urgent this is, i.e. when the site might disappear? |
13:46
🔗
|
JAA |
Or is this rather a "site has been unhealthy for a while, better safe than sorry"-type thing? |
13:47
🔗
|
|
enowaldo has quit IRC (Read error: Operation timed out) |
13:49
🔗
|
|
bitBaron has joined #archiveteam-bs |
13:50
🔗
|
|
gilbahat has quit IRC (Ping timeout: 260 seconds) |
13:58
🔗
|
|
gilbahat has joined #archiveteam-bs |
13:58
🔗
|
gilbahat |
back, sorry |
14:08
🔗
|
|
Zerote_ has joined #archiveteam-bs |
14:13
🔗
|
|
Smiley has joined #archiveteam-bs |
14:20
🔗
|
|
Oddly has quit IRC (Read error: Operation timed out) |
14:28
🔗
|
marked |
gilbahat, the question to you was: what do you know about the urgency/when the site might disappear? |
14:29
🔗
|
|
omarroth has quit IRC (Read error: Connection reset by peer) |
14:29
🔗
|
gilbahat |
I know for sure that the site is doomed (confirmed in private chat with owner, still not publicly known) |
14:29
🔗
|
gilbahat |
but no shuttering date |
14:32
🔗
|
|
omarroth has joined #archiveteam-bs |
14:38
🔗
|
|
gilbahat has quit IRC (gilbahat) |
14:54
🔗
|
|
bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) |
15:04
🔗
|
|
gilbahat has joined #archiveteam-bs |
15:05
🔗
|
arkiver |
did we make a decision on the channel for the sketch website? |
15:06
🔗
|
schbirid |
a_spook_: thx |
15:06
🔗
|
schbirid |
retch? |
15:12
🔗
|
|
a_spook_ has quit IRC (Quit: Connection closed for inactivity) |
15:20
🔗
|
|
gilbahat has quit IRC (gilbahat) |
15:23
🔗
|
|
gilbahat has joined #archiveteam-bs |
15:28
🔗
|
|
enowaldo has joined #archiveteam-bs |
15:35
🔗
|
JAA |
arkiver: Don't think so. |
15:40
🔗
|
|
gilbahat has quit IRC (Quit: gilbahat) |
15:41
🔗
|
|
gilbahat has joined #archiveteam-bs |
15:45
🔗
|
nyany |
what's the website called |
15:46
🔗
|
JAA |
nyany: Sketch: https://sketch.sonymobile.com/ |
15:46
🔗
|
nyany |
oh this |
15:46
🔗
|
|
enowaldo has quit IRC (Read error: Operation timed out) |
15:47
🔗
|
nyany |
if you were looking for a name i was going to suggest something with the word sketchy in it, seems right for sony |
15:47
🔗
|
JAA |
Yeah, I suggested #sketchy, but that channel's occupied. |
15:49
🔗
|
nyany |
... #sketchedout? |
15:49
🔗
|
|
omarroth has quit IRC (Read error: Connection reset by peer) |
15:50
🔗
|
nyany |
i'm presently the only occupant of said channel |
15:50
🔗
|
|
gilbahat has quit IRC (Quit: gilbahat) |
15:53
🔗
|
nyany |
oh wow. the front page of their site is very depressing |
15:53
🔗
|
|
omarroth has joined #archiveteam-bs |
15:53
🔗
|
nyany |
https://sketch.sonymobile.com/explore/featured/sketch/1dddc116-ed8b-45a3-a042-40c96fd2de46 |
16:00
🔗
|
|
killsushi has joined #archiveteam-bs |
16:03
🔗
|
|
Madbrad has quit IRC (Quit: Madbrad) |
16:07
🔗
|
nyany |
i'll hold the channel until you get back to me. |
16:19
🔗
|
|
gilbahat has joined #archiveteam-bs |
16:21
🔗
|
|
enowaldo has joined #archiveteam-bs |
16:31
🔗
|
|
enowaldo has quit IRC (Ping timeout: 252 seconds) |
16:36
🔗
|
|
omarroth has quit IRC (Read error: Connection reset by peer) |
16:36
🔗
|
|
gilbahat has quit IRC (Quit: gilbahat) |
16:40
🔗
|
|
omarroth has joined #archiveteam-bs |
16:41
🔗
|
|
tuluu has quit IRC (Read error: Connection refused) |
16:42
🔗
|
|
tuluu has joined #archiveteam-bs |
16:44
🔗
|
|
enowaldo has joined #archiveteam-bs |
16:48
🔗
|
|
enowaldo has quit IRC (Ping timeout: 265 seconds) |
16:58
🔗
|
|
omarroth has quit IRC (Read error: Connection reset by peer) |
17:07
🔗
|
|
tuluu has quit IRC (Read error: Connection refused) |
17:08
🔗
|
|
tuluu has joined #archiveteam-bs |
17:21
🔗
|
|
enowaldo has joined #archiveteam-bs |
17:35
🔗
|
|
Verified_ has joined #archiveteam-bs |
17:37
🔗
|
|
martinlig has joined #archiveteam-bs |
17:43
🔗
|
|
enowaldo has quit IRC (Read error: Operation timed out) |
17:52
🔗
|
marked |
sketched out is pretty good. Could we just recreate it with capitals? #SketchedOut |
18:03
🔗
|
nyany |
done, but in looking at the other channels on at they all seem to be in lowercase? |
18:19
🔗
|
|
enowaldo has joined #archiveteam-bs |
18:19
🔗
|
JAA |
IRC channel names are treated as case-insensitive by almost all implementations I believe (although RFC 1459 doesn't say that). Anyway. |
18:19
🔗
|
JAA |
#SketchedOut it is. |
18:19
🔗
|
astrid |
they're case insensitive but case preserving |
18:20
🔗
|
JAA |
Probably depends on the server implementation. The specs don't say anything about it. |
18:22
🔗
|
JAA |
And depending on the client, it may also not show up with the "correct" capitalisation in your client. I know I've joined a channel with a different capitalisation than it was created as before in irssi, though I don't remember which it was. |
18:22
🔗
|
nyany |
in irssi, [14:22] [@nyany_(+i)] [2:choopa/#sketchedout(+nt)] |
18:22
🔗
|
nyany |
shows with lowercase |
18:23
🔗
|
JAA |
Did you /join #sketchedout or #SketchedOut? |
18:23
🔗
|
astrid |
yes that may depend on how you typed it when you joined |
18:23
🔗
|
|
Mateon1 has quit IRC (Quit: Mateon1) |
18:23
🔗
|
nyany |
JAA: I joined sketchedout |
18:23
🔗
|
nyany |
but ah |
18:23
🔗
|
nyany |
yeah, that makes sense. forgive me. |
18:23
🔗
|
JAA |
Right, that's what I meant above. |
18:24
🔗
|
|
enowaldo has quit IRC (Ping timeout: 265 seconds) |
18:25
🔗
|
|
Tsuser has quit IRC (Ping timeout: 260 seconds) |
18:26
🔗
|
|
Mateon1 has joined #archiveteam-bs |
18:32
🔗
|
|
Tsuser_ has joined #archiveteam-bs |
18:40
🔗
|
Fusl |
it kaput NodePing: [AT] HTTPS tracker.archiveteam.org: HTTP is down |
18:41
🔗
|
Fusl |
> 100% iowait |
18:41
🔗
|
Fusl |
oof |
18:44
🔗
|
nyany |
ouch |
18:47
🔗
|
Fusl |
100% iowait almost always means that something is seriously fucked, as is probably the case here |
18:48
🔗
|
Fusl |
so uh, chfoo Kaz wanna take a look? |
18:48
🔗
|
Kaz |
I have nothing but a phone on me |
18:49
🔗
|
|
Fusl sets mode: +o Kaz |
18:51
🔗
|
Kaz |
I tried, I can't even log in |
18:53
🔗
|
Fusl |
fun |
18:53
🔗
|
Fusl |
yeah with 100% iowait you very certainly wont be able to do anything on it anyway |
18:53
🔗
|
Fusl |
oh look its slowly coming back |
18:55
🔗
|
nyany |
we use puppet in a production environment, one of the servers managed was a rather small kvm. there was an ensure set to make sure x application was running |
18:55
🔗
|
nyany |
problem is, the ensure was misconfigured, so every time it checked, it assumed the app wasnt running, and started it. results were similar to what just happened here. |
18:57
🔗
|
Fusl |
yeah, this looks like a hardware failure to me though |
18:57
🔗
|
Fusl |
disk ops dropped to 0, iowait went up to 100% |
18:59
🔗
|
|
astrid has quit IRC (Ping timeout: 1212 seconds) |
19:02
🔗
|
|
wyatt8740 has joined #archiveteam-bs |
19:03
🔗
|
Kaz |
Maybe we'll finally fix/replace it |
19:03
🔗
|
Fusl |
i dont think so |
19:03
🔗
|
Fusl |
it will just be a "i restarted it, its working again" |
19:14
🔗
|
|
m007a83 has quit IRC (Read error: Connection reset by peer) |
19:24
🔗
|
|
astrid has joined #archiveteam-bs |
19:41
🔗
|
|
enowaldo has joined #archiveteam-bs |
19:49
🔗
|
jodizzle |
VoynichCr: I added a whole bunch of URLs to https://www.archiveteam.org/index.php/ArchiveBot/Educational_institutions/list, but they haven't been processed into the corresponding table yet. Any idea why? |
19:49
🔗
|
jodizzle |
Is it possible the list is too long? |
19:50
🔗
|
|
wyatt8740 has quit IRC (Ping timeout: 246 seconds) |
19:52
🔗
|
|
ayanami_ has joined #archiveteam-bs |
19:52
🔗
|
JAA |
Update on what I wrote in here a couple days ago about WARC payload digests being incorrect in WARCs produced by wpull and warcio, there was quite a bit of discussion about this on the warcio issue I opened yesterday: https://github.com/webrecorder/warcio/issues/74 TL;DR: "Someone" needs to do a comparison between existing toolery and identify which tools produce payload digests according to the |
19:52
🔗
|
JAA |
standard and which keep the transfer encoding. Then a decision can be made whether software or standard need to be fixed. |
19:54
🔗
|
JAA |
chfoo, ivan, PurpleSym: ^ You might be interested in this. |
19:54
🔗
|
|
enowaldo has quit IRC (Read error: Operation timed out) |
19:55
🔗
|
|
wyatt8740 has joined #archiveteam-bs |
19:58
🔗
|
|
Ravenloft has quit IRC (Remote host closed the connection) |
20:39
🔗
|
Kaz |
hmm, are any projects active on the tracker atm? |
20:39
🔗
|
JAA |
Only URLTeam I think. |
20:41
🔗
|
JAA |
Well, when the tracker responds, that is. |
20:41
🔗
|
nyany |
i was going to ask something about the "active" projects. tumblr is there still. seems that when i was looking at the stats there's been next to no activity since early april |
20:41
🔗
|
nyany |
is that project still ongoing? |
20:42
🔗
|
JAA |
That's why I moved it to the Hiatus section on the wiki earlier. |
20:43
🔗
|
nyany |
no kidding. must've been just after i was looking |
20:43
🔗
|
nyany |
actually, that doesn't appear to be the case, JAA. |
20:44
🔗
|
JAA |
I did edit it, but the main page is cached. |
20:44
🔗
|
JAA |
Now it should be good. |
20:45
🔗
|
nyany |
jolly good |
20:48
🔗
|
|
enowaldo has joined #archiveteam-bs |
20:57
🔗
|
|
enowaldo has quit IRC (Ping timeout: 252 seconds) |
21:29
🔗
|
ivan |
I am looking for someone who wants to help with YouTube archiving by monitoring everything submitted in #youtubearchive and feeding in current events, so that I can focus on the software stuff for a bit |
21:34
🔗
|
|
tuluu has quit IRC (Read error: Connection refused) |
21:35
🔗
|
|
tuluu has joined #archiveteam-bs |
21:36
🔗
|
ivan |
the first task involves clicking every link and making sure people aren't gumming up the works with boring gaming or 10 Hour videos |
21:37
🔗
|
ivan |
the second task involves thinking about what is on YouTube when Something Is Happening and should be saved |
21:39
🔗
|
marked |
How many WARC tools are going to be enough to resolve an answer to the digest contraversy? |
21:44
🔗
|
JAA |
marked: All of them. Well, all common ones at least. Things in the WARC standard have been driven by implementations, so it should be as complete a picture as possible. As mentioned, my plan is to write a little HTTP server that every author/maintainer can run their tool against, and then the WARCs can be compared against each other. I won't do that immediately though since I also want to hear what |
21:44
🔗
|
JAA |
wumpus has found so far in his investigation to possibly also cover some other pitfalls than just this transfer encoding/payload digest thing. |
21:53
🔗
|
|
VerifiedJ has quit IRC (Quit: Leaving) |
22:12
🔗
|
|
enowaldo has joined #archiveteam-bs |
22:14
🔗
|
jodizzle |
ivan: For the first issue, have you considered just restricting access like is done with archivebot? |
22:14
🔗
|
jodizzle |
I guess that probably requires software work. |
22:19
🔗
|
ivan |
jodizzle: I kind of like the unrestricted access |
22:20
🔗
|
ivan |
and giving someone permission doesn't guarantee they'll behave anyway |
22:25
🔗
|
|
enowaldo has quit IRC (Read error: Operation timed out) |
22:26
🔗
|
|
DashEqual has joined #archiveteam-bs |
22:47
🔗
|
|
Dj-Wawa has joined #archiveteam-bs |
22:55
🔗
|
|
Zerote_ has quit IRC (Read error: Connection reset by peer) |
23:01
🔗
|
|
tuluu has quit IRC (Read error: Connection refused) |
23:03
🔗
|
|
tuluu has joined #archiveteam-bs |
23:04
🔗
|
|
astrid has quit IRC (Read error: Operation timed out) |
23:18
🔗
|
|
BlueMax has joined #archiveteam-bs |
23:18
🔗
|
|
enowaldo has joined #archiveteam-bs |
23:22
🔗
|
|
astrid has joined #archiveteam-bs |
23:22
🔗
|
|
Fusl sets mode: +o astrid |
23:28
🔗
|
|
enowaldo has quit IRC (Ping timeout: 268 seconds) |
23:57
🔗
|
|
enowaldo has joined #archiveteam-bs |
23:57
🔗
|
|
PhrackD has quit IRC (Read error: Operation timed out) |
23:59
🔗
|
|
PhrackD has joined #archiveteam-bs |