| Time |
Nickname |
Message |
|
00:07
🔗
|
bluefoo_ |
anyone know if archive.org actuallly manages to archive twitter content or if its just the javascript loading it |
|
00:08
🔗
|
JAA |
No idea what the WBM SPN does (though I'd assume it works), but the ArchiveBot grabs seem to play back fine. |
|
00:08
🔗
|
JAA |
Once Twitter rolls out the redesign, that will almost certainly change. |
|
00:11
🔗
|
bluefoo_ |
whats an SPN |
|
00:11
🔗
|
Kaz |
save page now |
|
00:15
🔗
|
kiiwii |
I found this other piece of software called gopherbot, I think I may wanna use this for archiving gopherspace but I don't know how to execute haskell code. gopher://gopherspace.de:70/1/menu/Downloads/Gopher_Querying/gopherbot/ |
|
00:19
🔗
|
bluefoo_ |
ahh |
|
00:19
🔗
|
bluefoo_ |
kiiwii: i dont even know how to open that link |
|
00:19
🔗
|
bluefoo_ |
kiiwii: is it just raw haskell code? |
|
00:19
🔗
|
bluefoo_ |
kiiwii: is there a setup.hs file? |
|
00:20
🔗
|
bluefoo_ |
if you need to compile it, you probably need to install cabal-install, which is the haskell build tool |
|
00:20
🔗
|
kiiwii |
There's a setup.lhs file |
|
00:21
🔗
|
kiiwii |
Config.hs COPYING.txt COPYRIGHT.txt DB.hs DBProcs.hs DirParser.hs gopherbot.cabal.txt gopherbot.hs Makefile.txt NetClient.hs RobotsTxt.hs Setup.lhs Types.hs Utils.hs |
|
00:21
🔗
|
|
jodizzle has quit IRC (Quit: ZNC 1.7.1 - https://znc.in) |
|
00:21
🔗
|
kiiwii |
Those are the files |
|
00:21
🔗
|
|
jodizzle has joined #archiveteam-bs |
|
00:42
🔗
|
godane |
SketchCow: film score monthly is getting uploaded: https://archive.org/details/Film_Score_Monthly_Volume_01_Issue_02_1990_06_Vineyard_Haven_US |
|
01:02
🔗
|
|
SoraUta has joined #archiveteam-bs |
|
01:09
🔗
|
|
superkuh_ has joined #archiveteam-bs |
|
02:46
🔗
|
kiiwii |
Turns out the gopherbot code is over a decade old meaning it won't compile on a current machine :/ |
|
02:46
🔗
|
kiiwii |
So I'm just gonna have to continue using the python code |
|
02:46
🔗
|
Wingy |
Can you use wget? |
|
02:46
🔗
|
Wingy |
Oh I think only curl supports gopher |
|
02:47
🔗
|
Wingy |
So are you trying to archive *every* gopher server? |
|
02:47
🔗
|
kiiwii |
Every gopherhole, yeah |
|
02:47
🔗
|
markedL |
why was gopherbot better? |
|
02:47
🔗
|
kiiwii |
I wasn't sure, I wanted to try it out |
|
02:48
🔗
|
kiiwii |
But with the way I'm doing it, I have to type in every site manually. |
|
02:48
🔗
|
kiiwii |
And there's ~300 gopherholes out there |
|
02:49
🔗
|
Wingy |
Can you recursively download gopher://gopher.quux.org/1/Software/Gopher/servers? |
|
02:49
🔗
|
kiiwii |
I can try. |
|
02:49
🔗
|
markedL |
we can put the python code in to the warriors |
|
02:50
🔗
|
kiiwii |
How can I send you the file? Or should I send the link to the github repo? |
|
02:50
🔗
|
markedL |
though, really, could just script it also if you don't need more than a few IPs |
|
02:54
🔗
|
kiiwii |
Yeah, I can't download recursivley from gopher://gopher.quux.org/1/Software/Gopher/servers |
|
03:24
🔗
|
|
cerca has quit IRC (Remote host closed the connection) |
|
04:08
🔗
|
Flashfire |
bellsouth.net should probably be scraped at some stage a lot of websites hosted on it |
|
04:17
🔗
|
kiiwii |
Starting to archive gopher.quux.org, this may be a big gopherhole |
|
04:17
🔗
|
|
odemgi_ has joined #archiveteam-bs |
|
04:22
🔗
|
|
odemgi has quit IRC (Read error: Operation timed out) |
|
04:30
🔗
|
|
kiska has quit IRC (Remote host closed the connection) |
|
04:30
🔗
|
|
Flashfire has quit IRC (Remote host closed the connection) |
|
04:31
🔗
|
|
Flashfire has joined #archiveteam-bs |
|
04:31
🔗
|
|
kiska has joined #archiveteam-bs |
|
04:31
🔗
|
Flashfire |
Kiska you reset it did you? |
|
04:31
🔗
|
|
svchfoo1 sets mode: +o kiska |
|
04:31
🔗
|
|
svchfoo3 sets mode: +o kiska |
|
04:32
🔗
|
kiska |
Huh? |
|
04:43
🔗
|
|
superkuh_ has quit IRC (Quit: the neuronal action potential is an electrical manipulation of reversible abrupt phase changes in the lipid bilaye) |
|
04:45
🔗
|
|
stapler11 has joined #archiveteam-bs |
|
04:47
🔗
|
|
odemgi has joined #archiveteam-bs |
|
04:51
🔗
|
|
odemgi_ has quit IRC (Read error: Connection reset by peer) |
|
04:55
🔗
|
|
qw3rty has joined #archiveteam-bs |
|
05:04
🔗
|
|
qw3rty2 has quit IRC (Ping timeout: 745 seconds) |
|
05:05
🔗
|
|
DogsRNice has quit IRC (Read error: Connection reset by peer) |
|
05:13
🔗
|
SketchCow |
I'm happy to report FOS is running "only" about 24 hours behind uploading Archivebot grabs. |
|
05:58
🔗
|
godane |
SketchCow: so more interesting cover art of japanese manuals are coming |
|
05:59
🔗
|
godane |
mostly cause of Hitachi Microwave ovens in 96xxx area |
|
06:09
🔗
|
|
HP_Archiv has joined #archiveteam-bs |
|
06:11
🔗
|
|
Jopik has quit IRC (Read error: Connection reset by peer) |
|
06:11
🔗
|
|
Jopik has joined #archiveteam-bs |
|
06:30
🔗
|
|
SoraUta has quit IRC (Remote host closed the connection) |
|
06:31
🔗
|
|
SoraUta has joined #archiveteam-bs |
|
07:54
🔗
|
|
killsushi has quit IRC (Quit: Leaving) |
|
08:36
🔗
|
|
stapler11 has quit IRC (Read error: Connection reset by peer) |
|
08:46
🔗
|
|
LowLevelM has quit IRC (Read error: Operation timed out) |
|
08:47
🔗
|
|
LowLevelM has joined #archiveteam-bs |
|
09:05
🔗
|
|
LowLevelM has quit IRC (Read error: Operation timed out) |
|
09:34
🔗
|
|
LowLevelM has joined #archiveteam-bs |
|
09:47
🔗
|
|
trc has joined #archiveteam-bs |
|
10:15
🔗
|
|
kiska18 has quit IRC (Remote host closed the connection) |
|
10:15
🔗
|
|
Ryz has quit IRC (Remote host closed the connection) |
|
10:15
🔗
|
|
kiska18 has joined #archiveteam-bs |
|
10:16
🔗
|
|
Ryz has joined #archiveteam-bs |
|
10:16
🔗
|
|
svchfoo3 sets mode: +o kiska18 |
|
10:16
🔗
|
|
svchfoo1 sets mode: +o kiska18 |
|
10:48
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
|
10:51
🔗
|
|
Atom__ has joined #archiveteam-bs |
|
10:57
🔗
|
|
Atom-- has quit IRC (Read error: Operation timed out) |
|
11:01
🔗
|
|
tech234a has quit IRC (Quit: Connection closed for inactivity) |
|
11:41
🔗
|
|
cerca has joined #archiveteam-bs |
|
11:57
🔗
|
|
LeighR has joined #archiveteam-bs |
|
12:09
🔗
|
|
ephemer0l has joined #archiveteam-bs |
|
12:20
🔗
|
|
Wingy has quit IRC (Remote host closed the connection) |
|
12:42
🔗
|
|
ephemer0l has quit IRC (Ping timeout: 745 seconds) |
|
12:44
🔗
|
|
Ravenloft has quit IRC (Read error: Operation timed out) |
|
12:53
🔗
|
|
qwebirc60 has quit IRC (Quit: Page closed) |
|
12:53
🔗
|
|
InkArchiv has joined #archiveteam-bs |
|
13:21
🔗
|
|
SoraUta has quit IRC (Read error: Operation timed out) |
|
13:35
🔗
|
|
SoraUta has joined #archiveteam-bs |
|
13:41
🔗
|
|
chazchaz has quit IRC (Read error: Operation timed out) |
|
13:41
🔗
|
|
chazchaz has joined #archiveteam-bs |
|
13:46
🔗
|
|
SoraUta has quit IRC (Read error: Operation timed out) |
|
13:47
🔗
|
|
Sauce has joined #archiveteam-bs |
|
13:47
🔗
|
|
Sauce is now known as amdk6 |
|
14:10
🔗
|
|
amdk6 has quit IRC (C:\exit.exe) |
|
14:14
🔗
|
|
superkuh_ has joined #archiveteam-bs |
|
14:25
🔗
|
|
Wingy has joined #archiveteam-bs |
|
14:27
🔗
|
|
Wingy has quit IRC (Client Quit) |
|
14:27
🔗
|
|
Wingy has joined #archiveteam-bs |
|
14:37
🔗
|
|
ephemer0l has joined #archiveteam-bs |
|
15:15
🔗
|
SootBectr |
re; https://archiveteam.org/index.php?title=Warrior#Can_I_use_whatever_internet_access_for_the_warrior.3F I assume that Pi-Hole (or any other DNS blocklist setup) should not be used? If that's the case it could be made explicit there. |
|
15:19
🔗
|
JAA |
Correct, such things should not be used with workers. |
|
15:20
🔗
|
Wingy |
JAA: Do you know if anyone responded re: my wiki account? |
|
15:21
🔗
|
JAA |
Wingy: jrwr can't do it, so you'll have to wait for SketchCow. |
|
15:21
🔗
|
Wingy |
Okay thanks :) |
|
15:25
🔗
|
SootBectr |
Thanks JAA |
|
15:26
🔗
|
Wingy |
JAA: Should I change the warrior-dockerfile to always use 1.1.1.1 or 8.8.8.8? |
|
15:26
🔗
|
Wingy |
(and submit as PR ofc) |
|
15:30
🔗
|
LeighR |
Wingy: ooo, that's a good idea |
|
15:30
🔗
|
SootBectr |
Some people make their router redirect all DNS traffic to their chosen server, so it may be worth mentioning on the wiki. |
|
15:30
🔗
|
LeighR |
Wingy: at the very least, it should be an ENV, and set to 1.1.1.1 or 8.8.8.8 as the default |
|
15:39
🔗
|
Kaz |
yeah, shoot it through |
|
15:42
🔗
|
|
InkArchiv has quit IRC (Quit: Page closed) |
|
16:07
🔗
|
|
trc has quit IRC (Quit: Goodbye) |
|
16:44
🔗
|
godane |
SketchCow: this may interest you : http://publ.lib.ru/ARCHIVES/ |
|
16:44
🔗
|
godane |
tons of russian stuff |
|
16:45
🔗
|
JAA |
Looks like there was an AB job for all of http://www.publ.lib.ru/ two years ago. |
|
17:09
🔗
|
|
i0npulse has quit IRC (Ping timeout: 248 seconds) |
|
17:31
🔗
|
godane |
ok |
|
17:54
🔗
|
|
i0npulse has joined #archiveteam-bs |
|
18:20
🔗
|
|
tech234a has joined #archiveteam-bs |
|
18:51
🔗
|
|
schbirid has joined #archiveteam-bs |
|
19:01
🔗
|
|
Ravenloft has joined #archiveteam-bs |
|
19:14
🔗
|
|
LeighR has quit IRC (Ping timeout: 260 seconds) |
|
19:21
🔗
|
|
Stilettoo has joined #archiveteam-bs |
|
19:22
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
|
19:39
🔗
|
|
kiiwii has quit IRC (Quit: Konversation terminated!) |
|
19:39
🔗
|
|
kiiwii has joined #archiveteam-bs |
|
19:55
🔗
|
|
Ravenloft has quit IRC (Read error: Operation timed out) |
|
20:30
🔗
|
|
SoraUta has joined #archiveteam-bs |
|
20:51
🔗
|
|
X-Scale` has joined #archiveteam-bs |
|
20:59
🔗
|
|
X-Scale has quit IRC (Ping timeout: 610 seconds) |
|
20:59
🔗
|
|
X-Scale` is now known as X-Scale |
|
21:03
🔗
|
|
BlueMax has joined #archiveteam-bs |
|
21:07
🔗
|
|
killsushi has joined #archiveteam-bs |
|
21:14
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
|
21:25
🔗
|
|
ephemer0l has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) |
|
21:46
🔗
|
|
ephemer0l has joined #archiveteam-bs |
|
21:50
🔗
|
|
mtntmnky has quit IRC (Remote host closed the connection) |
|
21:50
🔗
|
|
mtntmnky has joined #archiveteam-bs |
|
22:01
🔗
|
|
ShellyRol has quit IRC (Read error: Connection reset by peer) |
|
22:03
🔗
|
|
ShellyRol has joined #archiveteam-bs |
|
22:05
🔗
|
|
Stilettoo has quit IRC (Read error: Operation timed out) |
|
22:08
🔗
|
|
Stiletto has joined #archiveteam-bs |
|
22:08
🔗
|
|
kode54 has quit IRC (Quit: Ping timeout (120 seconds)) |
|
22:18
🔗
|
|
kode54 has joined #archiveteam-bs |
|
22:39
🔗
|
OrIdow6 |
http://www.freedb.org/en/download__database.10.html "Here you can download the freedb database." |
|
22:39
🔗
|
OrIdow6 |
1st mirror doesn't work for me, but 2nd does - updates as recently as half a month ago |
|
22:42
🔗
|
godane |
SketchCow: So i found more scans of macformat magazine |
|
22:42
🔗
|
godane |
there on macintoshgarden.org |
|
22:49
🔗
|
|
DFJustin has quit IRC (Remote host closed the connection) |
|
22:52
🔗
|
JAA |
So yeah, we can throw freedb into AB, but I don't think it'll grab much. |
|
22:53
🔗
|
|
DFJustin has joined #archiveteam-bs |
|
22:53
🔗
|
JAA |
We'd need to generate all the possible URLs from the DB file probably. |
|
22:53
🔗
|
JAA |
And then grab those with !ao <. |
|
22:53
🔗
|
JAA |
(Or some other method if there are too many.) |
|
22:58
🔗
|
|
Stilettoo has joined #archiveteam-bs |
|
22:58
🔗
|
arkiver |
JAA: yeah, we can do a little project for it |
|
22:59
🔗
|
arkiver |
isn't there many many millions of URLs/ |
|
22:59
🔗
|
arkiver |
? |
|
22:59
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
|
22:59
🔗
|
OrIdow6 |
Assuming records are evenly distributed in the tars, there are something like 500 000 records |
|
23:00
🔗
|
OrIdow6 |
Hmm, that must be wrong, though |
|
23:01
🔗
|
OrIdow6 |
Wikipedia says 2 000 000 |
|
23:01
🔗
|
JAA |
... in 2006 |
|
23:01
🔗
|
OrIdow6 |
Yes |
|
23:04
🔗
|
JAA |
There are 2**32 possible CDDB IDs, so we could in theory bruteforce that, but let's not. |
|
23:08
🔗
|
JAA |
Checking the latest .tar.bz2 now. |
|
23:14
🔗
|
JAA |
> lsar freedb-complete-20191203.tar.bz2 | grep -c '^data/[0-9a-f]\{8\}' |
|
23:14
🔗
|
JAA |
117667 |
|
23:14
🔗
|
JAA |
Uhm... |
|
23:17
🔗
|
|
Stiletto has joined #archiveteam-bs |
|
23:19
🔗
|
|
Stilettoo has quit IRC (Ping timeout: 258 seconds) |
|
23:23
🔗
|
JAA |
Oh, sections, nvm. |
|
23:24
🔗
|
JAA |
3923817 entries |
|
23:24
🔗
|
JAA |
Easy |
|
23:25
🔗
|
JAA |
I'll qwarc this. |
|
23:26
🔗
|
OrIdow6 |
As I'm reading it, each request involves the client's username, hostname, & software name & version |
|
23:26
🔗
|
JAA |
Yeah, just found that as well. |
|
23:27
🔗
|
JAA |
arkiver: ^ That might need a patch in the WBM if we want it to be possible to just plug the WBM into tools to continue using the freedb database from there. |
|
23:35
🔗
|
JAA |
URLs look like this: http://freedb.freedb.org/~cddb/cddb.cgi?cmd=cddb+query+21037703+3+150+21592+47662+889&hello=user+host+application+v0.0&proto=6 http://freedb.freedb.org/~cddb/cddb.cgi?cmd=cddb+read+data+21037703&hello=user+host+application+v0.0&proto=6 |
|
23:35
🔗
|
JAA |
CDDB documentation available at http://ftp.freedb.org/pub/freedb/latest/CDDBPROTO |
|
23:36
🔗
|
JAA |
Essentially, the "hello" parameter would have to be ignored in the WBM. |
|
23:39
🔗
|
|
OrIdow6 has quit IRC (Remote host closed the connection) |
|
23:45
🔗
|
anarcat |
you guys rock |