[00:07] anyone know if archive.org actuallly manages to archive twitter content or if its just the javascript loading it [00:08] No idea what the WBM SPN does (though I'd assume it works), but the ArchiveBot grabs seem to play back fine. [00:08] Once Twitter rolls out the redesign, that will almost certainly change. [00:11] whats an SPN [00:11] save page now [00:15] I found this other piece of software called gopherbot, I think I may wanna use this for archiving gopherspace but I don't know how to execute haskell code. gopher://gopherspace.de:70/1/menu/Downloads/Gopher_Querying/gopherbot/ [00:19] ahh [00:19] kiiwii: i dont even know how to open that link [00:19] kiiwii: is it just raw haskell code? [00:19] kiiwii: is there a setup.hs file? [00:20] if you need to compile it, you probably need to install cabal-install, which is the haskell build tool [00:20] There's a setup.lhs file [00:21] Config.hs COPYING.txt COPYRIGHT.txt DB.hs DBProcs.hs DirParser.hs gopherbot.cabal.txt gopherbot.hs Makefile.txt NetClient.hs RobotsTxt.hs Setup.lhs Types.hs Utils.hs [00:21] *** jodizzle has quit IRC (Quit: ZNC 1.7.1 - https://znc.in) [00:21] Those are the files [00:21] *** jodizzle has joined #archiveteam-bs [00:42] SketchCow: film score monthly is getting uploaded: https://archive.org/details/Film_Score_Monthly_Volume_01_Issue_02_1990_06_Vineyard_Haven_US [01:02] *** SoraUta has joined #archiveteam-bs [01:09] *** superkuh_ has joined #archiveteam-bs [02:46] Turns out the gopherbot code is over a decade old meaning it won't compile on a current machine :/ [02:46] So I'm just gonna have to continue using the python code [02:46] Can you use wget? [02:46] Oh I think only curl supports gopher [02:47] So are you trying to archive *every* gopher server? [02:47] Every gopherhole, yeah [02:47] why was gopherbot better? [02:47] I wasn't sure, I wanted to try it out [02:48] But with the way I'm doing it, I have to type in every site manually. [02:48] And there's ~300 gopherholes out there [02:49] Can you recursively download gopher://gopher.quux.org/1/Software/Gopher/servers? [02:49] I can try. [02:49] we can put the python code in to the warriors [02:50] How can I send you the file? Or should I send the link to the github repo? [02:50] though, really, could just script it also if you don't need more than a few IPs [02:54] Yeah, I can't download recursivley from gopher://gopher.quux.org/1/Software/Gopher/servers [03:24] *** cerca has quit IRC (Remote host closed the connection) [04:08] bellsouth.net should probably be scraped at some stage a lot of websites hosted on it [04:17] Starting to archive gopher.quux.org, this may be a big gopherhole [04:17] *** odemgi_ has joined #archiveteam-bs [04:22] *** odemgi has quit IRC (Read error: Operation timed out) [04:30] *** kiska has quit IRC (Remote host closed the connection) [04:30] *** Flashfire has quit IRC (Remote host closed the connection) [04:31] *** Flashfire has joined #archiveteam-bs [04:31] *** kiska has joined #archiveteam-bs [04:31] Kiska you reset it did you? [04:31] *** svchfoo1 sets mode: +o kiska [04:31] *** svchfoo3 sets mode: +o kiska [04:32] Huh? [04:43] *** superkuh_ has quit IRC (Quit: the neuronal action potential is an electrical manipulation of reversible abrupt phase changes in the lipid bilaye) [04:45] *** stapler11 has joined #archiveteam-bs [04:47] *** odemgi has joined #archiveteam-bs [04:51] *** odemgi_ has quit IRC (Read error: Connection reset by peer) [04:55] *** qw3rty has joined #archiveteam-bs [05:04] *** qw3rty2 has quit IRC (Ping timeout: 745 seconds) [05:05] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [05:13] I'm happy to report FOS is running "only" about 24 hours behind uploading Archivebot grabs. [05:58] SketchCow: so more interesting cover art of japanese manuals are coming [05:59] mostly cause of Hitachi Microwave ovens in 96xxx area [06:09] *** HP_Archiv has joined #archiveteam-bs [06:11] *** Jopik has quit IRC (Read error: Connection reset by peer) [06:11] *** Jopik has joined #archiveteam-bs [06:30] *** SoraUta has quit IRC (Remote host closed the connection) [06:31] *** SoraUta has joined #archiveteam-bs [07:54] *** killsushi has quit IRC (Quit: Leaving) [08:36] *** stapler11 has quit IRC (Read error: Connection reset by peer) [08:46] *** LowLevelM has quit IRC (Read error: Operation timed out) [08:47] *** LowLevelM has joined #archiveteam-bs [09:05] *** LowLevelM has quit IRC (Read error: Operation timed out) [09:34] *** LowLevelM has joined #archiveteam-bs [09:47] *** trc has joined #archiveteam-bs [10:15] *** kiska18 has quit IRC (Remote host closed the connection) [10:15] *** Ryz has quit IRC (Remote host closed the connection) [10:15] *** kiska18 has joined #archiveteam-bs [10:16] *** Ryz has joined #archiveteam-bs [10:16] *** svchfoo3 sets mode: +o kiska18 [10:16] *** svchfoo1 sets mode: +o kiska18 [10:48] *** BlueMax has quit IRC (Read error: Connection reset by peer) [10:51] *** Atom__ has joined #archiveteam-bs [10:57] *** Atom-- has quit IRC (Read error: Operation timed out) [11:01] *** tech234a has quit IRC (Quit: Connection closed for inactivity) [11:41] *** cerca has joined #archiveteam-bs [11:57] *** LeighR has joined #archiveteam-bs [12:09] *** ephemer0l has joined #archiveteam-bs [12:20] *** Wingy has quit IRC (Remote host closed the connection) [12:42] *** ephemer0l has quit IRC (Ping timeout: 745 seconds) [12:44] *** Ravenloft has quit IRC (Read error: Operation timed out) [12:53] *** qwebirc60 has quit IRC (Quit: Page closed) [12:53] *** InkArchiv has joined #archiveteam-bs [13:21] *** SoraUta has quit IRC (Read error: Operation timed out) [13:35] *** SoraUta has joined #archiveteam-bs [13:41] *** chazchaz has quit IRC (Read error: Operation timed out) [13:41] *** chazchaz has joined #archiveteam-bs [13:46] *** SoraUta has quit IRC (Read error: Operation timed out) [13:47] *** Sauce has joined #archiveteam-bs [13:47] *** Sauce is now known as amdk6 [14:10] *** amdk6 has quit IRC (C:\exit.exe) [14:14] *** superkuh_ has joined #archiveteam-bs [14:25] *** Wingy has joined #archiveteam-bs [14:27] *** Wingy has quit IRC (Client Quit) [14:27] *** Wingy has joined #archiveteam-bs [14:37] *** ephemer0l has joined #archiveteam-bs [15:15] re; https://archiveteam.org/index.php?title=Warrior#Can_I_use_whatever_internet_access_for_the_warrior.3F I assume that Pi-Hole (or any other DNS blocklist setup) should not be used? If that's the case it could be made explicit there. [15:19] Correct, such things should not be used with workers. [15:20] JAA: Do you know if anyone responded re: my wiki account? [15:21] Wingy: jrwr can't do it, so you'll have to wait for SketchCow. [15:21] Okay thanks :) [15:25] Thanks JAA [15:26] JAA: Should I change the warrior-dockerfile to always use 1.1.1.1 or 8.8.8.8? [15:26] (and submit as PR ofc) [15:30] Wingy: ooo, that's a good idea [15:30] Some people make their router redirect all DNS traffic to their chosen server, so it may be worth mentioning on the wiki. [15:30] Wingy: at the very least, it should be an ENV, and set to 1.1.1.1 or 8.8.8.8 as the default [15:39] yeah, shoot it through [15:42] *** InkArchiv has quit IRC (Quit: Page closed) [16:07] *** trc has quit IRC (Quit: Goodbye) [16:44] SketchCow: this may interest you : http://publ.lib.ru/ARCHIVES/ [16:44] tons of russian stuff [16:45] Looks like there was an AB job for all of http://www.publ.lib.ru/ two years ago. [17:09] *** i0npulse has quit IRC (Ping timeout: 248 seconds) [17:31] ok [17:54] *** i0npulse has joined #archiveteam-bs [18:20] *** tech234a has joined #archiveteam-bs [18:51] *** schbirid has joined #archiveteam-bs [19:01] *** Ravenloft has joined #archiveteam-bs [19:14] *** LeighR has quit IRC (Ping timeout: 260 seconds) [19:21] *** Stilettoo has joined #archiveteam-bs [19:22] *** Stiletto has quit IRC (Read error: Operation timed out) [19:39] *** kiiwii has quit IRC (Quit: Konversation terminated!) [19:39] *** kiiwii has joined #archiveteam-bs [19:55] *** Ravenloft has quit IRC (Read error: Operation timed out) [20:30] *** SoraUta has joined #archiveteam-bs [20:51] *** X-Scale` has joined #archiveteam-bs [20:59] *** X-Scale has quit IRC (Ping timeout: 610 seconds) [20:59] *** X-Scale` is now known as X-Scale [21:03] *** BlueMax has joined #archiveteam-bs [21:07] *** killsushi has joined #archiveteam-bs [21:14] *** schbirid has quit IRC (Quit: Leaving) [21:25] *** ephemer0l has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) [21:46] *** ephemer0l has joined #archiveteam-bs [21:50] *** mtntmnky has quit IRC (Remote host closed the connection) [21:50] *** mtntmnky has joined #archiveteam-bs [22:01] *** ShellyRol has quit IRC (Read error: Connection reset by peer) [22:03] *** ShellyRol has joined #archiveteam-bs [22:05] *** Stilettoo has quit IRC (Read error: Operation timed out) [22:08] *** Stiletto has joined #archiveteam-bs [22:08] *** kode54 has quit IRC (Quit: Ping timeout (120 seconds)) [22:18] *** kode54 has joined #archiveteam-bs [22:39] http://www.freedb.org/en/download__database.10.html "Here you can download the freedb database." [22:39] 1st mirror doesn't work for me, but 2nd does - updates as recently as half a month ago [22:42] SketchCow: So i found more scans of macformat magazine [22:42] there on macintoshgarden.org [22:49] *** DFJustin has quit IRC (Remote host closed the connection) [22:52] So yeah, we can throw freedb into AB, but I don't think it'll grab much. [22:53] *** DFJustin has joined #archiveteam-bs [22:53] We'd need to generate all the possible URLs from the DB file probably. [22:53] And then grab those with !ao <. [22:53] (Or some other method if there are too many.) [22:58] *** Stilettoo has joined #archiveteam-bs [22:58] JAA: yeah, we can do a little project for it [22:59] isn't there many many millions of URLs/ [22:59] ? [22:59] *** Stiletto has quit IRC (Read error: Operation timed out) [22:59] Assuming records are evenly distributed in the tars, there are something like 500 000 records [23:00] Hmm, that must be wrong, though [23:01] Wikipedia says 2 000 000 [23:01] ... in 2006 [23:01] Yes [23:04] There are 2**32 possible CDDB IDs, so we could in theory bruteforce that, but let's not. [23:08] Checking the latest .tar.bz2 now. [23:14] > lsar freedb-complete-20191203.tar.bz2 | grep -c '^data/[0-9a-f]\{8\}' [23:14] 117667 [23:14] Uhm... [23:17] *** Stiletto has joined #archiveteam-bs [23:19] *** Stilettoo has quit IRC (Ping timeout: 258 seconds) [23:23] Oh, sections, nvm. [23:24] 3923817 entries [23:24] Easy [23:25] I'll qwarc this. [23:26] As I'm reading it, each request involves the client's username, hostname, & software name & version [23:26] Yeah, just found that as well. [23:27] arkiver: ^ That might need a patch in the WBM if we want it to be possible to just plug the WBM into tools to continue using the freedb database from there. [23:35] URLs look like this: http://freedb.freedb.org/~cddb/cddb.cgi?cmd=cddb+query+21037703+3+150+21592+47662+889&hello=user+host+application+v0.0&proto=6 http://freedb.freedb.org/~cddb/cddb.cgi?cmd=cddb+read+data+21037703&hello=user+host+application+v0.0&proto=6 [23:35] CDDB documentation available at http://ftp.freedb.org/pub/freedb/latest/CDDBPROTO [23:36] Essentially, the "hello" parameter would have to be ignored in the WBM. [23:39] *** OrIdow6 has quit IRC (Remote host closed the connection) [23:45] you guys rock