[00:00] *** enowaldo has quit IRC (Ping timeout: 492 seconds) [00:09] *** apache2 has quit IRC (Remote host closed the connection) [00:09] *** apache2 has joined #archiveteam-bs [00:12] Hrm, testing something to try and fix that, but I can't seem to hammer them very hard. [00:13] Getting some "Bad Gateway" errors now as well, ew. [00:17] Well, I guess I'm going to stop here. Feel free to ignore the /posts/ URLs on the AB jobs if you think that's better. I'm not sure I'll be around again before the deadline. [00:18] *** bitBaron has quit IRC (Quit: Bye.) [00:22] so i throw out most of the pc computing magazines that i have scanned [00:23] this was just so i can have space for more magazines to scan later [00:24] alot of them was water damaged so i don't fell that bad about it [00:34] *** Jopik has quit IRC (Remote host closed the connection) [00:34] *** Jopik has joined #archiveteam-bs [00:34] *** Zerote has quit IRC (Ping timeout: 260 seconds) [00:55] *** BlueMax has joined #archiveteam-bs [01:01] *** jut has quit IRC (Read error: Connection reset by peer) [01:02] *** jut has joined #archiveteam-bs [01:08] *** enowaldo has joined #archiveteam-bs [01:20] *** enowaldo has quit IRC (Read error: Operation timed out) [01:50] *** Despatche has quit IRC (Quit: Read error: Connection reset by deer) [02:08] *** enowaldo has joined #archiveteam-bs [02:17] *** enowaldo has quit IRC (Ping timeout: 492 seconds) [02:27] *** dashcloud has joined #archiveteam-bs [02:56] *** Dimtree has quit IRC () [03:00] *** m007a83_ has joined #archiveteam-bs [03:00] *** drcd_ has joined #archiveteam-bs [03:02] *** deevious has quit IRC (Ping timeout: 252 seconds) [03:02] *** coderobe has quit IRC (Read error: Connection reset by peer) [03:02] *** Flashfire has quit IRC (Read error: Connection reset by peer) [03:02] *** ColdIce has quit IRC (Quit: Ping timeout (120 seconds)) [03:02] *** Terbium has quit IRC (Ping timeout: 252 seconds) [03:02] *** deevious has joined #archiveteam-bs [03:02] *** coderobe has joined #archiveteam-bs [03:02] *** jut has quit IRC (Ping timeout: 252 seconds) [03:02] *** odemgi_ has quit IRC (Ping timeout: 252 seconds) [03:02] *** m007a83 has quit IRC (Ping timeout: 252 seconds) [03:02] *** odemgi_ has joined #archiveteam-bs [03:03] *** Flashfire has joined #archiveteam-bs [03:03] *** ColdIce has joined #archiveteam-bs [03:03] *** kiska has quit IRC (Ping timeout: 252 seconds) [03:03] *** drcd has quit IRC (Ping timeout: 252 seconds) [03:03] *** kiska has joined #archiveteam-bs [03:04] *** svchfoo3 sets mode: +o kiska [03:04] *** svchfoo1 sets mode: +o kiska [03:04] *** jut has joined #archiveteam-bs [03:07] *** Terbium has joined #archiveteam-bs [03:08] *** Dimtree has joined #archiveteam-bs [03:15] *** odemgi has joined #archiveteam-bs [03:17] *** odemgi_ has quit IRC (Ping timeout: 252 seconds) [03:24] *** odemg has quit IRC (Ping timeout: 615 seconds) [03:30] *** odemg has joined #archiveteam-bs [03:32] *** qw3rty119 has joined #archiveteam-bs [03:36] *** qw3rty118 has quit IRC (Read error: Operation timed out) [03:44] *** BlueMax has quit IRC (Read error: Connection reset by peer) [03:53] *** drcd_ is now known as drcd [04:45] *** drcd has quit IRC (Read error: Connection reset by peer) [05:01] *** m007a83_ is now known as m007a83 [05:12] *** enowaldo has joined #archiveteam-bs [05:24] *** Frogging has quit IRC (Read error: Operation timed out) [05:24] *** Frogging has joined #archiveteam-bs [05:24] *** balrog has quit IRC (Read error: Operation timed out) [05:24] *** closure has quit IRC (Read error: Operation timed out) [05:24] *** ivan has quit IRC (Read error: Operation timed out) [05:24] *** JAA has quit IRC (Read error: Operation timed out) [05:24] *** closure has joined #archiveteam-bs [05:25] *** wabu has quit IRC (Read error: Operation timed out) [05:25] *** balrog has joined #archiveteam-bs [05:25] *** ivan has joined #archiveteam-bs [05:25] *** simon816 has quit IRC (Ping timeout: 246 seconds) [05:25] *** svchfoo1 has quit IRC (Read error: Operation timed out) [05:25] *** enowaldo has quit IRC (Read error: Operation timed out) [05:25] *** Exairnous has quit IRC (Read error: Operation timed out) [05:25] *** SynMonger has quit IRC (Read error: Operation timed out) [05:26] *** Exairnous has joined #archiveteam-bs [05:26] *** fredgido has quit IRC (Ping timeout: 600 seconds) [05:26] *** c4rc4s has quit IRC (Read error: Operation timed out) [05:26] *** swebb has quit IRC (Read error: Operation timed out) [05:26] *** SynMonger has joined #archiveteam-bs [05:27] *** Hintswen has quit IRC (Ping timeout: 246 seconds) [05:27] *** Hintswen has joined #archiveteam-bs [05:28] *** wp494 has quit IRC (Read error: Operation timed out) [05:28] *** swebb has joined #archiveteam-bs [05:29] *** tech234a has joined #archiveteam-bs [05:32] *** wp494 has joined #archiveteam-bs [05:35] *** c4rc4s has joined #archiveteam-bs [05:35] *** simon816 has joined #archiveteam-bs [05:35] *** svchfoo1 has joined #archiveteam-bs [05:35] *** Fusl sets mode: +o svchfoo1 [05:38] *** JAA has joined #archiveteam-bs [05:38] *** Fusl sets mode: +o JAA [05:39] *** bakJAA sets mode: +o JAA [05:39] *** wabu has joined #archiveteam-bs [05:53] *** JAA has quit IRC (Read error: Operation timed out) [05:54] *** wabu has quit IRC (Read error: Operation timed out) [05:55] *** svchfoo1 has quit IRC (Read error: Operation timed out) [05:55] *** simon816 has quit IRC (Read error: Operation timed out) [05:56] *** c4rc4s has quit IRC (Read error: Operation timed out) [05:58] *** killsushi has quit IRC (Quit: Leaving) [05:58] *** simon816 has joined #archiveteam-bs [05:58] *** c4rc4s has joined #archiveteam-bs [05:59] *** svchfoo1 has joined #archiveteam-bs [06:01] *** JAA has joined #archiveteam-bs [06:02] *** wabu has joined #archiveteam-bs [06:09] *** d5f4a3622 has quit IRC (Read error: Connection reset by peer) [06:12] *** d5f4a3622 has joined #archiveteam-bs [06:18] JAA: arkiver: This is what I have, https://github.com/kiska3/sola-grab [06:44] *** BlueMax has joined #archiveteam-bs [06:52] *** Exairnous has quit IRC (Ping timeout: 265 seconds) [07:06] *** Mata has quit IRC (Ping timeout: 600 seconds) [07:13] *** enowaldo has joined #archiveteam-bs [07:22] *** enowaldo has quit IRC (Ping timeout: 492 seconds) [07:22] SketchCow: i think you need to fix this cause there russian magazines not english ones: https://archive.org/details/magazines_russian?and[]=languageSorter%3A%22English%22 [07:38] Also someone may have put the rest of Byte Magazine here: https://vintageapple.org/byte/ [07:39] *** tech234a has quit IRC (Quit: Connection closed for inactivity) [08:18] Hrm... I'll run the tracker on my domain, and I can start a crawl, hopefully of sola.ai. I am still trying to write the damn thing [08:19] *** Reventlov has quit IRC (Quit: WeeChat 2.4) [08:34] *** jesso has quit IRC (Quit: jesso) [08:39] *** godane1 has joined #archiveteam-bs [08:40] *** godane has quit IRC (Ping timeout: 615 seconds) [08:43] *** jesso has joined #archiveteam-bs [08:45] *** JAA has quit IRC (Reconnecting) [08:45] *** JAA has joined #archiveteam-bs [08:45] *** Fusl sets mode: +o JAA [08:45] *** bakJAA sets mode: +o JAA [09:01] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [09:03] *** RichardG has joined #archiveteam-bs [09:07] *** BlueMax has quit IRC (Quit: Leaving) [09:19] *** icedice has joined #archiveteam-bs [09:20] *** PhrackD has quit IRC (Read error: Operation timed out) [09:20] *** PhrackD has joined #archiveteam-bs [09:21] *** icedice has quit IRC (Client Quit) [09:24] JAA: I've put all the code I've written into my repo [09:35] SketchCow: btw i think the macworld pdfs got redone/restore by vintageapple.org [09:35] there is like a index in the pdfs now [09:35] and there smaller too [09:39] *** netsound has joined #archiveteam-bs [09:41] *** icedice has joined #archiveteam-bs [09:44] *** Odd0002_ has joined #archiveteam-bs [09:45] *** Odd0002 has quit IRC (Ping timeout: 252 seconds) [09:45] *** Odd0002_ is now known as Odd0002 [09:48] *** deevious has quit IRC (Read error: Connection reset by peer) [10:30] kiska: Aye, but we only have 1.5 hours left... [10:37] At least my API job grabbed about 1.6 GB of data: https://archive.fart.website/archivebot/viewer/job/c46az [10:37] And the others are currently grabbing images and stuff. [10:38] L133: newurl = "https://api.solacore.net/users/" .. uuid .. "/posts/?limit=30&offset=30" [10:38] kiska: dont you want to iterate through the offset to get the rest there? [10:44] SketchCow: and those russian magazines have garbage OCR text because of lang=ENG [10:55] VADemon: yes I do [10:56] is it done elsewhere or do you need help? [10:59] I am coding that right now [11:05] I just pushed something, can you check if that looks ok? [11:32] I am running it locally and it looks like its grabbing expected contents [11:32] Can someone check this? [11:35] *** tomaspark has quit IRC (Read error: Operation timed out) [11:38] *** enowaldo has joined #archiveteam-bs [11:40] *** wyatt8740 has quit IRC (Read error: Operation timed out) [11:50] kiska: L142 -> to "local nextpage = string.match(html, "/users/[^/]+/posts/%?limit=(%d+)&offset=(%d+)")" [11:51] Changing [11:51] [%d] would be a pattern of digits, same as just %d. If you needed to make a better pattern you could use [%dabcdf] [11:51] Hold on, just remove the brackets on %d altogheter kiska [11:52] I am on a train so can you give me that line again? [11:53] https://github.com/kiska3/sola-grab/blob/master/sola.lua#L142 [11:53] remove [] around %d [11:53] [] is a pattern definition, -+?* dont work inside it. You define a pattern [abc] then apply -+?* to it: [abc]+ [11:55] *** enowaldo has quit IRC (Read error: Operation timed out) [11:59] *** kiska1 has quit IRC (Ping timeout (120 seconds)) [11:59] *** kiska1 has joined #archiveteam-bs [11:59] Looks like they've killed the servers already [11:59] *** svchfoo3 sets mode: +o kiska1 [12:00] *** Zerote has joined #archiveteam-bs [12:07] *** wyatt8740 has joined #archiveteam-bs [12:15] *** wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) [12:20] *** VADemon has quit IRC (Quit: left4dead) [12:21] *** Despatche has joined #archiveteam-bs [12:22] *** wp494 has joined #archiveteam-bs [12:22] F [12:27] I started grabbing about 2 hrs ago, but I found many mistakes in my code [12:28] Apparently it's not down can someone access sola.ai and see if content is still there [12:29] That being said, wget isn't getting data [12:29] Uhm yeah, was just about to say that. [12:29] Still working for me. [12:29] I see [12:30] Can you clone my repo and run the warrior pipeline on the machines I gave you? [12:31] I haven't specified an rsync target in the test tracker so rsync errors are expected. Hetzner instances don't ship with rsync installed so apt install that [12:31] I've pre-packaged wget lua so there is no need to build it [12:33] *** wyatt8740 has quit IRC (Read error: Operation timed out) [12:38] I fucking love tmux synchronize-panes. [12:38] xD [12:40] *** wyatt8740 has joined #archiveteam-bs [12:41] Did you do something on one of the machines? They behave differently. [12:41] .88 has a pip installed in /usr/local? [12:42] Huh? [12:42] I chucked a ao pipeline on it to clear the AB backlog [12:43] Oh [12:44] Up and running [12:47] Actually, taking it down again because I forgot something. [12:48] Ok, up again for real now. [12:49] Huh? [12:50] Had it running directly in SSH instead of in a tmux session. [12:53] *** odemgi has quit IRC (Read error: Connection reset by peer) [12:53] *** odemgi has joined #archiveteam-bs [12:58] Ah I see [13:05] SketchCow: i noticed the infoworld magazines i uploaded are dark now [13:06] JAA: Is it grabbing anything? [13:06] kiska: Nope, tracker rate limited. [13:07] what i find funny is that there from google books and the IA/American Libraries collection has there files up: https://archive.org/details/bub_gb_yjAEAAAAMBAJ [13:07] I am trying, but wget isn't grabbing anything [13:08] SketchCow: so my question is why can the google books rips like those be still up but my rips that put metadata and fixes into the rips be taking down [13:15] kiska: Same here, got some jobs now but it isn't grabbing anything. [13:16] Ah no, it is grabbing stuff, just not printing anything to the console. [13:16] It's very slow though. [13:17] ... what have I done... [13:19] Can you check the code and see if there is any issues with it? Otherwise it'll be down to parsing the html [13:20] I'll have a look if I see anything obvious. [13:22] kiska: There's a syntax error in the Lua script. [13:22] lua: sola.lua:195: '=' expected near 'end' [13:22] Missing return on line 194 [13:22] I typed this on a train.... [13:23] Also, what's the matter with api%/solacore%.net? Is that slash supposed to be a period? [13:23] Yeah... [13:24] an item is a profile? [13:24] *** icedice has quit IRC (Leaving) [13:24] Yes [13:25] latest scan : https://archive.org/details/good-food-magazine-1987-07 [13:31] latest scan : https://archive.org/details/enjoy-your-cockatiel-pet-library [13:31] latest scan : https://archive.org/details/fin-facts-aquarium-handbook-1992-wardley [13:33] It looks like 30 connections from me is 502'ing their service xD [13:34] Yeah, that started happening yesterday evening. I hammered them with 32 connections when I was scraping user profiles. [13:35] So limit = ~60 connections [13:37] I don't get how its producing this url... https://api.solacore.net/users/items/posts/?limit=30&offset=30 [13:38] Check the referrer in the WARC. [13:42] *** ayanami_ has joined #archiveteam-bs [13:42] https://www.flogao.com.br Check this out. Brazilian site shutting down in June [13:43] Ahhhhh [13:44] ..? sorry [13:45] Thanks for letting us know. I've put it on my list to investigate. kiska may be screaming about something unrelated. [13:47] ... Referer: https://sola.ai/erenjager [13:48] :-/ [13:48] Yeah, saw those on the AB job before as well (including &) [13:49] *** enowaldo has joined #archiveteam-bs [13:49] Away again for a bit. [13:50] I am just going to use the ignore-list on that url [14:17] *** enowaldo has quit IRC (Read error: Operation timed out) [14:24] JAA: Can you update the machines I gave you with the latest commit? I think I fixed whatever was causing it to not function. I was basically recursing the entirety of sola.ai with this "^https?://sola.ai/" and not clamping down [14:29] kiska: Yup, up and running again. [14:31] I clamped it down to "^https?://sola.ai/" .. item_value so hopefully it'll grab things still [14:34] How are the post URLs treated? [14:34] Or rather, which post URLs are retrieved? [14:34] /user/$slug or /posts/$postid ? [14:35] I seem some of the former, but I wonder if that's only those which are linked directly in the HTML of the profile page or also the pagination. [14:36] I am going to use this in the httploop_result https://pastebin.com/RstQFpdm and in the allowed function I'll include string.match(url, "^https?://sola.ai/posts") [14:37] But I haven't pushed out that change yet, since I don't know what will occur [14:39] *** wyatt8740 has quit IRC (Read error: Operation timed out) [14:42] JAA: Can you check this commit and see if it does allow the /posts/ url's https://github.com/kiska3/sola-grab/commit/80d3bc57d2062de1659574aea3bab91a17b67432 [14:43] This can never be true, can it? https://github.com/kiska3/sola-grab/blob/80d3bc57d2062de1659574aea3bab91a17b67432/sola.lua#L64-L68 [14:43] Yeah I am thinking about it [14:44] I'd just add another 'or string.match' for /posts. [14:44] So if I remove the inner if statement, it should become true, and grab the /posts/ url, but if it doesn't match the item_value it'll be rejected on the redirect [14:46] So if "string.match(url, "^https?://sola%.ai/posts") or string.match(url, "^https?://sola%.ai/" .. item_value)" should be the statement [14:48] Yeah that should now fix the issue of /posts/ not being grabbed [14:49] *** wabu has quit IRC (Read error: Operation timed out) [14:49] Yeah [14:49] Well lets resume the tracker with the change, I didn't increment the pipeline version, I should probably do that [14:50] Yes please [14:51] Now resuming with 20190410.02 [14:52] Broken again [14:52] lua: sola.lua:198: '=' expected near 'end' [14:52] *** wabu has joined #archiveteam-bs [14:53] ... I keep doing that [14:54] *** icedice has joined #archiveteam-bs [14:55] Test it with 'lua sola.lua'. [14:56] If you get something like 'lua: sola.lua:6: bad argument #1 to 'gsub' (string expected, got nil)', at least the syntax's right. ;-) [14:56] Should now work.... [14:56] Yup, doing something at least. [14:57] At least we want something, cause it'll grab something xD [14:57] And I do see /posts URLs in the output. [14:57] That's the spirit almost 3 hours after the deadline. [14:58] Well they did post their shutting down statement on April fools... [14:58] Yeah [14:58] Let me requeue the ones without the /posts/ url [14:59] But they didn't post any update afterwards. [15:15] *sigh* [15:15] What is it doing now... [15:21] *** enowaldo has joined #archiveteam-bs [15:25] *** Verified_ has quit IRC (Quit: Quit) [15:25] *** Verified_ has joined #archiveteam-bs [15:27] *** bitspill has quit IRC (Quit: Connection closed for inactivity) [16:10] *** tech234a has joined #archiveteam-bs [16:21] *** icedice2 has joined #archiveteam-bs [16:25] *** icedice has quit IRC (Ping timeout: 252 seconds) [16:42] *** enowaldo has quit IRC (Read error: Operation timed out) [16:46] *** PhrackD- has joined #archiveteam-bs [16:47] *** PhrackD has quit IRC (Read error: Operation timed out) [16:47] *** PhrackD- is now known as PhrackD [16:48] I wonder if that Sola shutdown is April Fools after all - let's hope not [16:48] Might have just been bad timing [16:52] *** icedice2 has quit IRC (Quit: Leaving) [16:52] its past the deadline so xD [16:53] Also I have no clue what my script is doing now... [16:53] *** icedice has joined #archiveteam-bs [17:07] *** Hani111 has joined #archiveteam-bs [17:07] *** Hani has quit IRC (Read error: Operation timed out) [17:07] *** Hani111 is now known as Hani [17:09] *** enowaldo has joined #archiveteam-bs [17:14] *** Hani111 has joined #archiveteam-bs [17:16] *** Hani has quit IRC (Ping timeout: 268 seconds) [17:20] *** Hani111 has quit IRC (Read error: Operation timed out) [17:21] *** enowaldo has quit IRC (Read error: Operation timed out) [17:24] *** Hani has joined #archiveteam-bs [17:29] *** Exairnous has joined #archiveteam-bs [17:40] VoynichCr: Hi. I wanted to make a wiki page using HadeanEon. How do I do that? [17:41] VoynichCr: I've already sent you a message on the other channel, but it would be easier to discuss it here. [17:42] I've made https://www.archiveteam.org/index.php?title=ArchiveBot/Educational_institutions [17:42] I will eventually make others too. [17:43] bot just updated the table t3 [17:43] sure, create all pages you need [17:44] VoynichCr: How do I make the bot create the page? [17:45] you can't, you have to create the /list, and the mainpage like you did, and wait 1 day [17:46] Is there a way I can make the bot add more items? [17:46] modify /list and add more links there [17:46] So there is no IRC bot to control it, I presume? [17:46] no [17:46] And when I archive a website, the bot will automatically update the list? [17:47] it updates the table, yeah [17:47] Oh okay! Thanks. [17:48] you are welcome [17:48] Does it deduplicate the list? [17:48] yes [17:49] Awesome! [17:50] So how is it added to the ArchiveBot template table on the bottom? [17:51] t3: you have to add it handly, click [e] link [18:18] *** Exairnous has quit IRC (Ping timeout: 252 seconds) [18:20] *** tech234a has quit IRC (Quit: Connection closed for inactivity) [18:33] *** Exairnous has joined #archiveteam-bs [18:48] *** Oddly has quit IRC (Read error: Operation timed out) [18:51] *** VADemon has joined #archiveteam-bs [18:54] *** m007a83 has quit IRC (Read error: Connection reset by peer) [18:58] *** PhrackD has quit IRC (Read error: Operation timed out) [19:00] *** PhrackD has joined #archiveteam-bs [19:00] *** enowaldo has joined #archiveteam-bs [19:25] *** Exairnous has quit IRC (Read error: Operation timed out) [19:28] *** killsushi has joined #archiveteam-bs [19:41] VoynichCr: Thanks. So I've added the link to the Educational institutions wiki page. Thanks for the help! I'm just going to have to wait for the HadeanEon bot to update the page. A whole day seems like a long time. [19:59] *** killsushi has quit IRC (Quit: Leaving) [20:07] *** Hani has quit IRC (Ping timeout: 255 seconds) [20:07] kiska JAA : https://sola.ai/ is 503-ing [20:09] *** PhrackD has quit IRC (Read error: Operation timed out) [20:09] ERROR 503: Service Unavailable: Back-end server is at capacity. [20:09] marked: I'll slow it down. [20:09] *** PhrackD has joined #archiveteam-bs [20:11] *** Hani has joined #archiveteam-bs [20:13] slow down what? the tracker's not moving [20:14] oh wait, it's doing check-outs but not check-ins? [20:15] *** Hani111 has joined #archiveteam-bs [20:20] *** Hani has quit IRC (Read error: Operation timed out) [20:20] *** Hani111 is now known as Hani [21:21] *** wp494 has quit IRC (Ping timeout: 252 seconds) [21:22] *** wp494 has joined #archiveteam-bs [21:34] marked: I made it go faster. [21:35] marked: It's supposed to shut down today. Maybe that's why it's sending out 503s. [21:46] JAA: For 753f9wxjswuxuz1n687khv2cf, it seems like archive.fo URLs are not loading on the pipeline. [21:46] But I don't think it should be archiving an archive. [21:47] I will add `!ig 753f9wxjswuxuz1n687khv2cf ^https?://archive\.fo/`. [21:52] JAA: I've increase the concurrency of the sola.ai jobs. [22:19] *** enowaldo has quit IRC (Ping timeout: 265 seconds) [22:28] So I guess Sola shut down by now? [22:32] Yup [22:32] 503 [22:32] "sola.ai is currently unable to handle this request."\ [22:33] *** tech234a has joined #archiveteam-bs [22:34] *** mgrytbak has joined #archiveteam-bs [22:35] *** BlueMax has joined #archiveteam-bs [23:08] *** enowaldo has joined #archiveteam-bs [23:21] *** ndiddy has joined #archiveteam-bs [23:22] *** enowaldo has quit IRC (Ping timeout: 252 seconds) [23:32] screen -d