[00:21] *** BlueMax has joined #archiveteam-bs [00:46] *** ndiddy has quit IRC () [02:00] *** icedice has quit IRC (Quit: Leaving) [02:16] *** FlashBack is now known as AlbardinG [02:17] *** coldice has joined #archiveteam-bs [03:01] *** pikhq has quit IRC (Read error: Operation timed out) [03:11] *** pikhq has joined #archiveteam-bs [03:35] *** kiskabak2 has quit IRC (Quit: The Lounge - https://thelounge.github.io) [03:40] *** kiskabak has joined #archiveteam-bs [03:45] test [03:45] boop [03:46] :D [04:01] *** odemg has quit IRC (Ping timeout: 260 seconds) [04:13] *** odemg has joined #archiveteam-bs [05:55] I'm going to send an email to the botbot.me guys - see if they are willing to give us a copy of the IRC logs before they shut down. [06:39] dxrt maybe get them to give us a warc compatible copy? [06:41] you want them to scrape themselves and produce the HTML fragments they serve? [06:42] Well it would free up a pipeline slot on #archivebot [07:03] Someone is uploading a lot of things to FOS via FTP and I'm deleting a lot of it [07:03] So I hope this is all going to workout [07:04] I guess you can always see what they are [07:06] It's crap-warez [07:06] Like an open-directory barfed [07:06] Send it my way sketch [07:06] Oh [07:06] lol [07:07] What sort of crapware? Also could it be someone trying to archive something? [07:09] ffffff [07:09] Whats up now? still going? [07:12] I'm posting this in here so that the person responsible sees it and knows what happened. [07:12] its not me [07:12] Well not me, as I don't have your ftp details [07:24] *** TigerbotH has quit IRC (ZNC - http://znc.in) [07:29] *** TigerbotH has joined #archiveteam-bs [07:41] i didnt even know sketch had an ftp [07:50] *** plue has joined #archiveteam-bs [08:07] *** Mateon1 has quit IRC (Ping timeout: 268 seconds) [08:07] *** Mateon1 has joined #archiveteam-bs [08:31] SketchCow: so DTIC archive is at 462k items now [09:30] *** altlabel_ has quit IRC (Read error: Operation timed out) [09:36] *** caff has quit IRC (Read error: Connection reset by peer) [10:52] Not me either. [10:54] dxrt: Good idea, I've been considering doing that as well. I figured they might not be willing to do it though since they're shutting down due to privacy issues/GDPR. But can't hurt to ask. [12:28] *** Pixi has quit IRC (Quit: Pixi) [12:36] *** Pixi has joined #archiveteam-bs [12:44] Most of the time the FTP is used when I have people who want to upload but it would be a burden for them to learn the ia python interface, or the collection is very large, and I give them a help. [12:44] They upload the given 30-300gb of material to the FTP cred I give them, and then I do the rest. [12:44] It's a bustling little depot. [13:13] *** BlueMax has quit IRC (Quit: Leaving) [13:26] *** bitBaron has joined #archiveteam-bs [13:30] *** Pixi has quit IRC (hub.efnet.us irc.servercentral.net) [13:30] *** superkuh has quit IRC (hub.efnet.us irc.servercentral.net) [13:30] *** decay has quit IRC (hub.efnet.us irc.servercentral.net) [13:30] *** swebb has quit IRC (hub.efnet.us irc.servercentral.net) [13:30] *** Atom__ has quit IRC (hub.efnet.us irc.servercentral.net) [13:30] *** erin has quit IRC (hub.efnet.us irc.servercentral.net) [13:30] *** zino has quit IRC (hub.efnet.us irc.servercentral.net) [13:30] *** Somebody2 has quit IRC (hub.efnet.us irc.servercentral.net) [13:30] *** phirephly has quit IRC (hub.efnet.us irc.servercentral.net) [13:30] *** MrRadar has quit IRC (hub.efnet.us irc.servercentral.net) [13:30] *** twigfoot has quit IRC (hub.efnet.us irc.servercentral.net) [13:30] *** Cameron_D has quit IRC (hub.efnet.us irc.servercentral.net) [13:30] *** astrid has quit IRC (hub.efnet.us irc.servercentral.net) [13:30] *** chirlu` has quit IRC (hub.efnet.us irc.servercentral.net) [13:32] *** decay_ has joined #archiveteam-bs [13:35] *** BartoCH_ has joined #archiveteam-bs [13:37] *** BartoCH has quit IRC (Read error: Connection reset by peer) [13:41] *** superkuh_ has joined #archiveteam-bs [13:45] *** decay_ is now known as decay [14:00] *** Atom__ has joined #archiveteam-bs [14:00] *** Cameron_D has joined #archiveteam-bs [14:00] *** Somebody2 has joined #archiveteam-bs [14:00] *** twigfoot has joined #archiveteam-bs [14:00] *** Pixi` has joined #archiveteam-bs [14:00] *** swebb has joined #archiveteam-bs [14:00] *** erin has joined #archiveteam-bs [14:00] *** zino has joined #archiveteam-bs [14:00] *** phirephly has joined #archiveteam-bs [14:00] *** MrRadar has joined #archiveteam-bs [14:00] *** astrid has joined #archiveteam-bs [14:00] *** chirlu` has joined #archiveteam-bs [14:00] *** irc.servercentral.net sets mode: +oo swebb astrid [14:01] *** swebb sets mode: +o JAA [14:01] *** swebb sets mode: +o SketchCow [14:01] *** swebb sets mode: +o underscor [14:01] *** swebb sets mode: +o brayden [14:01] *** swebb sets mode: +o Jonimoose [14:01] *** swebb sets mode: +o balrog [14:01] *** swebb sets mode: +o antomatic [14:01] *** swebb sets mode: +o bakJAA [14:01] *** swebb sets mode: +o DFJustin [14:09] *** Atom__ has quit IRC (Read error: Operation timed out) [14:23] *** faolingfa has joined #archiveteam-bs [14:24] *** Cameron_D has quit IRC (Read error: Operation timed out) [14:32] *** zino has quit IRC (Read error: Connection reset by peer) [14:33] *** zino has joined #archiveteam-bs [14:36] *** Cameron_D has joined #archiveteam-bs [14:54] and you're the depot despot [14:55] *** schbirid has joined #archiveteam-bs [15:02] *** bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) [15:08] *** eprillios has joined #archiveteam-bs [15:09] *** bitBaron has joined #archiveteam-bs [15:45] *** godane has quit IRC (Read error: Operation timed out) [15:59] *** godane has joined #archiveteam-bs [16:00] *** svchfoo1 sets mode: +o godane [16:04] *** BartoCH_ has quit IRC (Quit: WeeChat 2.2) [16:06] *** BartoCH has joined #archiveteam-bs [16:31] *** wp494 has quit IRC (Ping timeout: 252 seconds) [16:32] *** wp494 has joined #archiveteam-bs [16:42] *** bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) [16:54] *** bitBaron has joined #archiveteam-bs [17:05] *** bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) [17:06] *** bitBaron has joined #archiveteam-bs [17:11] *** bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) [17:12] *** bitBaron has joined #archiveteam-bs [17:35] *** bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) [17:40] *** Pixi has joined #archiveteam-bs [17:42] *** bitBaron has joined #archiveteam-bs [17:52] *** bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) [17:52] *** bitBaron has joined #archiveteam-bs [17:54] *** Pixi` has quit IRC (Read error: Operation timed out) [18:04] *** superkuh_ is now known as superkuh [18:28] Are there any python devs who could help me out with some Warrior scripts? I'm stuck at figuring out how to implement a URL parser [18:31] Hmmm... it might be better to do it on the Lua side.... [18:32] there's a bunch of url parsing functions in urllib.parse from the standard library [18:34] My main issue is that I'm trying to grab the sitemap - store it as a temp file, then read the URLS from the sitemap and grab those [18:36] At the moment, the script just grabs the sitemap as a WARC (https://github.com/adinbied/angelfire-grab/blob/master/pipeline.py#L197-L200), is there any way to store that as a temp file as non-warced? [18:40] run wget for the sitemap in a separate step before WgetDownload? [18:42] *** bitBaron_ has joined #archiveteam-bs [18:45] How would I go about doing that? I.E. grabbing it, parsing, and then feeding the URLs back into WgetDownload [18:47] Here's a shell version of roughly what I'm trying to do: https://github.com/SpiritQuaddicted/angelfire/blob/master/downloaduser.shhttps://github.com/SpiritQuaddicted/angelfire/blob/master/downloaduser.sh [18:47] https://github.com/SpiritQuaddicted/angelfire/blob/master/downloaduser.sh [18:49] *** bitBaron has quit IRC (Ping timeout: 492 seconds) [18:49] that's me btw :D [18:49] thanks for doing this! [19:19] *** schbirid has quit IRC (Read error: Operation timed out) [19:21] looks like the way pipelines work makes the lua option easier, something like https://0x0.st/sxJI.lua [19:23] (might break if some sitemaps have multiple urls per line or there are sitemap.xml files in subdirectories) [19:31] *** Nicu` has joined #archiveteam-bs [19:33] *** odemg has quit IRC (Ping timeout: 260 seconds) [20:14] i got the last 2 star trek magazines scanned [20:18] *** Nicu` has quit IRC (Ping timeout: 255 seconds) [20:54] https://archive.org/details/star-trek-communicator-issue-142 [21:59] *** bitBaron_ has quit IRC (Quit: Bye.) [22:04] https://archive.org/details/star-trek-communicator-issue-143 [22:22] *** bitBaron has joined #archiveteam-bs [22:30] *** coldice has quit IRC (Quit: Leaving) [23:05] *** antomati_ has joined #archiveteam-bs [23:05] *** swebb sets mode: +o antomati_ [23:07] *** antomatic has quit IRC (Read error: Operation timed out) [23:40] *** BlueMax has joined #archiveteam-bs