#archiveteam-bs 2018-09-17,Mon

↑back Search

Time Nickname Message
00:21 🔗 BlueMax has joined #archiveteam-bs
00:46 🔗 ndiddy has quit IRC ()
02:00 🔗 icedice has quit IRC (Quit: Leaving)
02:16 🔗 FlashBack is now known as AlbardinG
02:17 🔗 coldice has joined #archiveteam-bs
03:01 🔗 pikhq has quit IRC (Read error: Operation timed out)
03:11 🔗 pikhq has joined #archiveteam-bs
03:35 🔗 kiskabak2 has quit IRC (Quit: The Lounge - https://thelounge.github.io)
03:40 🔗 kiskabak has joined #archiveteam-bs
03:45 🔗 w0rmhole test
03:45 🔗 astrid boop
03:46 🔗 w0rmhole :D
04:01 🔗 odemg has quit IRC (Ping timeout: 260 seconds)
04:13 🔗 odemg has joined #archiveteam-bs
05:55 🔗 dxrt I'm going to send an email to the botbot.me guys - see if they are willing to give us a copy of the IRC logs before they shut down.
06:39 🔗 kiska dxrt maybe get them to give us a warc compatible copy?
06:41 🔗 ivan you want them to scrape themselves and produce the HTML fragments they serve?
06:42 🔗 kiska Well it would free up a pipeline slot on #archivebot
07:03 🔗 SketchCow Someone is uploading a lot of things to FOS via FTP and I'm deleting a lot of it
07:03 🔗 SketchCow So I hope this is all going to workout
07:04 🔗 kiska I guess you can always see what they are
07:06 🔗 SketchCow It's crap-warez
07:06 🔗 SketchCow Like an open-directory barfed
07:06 🔗 Flashfire Send it my way sketch
07:06 🔗 kiska Oh
07:06 🔗 Flashfire lol
07:07 🔗 kiska What sort of crapware? Also could it be someone trying to archive something?
07:09 🔗 SketchCow ffffff
07:09 🔗 Flashfire Whats up now? still going?
07:12 🔗 SketchCow I'm posting this in here so that the person responsible sees it and knows what happened.
07:12 🔗 Flashfire its not me
07:12 🔗 kiska Well not me, as I don't have your ftp details
07:24 🔗 TigerbotH has quit IRC (ZNC - http://znc.in)
07:29 🔗 TigerbotH has joined #archiveteam-bs
07:41 🔗 w0rmhole i didnt even know sketch had an ftp
07:50 🔗 plue has joined #archiveteam-bs
08:07 🔗 Mateon1 has quit IRC (Ping timeout: 268 seconds)
08:07 🔗 Mateon1 has joined #archiveteam-bs
08:31 🔗 godane SketchCow: so DTIC archive is at 462k items now
09:30 🔗 altlabel_ has quit IRC (Read error: Operation timed out)
09:36 🔗 caff has quit IRC (Read error: Connection reset by peer)
10:52 🔗 JAA Not me either.
10:54 🔗 JAA dxrt: Good idea, I've been considering doing that as well. I figured they might not be willing to do it though since they're shutting down due to privacy issues/GDPR. But can't hurt to ask.
12:28 🔗 Pixi has quit IRC (Quit: Pixi)
12:36 🔗 Pixi has joined #archiveteam-bs
12:44 🔗 SketchCow Most of the time the FTP is used when I have people who want to upload but it would be a burden for them to learn the ia python interface, or the collection is very large, and I give them a help.
12:44 🔗 SketchCow They upload the given 30-300gb of material to the FTP cred I give them, and then I do the rest.
12:44 🔗 SketchCow It's a bustling little depot.
13:13 🔗 BlueMax has quit IRC (Quit: Leaving)
13:26 🔗 bitBaron has joined #archiveteam-bs
13:30 🔗 Pixi has quit IRC (hub.efnet.us irc.servercentral.net)
13:30 🔗 superkuh has quit IRC (hub.efnet.us irc.servercentral.net)
13:30 🔗 decay has quit IRC (hub.efnet.us irc.servercentral.net)
13:30 🔗 swebb has quit IRC (hub.efnet.us irc.servercentral.net)
13:30 🔗 Atom__ has quit IRC (hub.efnet.us irc.servercentral.net)
13:30 🔗 erin has quit IRC (hub.efnet.us irc.servercentral.net)
13:30 🔗 zino has quit IRC (hub.efnet.us irc.servercentral.net)
13:30 🔗 Somebody2 has quit IRC (hub.efnet.us irc.servercentral.net)
13:30 🔗 phirephly has quit IRC (hub.efnet.us irc.servercentral.net)
13:30 🔗 MrRadar has quit IRC (hub.efnet.us irc.servercentral.net)
13:30 🔗 twigfoot has quit IRC (hub.efnet.us irc.servercentral.net)
13:30 🔗 Cameron_D has quit IRC (hub.efnet.us irc.servercentral.net)
13:30 🔗 astrid has quit IRC (hub.efnet.us irc.servercentral.net)
13:30 🔗 chirlu` has quit IRC (hub.efnet.us irc.servercentral.net)
13:32 🔗 decay_ has joined #archiveteam-bs
13:35 🔗 BartoCH_ has joined #archiveteam-bs
13:37 🔗 BartoCH has quit IRC (Read error: Connection reset by peer)
13:41 🔗 superkuh_ has joined #archiveteam-bs
13:45 🔗 decay_ is now known as decay
14:00 🔗 Atom__ has joined #archiveteam-bs
14:00 🔗 Cameron_D has joined #archiveteam-bs
14:00 🔗 Somebody2 has joined #archiveteam-bs
14:00 🔗 twigfoot has joined #archiveteam-bs
14:00 🔗 Pixi` has joined #archiveteam-bs
14:00 🔗 swebb has joined #archiveteam-bs
14:00 🔗 erin has joined #archiveteam-bs
14:00 🔗 zino has joined #archiveteam-bs
14:00 🔗 phirephly has joined #archiveteam-bs
14:00 🔗 MrRadar has joined #archiveteam-bs
14:00 🔗 astrid has joined #archiveteam-bs
14:00 🔗 chirlu` has joined #archiveteam-bs
14:00 🔗 irc.servercentral.net sets mode: +oo swebb astrid
14:01 🔗 swebb sets mode: +o JAA
14:01 🔗 swebb sets mode: +o SketchCow
14:01 🔗 swebb sets mode: +o underscor
14:01 🔗 swebb sets mode: +o brayden
14:01 🔗 swebb sets mode: +o Jonimoose
14:01 🔗 swebb sets mode: +o balrog
14:01 🔗 swebb sets mode: +o antomatic
14:01 🔗 swebb sets mode: +o bakJAA
14:01 🔗 swebb sets mode: +o DFJustin
14:09 🔗 Atom__ has quit IRC (Read error: Operation timed out)
14:23 🔗 faolingfa has joined #archiveteam-bs
14:24 🔗 Cameron_D has quit IRC (Read error: Operation timed out)
14:32 🔗 zino has quit IRC (Read error: Connection reset by peer)
14:33 🔗 zino has joined #archiveteam-bs
14:36 🔗 Cameron_D has joined #archiveteam-bs
14:54 🔗 astrid and you're the depot despot
14:55 🔗 schbirid has joined #archiveteam-bs
15:02 🔗 bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…)
15:08 🔗 eprillios has joined #archiveteam-bs
15:09 🔗 bitBaron has joined #archiveteam-bs
15:45 🔗 godane has quit IRC (Read error: Operation timed out)
15:59 🔗 godane has joined #archiveteam-bs
16:00 🔗 svchfoo1 sets mode: +o godane
16:04 🔗 BartoCH_ has quit IRC (Quit: WeeChat 2.2)
16:06 🔗 BartoCH has joined #archiveteam-bs
16:31 🔗 wp494 has quit IRC (Ping timeout: 252 seconds)
16:32 🔗 wp494 has joined #archiveteam-bs
16:42 🔗 bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…)
16:54 🔗 bitBaron has joined #archiveteam-bs
17:05 🔗 bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…)
17:06 🔗 bitBaron has joined #archiveteam-bs
17:11 🔗 bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…)
17:12 🔗 bitBaron has joined #archiveteam-bs
17:35 🔗 bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…)
17:40 🔗 Pixi has joined #archiveteam-bs
17:42 🔗 bitBaron has joined #archiveteam-bs
17:52 🔗 bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…)
17:52 🔗 bitBaron has joined #archiveteam-bs
17:54 🔗 Pixi` has quit IRC (Read error: Operation timed out)
18:04 🔗 superkuh_ is now known as superkuh
18:28 🔗 adinbied Are there any python devs who could help me out with some Warrior scripts? I'm stuck at figuring out how to implement a URL parser
18:31 🔗 adinbied Hmmm... it might be better to do it on the Lua side....
18:32 🔗 moufu there's a bunch of url parsing functions in urllib.parse from the standard library
18:34 🔗 adinbied My main issue is that I'm trying to grab the sitemap - store it as a temp file, then read the URLS from the sitemap and grab those
18:36 🔗 adinbied At the moment, the script just grabs the sitemap as a WARC (https://github.com/adinbied/angelfire-grab/blob/master/pipeline.py#L197-L200), is there any way to store that as a temp file as non-warced?
18:40 🔗 moufu run wget for the sitemap in a separate step before WgetDownload?
18:42 🔗 bitBaron_ has joined #archiveteam-bs
18:45 🔗 adinbied How would I go about doing that? I.E. grabbing it, parsing, and then feeding the URLs back into WgetDownload
18:47 🔗 adinbied Here's a shell version of roughly what I'm trying to do: https://github.com/SpiritQuaddicted/angelfire/blob/master/downloaduser.shhttps://github.com/SpiritQuaddicted/angelfire/blob/master/downloaduser.sh
18:47 🔗 adinbied https://github.com/SpiritQuaddicted/angelfire/blob/master/downloaduser.sh
18:49 🔗 bitBaron has quit IRC (Ping timeout: 492 seconds)
18:49 🔗 schbirid that's me btw :D
18:49 🔗 schbirid thanks for doing this!
19:19 🔗 schbirid has quit IRC (Read error: Operation timed out)
19:21 🔗 moufu looks like the way pipelines work makes the lua option easier, something like https://0x0.st/sxJI.lua
19:23 🔗 moufu (might break if some sitemaps have multiple urls per line or there are sitemap.xml files in subdirectories)
19:31 🔗 Nicu` has joined #archiveteam-bs
19:33 🔗 odemg has quit IRC (Ping timeout: 260 seconds)
20:14 🔗 godane i got the last 2 star trek magazines scanned
20:18 🔗 Nicu` has quit IRC (Ping timeout: 255 seconds)
20:54 🔗 godane https://archive.org/details/star-trek-communicator-issue-142
21:59 🔗 bitBaron_ has quit IRC (Quit: Bye.)
22:04 🔗 godane https://archive.org/details/star-trek-communicator-issue-143
22:22 🔗 bitBaron has joined #archiveteam-bs
22:30 🔗 coldice has quit IRC (Quit: Leaving)
23:05 🔗 antomati_ has joined #archiveteam-bs
23:05 🔗 swebb sets mode: +o antomati_
23:07 🔗 antomatic has quit IRC (Read error: Operation timed out)
23:40 🔗 BlueMax has joined #archiveteam-bs

irclogger-viewer