Time |
Nickname |
Message |
00:21
🔗
|
|
BlueMax has joined #archiveteam-bs |
00:46
🔗
|
|
ndiddy has quit IRC () |
02:00
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
02:16
🔗
|
|
FlashBack is now known as AlbardinG |
02:17
🔗
|
|
coldice has joined #archiveteam-bs |
03:01
🔗
|
|
pikhq has quit IRC (Read error: Operation timed out) |
03:11
🔗
|
|
pikhq has joined #archiveteam-bs |
03:35
🔗
|
|
kiskabak2 has quit IRC (Quit: The Lounge - https://thelounge.github.io) |
03:40
🔗
|
|
kiskabak has joined #archiveteam-bs |
03:45
🔗
|
w0rmhole |
test |
03:45
🔗
|
astrid |
boop |
03:46
🔗
|
w0rmhole |
:D |
04:01
🔗
|
|
odemg has quit IRC (Ping timeout: 260 seconds) |
04:13
🔗
|
|
odemg has joined #archiveteam-bs |
05:55
🔗
|
dxrt |
I'm going to send an email to the botbot.me guys - see if they are willing to give us a copy of the IRC logs before they shut down. |
06:39
🔗
|
kiska |
dxrt maybe get them to give us a warc compatible copy? |
06:41
🔗
|
ivan |
you want them to scrape themselves and produce the HTML fragments they serve? |
06:42
🔗
|
kiska |
Well it would free up a pipeline slot on #archivebot |
07:03
🔗
|
SketchCow |
Someone is uploading a lot of things to FOS via FTP and I'm deleting a lot of it |
07:03
🔗
|
SketchCow |
So I hope this is all going to workout |
07:04
🔗
|
kiska |
I guess you can always see what they are |
07:06
🔗
|
SketchCow |
It's crap-warez |
07:06
🔗
|
SketchCow |
Like an open-directory barfed |
07:06
🔗
|
Flashfire |
Send it my way sketch |
07:06
🔗
|
kiska |
Oh |
07:06
🔗
|
Flashfire |
lol |
07:07
🔗
|
kiska |
What sort of crapware? Also could it be someone trying to archive something? |
07:09
🔗
|
SketchCow |
ffffff |
07:09
🔗
|
Flashfire |
Whats up now? still going? |
07:12
🔗
|
SketchCow |
I'm posting this in here so that the person responsible sees it and knows what happened. |
07:12
🔗
|
Flashfire |
its not me |
07:12
🔗
|
kiska |
Well not me, as I don't have your ftp details |
07:24
🔗
|
|
TigerbotH has quit IRC (ZNC - http://znc.in) |
07:29
🔗
|
|
TigerbotH has joined #archiveteam-bs |
07:41
🔗
|
w0rmhole |
i didnt even know sketch had an ftp |
07:50
🔗
|
|
plue has joined #archiveteam-bs |
08:07
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 268 seconds) |
08:07
🔗
|
|
Mateon1 has joined #archiveteam-bs |
08:31
🔗
|
godane |
SketchCow: so DTIC archive is at 462k items now |
09:30
🔗
|
|
altlabel_ has quit IRC (Read error: Operation timed out) |
09:36
🔗
|
|
caff has quit IRC (Read error: Connection reset by peer) |
10:52
🔗
|
JAA |
Not me either. |
10:54
🔗
|
JAA |
dxrt: Good idea, I've been considering doing that as well. I figured they might not be willing to do it though since they're shutting down due to privacy issues/GDPR. But can't hurt to ask. |
12:28
🔗
|
|
Pixi has quit IRC (Quit: Pixi) |
12:36
🔗
|
|
Pixi has joined #archiveteam-bs |
12:44
🔗
|
SketchCow |
Most of the time the FTP is used when I have people who want to upload but it would be a burden for them to learn the ia python interface, or the collection is very large, and I give them a help. |
12:44
🔗
|
SketchCow |
They upload the given 30-300gb of material to the FTP cred I give them, and then I do the rest. |
12:44
🔗
|
SketchCow |
It's a bustling little depot. |
13:13
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
13:26
🔗
|
|
bitBaron has joined #archiveteam-bs |
13:30
🔗
|
|
Pixi has quit IRC (hub.efnet.us irc.servercentral.net) |
13:30
🔗
|
|
superkuh has quit IRC (hub.efnet.us irc.servercentral.net) |
13:30
🔗
|
|
decay has quit IRC (hub.efnet.us irc.servercentral.net) |
13:30
🔗
|
|
swebb has quit IRC (hub.efnet.us irc.servercentral.net) |
13:30
🔗
|
|
Atom__ has quit IRC (hub.efnet.us irc.servercentral.net) |
13:30
🔗
|
|
erin has quit IRC (hub.efnet.us irc.servercentral.net) |
13:30
🔗
|
|
zino has quit IRC (hub.efnet.us irc.servercentral.net) |
13:30
🔗
|
|
Somebody2 has quit IRC (hub.efnet.us irc.servercentral.net) |
13:30
🔗
|
|
phirephly has quit IRC (hub.efnet.us irc.servercentral.net) |
13:30
🔗
|
|
MrRadar has quit IRC (hub.efnet.us irc.servercentral.net) |
13:30
🔗
|
|
twigfoot has quit IRC (hub.efnet.us irc.servercentral.net) |
13:30
🔗
|
|
Cameron_D has quit IRC (hub.efnet.us irc.servercentral.net) |
13:30
🔗
|
|
astrid has quit IRC (hub.efnet.us irc.servercentral.net) |
13:30
🔗
|
|
chirlu` has quit IRC (hub.efnet.us irc.servercentral.net) |
13:32
🔗
|
|
decay_ has joined #archiveteam-bs |
13:35
🔗
|
|
BartoCH_ has joined #archiveteam-bs |
13:37
🔗
|
|
BartoCH has quit IRC (Read error: Connection reset by peer) |
13:41
🔗
|
|
superkuh_ has joined #archiveteam-bs |
13:45
🔗
|
|
decay_ is now known as decay |
14:00
🔗
|
|
Atom__ has joined #archiveteam-bs |
14:00
🔗
|
|
Cameron_D has joined #archiveteam-bs |
14:00
🔗
|
|
Somebody2 has joined #archiveteam-bs |
14:00
🔗
|
|
twigfoot has joined #archiveteam-bs |
14:00
🔗
|
|
Pixi` has joined #archiveteam-bs |
14:00
🔗
|
|
swebb has joined #archiveteam-bs |
14:00
🔗
|
|
erin has joined #archiveteam-bs |
14:00
🔗
|
|
zino has joined #archiveteam-bs |
14:00
🔗
|
|
phirephly has joined #archiveteam-bs |
14:00
🔗
|
|
MrRadar has joined #archiveteam-bs |
14:00
🔗
|
|
astrid has joined #archiveteam-bs |
14:00
🔗
|
|
chirlu` has joined #archiveteam-bs |
14:00
🔗
|
|
irc.servercentral.net sets mode: +oo swebb astrid |
14:01
🔗
|
|
swebb sets mode: +o JAA |
14:01
🔗
|
|
swebb sets mode: +o SketchCow |
14:01
🔗
|
|
swebb sets mode: +o underscor |
14:01
🔗
|
|
swebb sets mode: +o brayden |
14:01
🔗
|
|
swebb sets mode: +o Jonimoose |
14:01
🔗
|
|
swebb sets mode: +o balrog |
14:01
🔗
|
|
swebb sets mode: +o antomatic |
14:01
🔗
|
|
swebb sets mode: +o bakJAA |
14:01
🔗
|
|
swebb sets mode: +o DFJustin |
14:09
🔗
|
|
Atom__ has quit IRC (Read error: Operation timed out) |
14:23
🔗
|
|
faolingfa has joined #archiveteam-bs |
14:24
🔗
|
|
Cameron_D has quit IRC (Read error: Operation timed out) |
14:32
🔗
|
|
zino has quit IRC (Read error: Connection reset by peer) |
14:33
🔗
|
|
zino has joined #archiveteam-bs |
14:36
🔗
|
|
Cameron_D has joined #archiveteam-bs |
14:54
🔗
|
astrid |
and you're the depot despot |
14:55
🔗
|
|
schbirid has joined #archiveteam-bs |
15:02
🔗
|
|
bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) |
15:08
🔗
|
|
eprillios has joined #archiveteam-bs |
15:09
🔗
|
|
bitBaron has joined #archiveteam-bs |
15:45
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
15:59
🔗
|
|
godane has joined #archiveteam-bs |
16:00
🔗
|
|
svchfoo1 sets mode: +o godane |
16:04
🔗
|
|
BartoCH_ has quit IRC (Quit: WeeChat 2.2) |
16:06
🔗
|
|
BartoCH has joined #archiveteam-bs |
16:31
🔗
|
|
wp494 has quit IRC (Ping timeout: 252 seconds) |
16:32
🔗
|
|
wp494 has joined #archiveteam-bs |
16:42
🔗
|
|
bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) |
16:54
🔗
|
|
bitBaron has joined #archiveteam-bs |
17:05
🔗
|
|
bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) |
17:06
🔗
|
|
bitBaron has joined #archiveteam-bs |
17:11
🔗
|
|
bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) |
17:12
🔗
|
|
bitBaron has joined #archiveteam-bs |
17:35
🔗
|
|
bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) |
17:40
🔗
|
|
Pixi has joined #archiveteam-bs |
17:42
🔗
|
|
bitBaron has joined #archiveteam-bs |
17:52
🔗
|
|
bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) |
17:52
🔗
|
|
bitBaron has joined #archiveteam-bs |
17:54
🔗
|
|
Pixi` has quit IRC (Read error: Operation timed out) |
18:04
🔗
|
|
superkuh_ is now known as superkuh |
18:28
🔗
|
adinbied |
Are there any python devs who could help me out with some Warrior scripts? I'm stuck at figuring out how to implement a URL parser |
18:31
🔗
|
adinbied |
Hmmm... it might be better to do it on the Lua side.... |
18:32
🔗
|
moufu |
there's a bunch of url parsing functions in urllib.parse from the standard library |
18:34
🔗
|
adinbied |
My main issue is that I'm trying to grab the sitemap - store it as a temp file, then read the URLS from the sitemap and grab those |
18:36
🔗
|
adinbied |
At the moment, the script just grabs the sitemap as a WARC (https://github.com/adinbied/angelfire-grab/blob/master/pipeline.py#L197-L200), is there any way to store that as a temp file as non-warced? |
18:40
🔗
|
moufu |
run wget for the sitemap in a separate step before WgetDownload? |
18:42
🔗
|
|
bitBaron_ has joined #archiveteam-bs |
18:45
🔗
|
adinbied |
How would I go about doing that? I.E. grabbing it, parsing, and then feeding the URLs back into WgetDownload |
18:47
🔗
|
adinbied |
Here's a shell version of roughly what I'm trying to do: https://github.com/SpiritQuaddicted/angelfire/blob/master/downloaduser.shhttps://github.com/SpiritQuaddicted/angelfire/blob/master/downloaduser.sh |
18:47
🔗
|
adinbied |
https://github.com/SpiritQuaddicted/angelfire/blob/master/downloaduser.sh |
18:49
🔗
|
|
bitBaron has quit IRC (Ping timeout: 492 seconds) |
18:49
🔗
|
schbirid |
that's me btw :D |
18:49
🔗
|
schbirid |
thanks for doing this! |
19:19
🔗
|
|
schbirid has quit IRC (Read error: Operation timed out) |
19:21
🔗
|
moufu |
looks like the way pipelines work makes the lua option easier, something like https://0x0.st/sxJI.lua |
19:23
🔗
|
moufu |
(might break if some sitemaps have multiple urls per line or there are sitemap.xml files in subdirectories) |
19:31
🔗
|
|
Nicu` has joined #archiveteam-bs |
19:33
🔗
|
|
odemg has quit IRC (Ping timeout: 260 seconds) |
20:14
🔗
|
godane |
i got the last 2 star trek magazines scanned |
20:18
🔗
|
|
Nicu` has quit IRC (Ping timeout: 255 seconds) |
20:54
🔗
|
godane |
https://archive.org/details/star-trek-communicator-issue-142 |
21:59
🔗
|
|
bitBaron_ has quit IRC (Quit: Bye.) |
22:04
🔗
|
godane |
https://archive.org/details/star-trek-communicator-issue-143 |
22:22
🔗
|
|
bitBaron has joined #archiveteam-bs |
22:30
🔗
|
|
coldice has quit IRC (Quit: Leaving) |
23:05
🔗
|
|
antomati_ has joined #archiveteam-bs |
23:05
🔗
|
|
swebb sets mode: +o antomati_ |
23:07
🔗
|
|
antomatic has quit IRC (Read error: Operation timed out) |
23:40
🔗
|
|
BlueMax has joined #archiveteam-bs |