Time |
Nickname |
Message |
05:15
🔗
|
|
zerkalo has quit IRC (Ping timeout: 260 seconds) |
05:38
🔗
|
|
zerkalo has joined #warrior |
06:41
🔗
|
|
Honno has joined #warrior |
06:58
🔗
|
|
zerkalo has quit IRC (Read error: Operation timed out) |
06:59
🔗
|
|
zerkalo has joined #warrior |
10:06
🔗
|
|
zhongfu has quit IRC (Ping timeout: 260 seconds) |
10:19
🔗
|
|
zhongfu has joined #warrior |
19:22
🔗
|
|
scraping_ has joined #warrior |
19:23
🔗
|
scraping_ |
Hello - Are there any good examples of using Scrapy to build a warrior project? |
19:23
🔗
|
scraping_ |
Or, alternatively: Any suggestions how I might contribute to add Scrapy support to Warrior? |
19:24
🔗
|
scraping_ |
ArchiveTeam's an amazing project, but I think yall woud get a lot more coder support if you could do Scrapy projects because of the ecosystem |
19:28
🔗
|
arkiver |
we're not using scrapy |
19:28
🔗
|
arkiver |
and I'm planning on doing anything with scrapy |
19:29
🔗
|
arkiver |
we're doing fine on warrior projects |
19:29
🔗
|
arkiver |
well, we're not using scrapy for warrior project, not sure about other projects |
19:30
🔗
|
scraping_ |
I've seen projects on your GH that don't look like they're seesaw though |
19:31
🔗
|
scraping_ |
but they say they're "Warrior" compatible. So warrior must be pluggable? Or am I mistaken there? |
20:19
🔗
|
|
arkiver is now known as arkiver2 |
21:12
🔗
|
Kaz |
scraping_: got a link to one of those? |
21:14
🔗
|
|
SmileyG has quit IRC (Read error: Connection reset by peer) |
21:19
🔗
|
|
Smiley has joined #warrior |
21:36
🔗
|
scraping_ |
kaz: https://github.com/ArchiveTeam/tinyback looks like it's pure python, no lua involved. And it specifies it works on Warrior. My initial impresion was that lua was needed for seesaw |
21:38
🔗
|
Kaz |
that uses seesaw |
21:38
🔗
|
Kaz |
lua is usually used with wget-lua, for the actual grabbing (afaik), seesaw is irrelevant to that |
21:38
🔗
|
Kaz |
that project is also 3+ years old, things have changed a bit since then |
21:39
🔗
|
scraping_ |
Ah, ok. Just wondering; why Seesaw? it seems to be a custom thing just for ArchiveTeam, but that sort of opts out of all the scraping ecosystems out there, doesn't it? |
21:40
🔗
|
Kaz |
seesaw isn't just the scraping, it's the checking into tracker, getting jobs, keeping on top of the status of those jobs, pushing the final output to a specified location etc |
21:41
🔗
|
Kaz |
Created well before I heard about AT, i'd assume mainly because making something custom purely for AT purposes was better than working through various other tools and bolting bits together |
22:34
🔗
|
|
Honno has quit IRC (Ping timeout: 370 seconds) |
23:25
🔗
|
chfoo |
seesaw used to be a bunch of bash scripts that downloaded and uploaded. it evolved from there. |
23:28
🔗
|
chfoo |
it works well for most projects but using it isn't strictly mandatory. ie, urlteam doesn't fit in so it calls its own code for managing the process |
23:43
🔗
|
|
arkiver2 is now known as arkiver |