[05:15] *** zerkalo has quit IRC (Ping timeout: 260 seconds) [05:38] *** zerkalo has joined #warrior [06:41] *** Honno has joined #warrior [06:58] *** zerkalo has quit IRC (Read error: Operation timed out) [06:59] *** zerkalo has joined #warrior [10:06] *** zhongfu has quit IRC (Ping timeout: 260 seconds) [10:19] *** zhongfu has joined #warrior [19:22] *** scraping_ has joined #warrior [19:23] Hello - Are there any good examples of using Scrapy to build a warrior project? [19:23] Or, alternatively: Any suggestions how I might contribute to add Scrapy support to Warrior? [19:24] ArchiveTeam's an amazing project, but I think yall woud get a lot more coder support if you could do Scrapy projects because of the ecosystem [19:28] we're not using scrapy [19:28] and I'm planning on doing anything with scrapy [19:29] we're doing fine on warrior projects [19:29] well, we're not using scrapy for warrior project, not sure about other projects [19:30] I've seen projects on your GH that don't look like they're seesaw though [19:31] but they say they're "Warrior" compatible. So warrior must be pluggable? Or am I mistaken there? [20:19] *** arkiver is now known as arkiver2 [21:12] scraping_: got a link to one of those? [21:14] *** SmileyG has quit IRC (Read error: Connection reset by peer) [21:19] *** Smiley has joined #warrior [21:36] kaz: https://github.com/ArchiveTeam/tinyback looks like it's pure python, no lua involved. And it specifies it works on Warrior. My initial impresion was that lua was needed for seesaw [21:38] that uses seesaw [21:38] lua is usually used with wget-lua, for the actual grabbing (afaik), seesaw is irrelevant to that [21:38] that project is also 3+ years old, things have changed a bit since then [21:39] Ah, ok. Just wondering; why Seesaw? it seems to be a custom thing just for ArchiveTeam, but that sort of opts out of all the scraping ecosystems out there, doesn't it? [21:40] seesaw isn't just the scraping, it's the checking into tracker, getting jobs, keeping on top of the status of those jobs, pushing the final output to a specified location etc [21:41] Created well before I heard about AT, i'd assume mainly because making something custom purely for AT purposes was better than working through various other tools and bolting bits together [22:34] *** Honno has quit IRC (Ping timeout: 370 seconds) [23:25] seesaw used to be a bunch of bash scripts that downloaded and uploaded. it evolved from there. [23:28] it works well for most projects but using it isn't strictly mandatory. ie, urlteam doesn't fit in so it calls its own code for managing the process [23:43] *** arkiver2 is now known as arkiver