#warrior 2013-11-05,Tue

↑back Search

Time Nickname Message
00:17 🔗 n00b221 Hey guys just a thought but I noticed something today, archive.org now gives you the option to archive any page if you put it into the waybackmachine and it is not already archived and also auto archives various pages if they are linked to from within a archived page but not archived themselves, so would it be possible to write a script to use this to expand the archive.org archives as well, more or less a a
00:17 🔗 n00b221 them if the option to archive them pops up
00:27 🔗 n00b221 If there had been something like that during the isohunt bit, the entire site could have been archived in a few hours
00:30 🔗 odie5533 n00b221: I'd imagine at this point IA doesn't want to do that.
00:30 🔗 odie5533 and would perhaps rather we use archivebot or warrior projects
00:41 🔗 odie5533 Does anyone know who manages the Warrior project list and rsync targets?
06:57 🔗 ersi odie5533: Anyone else
06:58 🔗 ersi We are a bunch with access to the tracker/project list. And SketchCow usually provides rsync targets
15:39 🔗 odie5533 So how does one go about adding a Warrior project to the tracker?
15:44 🔗 GLaDOS All that needs to be done is editing projects.json (located at http://warriorhq.archiveteam.org/projects.json for AT)
15:47 🔗 odie5533 GLaDOS: Doesn't it need to be added to e.g. tracker.archiveteam.org/project/>
15:47 🔗 GLaDOS Warrior projects don't need a tracker at that location (e.g URLTeam, and some other projects which had others hosting trackers)
15:50 🔗 odie5533 GLaDOS: Is the tracker.archiveteam.org tracker for only certain projects, or can others request their project be hosted on it?
15:52 🔗 GLaDOS You can request a project to be hosted on it
15:53 🔗 GLaDOS as long as it doesn't take too much RAM up ;)
15:57 🔗 odie5533 Is anyone around that has created a Seesaw pipeline before? chfoo, you there?
15:58 🔗 odie5533 In what dev enviornment is a pipeline created? Can I use the Warrior to create Seesaw pipelines?
15:59 🔗 GLaDOS You can use the warrior.
15:59 🔗 GLaDOS Hell, it's probably best to do so, seeing as that's where it'll be run from
15:59 🔗 odie5533 It has no gui, so I'd have to install one, or edit in vim/emacs
15:59 🔗 GLaDOS Just remember to have the script install any packages you may instalL!
16:00 🔗 GLaDOS Oh, right, people usually use IDEs.
16:00 🔗 odie5533 I've been liking using Eclipse for my Python development lately
16:00 🔗 GLaDOS You could just install the seesaw packages from pypi
16:00 🔗 GLaDOS That's all the code that the warrior runs
16:01 🔗 chfoo i just do it in ubuntu and assume it works in the warrior
16:01 🔗 GLaDOS WELL, NOW I SLEEP
16:01 🔗 odie5533 GLaDOS: thanks for the help. g'night
16:01 🔗 odie5533 chfoo: you ran pip install seesaw and that was it?
16:02 🔗 odie5533 also get-wget-lua.sh which seems to get and build the wget-lua branch
16:03 🔗 chfoo yeah, i don't recall having to do anything special
16:05 🔗 odie5533 chfoo: Are the Redis items that the Pipeline gets typically just urls?
16:06 🔗 chfoo i would try to get the item names as short as possible
16:06 🔗 odie5533 What do you mean?
16:08 🔗 chfoo the way redis has its data structures is that its optimized for speed which means it uses up a lot of memory
16:09 🔗 odie5533 So what is a typical item for the Pipeline? Not a urls?
16:11 🔗 chfoo it depends. for something like a blog, its usually just the username and then the pipeline interpolates that into username.example.com and wget will crawl the entire domain
16:12 🔗 odie5533 What does the blip.tv grabber use?
16:12 🔗 chfoo its using a full url, but it shouldn't be
16:13 🔗 odie5533 should have parsed off at least the http://blip.tv/ part?
16:13 🔗 chfoo yeah, its a bit redundant
16:13 🔗 chfoo but i guess since there isn't millions of items to load, it's not using up much memory
16:15 🔗 odie5533 At what number of urls would using full urls become excessive?
16:15 🔗 chfoo i'm not sure, i'm not an redis expert.
16:17 🔗 chfoo i can calculate the memory usage for the puush tracker though for a rough estimate
16:19 🔗 chfoo it looks like 11 character item names uses 94 bytes per name (568.86MB for 6376877 items)
16:44 🔗 odie5533 chfoo: have you ever run the universal-tracker yourself?
16:50 🔗 chfoo odie5533: yeah, i'm running it for the puush project
16:50 🔗 odie5533 on a vps?
16:51 🔗 chfoo yeah
16:51 🔗 odie5533 For the puush items, are they just the img filename like 4ddxg ?
16:54 🔗 chfoo yeah, just prefix puu.sh/ to get the url. though currently the items are ranges of 13 images
17:56 🔗 odie5533 chfoo: Is the tracker hosted on your personal VPS?
18:59 🔗 odie5533 chfoo: what license, if any, is your blip.tv pipeline released under?
23:22 🔗 GLaDOS odie5533: generally, treat anything from Archive Team as being licenced under the WTFPL

irclogger-viewer