#warrior 2013-11-05,Tue

↑back Search

Time	Nickname	Message
00:17 ^🔗	n00b221	Hey guys just a thought but I noticed something today, archive.org now gives you the option to archive any page if you put it into the waybackmachine and it is not already archived and also auto archives various pages if they are linked to from within a archived page but not archived themselves, so would it be possible to write a script to use this to expand the archive.org archives as well, more or less a a
00:17 ^🔗	n00b221	them if the option to archive them pops up
00:27 ^🔗	n00b221	If there had been something like that during the isohunt bit, the entire site could have been archived in a few hours
00:30 ^🔗	odie5533	n00b221: I'd imagine at this point IA doesn't want to do that.
00:30 ^🔗	odie5533	and would perhaps rather we use archivebot or warrior projects
00:41 ^🔗	odie5533	Does anyone know who manages the Warrior project list and rsync targets?
06:57 ^🔗	ersi	odie5533: Anyone else
06:58 ^🔗	ersi	We are a bunch with access to the tracker/project list. And SketchCow usually provides rsync targets
15:39 ^🔗	odie5533	So how does one go about adding a Warrior project to the tracker?
15:44 ^🔗	GLaDOS	All that needs to be done is editing projects.json (located at http://warriorhq.archiveteam.org/projects.json for AT)
15:47 ^🔗	odie5533	GLaDOS: Doesn't it need to be added to e.g. tracker.archiveteam.org/project/>
15:47 ^🔗	GLaDOS	Warrior projects don't need a tracker at that location (e.g URLTeam, and some other projects which had others hosting trackers)
15:50 ^🔗	odie5533	GLaDOS: Is the tracker.archiveteam.org tracker for only certain projects, or can others request their project be hosted on it?
15:52 ^🔗	GLaDOS	You can request a project to be hosted on it
15:53 ^🔗	GLaDOS	as long as it doesn't take too much RAM up ;)
15:57 ^🔗	odie5533	Is anyone around that has created a Seesaw pipeline before? chfoo, you there?
15:58 ^🔗	odie5533	In what dev enviornment is a pipeline created? Can I use the Warrior to create Seesaw pipelines?
15:59 ^🔗	GLaDOS	You can use the warrior.
15:59 ^🔗	GLaDOS	Hell, it's probably best to do so, seeing as that's where it'll be run from
15:59 ^🔗	odie5533	It has no gui, so I'd have to install one, or edit in vim/emacs
15:59 ^🔗	GLaDOS	Just remember to have the script install any packages you may instalL!
16:00 ^🔗	GLaDOS	Oh, right, people usually use IDEs.
16:00 ^🔗	odie5533	I've been liking using Eclipse for my Python development lately
16:00 ^🔗	GLaDOS	You could just install the seesaw packages from pypi
16:00 ^🔗	GLaDOS	That's all the code that the warrior runs
16:01 ^🔗	chfoo	i just do it in ubuntu and assume it works in the warrior
16:01 ^🔗	GLaDOS	WELL, NOW I SLEEP
16:01 ^🔗	odie5533	GLaDOS: thanks for the help. g'night
16:01 ^🔗	odie5533	chfoo: you ran pip install seesaw and that was it?
16:02 ^🔗	odie5533	also get-wget-lua.sh which seems to get and build the wget-lua branch
16:03 ^🔗	chfoo	yeah, i don't recall having to do anything special
16:05 ^🔗	odie5533	chfoo: Are the Redis items that the Pipeline gets typically just urls?
16:06 ^🔗	chfoo	i would try to get the item names as short as possible
16:06 ^🔗	odie5533	What do you mean?
16:08 ^🔗	chfoo	the way redis has its data structures is that its optimized for speed which means it uses up a lot of memory
16:09 ^🔗	odie5533	So what is a typical item for the Pipeline? Not a urls?
16:11 ^🔗	chfoo	it depends. for something like a blog, its usually just the username and then the pipeline interpolates that into username.example.com and wget will crawl the entire domain
16:12 ^🔗	odie5533	What does the blip.tv grabber use?
16:12 ^🔗	chfoo	its using a full url, but it shouldn't be
16:13 ^🔗	odie5533	should have parsed off at least the http://blip.tv/ part?
16:13 ^🔗	chfoo	yeah, its a bit redundant
16:13 ^🔗	chfoo	but i guess since there isn't millions of items to load, it's not using up much memory
16:15 ^🔗	odie5533	At what number of urls would using full urls become excessive?
16:15 ^🔗	chfoo	i'm not sure, i'm not an redis expert.
16:17 ^🔗	chfoo	i can calculate the memory usage for the puush tracker though for a rough estimate
16:19 ^🔗	chfoo	it looks like 11 character item names uses 94 bytes per name (568.86MB for 6376877 items)
16:44 ^🔗	odie5533	chfoo: have you ever run the universal-tracker yourself?
16:50 ^🔗	chfoo	odie5533: yeah, i'm running it for the puush project
16:50 ^🔗	odie5533	on a vps?
16:51 ^🔗	chfoo	yeah
16:51 ^🔗	odie5533	For the puush items, are they just the img filename like 4ddxg ?
16:54 ^🔗	chfoo	yeah, just prefix puu.sh/ to get the url. though currently the items are ranges of 13 images
17:56 ^🔗	odie5533	chfoo: Is the tracker hosted on your personal VPS?
18:59 ^🔗	odie5533	chfoo: what license, if any, is your blip.tv pipeline released under?
23:22 ^🔗	GLaDOS	odie5533: generally, treat anything from Archive Team as being licenced under the WTFPL

irclogger-viewer