#archiveteam 2013-03-25,Mon

↑back Search

Time Nickname Message
00:01 πŸ”— omf_ Wow. Okay believe what you want. URL shortening services existed years before twitter and they did not get much use
00:04 πŸ”— namespace Well, I would think that they're useful now. To the extent that they are.
00:04 πŸ”— tev Personally, I hate them primarily because I can't tell where the link is actually going to take me
00:04 πŸ”— tev beforehand
00:05 πŸ”— omf_ Then there is the url service to url service to url service bullshit
00:05 πŸ”— omf_ how many companies did you just give you information to access a public link?
00:06 πŸ”— chronomex not enough, clearly
00:42 πŸ”— SketchCow Are we debating saving shorteners?
00:42 πŸ”— dashcloud debating the usefulness of them
00:43 πŸ”— balrog_ they're not gonna die
00:48 πŸ”— omf_ I think shortener serve no purpose but to turn the end user into a product
01:14 πŸ”— gevmage My take: primary reason for users is to shorten for use in twitter.
01:15 πŸ”— gevmage However, it's also occasionally useful for sites that use stupid 200-long character URLs that tend to mangle links in e-mails and other formatting.
01:15 πŸ”— gevmage If the link is shorter than a line, it's much more likely to work properly.
01:15 πŸ”— gevmage I agree, though, about their evils.
01:16 πŸ”— gevmage Using user as demographic information, hiding links behind something that's effectively a hash, etc.
01:19 πŸ”— trs80 they're useful in irc even
01:21 πŸ”— omf_ but they are not really rememberable
01:21 πŸ”— omf_ long urls can be
01:24 πŸ”— omf_ as I point back to this classic http://www.w3.org/Provider/Style/URI.html
01:25 πŸ”— trs80 sure
01:25 πŸ”— trs80 they are a transient thing (and irc is transient)
01:25 πŸ”— trs80 people using them in actual html deserve to burn
01:25 πŸ”— omf_ modern cms' have kinda borked that though
01:28 πŸ”— omf_ Do we have or know of any special software for mirroring forums
01:39 πŸ”— chronomex irc logs are not transient ...
03:21 πŸ”— DFJustin omf_: you could take a look at e.g. https://github.com/ArchiveTeam/cityofheroes-grab which is for vbulletin
03:28 πŸ”— chronomex I think there are some other forum-specific grabbers on the archiveteam github account
06:00 πŸ”— SketchCow ------------------------------
06:01 πŸ”— SketchCow Sorry for the lateness of this - can someone go after this?
06:01 πŸ”— SketchCow http://www.glitch.com/forum/general/30735/
06:01 πŸ”— SketchCow ------------------------------
06:01 πŸ”— SketchCow Someone mentions they have an archive, but it's 94mb, don't trust it
06:19 πŸ”— omf_ thanks DFJustin
06:27 πŸ”— omf_ Is someone else trying to grab glitch? I just started and it is slooow
12:13 πŸ”— omf_ the glitch grab is still going, very slowly
12:18 πŸ”— wwwtxt omf_: what's the size up to now?
12:25 πŸ”— omf_ The warc.gz is 12mb so far, 52mb uncompressed and it is still going.
12:25 πŸ”— Smiley http://www.glitch.com/profiles/PHVBB509QK42GQD/ << this person said they are archiving it.
12:28 πŸ”— wwwtxt gotcha, super slow then, glad there's some time
12:29 πŸ”— omf_ yeah I have no wait time between page fetches and it did not speed up
12:30 πŸ”— omf_ and it is running on a butt instance so the slow down is definitely not on my end
12:30 πŸ”— wwwtxt yeah, I figured as much, crazy
17:16 πŸ”— tef_ hrm is ytmnd shutting down ?
17:20 πŸ”— Zombywuf So there's a rumour ytmnd is shutting down
17:45 πŸ”— tcv I've been running warrior all weekend. Running Virtual Box on a Win7x64 workstation. Twice now the web interface has become partially unresponsive. Right now, for instance, I only see "Current Project", the "Stop" button," the Archive Team Warrior logo, and a blank yellow area. I've tried different browsers. Restarting the VM will get rid of the problem, but I don't really want to do that. How can I recover from this and/
17:45 πŸ”— tcv or fix the root problem?
17:49 πŸ”— tcv Whoops. I should probably switch over to the #Warrior channel.
19:35 πŸ”— omf_ ugh glitch is still slow going
19:56 πŸ”— bowman__ hello world
19:56 πŸ”— bowman__ Lua runtime error: formspring.lua:311: bad argument #1 to 'ipairs' (table expected, got nil).
19:56 πŸ”— bowman__ getting this while running the formspring project:
19:57 πŸ”— ersi Is wget continuing work? If so, nothing serious. You can log onto the console and check /data/data/data/ and the user's wget.log
19:59 πŸ”— bowman__ well - /data/data/data is entirely empty ^^
20:00 πŸ”— ersi mhh, nothing around there either?
20:00 πŸ”— bowman__ ah, /data/data/projects/formspring-../ etc. has stuff in it
20:00 πŸ”— bowman__ looks like things are being downloaded
20:01 πŸ”— ersi goodie :)
20:11 πŸ”— bowman__ I'll double check though - where is the wget.log you mentioned?
20:12 πŸ”— bowman__ ah, inside the data folders :)
20:13 πŸ”— ersi should be in each user/item folder in the data dir
20:13 πŸ”— ersi data dir = project data dir
20:14 πŸ”— bowman__ yeah, logs look nice. I'll let it run and see what happens
20:26 πŸ”— ersi goodie :)
20:45 πŸ”— balrog_ http://news.slashdot.org/story/13/03/25/1853201/yahoo-buys-uk-teens-smartphone-news-app -- wouldn't be posting this here, but ... yahoo
21:37 πŸ”— WiK welp 455221 repos cloned from githug so far
21:50 πŸ”— ivan` heh @ githug
21:51 πŸ”— ivan` are you grabbing repos with the same name, different user?
21:56 πŸ”— WiK no, im downloading all of the repos
21:57 πŸ”— WiK i started with id 1 and now im up to id 1296894
21:57 πŸ”— ersi whoa
21:57 πŸ”— zwol wondering if there's an equivalent to yahoomessages-grab repo for "Work on whatever is most important right now" mode in the canned vm
21:59 πŸ”— ersi zwol: Yes, select "The ArchiveTeam's choice" :)
21:59 πŸ”— ersi zwol: Oh, well - it's in the VM/Warrior
21:59 πŸ”— zwol ersl: you misunderstand
21:59 πŸ”— ersi yea
21:59 πŸ”— zwol ersl: I have a cloud VM and I'm trying to duplicate what that does
22:00 πŸ”— zwol I can't just run a VM inside the VM, I need to pull out the code
22:00 πŸ”— zwol also I'm having weird problems with the VM right now but that's separate
22:00 πŸ”— ersi The warrior-code2 is basically just a wrapper around selecting a project and starting it
22:00 πŸ”— zwol "The warrior-code2"?
22:01 πŸ”— ersi A warrior VM = Pre-made template with http://github.com/ArchiveTeam/warrior-code2/ http://github.com/ArchiveTeam/seesaw-kit and project code + debian base
22:05 πŸ”— zwol ersi: It keeps saying "Reboot for Seesaw update".
22:06 πŸ”— zwol I think this is going to require more brain than I have right now
22:12 πŸ”— ersi I'm just saying, that's what a warrior is and yes, you probably need to work on it more
22:12 πŸ”— ersi there's a ready made AWS AMI btw
22:12 πŸ”— ersi if that's what you're doing
22:17 πŸ”— zwol that is not what i am doing
22:17 πŸ”— zwol where do I get wget-lua?
22:19 πŸ”— zwol also, how do i configure run-warrior from the command line and/or from a config file rather than from the embedded web server (which, for this use case, should be disabled)?
22:34 πŸ”— namespace ytmnd shutting down?
22:34 πŸ”— namespace What?
22:35 πŸ”— namespace I'm not surprised, but source?
22:40 πŸ”— zwol "/usr/bin/wget-lua: Incorrect Wget+Lua version (want GNU Wget 1.14.lua.20130120-8476)." (ҕ¯Â°Ò–‘°)ҕ¯ï¸¡ Ҕ»Ò”Ò”»
22:40 πŸ”— zwol I just compiled the damned thing from source
22:41 πŸ”— zwol also, that it doesn't look in $PATH is unacceptable
22:46 πŸ”— DFJustin source http://bitcoin.ytmnd.com/
22:47 πŸ”— zwol $ git describe HEAD
22:47 πŸ”— zwol fatal: No names found, cannot describe anything.
22:47 πŸ”— zwol RAAAAAAAGE
22:47 πŸ”— zwol HULK SMASH GIT
22:48 πŸ”— zwol ... srsly folks, someone help me out here. all i'm trying to do is get a wget-lua installed that the scripts will like.
22:48 πŸ”— zwol (which I really hope does not mean "the exact version that has been embedded in the per-project repo"
22:52 πŸ”— Cameron_D are you using the install script (it is in each project repo) - https://raw.github.com/ArchiveTeam/formspring-grab/master/get-wget-lua.sh
22:53 πŸ”— Cameron_D that'll leave a wget-lua executable in the current directory
22:55 πŸ”— zwol I was not, in fact
22:56 πŸ”— zwol aha
22:56 πŸ”— zwol one is *not* supposed to compile from git, eh?
22:57 πŸ”— Cameron_D I guess not, I've never had success with doing it either :P
22:58 πŸ”— alard Compiling from git is more complicated. You'll need to do stuff to get the gnulib files etc.
22:59 πŸ”— zwol alard: I followed all the instructions, I got stuck on git-version-gen always printing "UNKNOWN"
23:00 πŸ”— zwol there
23:00 πŸ”— alard Yes, I don't understand how that works. There's a file you can set (tar-version or something)
23:00 πŸ”— zwol that'll probably work until the next time the config changes :-/
23:02 πŸ”— zwol alard: The present setup is not very friendly for "I have a VM whose primary function is $OTHERTHING but I want it to run the warrior in the background under a dedicated user ID"
23:02 πŸ”— zwol I would send patches but I don't have time to do much more than complain
23:06 πŸ”— alard Yes. warrior-code2 isn't meant to be used for anything else than on the warrior VM. The seesaw-kit is a bit more.
23:07 πŸ”— zwol yeah so I can get 95% by downloading seesaw-kit, installing the dependencies manually, and running run-warrior with slightly tweaked arguments
23:07 πŸ”— zwol the other 5% is assumptions about the environment inside the per-project repos
23:08 πŸ”— alard Yes, that's true. You can get part of the way by placing a recent wget-lua build somewhere on your filesystem.
23:08 πŸ”— zwol I think only 3 changes needed: 1) never ever hardwire the name of the current user or the name of the current user's home-directory
23:08 πŸ”— alard We don't?
23:08 πŸ”— zwol `/home/warrior` appears in per-project repos sometimes
23:08 πŸ”— alard The recent ones?
23:09 πŸ”— zwol yeah, it was right there in formspring as of ... half an hour ago?
23:09 πŸ”— alard https://github.com/ArchiveTeam/formspring-grab/blob/master/pipeline.py#L38-L46
23:09 πŸ”— zwol It's not a problem if the correct wget-lua is in the project directory itself
23:09 πŸ”— alard What that part does is: look for a working Wget-Lua in one of those paths.
23:09 πŸ”— namespace Is there a list of at-risk websites?
23:09 πŸ”— zwol yeah that
23:10 πŸ”— alard It should skip /home/warrior/wget-lua and continue to /usr/bin/wget-lua
23:10 πŸ”— ersi namespace: not really, but there's some stale groups at the wiki
23:10 πŸ”— zwol alard: which brings me to part 2: os.environ("PATH").split(":") should replace the hardwired /usr/bin in that list
23:11 πŸ”— zwol 2a: responsibility for finding and updating wget-lua should be moved to seesaw-kit, which should assume it is running as an unprivileged user and do the install to $HOME/.local/bin
23:11 πŸ”— ersi zwol: Any help is appreciated. alard has done most/all of the work, which is pretty awesome - and the focus have been to get something up and running (in the supplied OVA environment)
23:11 πŸ”— InitHello zwol: why not use sys.path, which is already a list?
23:11 πŸ”— namespace ersi: Because I can't stop seeing stuff that looks like it could die any day.
23:11 πŸ”— namespace (Like say, blogger.)
23:11 πŸ”— ersi namespace: Welcome to the club :)
23:11 πŸ”— zwol ersi: I can list problems, but I don't have time to fix 'em myself
23:11 πŸ”— zwol InitHello: sys.path is for python modules
23:11 πŸ”— InitHello oh wait, that's the python path
23:12 πŸ”— InitHello yeah
23:12 πŸ”— alard zwol: It disagree, it shouldn't be moved to seesaw-kit. seesaw-kit doesn't know about wget, let alone which version of wget.
23:12 πŸ”— alard (*I* disagree.)
23:12 πŸ”— ersi zwol: I was just saying, and inviting. Also explaining the environment around what has currently been done.
23:12 πŸ”— zwol 3: in general any binary that needs to be available should be built from source as necessary rather than embedded in project repos, because you don't know what architecture I've got
23:12 πŸ”— alard zwol: But perhaps we should do this in #warrior.
23:13 πŸ”— ersi alard: Thanks for all the great warrior/seesaw/project work by the way. Can't say that enough. :)
23:13 πŸ”— alard And not now, at least not by me. I'm going to bed.
23:13 πŸ”— alard And Github issues on the seesaw-kit repo would be nice. :)
23:13 πŸ”— zwol alard: fair enough, I need to go make dinner
23:13 πŸ”— zwol alard: I can do that for you, sure.
23:13 πŸ”— zwol After dinner :)
23:15 πŸ”— alard Good idea. Bye!
23:15 πŸ”— ersi o/
23:31 πŸ”— omf_ Anyone who worked on warc for wget or warcproxy still around?
23:32 πŸ”— omf_ or know of other tools besides wget, warcproxy, warcindex and heritrix
23:32 πŸ”— omf_ that have warc support
23:38 πŸ”— SketchCow That was alard's deal, as I recall.
23:38 πŸ”— SketchCow He's hit bedf
23:38 πŸ”— SketchCow BEDF
23:42 πŸ”— InitHello that's where you BAMF into BED
23:42 πŸ”— GLaDOS I do it all the time.
23:43 πŸ”— omf_ Nightcrawler BAMF, right?
23:46 πŸ”— omf_ The python warc library was written by two guys in India
23:47 πŸ”— omf_ I didn't realize the IA had people everywhere
23:51 πŸ”— InitHello india is pretty notable in software
23:52 πŸ”— InitHello calibre is written and maintained by an indian guy
23:53 πŸ”— omf_ and what is calibre?
23:54 πŸ”— InitHello ebook management software
23:54 πŸ”— InitHello interfaces with kindles, nooks, etc
23:56 πŸ”— WiK i use calibre all the time, its awesome

irclogger-viewer