[00:01] Wow. Okay believe what you want. URL shortening services existed years before twitter and they did not get much use [00:04] Well, I would think that they're useful now. To the extent that they are. [00:04] Personally, I hate them primarily because I can't tell where the link is actually going to take me [00:04] beforehand [00:05] Then there is the url service to url service to url service bullshit [00:05] how many companies did you just give you information to access a public link? [00:06] not enough, clearly [00:42] Are we debating saving shorteners? [00:42] debating the usefulness of them [00:43] they're not gonna die [00:48] I think shortener serve no purpose but to turn the end user into a product [01:14] My take: primary reason for users is to shorten for use in twitter. [01:15] However, it's also occasionally useful for sites that use stupid 200-long character URLs that tend to mangle links in e-mails and other formatting. [01:15] If the link is shorter than a line, it's much more likely to work properly. [01:15] I agree, though, about their evils. [01:16] Using user as demographic information, hiding links behind something that's effectively a hash, etc. [01:19] they're useful in irc even [01:21] but they are not really rememberable [01:21] long urls can be [01:24] as I point back to this classic http://www.w3.org/Provider/Style/URI.html [01:25] sure [01:25] they are a transient thing (and irc is transient) [01:25] people using them in actual html deserve to burn [01:25] modern cms' have kinda borked that though [01:28] Do we have or know of any special software for mirroring forums [01:39] irc logs are not transient ... [03:21] omf_: you could take a look at e.g. https://github.com/ArchiveTeam/cityofheroes-grab which is for vbulletin [03:28] I think there are some other forum-specific grabbers on the archiveteam github account [06:00] ------------------------------ [06:01] Sorry for the lateness of this - can someone go after this? [06:01] http://www.glitch.com/forum/general/30735/ [06:01] ------------------------------ [06:01] Someone mentions they have an archive, but it's 94mb, don't trust it [06:19] thanks DFJustin [06:27] Is someone else trying to grab glitch? I just started and it is slooow [12:13] the glitch grab is still going, very slowly [12:18] omf_: what's the size up to now? [12:25] The warc.gz is 12mb so far, 52mb uncompressed and it is still going. [12:25] http://www.glitch.com/profiles/PHVBB509QK42GQD/ << this person said they are archiving it. [12:28] gotcha, super slow then, glad there's some time [12:29] yeah I have no wait time between page fetches and it did not speed up [12:30] and it is running on a butt instance so the slow down is definitely not on my end [12:30] yeah, I figured as much, crazy [17:16] hrm is ytmnd shutting down ? [17:20] So there's a rumour ytmnd is shutting down [17:45] I've been running warrior all weekend. Running Virtual Box on a Win7x64 workstation. Twice now the web interface has become partially unresponsive. Right now, for instance, I only see "Current Project", the "Stop" button," the Archive Team Warrior logo, and a blank yellow area. I've tried different browsers. Restarting the VM will get rid of the problem, but I don't really want to do that. How can I recover from this and/ [17:45] or fix the root problem? [17:49] Whoops. I should probably switch over to the #Warrior channel. [19:35] ugh glitch is still slow going [19:56] hello world [19:56] Lua runtime error: formspring.lua:311: bad argument #1 to 'ipairs' (table expected, got nil). [19:56] getting this while running the formspring project: [19:57] Is wget continuing work? If so, nothing serious. You can log onto the console and check /data/data/data/ and the user's wget.log [19:59] well - /data/data/data is entirely empty ^^ [20:00] mhh, nothing around there either? [20:00] ah, /data/data/projects/formspring-../ etc. has stuff in it [20:00] looks like things are being downloaded [20:01] goodie :) [20:11] I'll double check though - where is the wget.log you mentioned? [20:12] ah, inside the data folders :) [20:13] should be in each user/item folder in the data dir [20:13] data dir = project data dir [20:14] yeah, logs look nice. I'll let it run and see what happens [20:26] goodie :) [20:45] http://news.slashdot.org/story/13/03/25/1853201/yahoo-buys-uk-teens-smartphone-news-app -- wouldn't be posting this here, but ... yahoo [21:37] welp 455221 repos cloned from githug so far [21:50] heh @ githug [21:51] are you grabbing repos with the same name, different user? [21:56] no, im downloading all of the repos [21:57] i started with id 1 and now im up to id 1296894 [21:57] whoa [21:57] wondering if there's an equivalent to yahoomessages-grab repo for "Work on whatever is most important right now" mode in the canned vm [21:59] zwol: Yes, select "The ArchiveTeam's choice" :) [21:59] zwol: Oh, well - it's in the VM/Warrior [21:59] ersl: you misunderstand [21:59] yea [21:59] ersl: I have a cloud VM and I'm trying to duplicate what that does [22:00] I can't just run a VM inside the VM, I need to pull out the code [22:00] also I'm having weird problems with the VM right now but that's separate [22:00] The warrior-code2 is basically just a wrapper around selecting a project and starting it [22:00] "The warrior-code2"? [22:01] A warrior VM = Pre-made template with http://github.com/ArchiveTeam/warrior-code2/ http://github.com/ArchiveTeam/seesaw-kit and project code + debian base [22:05] ersi: It keeps saying "Reboot for Seesaw update". [22:06] I think this is going to require more brain than I have right now [22:12] I'm just saying, that's what a warrior is and yes, you probably need to work on it more [22:12] there's a ready made AWS AMI btw [22:12] if that's what you're doing [22:17] that is not what i am doing [22:17] where do I get wget-lua? [22:19] also, how do i configure run-warrior from the command line and/or from a config file rather than from the embedded web server (which, for this use case, should be disabled)? [22:34] ytmnd shutting down? [22:34] What? [22:35] I'm not surprised, but source? [22:40] "/usr/bin/wget-lua: Incorrect Wget+Lua version (want GNU Wget 1.14.lua.20130120-8476)." (╯°□°)╯︵ ┻━┻ [22:40] I just compiled the damned thing from source [22:41] also, that it doesn't look in $PATH is unacceptable [22:46] source http://bitcoin.ytmnd.com/ [22:47] $ git describe HEAD [22:47] fatal: No names found, cannot describe anything. [22:47] RAAAAAAAGE [22:47] HULK SMASH GIT [22:48] ... srsly folks, someone help me out here. all i'm trying to do is get a wget-lua installed that the scripts will like. [22:48] (which I really hope does not mean "the exact version that has been embedded in the per-project repo" [22:52] are you using the install script (it is in each project repo) - https://raw.github.com/ArchiveTeam/formspring-grab/master/get-wget-lua.sh [22:53] that'll leave a wget-lua executable in the current directory [22:55] I was not, in fact [22:56] aha [22:56] one is *not* supposed to compile from git, eh? [22:57] I guess not, I've never had success with doing it either :P [22:58] Compiling from git is more complicated. You'll need to do stuff to get the gnulib files etc. [22:59] alard: I followed all the instructions, I got stuck on git-version-gen always printing "UNKNOWN" [23:00] there [23:00] Yes, I don't understand how that works. There's a file you can set (tar-version or something) [23:00] that'll probably work until the next time the config changes :-/ [23:02] alard: The present setup is not very friendly for "I have a VM whose primary function is $OTHERTHING but I want it to run the warrior in the background under a dedicated user ID" [23:02] I would send patches but I don't have time to do much more than complain [23:06] Yes. warrior-code2 isn't meant to be used for anything else than on the warrior VM. The seesaw-kit is a bit more. [23:07] yeah so I can get 95% by downloading seesaw-kit, installing the dependencies manually, and running run-warrior with slightly tweaked arguments [23:07] the other 5% is assumptions about the environment inside the per-project repos [23:08] Yes, that's true. You can get part of the way by placing a recent wget-lua build somewhere on your filesystem. [23:08] I think only 3 changes needed: 1) never ever hardwire the name of the current user or the name of the current user's home-directory [23:08] We don't? [23:08] `/home/warrior` appears in per-project repos sometimes [23:08] The recent ones? [23:09] yeah, it was right there in formspring as of ... half an hour ago? [23:09] https://github.com/ArchiveTeam/formspring-grab/blob/master/pipeline.py#L38-L46 [23:09] It's not a problem if the correct wget-lua is in the project directory itself [23:09] What that part does is: look for a working Wget-Lua in one of those paths. [23:09] Is there a list of at-risk websites? [23:09] yeah that [23:10] It should skip /home/warrior/wget-lua and continue to /usr/bin/wget-lua [23:10] namespace: not really, but there's some stale groups at the wiki [23:10] alard: which brings me to part 2: os.environ("PATH").split(":") should replace the hardwired /usr/bin in that list [23:11] 2a: responsibility for finding and updating wget-lua should be moved to seesaw-kit, which should assume it is running as an unprivileged user and do the install to $HOME/.local/bin [23:11] zwol: Any help is appreciated. alard has done most/all of the work, which is pretty awesome - and the focus have been to get something up and running (in the supplied OVA environment) [23:11] zwol: why not use sys.path, which is already a list? [23:11] ersi: Because I can't stop seeing stuff that looks like it could die any day. [23:11] (Like say, blogger.) [23:11] namespace: Welcome to the club :) [23:11] ersi: I can list problems, but I don't have time to fix 'em myself [23:11] InitHello: sys.path is for python modules [23:11] oh wait, that's the python path [23:12] yeah [23:12] zwol: It disagree, it shouldn't be moved to seesaw-kit. seesaw-kit doesn't know about wget, let alone which version of wget. [23:12] (*I* disagree.) [23:12] zwol: I was just saying, and inviting. Also explaining the environment around what has currently been done. [23:12] 3: in general any binary that needs to be available should be built from source as necessary rather than embedded in project repos, because you don't know what architecture I've got [23:12] zwol: But perhaps we should do this in #warrior. [23:13] alard: Thanks for all the great warrior/seesaw/project work by the way. Can't say that enough. :) [23:13] And not now, at least not by me. I'm going to bed. [23:13] And Github issues on the seesaw-kit repo would be nice. :) [23:13] alard: fair enough, I need to go make dinner [23:13] alard: I can do that for you, sure. [23:13] After dinner :) [23:15] Good idea. Bye! [23:15] o/ [23:31] Anyone who worked on warc for wget or warcproxy still around? [23:32] or know of other tools besides wget, warcproxy, warcindex and heritrix [23:32] that have warc support [23:38] That was alard's deal, as I recall. [23:38] He's hit bedf [23:38] BEDF [23:42] that's where you BAMF into BED [23:42] I do it all the time. [23:43] Nightcrawler BAMF, right? [23:46] The python warc library was written by two guys in India [23:47] I didn't realize the IA had people everywhere [23:51] india is pretty notable in software [23:52] calibre is written and maintained by an indian guy [23:53] and what is calibre? [23:54] ebook management software [23:54] interfaces with kindles, nooks, etc [23:56] i use calibre all the time, its awesome