[03:06] SketchCow: ill take it ive got 8gb im not using [03:07] mischecked, make that 13gb [03:08] I'm kinda new to this archiving thing, in the sense of sharing what I got, how does this uploading to archive.org work? [03:13] PatC: make an account on archive.org hit the upload button [03:13] thats it? [03:13] pretty much, just make sure its yours, or creative commons or public domain, or youve got authroization [03:13] to upload whatever it it [03:13] yea [03:14] *checks license for the video's that 2600 is uploading to blip* [03:31] they don't want you sharing those for some reason [03:31] oh [03:31] really? odd [03:31] I think that's because you can buy their dvd's or something [03:32] something like if you want a non-streaming copy, they want you to buy it [03:32] yea [03:32] why they'd use a service that readily allows downloads is any guess though [03:32] *cough /rss cough* [05:46] i have the nickel weekly grab [05:47] SketchCow: [06:08] uploading to batcave under bsmith in the file nickel.7z [06:37] trying to find acii art for the golden arches [06:37] McDonalds? [06:37] yeah [06:37] for this listtle static file server i'm writing [06:37] "serves crap. fast" [06:37] get it? :P [06:37] /V\ [06:37] psh [06:37] sorry [06:37] I don't know where there is any [06:37] something slightly fancier than that [06:37] I know I know... [06:38] I'm sorry for messing /w you [06:38] haha [06:38] might just end up doing that though [06:38] or make my own [06:38] oh, well in that case [06:38] /V\ (c) #2pencil 2012 [06:38] ;) [06:38] haha [06:46] kennethre: https://gist.github.com/b1a44a06a18a4c4971bf [06:46] those aren't the golden arches [06:46] those are LIES [06:48] fine, how about https://gist.github.com/b1a44a06a18a4c4971bf/ac953795609f62099ff2cc44f1c8a07072ac3808 [07:23] is batcave down? [07:23] I can't get to it [07:24] my rsync dies [07:24] dies [07:25] died** [07:25] hey any new progress on the ffnet grabber? [07:26] https://github.com/ArchiveTeam/ffnet-grab [07:28] does the wget warc downloader always grab the latest copy of the source? [07:29] ause if it doesnt you might as well just include a copy of the bin with the repo instead of a downloader [07:45] so how do i get everythig i need for this to work? [07:46] cause ive done gem install connection_pool, and its still throing that same cant find it error [07:47] (1) yes, it does [07:47] (2) no, that's a bad idea: think of different architectures [07:47] (3) the code in that repo doesn't really work yet [08:17] piratebay torrent file grab? [08:19] yarr [08:22] actually i'm trying to find a date when they'll stop hosting the torrent files [08:23] coulda sworn they said a date somewhere [08:26] "TorrentFreak was further informed that in “a month or so” the largest torrent site on the Internet will stop serving torrent files indefinitely." -- http://torrentfreak.com/the-pirate-bay-will-stop-serving-torrents-120112/ [08:26] so Feb 14th would be the date roughly [08:39] heh, one neat side effect of running a DNS cache for your apartment's network [08:39] you can see where everyone's going. [08:39] I know this is no great revelation but it does feel a bit wrong on a visceral level [08:42] hm. one thing i do wonder is how long it would take for a single machine to get a mirror of tpb torrents. [08:42] how many are there? [08:42] that is a good question [08:42] i think they have that stat, one moment [08:43] 4.001.575 torrents [08:43] from the homepage https://thepiratebay.org/ [08:44] I guess just go ahead and start [08:44] the task can be parallelized later if necessary [08:44] each torrent has its own URL, right? [08:47] yipdw: yes it does [08:47] yeah [08:47] quick look at the pages might mean they don't display more than 100 pages of a given category, which might make things tricky [08:51] anyone else not able to rsync to batcave? [08:51] rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.7] [08:51] rsync: failed to connect to batcave.textfiles.com: Connection timed out (110) [08:58] SketchCow: textfiles.com going down according to a tweet? What's up with that?! [09:02] bsmith094: yeah, it's just down again, just hold on [09:08] based on this previous tpb_index torrent floating around that was made back in 2009, the thing is going to be at least 20 gigabytes [09:09] rixard: no no no. the tweet is saying that he's NOT joining the internet blackout sopa protest movement [09:10] Coderjoe: ah, phew! Thanks [09:12] http://i.imgur.com/vuiZt.png [09:12] uh what [09:12] ;; ANSWER SECTION: [09:12] api.twitter.com. 30 IN A 199.59.148.20 [09:13] am I reading that right when I say that api.twitter.com has a TTL of 30 seconds? [09:13] because that seems obscenely short for a DNS record [09:13] or is this part of some DNS load-balancing scheme that I have not seen before [09:13] probably part of a dns load balancing [09:13] there are four entries for api.twitter.com [09:14] so yeah, I guess [09:14] I've seen it with CDN stuff before... like the friendster CDN entries [09:14] it's just weird seeing my dnscache logs be flooded with stuff like [09:14] @400000004f1298db0d38d97c tx 0 1 api.twitter.com. twitter.com. d04e4622 d04e4722 cc0dfb22 cc0dfa22 [09:14] @400000004f1298db0dd16ffc rr d04e4622 30 1 api.twitter.com. c73b95c8 [09:14] well, any decent resolver should be able to handle round robin just fine [09:14] it can [09:15] it just really fucks with the cache motion [09:19] yipdw: so is conventional wisdom to whip up a scraper per site? or is there a tool general enough, like i heard there's some IA crawler bot thing, but i never heard that brought up around ff.n [09:20] for deep focused crawls, it's usually most effective to modify an existing custom scraper [09:22] arrith: IA has heritrix, yeah [09:22] I have not tried Heritrix on ff.net [09:23] you may have to (at least) change its user-agent and its robots exclusion obey rules [09:23] because ff.net is very very vocal about the noarchive thing [09:23] chronomex: as in a scraper in the AT github? [09:24] yipdw: well just you came up with that method in that ruby script pretty fast and that seemed pretty clever, i wonder if you can use the same method with an existing scraper [09:24] the reason why I wrote a bunch of code for ff.net is because (1) people just wanted the reviews, stories, and profiles [09:24] (2) it was something to do [09:24] also, ff.net is kind of weird anyway [09:24] b.fanfiction.net always serves gzipped JS and CSS even if the client doesn't state it can handle gzip transfer encoding [09:24] I don't know if Heritrix can deal with that [09:25] arrith: yeah, probably. I mean the method I used was a pretty standard "start at source vertices in a graph and follow all arcs" walk [09:25] wget does that in recursive mode [09:26] arrith: but it's probably good to note that most AT scripts are just wget wrappers [09:26] augmenting wget with things like storage and link-generation logic [09:27] hm. so i suppose the jury is still out. something to look into [09:28] feel free to try Heritrix; I don't think anyone will mind [09:28] one nice thing about using wget for a lot of things is that its dependencies are more common [09:29] which helps out when it comes time to do the download-the-fuck-out-of-some-site-run-by-assholes thing [09:29] to quote chronomex [09:30] ... [09:31] wait, you didn't say that? [09:31] the TR article quotes you [09:38] ah [09:39] yeah having some standard set of tools [09:39] sounds like cygwin to me [09:42] also i really wonder why osx doesn't ship with wget [09:42] GNUphobia on Apple's part, maybe [09:43] FreeBSD doesn't come with wget [09:44] well might be just as well if people use wget-warc [09:46] no2pencil: yeah, FreeBSD is trimming all GPL dependencies from the base distribution, but I don't think that's a reason not to include wget -- it's not a dependency of anything [09:47] What do you mean by doesn't come with? It's not like you can't install it... [09:47] no2pencil: that said, at least wget is in ports [09:47] Exactly. Ports ftw [09:47] arrith: WARC support was rolled into GNU's official Wget; there will be a release at some point [09:48] also I use curl more than I use wget :P [09:48] ah fancy [09:48] ports means xcode which means 2-4GB of random stuff people might use once to get a 2MB wget binary [09:49] on OS X, maybe, but on FreeBSD ports is...really pretty much just there [09:49] just thinking about that makes me want to punch sjobs' skeleton [09:49] oh [09:49] freebsd sure [09:49] I mean, on FreeBSD you already have the developer tools [09:49] more people running linux than freebsd, in terms of people to plan the 'AT tools package' for [09:49] and on OS X...well, if that's your choice then you bear the consequences :P [09:49] by 'doesn't come with it' I mean out of the box [09:49] sure it's in ports [09:50] yipdw: i want those peoples' bandwidth :( [09:50] I'm just corrolating that it isn't installed with OSx out of the box, no surprise to me, because it isn't installed with FreeBSD [09:50] no2pencil: right yes, that sorta makes sense [09:50] though i wonder if freebsd comes with emacs oob, since osx does [09:50] I'm not really offering a solution, just a 'Don't feel bad' scenerio. Not much help I know... I know... [09:51] arrith: depends on what you mean by "out of the box", really [09:52] the FreeBSD install procedure lets you install stuff from ports [09:52] yipdw: stock install? [09:52] oh [09:52] any kind of default? [09:52] well, if you're talking about just the kernel and base distribution packages, you don't get much of anything [09:52] er [09:52] oops! [09:52] NOT PORTS [09:52] packages [09:52] ports are builds from source [09:52] that is an important distinction that I usually trip up on [09:53] freebsd has packages? i always thought it was only ports [09:53] FreeBSD has both [09:53] i am out of the loop in the bsds [09:53] arrith: http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/ports.html#PORTS-SYNOPSIS [09:54] if you're installing stuff like KDE, GNOME, Xorg, etc., it is best to try packages first unless you know what you're doing [09:54] oh [09:54] I've always gone to the ports :P [09:54] I knew there was a difference, but never knew to try one first [09:54] well [09:55] I should have appended IMO to that [09:55] ditto for ruby as i have found out the hard way? [09:55] I'm running FreeBSD on an Acer Aspire Revo [09:55] which does not have the computational power to do compilation very quickly and I am too lazy to maintain distcc [09:55] seriously that is one really noob-unfriendly language [09:55] Ruby? [09:55] i thought distt was a myth [09:56] I dunno, depends on how you learn it [09:56] I thought Ruby was a myth [09:56] j/k [09:56] I'm tired... sorry [09:56] [09:56] yipdw: yes , Ruby, and i mean to help out with things like this running ruby scripts, when everything else was" here run this bash script for a week or two" [09:57] bsmith094: I write my stuff in Ruby because I find it more predictable and portable than bash scripts [09:57] meant* [09:57] that's usually because I move between OS X and Ubuntu machines [09:57] where the userland is frightfully different but the Ruby environment does a good job of papering over it [09:57] YMMV [09:57] and you are a ruby fanatic [09:57] it probably is more portable but the dependency recognition and management sucks eggwater [09:57] Coderjoe: nah, it's just convenient for some things [09:58] bsmith094: distt? [09:58] bsmith094: if you're referring to rubygems, yeah, it has some issues, but locating gem dependencies is not one of them [09:59] well than *you* run his script and see if you don't get file not found errors, then. no Genfile comp[licates things [09:59] Gemfile [10:00] I think you're making a judgment that's based in inexperience with the programming environment vs. actual deficiencies with the environment [10:00] I agree that it would be nice if gems could be more easily packaged up for use in different package management schemes [10:00] and even then, it took me several hours of plowing through the docs to even figure out that i had the ubuntu ruby which sucks compared to the "real ruby" which is managed through rvm, which is not at all intuitive and really should have been mentioned at some point in the comments for the scripts [10:00] no, it shouldn't [10:00] the program should not depend on rvm [10:00] or gems [10:01] which is why I do not add such comments [10:01] yipdw: yes i probably am, but wow, i have never had this much work to do, just to get something to run [10:01] I suggest rvm only as a convenient way to install different Ruby versions under the assumption that you'd read about what it actually does [10:01] " i thought distt was a myth" -- what is distt? [10:02] anyway, about Rubygems: the "unable to find file (whatever)" message is the same sort of thing you'd get if you tried to run, say, a Python program for which you do not have the required library, a Perl program for which you do not have the required module, or even a binary executable in C for which you were missing a dynamically-loaded object [10:02] etc. [10:02] arrith: distcc [10:02] distcc isn't a myth [10:02] but of *course* you're going to get [10:02] er [10:02] http://code.google.com/p/distcc/ [10:02] well i know that *now* [10:03] ok but thw error really should tell you what to run or at least try, in order to fix it [10:03] it should not, because there is no way for the Ruby environment to tell you what to do [10:03] except "give me this file, I don't care how" [10:03] clang does somehow [10:03] gem bundle install, gem install, rubygems install, bundle install rubygems, all very confusing [10:03] dunno about deps complaining [10:04] arrith: what do you mean by "clang does somehow" [10:04] bsmith094: python or perl's not much better heh [10:04] i mean something like ubuntu terminal, blah not found suggestions, sudo apt-get install blah [10:04] bsmith094: you're making it harder than it is [10:04] yipdw: actually i'm not sure that applies here. i know clang does more than gcc, but i dunno about dependency stuff [10:05] sticking to stuff in your repo is one way to stay clean/sane [10:05] lot of people do that with python things [10:06] whole pseudo-package-management of python is sad, especially compared to apt [10:06] I guess one could come up with a Ruby environment that had a modified require mechanism that suggested Rubygems [10:06] but that's not done right now because there are other ways to satisfy dependencies [10:07] so this should grab everything i need gem install connection_pool mechanize girl_friday except it still thros the cant find connection_pool message [10:07] (also because, as far as I can tel, nobody has really thought it to be such a big deal that it needs to be fixed) [10:07] bsmith094: yeah, it'll throw that because Rubygems isn't loaded by default in Ruby 1.8, and you're probably not doing require "rubygems" or -rubygems [10:07] also, connection_pool requires Ruby >= 1.9 [10:08] but that's a separate problem [10:08] i already did rvm use 1.9.3 [10:08] you should confirm that you were using Ruby 1.9.3 at the time of gem installation [10:08] Rubygems is not separate from the Ruby environment [10:08] it is a Ruby program [10:11] gem install rubygems found nothing but the others worked fine [10:11] if you're installing Ruby via RVM, Rubygems is already present [10:12] well now it ran ruby ffgrab, and i got a stacktrace [10:13] paste the stacktrace [10:13] not here, on some pastie site [10:13] http://pastebin.com/XeqvGAbz and i even had a redis instance running and everything [10:14] ok, that's a broken Nokogiri install [10:14] unfortunately I don't know why that happens [10:14] try 1.9.2 [10:14] 1.9.2-p290 that is [10:15] you can also try Rubinius from its master branch in Ruby 1.9 mode, which is what I wrote and tested the code with [10:15] running this rvm install ruby-1.9.2-p290 [10:16] I admit that the ff.net story discovery thing does require a fair number of moving parts [10:17] probably some of them are not necessary [10:17] for example, it probably does not need to be multithreaded (but I know it scales up quite well if it is) [10:17] and it probably does not require Redis (but SADD vs. futzing with set and persistence logic? I'll take SADD) [10:23] ok its running fine now [10:25] to bed with me, its almost 530am est [10:25] gnitgh /gmorning, all [10:25] btw [10:25] running ffgrab is not necessary [10:25] if you want a copy of the dataset of story IDs and such, replicate the Redis instance that Coderjoe has [10:29] I feel tempted to ask for sysop access on the AT wiki [10:31] Hydriz: why? [10:31] its like, filled with vandalism [10:32] and BTW the Internet Archive just broke down [10:32] like a huge backlog of errors: http://www.archive.org/catalog.php?history=1&limit=50 [10:34] i just got some earlier tonight, and earlier a subtitle site went down. i break everything i touch. [10:35] yeah, all bup.php tasks broke [10:35] from the errors, the backup server is down [10:35] I got errors about slowing down my request rate [10:35] maybe I should, I was like spam feeding the Archive for almost 3 days [10:38] appointing trustworthy users as people that could cleanup spam on the wiki sounds good [10:38] yow [10:38] hi there [10:38] that's a lot of red [10:38] still need to get a good usable full backup dump system for the AT wiki site in place.. [10:39] yes, thats my worry [10:39] (for both things) [10:40] full dump system isn't of high priority now actually [10:40] and yes, the red lines are worrying [10:40] its a Sunday now, which makes things worse [11:01] so, what about those vandalism now? [11:03] I see nothing on that IA page :-\ [11:06] just see the amount of red lines [11:06] click on a random log and you see errors regarding the backup server [11:22] I don't have any red lines [11:22] or tasks [11:23] though I do have a "server readonly -- tasks waiting for harddrive fix" [11:25] nonono, see the link above [11:26] and the server readonly is just a legend [12:24] lol urlte.am itself is offline [12:25] any update about that site? [19:01] i just finished replicating the redis db, now what do i run, hoard isnt returning any output whatsoever? [19:05] hi all, what's going on? [20:13] ruby hoard.rb is giving no output at all, not messages, no file creation, no nothing [20:13] yipdw: [20:19] yipdw: seriously, how do i run this now, what do i use, hoard, ffgrab, what, is hoard expecting a file with id numbers, because i have that [20:19] i synced the redis db to Coderjoe's db so thats all set afaik [23:22] repeat from earlier yipdw: seriously, how do i run this now, what do i use, hoard, ffgrab, what, is hoard expecting a file with id numbers, because i have that [23:23] ive noticed there seem to be small bursts of activity from this channel, followed by hours of people entering and leaving without saying anything, how spread out are we, timezone wise? [23:48] wayback machine seems constipated right now :(