[00:40] Hi gang [00:41] I am not well, I'll be on here and there. [00:43] alright [00:44] Short-term not-well, hopefully? Regardless, hopefully things get better soon. [00:48] hope things get better soon [00:49] ditto [00:57] hope things get better for you [01:24] I am the wizard. [01:29] yer a lizzard harry [01:31] I am the robot zydel [01:35] better than whiny brat Cindel [02:18] SketchCow: Aww, why'd you ban zetathrustra? [02:53] stupid irc [03:22] bsmith093: that's not 80% full, that's 1% [03:22] in any case, I made further modifications to the crawler to not hit the same page twice [03:23] sorry must have missed a decimal place [03:23] been running it for about two hours, it's pulled back 173,224 story IDs [03:23] I inserted random 5-second sleeps [03:23] in order to not be an asshole [03:27] * yipdw also finally figured out his dash vault, woo [03:35] underscor: because it talked. we've been over this already. [03:38] I don't get it [03:45] chronomex: I was more insightful than some people we allow to stay in here [03:45] Plus, it was only once per day [03:45] :( [03:45] hahaha I suppose that's true [04:08] s/I/It/ [04:08] my statement stands [04:09] dnova: zetathrustra is underscor's bot. It made a smart-alec comment, and was then banned. [04:09] but we thrive on smart-alec comments [04:09] Yes, but not generally pre-programmed ones. [04:10] we do not allow bots which speak in the channel [04:10] ah, welp. [04:10] it is that simple [04:11] I should stick a MegaHAL-containing bot in here, and see what the resulting dict is somewhere else. [04:11] Alternatively, teach it C++ and python, with bits of ruby [04:14] you don't need to bring a bot in here, the logs are public [04:15] Pff, wheres the fun in that? [04:15] Alternatively, I've somehow managed to use 5TB of Traffic in 8 days, over my 1TB limit. [04:15] I now have a 2000 dollar bill [04:20] They're not preprogrammed [04:20] They're word-delineated markov chains. [04:21] Based on the occurrences in the channel [04:21] GLaDOS: Uh oh [04:22] wtf, there's Cirque du Soleil fanfiction [04:22] how is that even possible [04:22] there are no Cirque characters that even have names [04:22] However, it spikes, so I'm going to say it's a DDoS and contact the host. [04:41] Back, took some rest. [04:41] Yeah, so fuck bots. [04:51] anyone else scraping poevews, or is it just me? [04:51] poe-news.com [05:30] youre all over the world, you can't all be asleep, and or busy [05:30] * GLaDOS is away: zzz [05:51] scumbag LoC: maintains MARC 21 on just about everything. charges thousands of dollars for access. [05:51] SketchCow: im getting 503 filename prohibited errors with my upload [05:54] bsmith093: what filename are you using? [05:54] Thief_and_the_Cobbler_Recobbled_Cut minus the underscroes, for te filename [05:55] ThiefandtheCobblerRecobbledCut , you mean? [05:55] what is your upload command exactly? [05:55] er, no the file has spaces, the identifier for the archive has underscores, could that be the problem? [05:56] gftp [05:56] what is the exact name of the file [05:56] and where are you putting it [05:56] --exact-- name [05:56] "Thief and the Cobbler Recobbled Cut.iso" [05:57] and what is the item name? [05:57] I don't think IA likes spaces. [05:57] the item name i gave IA is that but with underscores instead of spaces [05:57] Thief_and_the_Cobbler_Recobbled_Cut is the folder ia gave me [05:58] pretty sure IA does not like spaces in filenames [05:58] juj ok easy to fix [05:58] yeah their uploader replaces them with _ [05:58] also, did I miss something, or when did you get authorization to upload that [05:59] as far as I can tell that was released in 1995 [05:59] actually i meant to ask about that, does anyone actually know if its pd or what [05:59] uh [05:59] it started production in 1964 [05:59] released in 1995 [05:59] no matter which way you slice it, no [06:00] k then no upload for m, then. [06:00] I suggest hunting down the rights to that first [06:00] unless IA policy says otherwise [06:00] (I'm not sure) [06:00] its a boondoggle, like you wouldnt believe, this is a fanmade version that as close as possible to the original plan [06:01] is this too iffy? [06:02] not my area of expertise; it just sounded weird [06:02] if IA says do it, maybe it's best to let them sort it out [06:02] i would say this is most definitely transformative, but im not really prepared to back that up, legally [06:02] who would i talk to [06:02] yipdw: we do not worry about ia policy. ia policy is ia does not worry until someone complains. [06:02] ok [06:02] upload is fine then, I guess [06:03] k then thats what i thought, if they worry they can always ,ake it dark,. and just keep if backed up but offline, which is what they do anyway, right? [06:05] yipdw: iirc, ia will also hold it, just dark, until the copyright expires (if that ever happens) [06:05] (after a complaint at least) [06:06] sounds fair [06:07] hey another thing, how do i go about adding things already in the archive, to a collection, so theyy are more ealsiy findable in one page rather than as each individual search result? [06:07] i wonder if the lighttpd/nginx config has something that prevents access to dark items, even if you manage to somehow know the location of the files [06:07] im talking about felix the cat, is case your wondering [06:07] bsmith093: that requires IA staffer intervention, afaik [06:08] (as far as I know) [06:08] ah ok then so should i just redownload and upload to a felix the cat folder? [06:08] no. they would create a collection and then modify the items to add them to that collection [06:09] so just put a message on the forum or something? [06:11] i don't really know how you get their attention [06:11] info@archive.org supposedly [06:12] they should really have an irc channel [06:13] oh I love you google [06:13] "here's a breakdown of activity on group X". click on an item: "Cannot find x. There is no group named x." [06:13] Coderjoe: for what exaclty [06:14] gotta love tat [06:14] that [06:14] ive run into that whenever my searches get **really** specific [06:15] this is for a mailing list I was on at some point in the past [06:15] which apparently no longer exists [06:46] does anyone know of any ancient UNIX stuff which isn't in TUHS? [06:47] how ancient and what is TUHS? [06:49] I now own textfiles.xxx [06:49] Go team [06:49] haha [06:49] good for u, anothermirror or to keep out the cybersquatters? [06:49] what registrar did you use? [06:49] did you buy textfiles.co when columbia's landrush happened? [06:49] No, that's bullshit [06:50] Coderjoe: ancient, unix v6 [06:50] I want an .xxx or two but the price is a bit much [06:50] TUHS is The Unix Historical Society [06:50] This is more of a move by me to protect against being ICE seized. [06:50] why would ICE care? [06:51] holy shit. $80.18 per year from my source. [06:51] bsmith093: ICE usually goes after the copyright violation domain names, iirc [06:52] yeah, and...? textfiles.com cant possibly be violating anything with domain names right? [06:52] oh wait, yeah that makes much more sense [06:53] the stuff on the site, wow im tired. :P [06:57] gnight/gmorning ,all [06:59] g'night, bsmith093 [07:41] Once we decided that the US could "seize" domain names, and believe me, it's utterly untested in 1000 ways, it's just a matter of time. [07:43] Boy, I am adding a ton of french magazines. [07:43] zut alors! [07:44] I need to do a weblog post about all the stuff I've added, followed by a request to give money to archive.org [07:45] Also, my scripts have become a ton more flexible since I started this, with more error correction and clarity. [07:48] It's just this thing has a ton of magazines with, like, 8 issues. [07:48] So it takes me 2 minutes to set up, or 5 depending. [07:48] then just 8 issues. [07:48] But when it's 120... then we're cooking with gas. [07:50] The fun one is http://www.archive.org/details/computermagazines-french-porte-revues [09:20] http://www.youtube.com/watch?v=zWu0W1kGvsQ&t=5m15s [09:20] I threw up in my mouth a bit [09:25] oh well I'll be sure to watch that then [09:25] Yeah, right on it [09:25] it's an example transfer from 9.5mm done by some other company than the one that made the video. [09:26] (film to dvd transfer) [09:26] Know what's great? http://procatinator.com/ [09:26] does that use some kind of metadata to match up cats to songs? [09:27] I would think all video and audio uses metadata [09:27] I find it rather weird that it possibly randomly matched up "Walking on Sunshine" to a cat on a treadmill [09:28] ok. this is not random matchups [09:35] yeah they're not random [09:39] I'm going to get in trouble for the new weblog posting. [09:39] But fuck it [09:43] this is funny [09:43] the bigger ipod versions of crankygeeks are worser i think [09:44] i'm backing up diggnation next [09:44] or at least the first 100 episodes [09:45] looks like i was right on that something was changed with crankygeeks ipod format before it ended [09:46] jan 13 2010 show doesn't have pixal blocks [09:46] but april 22 2010 show does [09:49] Why do you think you'll get in trouble for that, SketchCow? [09:49] I fucking love your slamposts. I just really, really hope I am never the subject of one [09:50] "here are 5,000 cited reasons why dnova SUCKS" [09:51] that's not a slam, that's an aggressive motivational speech [09:51] haha [09:54] It's a call to arms presented while standing on the corpse of a fat guy [10:05] I'm now writing an entry with what I've been putting in the Archive these past few months. [11:16] is this ever going to stop? [11:16] 15787 ./tmpfs/it/perijulka [11:16] that's megabytes [11:19] Hey, I've uploaded all my Splinder files to the Batcave. [11:20] bodacious [11:20] did you say so on the wiki? [11:20] What? [11:21] I'll take care of it [11:22] http://archiveteam.org/index.php?title=Splinder#Upload_status [11:23] :) [16:53] New front page looks great, dnova [16:54] thanks! still have some ideas [17:13] http://www.jorisvanhoboken.nl/?p=308 [19:53] anyone want to watch JAWS on CED? cuzzz I just got it. and 59 others. and 2 players. and that much less space in my house. [21:03] good news my poenews scrape is done [21:04] SketchCow: where do you want he Poe-news.com to go? [21:09] Coderjoe: Items are made dark by permissions of the actual files [21:09] You can always see where an item's files are by going to [21:09] archive.org/download/IDENTIFIER [21:33] SketchCow: ive got some poenews scraped, if you want it [21:50] http://inkdroid.org:3000/ [21:50] Realtime wikipedia edits [21:57] underscor: wow. [22:05] http://qaa.ath.cx/TheEmperorsNewClothes.html [22:10] awesome, now I can see Edit Wars IV: A New Hope [22:11] underscor: I know the guy that made that. He's a library tech guy [22:22] cool! [22:23] yipdw: anything on the ffnet script? [22:23] bsmith094: it needs to be made more robust to deal with network failures [22:23] also i have probably some and or all, of poenews i anyone wants it [22:23] I don't know what happened to fanfiction.net last night [22:23] but they were returning 503s for a while [22:24] the discovery mechanism does not gracefully cope with those [22:24] not pure bash. it relies on awk. though I haven't really observed any version problems with awk, but that's probably because we haven't really used awk here at AT [22:25] uh, what [22:25] that link from underscor, which was a bash json interpreter [22:25] oh [22:26] I would have been really impressed if it was pure bash [22:26] I have a terrible way to do it [22:26] write a bash backend for Ragel, write the JSON parser in Ragel [22:26] farm it off to something else that has a json parser? [22:27] yes, python2.6 -mjson is one way to do it [22:28] yipdw: I don't know about ff.net, but I have observed other sites that go down at regular intervals to do backups and stuff. (which to me screams "you're doing it wrong") [22:28] or whatever version you've got [22:28] Coderjoe: well, either way, the network is never reliable etc [22:28] except those other sites just stop accepting on 80 [22:28] true [22:28] it really felt like someone was just hammering ff [22:28] wasn't me, I was just running two connections at a time [22:29] perhaps some asshole that also wants a copy of everything? [22:29] wasnt me either, i was scraping poevews [22:29] maybe [22:30] what kind of webservr cant take the load of two mirroring efforts at once, anyway? [22:31] it's not that uncommon [22:32] the problem is rarely the web server, though [22:33] the application to which the server proxies is usually your bottleneck [22:33] are there 8any* sites left that are just hmtl and links [22:34] those are dead easy to save, this place has to write custom code for every job [22:35] 4chan's only database and non-static content, last I knew, was the actual posting script. the thread pages and index pages were re-written as static html when a new post came in that affected them [22:35] well, we're mostly interested in user-generated content [22:35] (as an example) [22:35] and if you let users upload arbitrary HTML, then things get zany [22:36] consider, too, that the characteristics of a mirroring operation are not the same as what a human would do [22:36] for one, mirroring will fuck your cache [22:36] like there was no tomorrow [22:37] because mirroring is going to request everything, including rarely-hit pages; and if it takes a lot of resources to generate HTML then that can bring an application down [22:37] or it'll generate a lot of content to be dumped into cache and depending on the cache expiration policy that may shove hot data out [22:38] (I mean, it shouldn't, but...) [22:49] even with a basic LRU policy, that will depend on the ratio of normal users to mirror users [22:50] yeah, there's a lot of factors [23:51] seriously does anybody want the poe-news wget-warc dump? [23:52] SketchCow is probably just afk, chillax for a while