#archiveteam 2011-12-08,Thu

↑back Search

Time Nickname Message
00:40 🔗 SketchCow Hi gang
00:41 🔗 SketchCow I am not well, I'll be on here and there.
00:43 🔗 bbot_ alright
00:44 🔗 Paradoks Short-term not-well, hopefully? Regardless, hopefully things get better soon.
00:48 🔗 dashcloud hope things get better soon
00:49 🔗 bsmith093 ditto
00:57 🔗 dashcloud hope things get better for you
01:24 🔗 GLaDOS I am the wizard.
01:29 🔗 Coderjoe yer a lizzard harry
01:31 🔗 rude___ I am the robot zydel
01:35 🔗 Coderjoe better than whiny brat Cindel
02:18 🔗 underscor SketchCow: Aww, why'd you ban zetathrustra?
02:53 🔗 db48x stupid irc
03:22 🔗 yipdw bsmith093: that's not 80% full, that's 1%
03:22 🔗 yipdw in any case, I made further modifications to the crawler to not hit the same page twice
03:23 🔗 bsmith093 sorry must have missed a decimal place
03:23 🔗 yipdw been running it for about two hours, it's pulled back 173,224 story IDs
03:23 🔗 yipdw I inserted random 5-second sleeps
03:23 🔗 yipdw in order to not be an asshole
03:27 🔗 * yipdw also finally figured out his dash vault, woo
03:35 🔗 chronomex underscor: because it talked. we've been over this already.
03:38 🔗 dnova I don't get it
03:45 🔗 underscor chronomex: I was more insightful than some people we allow to stay in here
03:45 🔗 underscor Plus, it was only once per day
03:45 🔗 underscor :(
03:45 🔗 chronomex hahaha I suppose that's true
04:08 🔗 underscor s/I/It/
04:08 🔗 chronomex my statement stands
04:09 🔗 Paradoks dnova: zetathrustra is underscor's bot. It made a smart-alec comment, and was then banned.
04:09 🔗 dnova but we thrive on smart-alec comments
04:09 🔗 Paradoks Yes, but not generally pre-programmed ones.
04:10 🔗 chronomex we do not allow bots which speak in the channel
04:10 🔗 dnova ah, welp.
04:10 🔗 chronomex it is that simple
04:11 🔗 GLaDOS I should stick a MegaHAL-containing bot in here, and see what the resulting dict is somewhere else.
04:11 🔗 GLaDOS Alternatively, teach it C++ and python, with bits of ruby
04:14 🔗 chronomex you don't need to bring a bot in here, the logs are public
04:15 🔗 GLaDOS Pff, wheres the fun in that?
04:15 🔗 GLaDOS Alternatively, I've somehow managed to use 5TB of Traffic in 8 days, over my 1TB limit.
04:15 🔗 GLaDOS I now have a 2000 dollar bill
04:20 🔗 underscor They're not preprogrammed
04:20 🔗 underscor They're word-delineated markov chains.
04:21 🔗 underscor Based on the occurrences in the channel
04:21 🔗 underscor GLaDOS: Uh oh
04:22 🔗 yipdw wtf, there's Cirque du Soleil fanfiction
04:22 🔗 yipdw how is that even possible
04:22 🔗 yipdw there are no Cirque characters that even have names
04:22 🔗 GLaDOS However, it spikes, so I'm going to say it's a DDoS and contact the host.
04:41 🔗 SketchCow Back, took some rest.
04:41 🔗 SketchCow Yeah, so fuck bots.
04:51 🔗 bsmith093 anyone else scraping poevews, or is it just me?
04:51 🔗 bsmith093 poe-news.com
05:30 🔗 bsmith093 youre all over the world, you can't all be asleep, and or busy
05:30 🔗 * GLaDOS is away: zzz
05:51 🔗 Coderjoe scumbag LoC: maintains MARC 21 on just about everything. charges thousands of dollars for access.
05:51 🔗 bsmith093 SketchCow: im getting 503 filename prohibited errors with my upload
05:54 🔗 chronomex bsmith093: what filename are you using?
05:54 🔗 bsmith093 Thief_and_the_Cobbler_Recobbled_Cut minus the underscroes, for te filename
05:55 🔗 chronomex ThiefandtheCobblerRecobbledCut , you mean?
05:55 🔗 chronomex what is your upload command exactly?
05:55 🔗 bsmith093 er, no the file has spaces, the identifier for the archive has underscores, could that be the problem?
05:56 🔗 bsmith093 gftp
05:56 🔗 chronomex what is the exact name of the file
05:56 🔗 chronomex and where are you putting it
05:56 🔗 chronomex --exact-- name
05:56 🔗 bsmith093 "Thief and the Cobbler Recobbled Cut.iso"
05:57 🔗 chronomex and what is the item name?
05:57 🔗 chronomex I don't think IA likes spaces.
05:57 🔗 bsmith093 the item name i gave IA is that but with underscores instead of spaces
05:57 🔗 bsmith093 Thief_and_the_Cobbler_Recobbled_Cut is the folder ia gave me
05:58 🔗 Coderjoe pretty sure IA does not like spaces in filenames
05:58 🔗 bsmith093 juj ok easy to fix
05:58 🔗 DFJustin yeah their uploader replaces them with _
05:58 🔗 yipdw also, did I miss something, or when did you get authorization to upload that
05:59 🔗 yipdw as far as I can tell that was released in 1995
05:59 🔗 bsmith093 actually i meant to ask about that, does anyone actually know if its pd or what
05:59 🔗 yipdw uh
05:59 🔗 yipdw it started production in 1964
05:59 🔗 yipdw released in 1995
05:59 🔗 yipdw no matter which way you slice it, no
06:00 🔗 bsmith093 k then no upload for m, then.
06:00 🔗 yipdw I suggest hunting down the rights to that first
06:00 🔗 yipdw unless IA policy says otherwise
06:00 🔗 yipdw (I'm not sure)
06:00 🔗 bsmith093 its a boondoggle, like you wouldnt believe, this is a fanmade version that as close as possible to the original plan
06:01 🔗 bsmith093 is this too iffy?
06:02 🔗 yipdw not my area of expertise; it just sounded weird
06:02 🔗 yipdw if IA says do it, maybe it's best to let them sort it out
06:02 🔗 bsmith093 i would say this is most definitely transformative, but im not really prepared to back that up, legally
06:02 🔗 bsmith093 who would i talk to
06:02 🔗 chronomex yipdw: we do not worry about ia policy. ia policy is ia does not worry until someone complains.
06:02 🔗 yipdw ok
06:02 🔗 yipdw upload is fine then, I guess
06:03 🔗 bsmith093 k then thats what i thought, if they worry they can always ,ake it dark,. and just keep if backed up but offline, which is what they do anyway, right?
06:05 🔗 Coderjoe yipdw: iirc, ia will also hold it, just dark, until the copyright expires (if that ever happens)
06:05 🔗 Coderjoe (after a complaint at least)
06:06 🔗 yipdw sounds fair
06:07 🔗 bsmith093 hey another thing, how do i go about adding things already in the archive, to a collection, so theyy are more ealsiy findable in one page rather than as each individual search result?
06:07 🔗 Coderjoe i wonder if the lighttpd/nginx config has something that prevents access to dark items, even if you manage to somehow know the location of the files
06:07 🔗 bsmith093 im talking about felix the cat, is case your wondering
06:07 🔗 Coderjoe bsmith093: that requires IA staffer intervention, afaik
06:08 🔗 Coderjoe (as far as I know)
06:08 🔗 bsmith093 ah ok then so should i just redownload and upload to a felix the cat folder?
06:08 🔗 Coderjoe no. they would create a collection and then modify the items to add them to that collection
06:09 🔗 bsmith093 so just put a message on the forum or something?
06:11 🔗 Coderjoe i don't really know how you get their attention
06:11 🔗 DFJustin info@archive.org supposedly
06:12 🔗 bsmith093 they should really have an irc channel
06:13 🔗 Coderjoe oh I love you google
06:13 🔗 Coderjoe "here's a breakdown of activity on group X". click on an item: "Cannot find x. There is no group named x."
06:13 🔗 bsmith093 Coderjoe: for what exaclty
06:14 🔗 bsmith093 gotta love tat
06:14 🔗 bsmith093 that
06:14 🔗 bsmith093 ive run into that whenever my searches get **really** specific
06:15 🔗 Coderjoe this is for a mailing list I was on at some point in the past
06:15 🔗 Coderjoe which apparently no longer exists
06:46 🔗 balrog does anyone know of any ancient UNIX stuff which isn't in TUHS?
06:47 🔗 Coderjoe how ancient and what is TUHS?
06:49 🔗 SketchCow I now own textfiles.xxx
06:49 🔗 SketchCow Go team
06:49 🔗 dnova haha
06:49 🔗 bsmith093 good for u, anothermirror or to keep out the cybersquatters?
06:49 🔗 dnova what registrar did you use?
06:49 🔗 Coderjoe did you buy textfiles.co when columbia's landrush happened?
06:49 🔗 SketchCow No, that's bullshit
06:50 🔗 balrog Coderjoe: ancient, unix v6
06:50 🔗 dnova I want an .xxx or two but the price is a bit much
06:50 🔗 balrog TUHS is The Unix Historical Society
06:50 🔗 SketchCow This is more of a move by me to protect against being ICE seized.
06:50 🔗 bsmith093 why would ICE care?
06:51 🔗 Coderjoe holy shit. $80.18 per year from my source.
06:51 🔗 Coderjoe bsmith093: ICE usually goes after the copyright violation domain names, iirc
06:52 🔗 bsmith093 yeah, and...? textfiles.com cant possibly be violating anything with domain names right?
06:52 🔗 bsmith093 oh wait, yeah that makes much more sense
06:53 🔗 bsmith093 the stuff on the site, wow im tired. :P
06:57 🔗 bsmith093 gnight/gmorning ,all
06:59 🔗 dnova g'night, bsmith093
07:41 🔗 SketchCow Once we decided that the US could "seize" domain names, and believe me, it's utterly untested in 1000 ways, it's just a matter of time.
07:43 🔗 SketchCow Boy, I am adding a ton of french magazines.
07:43 🔗 dnova zut alors!
07:44 🔗 SketchCow I need to do a weblog post about all the stuff I've added, followed by a request to give money to archive.org
07:45 🔗 SketchCow Also, my scripts have become a ton more flexible since I started this, with more error correction and clarity.
07:48 🔗 SketchCow It's just this thing has a ton of magazines with, like, 8 issues.
07:48 🔗 SketchCow So it takes me 2 minutes to set up, or 5 depending.
07:48 🔗 SketchCow then just 8 issues.
07:48 🔗 SketchCow But when it's 120... then we're cooking with gas.
07:50 🔗 SketchCow The fun one is http://www.archive.org/details/computermagazines-french-porte-revues
09:20 🔗 Coderjoe http://www.youtube.com/watch?v=zWu0W1kGvsQ&t=5m15s
09:20 🔗 Coderjoe I threw up in my mouth a bit
09:25 🔗 dnova oh well I'll be sure to watch that then
09:25 🔗 SketchCow Yeah, right on it
09:25 🔗 Coderjoe it's an example transfer from 9.5mm done by some other company than the one that made the video.
09:26 🔗 Coderjoe (film to dvd transfer)
09:26 🔗 SketchCow Know what's great? http://procatinator.com/
09:26 🔗 Coderjoe does that use some kind of metadata to match up cats to songs?
09:27 🔗 SketchCow I would think all video and audio uses metadata
09:27 🔗 Coderjoe I find it rather weird that it possibly randomly matched up "Walking on Sunshine" to a cat on a treadmill
09:28 🔗 Coderjoe ok. this is not random matchups
09:35 🔗 dnova yeah they're not random
09:39 🔗 SketchCow I'm going to get in trouble for the new weblog posting.
09:39 🔗 SketchCow But fuck it
09:43 🔗 godane this is funny
09:43 🔗 godane the bigger ipod versions of crankygeeks are worser i think
09:44 🔗 godane i'm backing up diggnation next
09:44 🔗 godane or at least the first 100 episodes
09:45 🔗 godane looks like i was right on that something was changed with crankygeeks ipod format before it ended
09:46 🔗 godane jan 13 2010 show doesn't have pixal blocks
09:46 🔗 godane but april 22 2010 show does
09:49 🔗 dnova Why do you think you'll get in trouble for that, SketchCow?
09:49 🔗 dnova I fucking love your slamposts. I just really, really hope I am never the subject of one
09:50 🔗 dnova "here are 5,000 cited reasons why dnova SUCKS"
09:51 🔗 RedType that's not a slam, that's an aggressive motivational speech
09:51 🔗 dnova haha
09:54 🔗 SketchCow It's a call to arms presented while standing on the corpse of a fat guy
10:05 🔗 SketchCow I'm now writing an entry with what I've been putting in the Archive these past few months.
11:16 🔗 dnova is this ever going to stop?
11:16 🔗 dnova 15787 ./tmpfs/it/perijulka
11:16 🔗 dnova that's megabytes
11:19 🔗 marceloan Hey, I've uploaded all my Splinder files to the Batcave.
11:20 🔗 dnova bodacious
11:20 🔗 dnova did you say so on the wiki?
11:20 🔗 marceloan What?
11:21 🔗 dnova I'll take care of it
11:22 🔗 dnova http://archiveteam.org/index.php?title=Splinder#Upload_status
11:23 🔗 marceloan :)
16:53 🔗 SketchCow New front page looks great, dnova
16:54 🔗 dnova thanks! still have some ideas
17:13 🔗 Schbirid http://www.jorisvanhoboken.nl/?p=308
19:53 🔗 dnova anyone want to watch JAWS on CED? cuzzz I just got it. and 59 others. and 2 players. and that much less space in my house.
21:03 🔗 bsmith094 good news my poenews scrape is done
21:04 🔗 bsmith094 SketchCow: where do you want he Poe-news.com to go?
21:09 🔗 underscor Coderjoe: Items are made dark by permissions of the actual files
21:09 🔗 underscor You can always see where an item's files are by going to
21:09 🔗 underscor archive.org/download/IDENTIFIER
21:33 🔗 bsmith094 SketchCow: ive got some poenews scraped, if you want it
21:50 🔗 underscor http://inkdroid.org:3000/
21:50 🔗 underscor Realtime wikipedia edits
21:57 🔗 BlueMax underscor: wow.
22:05 🔗 underscor http://qaa.ath.cx/TheEmperorsNewClothes.html
22:10 🔗 yipdw awesome, now I can see Edit Wars IV: A New Hope
22:11 🔗 pberry underscor: I know the guy that made that. He's a library tech guy
22:22 🔗 underscor cool!
22:23 🔗 bsmith094 yipdw: anything on the ffnet script?
22:23 🔗 yipdw bsmith094: it needs to be made more robust to deal with network failures
22:23 🔗 bsmith094 also i have probably some and or all, of poenews i anyone wants it
22:23 🔗 yipdw I don't know what happened to fanfiction.net last night
22:23 🔗 yipdw but they were returning 503s for a while
22:24 🔗 yipdw the discovery mechanism does not gracefully cope with those
22:24 🔗 Coderjoe not pure bash. it relies on awk. though I haven't really observed any version problems with awk, but that's probably because we haven't really used awk here at AT
22:25 🔗 yipdw uh, what
22:25 🔗 Coderjoe that link from underscor, which was a bash json interpreter
22:25 🔗 yipdw oh
22:26 🔗 Coderjoe I would have been really impressed if it was pure bash
22:26 🔗 yipdw I have a terrible way to do it
22:26 🔗 yipdw write a bash backend for Ragel, write the JSON parser in Ragel
22:26 🔗 Coderjoe farm it off to something else that has a json parser?
22:27 🔗 yipdw yes, python2.6 -mjson is one way to do it
22:28 🔗 Coderjoe yipdw: I don't know about ff.net, but I have observed other sites that go down at regular intervals to do backups and stuff. (which to me screams "you're doing it wrong")
22:28 🔗 yipdw or whatever version you've got
22:28 🔗 yipdw Coderjoe: well, either way, the network is never reliable etc
22:28 🔗 Coderjoe except those other sites just stop accepting on 80
22:28 🔗 Coderjoe true
22:28 🔗 yipdw it really felt like someone was just hammering ff
22:28 🔗 yipdw wasn't me, I was just running two connections at a time
22:29 🔗 Coderjoe perhaps some asshole that also wants a copy of everything?
22:29 🔗 bsmith094 wasnt me either, i was scraping poevews
22:29 🔗 yipdw maybe
22:30 🔗 bsmith094 what kind of webservr cant take the load of two mirroring efforts at once, anyway?
22:31 🔗 yipdw it's not that uncommon
22:32 🔗 yipdw the problem is rarely the web server, though
22:33 🔗 yipdw the application to which the server proxies is usually your bottleneck
22:33 🔗 bsmith094 are there 8any* sites left that are just hmtl and links
22:34 🔗 bsmith094 those are dead easy to save, this place has to write custom code for every job
22:35 🔗 Coderjoe 4chan's only database and non-static content, last I knew, was the actual posting script. the thread pages and index pages were re-written as static html when a new post came in that affected them
22:35 🔗 bbot_ well, we're mostly interested in user-generated content
22:35 🔗 Coderjoe (as an example)
22:35 🔗 bbot_ and if you let users upload arbitrary HTML, then things get zany
22:36 🔗 yipdw consider, too, that the characteristics of a mirroring operation are not the same as what a human would do
22:36 🔗 yipdw for one, mirroring will fuck your cache
22:36 🔗 Coderjoe like there was no tomorrow
22:37 🔗 yipdw because mirroring is going to request everything, including rarely-hit pages; and if it takes a lot of resources to generate HTML then that can bring an application down
22:37 🔗 yipdw or it'll generate a lot of content to be dumped into cache and depending on the cache expiration policy that may shove hot data out
22:38 🔗 yipdw (I mean, it shouldn't, but...)
22:49 🔗 Coderjoe even with a basic LRU policy, that will depend on the ratio of normal users to mirror users
22:50 🔗 yipdw yeah, there's a lot of factors
23:51 🔗 bsmith094 seriously does anybody want the poe-news wget-warc dump?
23:52 🔗 DFJustin SketchCow is probably just afk, chillax for a while

irclogger-viewer