#archiveteam 2012-04-14,Sat

↑back Search

Time Nickname Message
01:52 🔗 Coderjoe uploading video #200
01:52 🔗 Coderjoe yay
01:58 🔗 winr4r excellent
02:07 🔗 dashcloud holy crap- this must be the finest job posting ever: http://blogs.valvesoftware.com/abrash/
02:08 🔗 underscor just got nfs access to the imagedump server for wikimedia foundation
02:08 🔗 underscor those will be going up soon on a.o
02:08 🔗 dashcloud congrats!
02:09 🔗 underscor thanks :)
02:09 🔗 Wyatt|Wor Holy carp, is THAT where Abrash wound up?
02:10 🔗 dashcloud after 14 years he came back there it looks like
02:12 🔗 underscor 204.9.55.82:/z/public/pub/wikimedia/dumps 157T 31T 126T 20% /mnt/dumps
02:12 🔗 underscor 204.9.55.82:/z/public/pub/wikimedia/images 143T 16T 126T 12% /mnt/images
02:12 🔗 underscor wheeee
02:18 🔗 winr4r underscor: excellent!
02:19 🔗 winr4r i've got hundreds of photos there
02:19 🔗 underscor :)
02:19 🔗 winr4r i was hoping they'd outlive wikimedia, and it seems it will
02:19 🔗 underscor next step it to write ingestion logic to get it all into archive.org
02:19 🔗 underscor s/it/is/
02:19 🔗 winr4r (hundrds on the commons, that is)
02:19 🔗 winr4r underscor: good luck, and well done on scoring those
02:20 🔗 underscor thanks
02:20 🔗 Wyatt|Wor Wow, that's...a big array.
02:21 🔗 Wyatt|Wor Two of them.
02:21 🔗 underscor yeah
02:21 🔗 underscor the box has something like 480TB on it
02:22 🔗 Wyatt|Wor Where box == rack, I'd imagine.
02:23 🔗 underscor it's all on one "machine"
02:23 🔗 underscor connected over fibrechannel disk enclosures
02:24 🔗 Wyatt|Wor So something like a Ceph cluster? Okay, makes sense.
02:59 🔗 shaqfu Well, my archives con today reinforced how awesome AT is
02:59 🔗 shaqfu So, go you guys o/
03:01 🔗 mistym shaqfu: marac?
03:01 🔗 shaqfu mistym: ...how'd you know?
03:02 🔗 mistym shaqfu: Seen a bunch of people I follow twittering it up today. Wasn't there myself.
03:02 🔗 shaqfu mistym: Yep, was there today/tomorrow
03:02 🔗 shaqfu mistym: Didn't know we had another traditional archivist in the room
03:04 🔗 mistym shaqfu: Yep! For a given value of "traditional" anyway, but yeah, did my masters in a traditional archives program 'n all.
03:04 🔗 shaqfu mistym: Spiffy; just got mine
03:05 🔗 mistym Congrats!
03:05 🔗 shaqfu Thanks :)
03:05 🔗 shaqfu Are you with a place now?
03:05 🔗 mistym Yeah, I work at a museum in Manitoba.
03:06 🔗 shaqfu Gotcha; bit distant for MARAC, then
03:07 🔗 mistym Yeah, not exactly in the area.
03:07 🔗 shaqfu Is there a regional one for central Canada?
03:08 🔗 mistym Not in my province, at least. Alberta's archivists are pretty active though.
03:08 🔗 shaqfu Oh, wow; that's a hike
03:11 🔗 shaqfu The digital object seminar was cool; the job one, harrowing; the DH one, dull
03:12 🔗 mistym I saw anarchivist tweeting about the job one. It sounded brutal.
03:12 🔗 shaqfu Yep
03:13 🔗 shaqfu Lots of "the system is totally fucking broken and we can't fix it"
03:14 🔗 shaqfu And more of the usual student/professional divide, but nobody discussed it :(
03:14 🔗 mistym :(
03:15 🔗 shaqfu For those of us stuck between them, we're SOL
03:18 🔗 mistym Hm, is there a url for the wayback machine to load the latest version of a page, rather than a list or specific revision?
03:24 🔗 shaqfu But yeah, after listening to a bunch of "real" archivist talk digital records for a day, I appreciate AT that much more
03:26 🔗 mistym shaqfu: I know, right? It's unfortunate that it's hard to have a constructive discussion in that environment.
03:27 🔗 shaqfu mistym: Yeah, lots of "we need to be doing something!" and nothing getting done
03:27 🔗 mistym Yeah...
03:27 🔗 Wyatt|Wor Aah, I was just about to ask about that...
03:28 🔗 mistym shaqfu: Giving a talk at this year's ACA that I'm hoping to balance with a little "we can do things! let's get things done!"
03:28 🔗 shaqfu mistym: C being Canadian or Certified?
03:28 🔗 mistym Canadian.
03:28 🔗 mistym ("Canuck")
03:29 🔗 mistym Wyatt|Wor: re: wayback machine?
03:29 🔗 shaqfu If a bunch of malcontents online can move mountains, imagine how much big institutions could do...
03:29 🔗 Wyatt|Wor mistym: Re: "listening to a bunch of "real" archivist talk digital records for a day"
03:29 🔗 mistym Wyatt|Wor: Ahh.
03:29 🔗 Wyatt|Wor Whoops, those quotes escaped.
03:30 🔗 mistym Need to be more careful escaping.
03:31 🔗 mistym shaqfu: I dunno, the longer I'm in big institutions, the more I worry institutional glacial workflows can't be made to work at the speed that's useful.
03:31 🔗 winr4r that archive team was necessary shows that there's a real problem with "real" archivists
03:31 🔗 mistym I'm being overly pessimistic there, but there is major reorientation that institutions, in the big-institution sense, are going to have to do.
03:31 🔗 winr4r as much as many of them do great work, a lot of them have a lot of great ideas about digital preservation and very little wget
03:32 🔗 Wyatt|Wor winr4r: I think there are more problems than simply archivists.
03:32 🔗 winr4r Wyatt|Wor: oh of course
03:32 🔗 winr4r i don't doubt that
03:32 🔗 shaqfu mistym: Yeah, it's hard to respond to "you have 30 days before we delete everything" at the speed of bureaucracy
03:32 🔗 mistym Mhm...
03:33 🔗 winr4r yes
03:33 🔗 Wyatt|Wor It's also hard to change the mentality of people who put their data up without a second thought.
03:33 🔗 shaqfu Yeah :(
03:33 🔗 Wyatt|Wor And harder still to change the businesses that will bean-count something into oblivion without so much as a half-hearted apology.
03:33 🔗 shaqfu And yeah, there needs to be a serious realignment if we're going to realistically deal with these records
03:34 🔗 shaqfu Christ Almighty, Geocities alone is bigger than most university library systems
03:34 🔗 shaqfu Good fucking luck doing it the old-fashioned way; I'll see you at the heat death of the universe
03:34 🔗 Wyatt|Wor And yes, the metadata problem is monstrous.
03:34 🔗 shaqfu Wyatt|Wor: It's solvable - you can work miracles with machine language processing
03:35 🔗 shaqfu Which can at least deal with text stuff
03:35 🔗 winr4r give it a decade or so
03:35 🔗 Wyatt|Wor shaqfu: Haha, you got me. I was just thinking about how to hack away at it with NLP.
03:35 🔗 shaqfu Wyatt|Wor: One panel today was about using topic modeling on newspapers; I'm sure, given time, it'll apply to messier collections
03:36 🔗 shaqfu But yeah, get a lot of processing power together, point it at Geocities, and you'll have something at least usable
03:36 🔗 dashcloud maybe the only way to handle the new workflows is to have a totally separate group inside that's connected in name only, so they can respond in the timeframes required
03:37 🔗 mistym shaqfu: I see Yahoo as being a pretty good analogue. Not data deletion Yahoo, but the old Yahoo web directory.
03:37 🔗 shaqfu Ah, yeah
03:37 🔗 mistym I've heard (unconfirmed) that they were among the biggest employers of library-school graduates at one point in time!
03:38 🔗 shaqfu Yeah, like DMOZ, except people used it
03:38 🔗 mistym Exactly.
03:38 🔗 mistym There was a point where people thought of the internet as a thing you could index by hand, with meticulous metadata.
03:38 🔗 mistym Then the dot-com boom came, the internet *exploded*, and that was never possible again.
03:38 🔗 shaqfu Yep
03:39 🔗 winr4r mistym: i still think there's a place for it
03:39 🔗 shaqfu winr4r: For hand curation?
03:39 🔗 winr4r shaqfu: yes
03:39 🔗 winr4r actually we have it already: it's called twitter
03:39 🔗 shaqfu winr4r: Possible, but things move too fast for that
03:39 🔗 winr4r you just distribute the task
03:40 🔗 mistym winr4r: For subject-specific stuff, etc. What I'm saying is that there will never again be a time where all the Internet that's fit to print is hand-curated to someone's professional standards.
03:40 🔗 shaqfu winr4r: That requires an established network
03:40 🔗 winr4r shaqfu: yes
03:40 🔗 dashcloud I think a better example is any of the social bookmarking sites (or any bookmarking site that has bookmarks open to the public)
03:40 🔗 dashcloud like delicious and pinboard
03:41 🔗 Wyatt|Wor dashcloud: Not enough. There's simply too much data to rely on that.
03:41 🔗 winr4r mistym: you might be right
03:41 🔗 winr4r on the other hand, is there actually more good stuff on the internet than there was in 1999? :P
03:41 🔗 dashcloud hell yes
03:42 🔗 dashcloud in quantity yes, percentage wise maybe maybenot
03:42 🔗 SketchCow HiiiiIiIiiIIiiiI
03:42 🔗 SketchCow I saw "Detention"
03:42 🔗 SketchCow you must see detention.
03:43 🔗 mistym Detention?
03:43 🔗 aggro But I've done nothing wrong!
03:43 🔗 winr4r hi jason
03:43 🔗 shaqfu Movie about chewing gum in high school?
03:43 🔗 Wyatt|Wor Evening to you.
03:45 🔗 shaqfu SketchCow: Has there been talk at IA about using NLP to handle metadata for these McLargeHuge collections?
03:47 🔗 zgrant What does NLP = ?
03:48 🔗 zgrant Quick search shows Neuro Linguistic Programming, but that doesn't seem right.
03:48 🔗 aggro That's probably it.
03:48 🔗 winr4r natural language processing
03:48 🔗 aggro If you're trying to data mine metadata for relevant info to humans
03:48 🔗 zgrant winr4r: Thanks
03:51 🔗 shaqfu Letting machines figure out word associations, more or less
03:51 🔗 zgrant Interesting.
03:52 🔗 zgrant I'm reading the The Stanford Natural Language Processing Group web page. Who knew? Well I guess you did. :)
03:53 🔗 SketchCow shaqfu: Absolutely no
03:54 🔗 shaqfu SketchCow: Really? Admittedly, I'm surprised
03:55 🔗 shaqfu Seems like the only reasonable solution - barring some miracle, I don't see there being enough humans to mark everything up
03:57 🔗 SketchCow Don't be the latest in a hundred people I've dealt with surprised that archive.org doesn't have much manpower.
03:58 🔗 shaqfu SketchCow: I'm not surprised IA has minimal staff; I knew that already :P I'm surprise there's been no talk about letting machines do the heavy markup lifting
03:59 🔗 mistym It's kind of easy to assume that archive.org is some all-powerful automoton. Then you stand in a room WITH THE INTERNET and suddenly you realize that it's not actually powered by elder gods or smth.
03:59 🔗 mistym (even if it *is* in a church)
04:01 🔗 SketchCow Again
04:01 🔗 SketchCow THERE'S NOBODY TO TALK
04:01 🔗 winr4r i'd say let someone in 20 or 30 years deal with it when they're widely recognised for being as important as they are
04:01 🔗 shaqfu SketchCow: Gotcha this time
04:01 🔗 winr4r you could worry about NLP now to find all those cat photos, or you could 'grep -ri "cat\.*photo" /geocities' a thousand times as fast in 20 years' time
04:02 🔗 shaqfu winr4r: For the amount of processing power it'd take, and how NLP isn't really at the point you'd need yet, yeah, may as well wait
04:03 🔗 Wyatt|Wor I don't think there's only one "right" approach.
04:03 🔗 shaqfu There rarely is
04:03 🔗 Wyatt|Wor Especially since the data is hierarchical that actually could help quite a bit.
04:04 🔗 Wyatt|Wor Well, up to a point.
04:05 🔗 Wyatt|Wor (I'm not familiar enough with the data set to know how what proportion of neighbourhoods are just numbered with random stuff shoved in)
04:05 🔗 shaqfu Didn't they stop that system after a point?
04:05 🔗 winr4r shaqfu: yes, after around 1999 i believe
04:06 🔗 shaqfu So it's hard to regard that hierarchy for any serious use, unless you're limiting your work to 199x-1999
04:06 🔗 Wyatt|Wor One harpoon.
04:06 🔗 shaqfu Hm?
04:07 🔗 mistym Hooray, scraping script running. Hopefully will be successful!
04:07 🔗 winr4r mistym: what are you scraping?
04:07 🔗 mistym winr4r: digiplay.info
04:07 🔗 winr4r mistym: excellent
04:08 🔗 mistym Even though the data is kind of messy, it's not too much work to extract it into structured json.
04:08 🔗 SketchCow This is also why I keep bringing in assholes from outside to dump open-source solutions and leverage archive.org against it
04:09 🔗 SketchCow There's no dev space inside the company
04:09 🔗 chronomex leveraged synergies
04:09 🔗 chronomex wtf
04:09 🔗 chronomex SketchCow is talking about leveraged synergies
04:09 🔗 shaqfu Hunh; I knew it ran lean, but didn't expect it to run *that* lean
04:10 🔗 Wyatt|Wor SketchCow Clicker?
04:10 🔗 SketchCow The press says things like 200-300 employees
04:10 🔗 SketchCow But the vast vast vast majority of those people are scanners. Book scanners.
04:10 🔗 shaqfu Isn't it something like 20-30 core?
04:10 🔗 SketchCow I'd say, maybe, MAYBE, my observaton is 20-30.
04:10 🔗 SketchCow Yes, 20, 30.
04:11 🔗 SketchCow Now, work that out.
04:11 🔗 SketchCow We have 5 people overseeing the book scanning centers.
04:11 🔗 SketchCow Boom, now we're 20-25% down
04:11 🔗 SketchCow etc
04:12 🔗 SketchCow I'm like hiring six new employees in terms of stuff and publicity and the rest
04:12 🔗 SketchCow But I can only do things that are being brought in, there's no way to make those poor devs do MORE work
04:12 🔗 chronomex 15 coder-librians is not enough
04:12 🔗 SketchCow And there we are.
04:12 🔗 Wyatt|Wor And those people are also jointly responsible for running all the servers and such?
04:12 🔗 SketchCow So if we do some sort of NLP smart tagging smartiness, great. Get on it.
04:12 🔗 SketchCow Free tour.
04:12 🔗 SketchCow Yes, there's a team of 5-10 dev/admin/network people
04:13 🔗 SketchCow OH LOOK AT ALL YOUR EYES GO WIDE
04:13 🔗 SketchCow Anyway, so yeah, get on it.
04:13 🔗 winr4r and there's got to be like six billion servers there
04:13 🔗 SketchCow I'll just use my universal access to ensure you get stuff to help you.
04:14 🔗 Ymgve you should get a rifle, some tranq darts, then go hang out outside one of google's datacenters
04:14 🔗 shaqfu Pity 'bout the 5-10 years math ed it'd take to do it; it'd be a badass project
04:14 🔗 SketchCow Go rape a gaduate program is my suggestion
04:15 🔗 SketchCow Anyway, unrelated, I need to go to bed now.
04:15 🔗 Wyatt|Wor Yeah, the perfect admin abduction is hard to pull off.
04:15 🔗 Wyatt|Wor Good night.
04:15 🔗 winr4r night jason
04:15 🔗 shaqfu G'nite
04:15 🔗 SketchCow Let's keep making amazing shit
04:17 🔗 SketchCow Ooo, one of batcave's two remaining mounted drive sets has been emptied out
04:17 🔗 SketchCow We're now down to one. 9gb.
04:17 🔗 winr4r in any case, i think there's a risk of over-complicating the "saving shit" strategy (which provably works very well) and turning it into a discussion about "how do we make sure that every single thing is categorised as well as books are in a library" and thereby getting very little done
04:17 🔗 winr4r SketchCow: s/g/t/ ?
04:17 🔗 winr4r i can't imagine you having only 9gb of *anything*
04:17 🔗 SketchCow ha ha
04:18 🔗 shaqfu winr4r: Yep, that's what happened on this end; it turned into an issue of description
04:18 🔗 SketchCow Did I write 9gb?
04:18 🔗 SketchCow I DO need a rest
04:18 🔗 SketchCow 9tb
04:18 🔗 shaqfu Which, really, people only care 'bout good-enough
04:18 🔗 winr4r SketchCow: did you actually sleep at all last night?
04:18 🔗 shaqfu Barring special cases - obviously shit like presidential letters need Awesome
04:19 🔗 chronomex meh, president is just another sack of meat
04:19 🔗 winr4r shaqfu: good enough and actually existing beats immaculately described archives that do not
04:19 🔗 shaqfu winr4r: You got it
04:20 🔗 mistym "less process more product" etc
04:20 🔗 shaqfu mistym: wrought grand - okay item-level description
04:21 🔗 mistym ~500 pages of 5000. This may take awhile.
04:24 🔗 winr4r good luck
04:30 🔗 Wyatt|Wor So as a baseline, any thoughts on what metadata should be given priority? Dublin Core and a mostly-flat ontology of descriptive tags?
04:34 🔗 SketchCow http://archive.org/details/stage6
04:35 🔗 Wyatt|Wor He even archives in his sleep. ;)
04:36 🔗 SketchCow zzzzzzreclassifyfuckdublincorzzzzzzmmzzzzz
04:39 🔗 winr4r haha
04:42 🔗 Wyatt|Wor That's fine. I'm not a particularly huge fan of DCMI, even though I live a stone's throw from Dublin.
04:42 🔗 Coderjoe mmm
04:42 🔗 Coderjoe i has a collection
04:43 🔗 mistym What has you a collection of?
04:43 🔗 Coderjoe i'm uploading the stage6 items
04:43 🔗 winr4r Coderjoe: how did you get them?
04:44 🔗 mistym Ahh.
04:44 🔗 Coderjoe winr4r: I downloaded them between the closing announcement and the shutdown
04:44 🔗 Coderjoe with metadata and everythign
04:44 🔗 Coderjoe http://wegetsignal.org/stage6/
04:44 🔗 Coderjoe will probably be a little slow
04:45 🔗 winr4r 25 terabytes?
04:45 🔗 chronomex <3
04:45 🔗 Coderjoe winr4r: I don't have 25 TB of videos
04:45 🔗 Coderjoe only 290-ish GB
04:45 🔗 winr4r oh, nm, saw the percentage
04:45 🔗 winr4r but still, good work :)
04:46 🔗 Wyatt|Wor Good one.
04:46 🔗 Wyatt|Wor Is that 25TB before or after deriving?
04:46 🔗 Coderjoe that was the projected size of what was up on the stage6 servers
04:47 🔗 Wyatt|Wor Ah... :/
04:48 🔗 winr4r on an unrelated note, is there a big list of fortunecity sites that you guys have been using?
04:48 🔗 Coderjoe and this was just me with three network connections (home, work, and a server in california)
04:48 🔗 winr4r if i'm well within my bandwidth cap towards the end of the month, i will set my screenshot bot loose again
04:48 🔗 Coderjoe winr4r: I think it came from google results
04:49 🔗 winr4r http://archive.org/details/geocities-screengrabs-collection in case you didn't know
04:49 🔗 winr4r 4000+ from geocities
04:50 🔗 Wyatt|Wor winr4r: Does it run as a normal user? I give you an account on my VPS, if you'd like.
04:51 🔗 winr4r Wyatt|Wor: yes, though it does need an xvfb to run on
04:51 🔗 winr4r and a bunch of dependencies that aren't normally on a server
04:51 🔗 Wyatt|Wor That doesn't necessarily mean it's not doable.
04:52 🔗 Wyatt|Wor (Though I've never messed with xvfb on a headless machine)
04:52 🔗 winr4r Wyatt|Wor: me neither
04:55 🔗 Coderjoe xvncserver wouldn't work?
04:55 🔗 Coderjoe (you should be able to take a shot of the desktop or the like, I would think)
04:55 🔗 winr4r Coderjoe: i'd expect it would
04:55 🔗 winr4r i just know for sure that Xvfb does
04:56 🔗 Coderjoe i mean, yes, it isn't a vfb, but still
04:56 🔗 Coderjoe hmm. 6 items i need to redo
04:57 🔗 Coderjoe i'm sure it will increase
04:57 🔗 Wyatt|Wor Okay, looks like xvfb will work on a headless box. That's what Google says.
04:58 🔗 Coderjoe btw, that "videos listed" stat is just the videos I had pulled into my database with my importer. the "total video count" at the bottom is my estimated total video count that stage6 hosted
04:58 🔗 winr4r Wyatt|Wor: splendid!
06:57 🔗 Nemo_bis underscor, what dump server?
06:59 🔗 Nemo_bis ah, your.org
09:15 🔗 Wyatt|Wor Curious, since this thing is _still_ grepping that file, the webdav-feed.json and .xml...what role do they serve, exactly?
09:15 🔗 winr4r how big is the file?
09:16 🔗 winr4r and how can it take that long to grep anything?
09:16 🔗 chronomex fgrep is much faster, for fixed strings
09:17 🔗 winr4r yeah but i grepped a 1.9gb file in seconds, earlier today
09:17 🔗 chronomex it was probably all sitting in ram
09:17 🔗 winr4r (getting a list of fortunecity sites from the ODP)
09:17 🔗 winr4r chronomex: nope, fresh from the disk
09:17 🔗 chronomex hm, ok
09:19 🔗 Wyatt|Wor It's the json is 35MB. The incantation is grep http://gallery.me.com/[^"<]+ data/p/pe/per/pertormod1/gallery.me.com/webdav-feed.json and it's accumulated 3766 CPU _Minutes_
09:19 🔗 winr4r okay, i would call that a bug
09:20 🔗 chronomex indeed
09:20 🔗 Deewiant Run it interactively and see what it outputs, if anything?
09:20 🔗 emijrp use [^"<]+? and grep -E
09:20 🔗 Wyatt|Wor Even assuming the worst case of grep's iconv locale performance, I'm inclined to agree.
09:20 🔗 Wyatt|Wor Sorry, there's an -oE in there I missed
09:21 🔗 Wyatt|Wor (It's the seesaw-s3.sh)
09:21 🔗 winr4r just timed a regex again on a copy of said 19gb file, 19.4 seconds
09:21 🔗 chronomex winr4r: 1.9 or 19?
09:21 🔗 winr4r 1.9gb*
09:21 🔗 emijrp +?
09:21 🔗 chronomex ah
09:21 🔗 emijrp +?
09:21 🔗 winr4r so yeah 3766 minutes for a 35mb file is a LITTLE excessive
09:22 🔗 chronomex Wyatt|Wor: hmmmm. I would try some cut | grep(not -E) action.
09:22 🔗 Deewiant That comes out to about 162 bytes per second (and dropping, if it's still going)
09:22 🔗 chronomex drooping
09:23 🔗 Wyatt|Wor That's my perspective too. I think the emulated ARM processor that booted Linux on the 8-bit MC had a better data rate about a thousand times that.
09:25 🔗 alard winr4r: I see you're asking about a list of FortuneCity sites. I can send you the list from which we've been downloading, if that helps.
09:25 🔗 Deewiant Hey, it's faster than some 600 baud modems, according to Wikipedia.
09:25 🔗 chronomex some?
09:25 🔗 winr4r alard: i'd appreciate it, we can compare notes too
09:25 🔗 Deewiant https://en.wikipedia.org/wiki/List_of_device_bandwidths#Modems_.E2.80.93_narrow_and_broadband
09:26 🔗 winr4r alard: i grabbed a list from ODP (hence greeping a 1.9gb file), did you guys try that?
09:26 🔗 Deewiant There are two 600 baud ones that're 1.2 kbit/s and one that's 2.4 kbit/s.
09:26 🔗 chronomex ah
09:26 🔗 alard winr4r: What's ODP?
09:26 🔗 winr4r alard: open directory project
09:26 🔗 alard Ah, I see. No, I just googled.
09:26 🔗 winr4r okay, one sec
09:27 🔗 winr4r http://dl.dropbox.com/u/57276499/sitelist.txt
09:28 🔗 winr4r is what i got from ODP
09:28 🔗 Wyatt|Wor I actually don't understand this regex, even. o is --only-matching -E is extended regex... how does this work? [^"<]+
09:29 🔗 chronomex I don't think you should need -E for that
09:29 🔗 alard win4r: Okay, got it. I'm currently making my list.
09:29 🔗 alard Wyatt|Wor: The regex matches anything until " or <
09:29 🔗 alard I believe that's matching urls in the webdav file.
09:30 🔗 alard So it will match from http:// until the tag ends.
09:30 🔗 Wyatt|Wor alard: Ah, I thought the caret was an anchor to the beginning?
09:31 🔗 winr4r Wyatt|Wor: not within []s
09:31 🔗 alard Between [] it's a negation. So 'anything but " and < '
09:31 🔗 winr4r alard: thanks :)
09:31 🔗 Wyatt|Wor ooooooh, I see. Hmm, need to put more skill points in RegEx. And the +?
09:31 🔗 chronomex one or more instances of the preceding object
09:32 🔗 alard * means zero or more.
09:32 🔗 chronomex er, matching element, which in this case is the whole [] expression
09:32 🔗 Wyatt|Wor Ah, so [] create a single semantic unit. I see.
09:32 🔗 chronomex indeed
09:33 🔗 chronomex it matches exactly one character
09:33 🔗 alard winr4r: http://db.tt/PjVwK1A2 (a 3.7MB .txt.bzip2)
09:34 🔗 chronomex is that the file we're working on?
09:34 🔗 alard No, that's the list of all fortunecity sites.
09:34 🔗 winr4r alard: thanks!
09:35 🔗 alard winr4r: You'll have to expand the streets yourself, we've basically archived anything from number 0 to 2600.
09:37 🔗 winr4r alard: so com/campus/athena = campus.fortunecity.com/athena/<numbers here> ?
09:41 🔗 Wyatt|Wor Hm, so it's definitely finding things, though it seems awfully slow...
09:42 🔗 emijrp paste the entire command line
09:43 🔗 Wyatt|Wor grep -oE 'http://gallery.me.com/[^"<]+' data/p/pe/per/pertormod1/gallery.me.com/webdav-feed.json # Pretty much verbatim from seesaw-s3.sh
09:44 🔗 Wyatt|Wor Ah, I think I've got the problem. Who do I bug about a patch?
09:45 🔗 emijrp a bug in grep?
09:45 🔗 Wyatt|Wor Well, yes, to an extent. But it's a bug I think we can safely work around. export LANG=C And it's about three orders of magnitude faster
09:46 🔗 Wyatt|Wor I could have sworn the iconv bug was fixed though. :/
09:46 🔗 emijrp where is the file to parse? i want to make some grep tests
09:46 🔗 Wyatt|Wor emijrp: Let me stick it somewhere.
09:48 🔗 Wyatt|Wor Come to think of it , DCC would have been faster...
09:48 🔗 Wyatt|Wor radiusic.com/bigfeet.json
09:48 🔗 Wyatt|Wor "d" and "t" are totally right next to each other.
09:50 🔗 emijrp downlaiding
09:52 🔗 Wyatt|Wor Okay yeah, it hit me because my grep is old. It's apparently fixed in grep 2.9
09:53 🔗 Wyatt|Wor (Didn't realise I was still using grep 2.5.4)
09:53 🔗 emijrp what do you want, the entire url o just the domain + username?=
09:53 🔗 chronomex grep 2.old
09:53 🔗 Wyatt|Wor The problem is, in this case, most distros in production are probably using old grep.
09:54 🔗 Wyatt|Wor CentOS 6 has grep 2.6
09:54 🔗 winr4r i'm on 2.5.4 too
09:55 🔗 winr4r but piping it to a file, it takes a few seconds
09:56 🔗 winr4r 8 seconds, to be precise
09:56 🔗 winr4r time grep -oE 'http://gallery.me.com/[^"<]+' bigfeet.json > what
09:56 🔗 winr4r is what i am using
09:57 🔗 Wyatt|Wor emijrp: The problem isn't that it doesn't work. The problem is that when you're using many versions of grep in the wild with LANG=en_US.utf8 (or any unicode, locale for that matter), it's fantastically slow.
09:59 🔗 Nemo_bis unicode comparisons are always very slow
09:59 🔗 Wyatt|Wor The good thing is, we can patch our scripts by explicitly setting LANG=C and LC_CTYPE=C and that should be safe.
10:00 🔗 winr4r i am en_GB.utf-8 and that grep still takes seconds rather than hours
10:00 🔗 Wyatt|Wor (Or just unset LC_CTYPE)
10:01 🔗 Nemo_bis In what format is the file you're grepping save in? Wouldn't this matter?
10:01 🔗 Nemo_bis I suppose grep should determine what charset it's being used, but will do so only from the headers...
10:02 🔗 Wyatt|Wor Nemo_bis: It's just a JSON file from mobileme
10:03 🔗 alard winr4r: "so com/campus/athena = campus.fortunecity.com/athena/<numbers here> ?" Yes, or www.fortunecity.com/campus/athena/<number>. (The subdomain approach doesn't work with co.uk/it/es/se, I think.)
10:03 🔗 winr4r alard: gotcha
10:48 🔗 emijrp netsplit
11:08 🔗 Wyatt|Wor Okay, updated grep and things are much speedier. I'll try to take a look at the memac scripts when I get home and figure out where to add that env.
11:09 🔗 winr4r it's weird though
11:10 🔗 winr4r that i can be running the same version also with a UTF-8 LANG and do in nineteen seconds what your grep didn't finish in hours
11:10 🔗 * winr4r isn't exactly on a speed-demon computer
11:14 🔗 Wyatt|Wor winr4r: What distro/version?
11:15 🔗 winr4r Wyatt|Wor: ubuntu 10.04
11:15 🔗 Wyatt|Wor Distro-specific patches will do that. Yeah, Debian patched it a while back.
11:15 🔗 winr4r ah :)
11:15 🔗 Wyatt|Wor Gentoo just stabled a newer version instead.
11:15 🔗 winr4r that explains that
11:16 🔗 Wyatt|Wor (But this is my work computer, so I don't exactly bother updating often)
11:17 🔗 * winr4r nods
11:17 🔗 winr4r 1507 screenshots ;D
11:18 🔗 Wyatt|Wor Ooh, going pretty fast.
11:27 🔗 oli 06:25:13 up 13:46, 6 users, load average: 54.70, 54.97, 58.34
11:27 🔗 oli hmm i think i started too many threads
11:27 🔗 Wyatt|Wor What, that's it?
11:27 🔗 oli hahaha
11:27 🔗 oli yeah that's it
11:28 🔗 Wyatt|Wor ;)
11:30 🔗 oli i got a box from softlayer and its not going over 100mbit :(
11:31 🔗 Wyatt|Wor Time to go home. Later.
11:32 🔗 winr4r bye Wyatt|Wor!
13:33 🔗 SketchCow See? I sleep like everyone else. Here I am, back up again.
13:34 🔗 undersco2 lies
13:40 🔗 winr4r haha
13:41 🔗 winr4r hey cow, i'll have fortunecity screenshots for you soon
13:41 🔗 winr4r i'll email you when i'm done, it's not urgent
13:44 🔗 SketchCow Sounds fun
13:49 🔗 winr4r Wyatt|Wor is letting me use his VPS for it
13:49 🔗 winr4r i'm at nearly 2000 now
13:49 🔗 winr4r how are you? :)
13:49 🔗 SketchCow Just blew another bulk of mobileme off batcave.
13:49 🔗 SketchCow The machine is now down to 8.8tb of data.
13:50 🔗 SketchCow Which is good, it's down from rough 28tb
13:52 🔗 SketchCow Mostly, I'm stunned, I'm finding additional pieces of friendster
13:52 🔗 SketchCow And everything else.
13:52 🔗 SketchCow Also, our Berlios grab
13:53 🔗 winr4r that's the archive team equivalent of finding loose change in your sofa?
13:53 🔗 SketchCow Yeah
13:54 🔗 Nemo_bis What about splinder?
13:54 🔗 Nemo_bis SketchCow, I think chronomex needed a place where to upload his last pieces of Splinder.
13:55 🔗 SketchCow Next, I need to start shoving splinder into archive.org proper.
14:01 🔗 SketchCow Hey there, I'm James. I'm from Australia.
14:01 🔗 SketchCow I have the site bookmarked again, and will read a few files when I get the time.
14:01 🔗 SketchCow I'm nearly seventeen, and I remember coming across textfiles at 12 or 13..
14:01 🔗 SketchCow You are fucking straight up, and I respect it..
14:01 🔗 SketchCow Thanks for doing what you do!
14:02 🔗 winr4r :)
14:02 🔗 winr4r doesn't that sort of thing just make your day?
14:04 🔗 SketchCow Well, I get a lot of them.
14:04 🔗 SketchCow But I do appreciate them.
14:04 🔗 winr4r mhm
14:29 🔗 SketchCow 2.2T mobileme-03
14:29 🔗 SketchCow 2.8G mobileme-05
14:29 🔗 SketchCow 413G mobileme-06
14:29 🔗 SketchCow 7.5G mobileme-04
14:29 🔗 SketchCow See just kind of lying around there
14:30 🔗 winr4r 2.2 terabytes sounds like a small figure then you see "413G" and then it's like "oh, that is actually a big number"
14:35 🔗 oli haha
14:35 🔗 oli are there any other projects apart from mobileme i can be helping with? i have bandwidth to spare
14:39 🔗 SketchCow Check the wiki?
14:40 🔗 SketchCow I don't actually know offhand which need bandwidth OTHER than mobileme
14:41 🔗 SketchCow I'm about to dump a pile of Polish shareware CDs onto the cdbbscollection.
14:42 🔗 oli yeah i looked, theres not really anything else to do from what i can see :(
14:43 🔗 Deewiant Can't you simply use more bandwidth on mobileme, or is mobileme at its limit or something?
14:44 🔗 SketchCow Mobileme is a cancer eating all our attention - after I finish with batcave's decomission I will start regarding other things we can do.
14:45 🔗 oli i cant seem to get more than about 100mbit out of mobileme from my box at softlayer even though its on a gige connection
14:45 🔗 oli and im running a lot of threads, running more just bogs the system down and doesnt get anything downloading faster
14:50 🔗 SketchCow Someone has sent me 4gb (or thereabouts) of mid 1990s Spanish demoscene stuff.
14:52 🔗 undersco2 SketchCow: that message from james is really cool
14:53 🔗 SketchCow Yeah, and he's still a young nubile 17 year old and not some busted old mare like you
14:53 🔗 * SketchCow turns undersco2
14:53 🔗 SketchCow Come back when you've earned three fiddy
14:53 🔗 oli i cant resolve textfiels.com :/
14:53 🔗 oli textfiles.com rather
14:53 🔗 SketchCow Record expires on 07-Oct-2021.
14:53 🔗 undersco2 SketchCow: <3
14:53 🔗 undersco2 hahaha
14:53 🔗 SketchCow It ain't that!
14:53 🔗 SketchCow 2021!
14:53 🔗 winr4r same here
14:53 🔗 SketchCow Bitches!
14:54 🔗 undersco2 ;; ANSWER SECTION:
14:54 🔗 undersco2 textfiles.com. 3600 IN A 208.86.224.90
14:54 🔗 undersco2 fine here
14:54 🔗 SketchCow Well, I'm ON textfiles.com, so it's not the machine.
14:54 🔗 oli cant get it from my box in australia or here in budapest
14:54 🔗 SketchCow Likely, someone is assfucking the apache.
14:54 🔗 SketchCow One moment.
14:54 🔗 undersco2
14:54 🔗 undersco2 The connection was reset
14:54 🔗 undersco2
14:54 🔗 undersco2
14:54 🔗 undersco2 The connection to the server was reset while the page was loading.
14:54 🔗 oli yep same as undersco2
14:54 🔗 undersco2 yep
14:54 🔗 undersco2 oh fuck
14:55 🔗 undersco2 that's a lot of returns
14:55 🔗 undersco2 sorry
14:55 🔗 SketchCow Ah, here we are.
14:55 🔗 SketchCow Someone has 480 simultaneous connections to the machine.
14:55 🔗 SketchCow That might be a factor.
14:55 🔗 winr4r jesus
14:55 🔗 oli anyone know a way/system for a redundant multi node filesystem i can run between many computers?
14:55 🔗 oli w./ linux
14:55 🔗 SketchCow ha ha.
14:56 🔗 SketchCow Someone's about to meet my old friend mister soft firewall
14:56 🔗 undersco2 hahaha
14:56 🔗 undersco2 oli: ceph
14:56 🔗 winr4r can't you put in a rewrite rule for his IP so he downloads 480 goatses every time?
14:56 🔗 undersco2 ^
14:56 🔗 undersco2 hahahahaha
14:57 🔗 oli undersco2: thx will look into it
14:59 🔗 SketchCow tcp4 0 33078 208.86.224.90.80 189.19.142.212.42364 LAST_ACK
14:59 🔗 SketchCow tcp4 0 33078 208.86.224.90.80 189.19.142.212.42365 LAST_ACK
14:59 🔗 SketchCow tcp4 0 33078 208.86.224.90.80 189.19.142.212.42591 LAST_ACK
14:59 🔗 SketchCow tcp4 0 33080 208.86.224.90.80 189.19.142.212.42506 LAST_ACK
14:59 🔗 SketchCow tcp4 0 33080 208.86.224.90.80 189.19.142.212.42566 LAST_ACK
14:59 🔗 SketchCow tcp4 0 33080 208.86.224.90.80 189.19.142.212.42328 LAST_ACK
14:59 🔗 SketchCow tcp4 0 33079 208.86.224.90.80 189.19.142.212.42257 LAST_ACK
14:59 🔗 SketchCow tcp4 0 33078 208.86.224.90.80 189.19.142.212.42238 LAST_ACK
14:59 🔗 SketchCow tcp4 0 33078 208.86.224.90.80 189.19.142.212.42239 LAST_ACK
14:59 🔗 SketchCow tcp4 0 33078 208.86.224.90.80 189.19.142.212.42129 LAST_ACK
14:59 🔗 SketchCow tcp4 0 33080 208.86.224.90.80 189.19.142.212.42126 LAST_ACK
14:59 🔗 SketchCow tcp4 0 33077 208.86.224.90.80 189.19.142.212.42127 LAST_ACK
14:59 🔗 SketchCow tcp4 0 33078 208.86.224.90.80 189.19.142.212.42128 LAST_ACK
14:59 🔗 SketchCow tcp4 0 33079 208.86.224.90.80 189.19.142.212.42080 LAST_ACK
14:59 🔗 SketchCow It's like that all the way down.
14:59 🔗 SketchCow Just blocked him AND turned off the website for a moment
14:59 🔗 SketchCow I blocked his subnet, because it feels good man
15:00 🔗 winr4r haha
15:01 🔗 SketchCow It's getting there.
15:01 🔗 SketchCow Another 3-4 minutes, it'll be down to normal, then I'll restart.
15:01 🔗 SketchCow I love these toolbags.
15:01 🔗 oli thanks
15:02 🔗 SketchCow WOAH SHIT THIS WEBSITE IS MIRRORED IN 15 LOCATIONS AND HAS BEEN ON THE NET FOR 14 YEARS I BETTER OPEN THREE BILLION CONNECTIONS AND SUCK IT DOWN NOW
15:02 🔗 SketchCow AAAAAHHHH COULD GO ANY MORE
15:02 🔗 SketchCow ANY SECOND NOW IT MIGHT DIE
15:02 🔗 SketchCow AIIREEEEEEE I ATE SUGER FROSTED SUGAR THIS MORNING WHILE DRINKING QUIK
15:03 🔗 winr4r ^ this is the exact same conversation being had in fortunecity's secret IRC channel
15:03 🔗 SketchCow AAAARRRGGGGIGIGIGIGIGIGG
15:04 🔗 winr4r haha.
15:05 🔗 SketchCow Well, they're still at 200 connections, but bringing textfiles.com back.
15:06 🔗 BlueMax whoops, was that me?
15:07 🔗 SketchCow Hooray, my methamphetamine textfile is up
15:07 🔗 SketchCow http://www.textfiles.com/drugs/himet1.txt
15:07 🔗 SketchCow INTERNET SAVED
15:07 🔗 BlueMax lol
15:07 🔗 BlueMax I can't help but wonder how many textfiles we missed.
15:10 🔗 SketchCow In terms of what.
15:10 🔗 SketchCow When you say we missed, do you mean me?
15:10 🔗 mistym_ Erk, looks like scraping errors in my digiplay.info data. At least I have the html cached now.
15:10 🔗 SketchCow Because as far as I can tell, believe it or not, I got most of them, ultimately.
15:10 🔗 BlueMax No, I miss you whenever I sleep.
15:11 🔗 BlueMax Most of them?
15:11 🔗 SketchCow Well, nearly all that were passed from BBS to BBS.
15:11 🔗 BlueMax I wonder if the BBSes that are up today still have any we don't.
15:11 🔗 BlueMax Or you don't.
15:12 🔗 BlueMax However you want to say it.
15:12 🔗 Wyatt So if someone is willfully trying to immortalise their community/content on IA, how best to go about that?
15:12 🔗 winr4r hiya Wyatt :)
15:13 🔗 winr4r Wyatt: going strong btw, over 2000 now!
15:14 🔗 Wyatt Feel free to hammer on it all week if you want. Start a couple in parallel, even.
15:14 🔗 winr4r :D
15:15 🔗 SketchCow SCREENSHOT ALL THE THINGS
15:15 🔗 SketchCow Ha ha, this isn't spanish demo scene.
15:15 🔗 SketchCow This is Spanish ATARI demo scene
15:15 🔗 Wyatt Oh hawt
15:16 🔗 BlueMax mmm, tilt that joystick
15:22 🔗 SketchCow http://archive.org/details/spanish-demoscene-collection
15:23 🔗 emijrp Finally some Spanish content.
15:24 🔗 SketchCow Por último, todo el mundo puede ser un idiota!
15:25 🔗 BlueMax Everyone can be an idiot!
15:25 🔗 emijrp ú
15:25 🔗 emijrp lólólólól
15:25 🔗 emijrp TACO.
15:31 🔗 emijrp A weird thing of IA items is that don't show who is the uploader, so, you can't search for similar stuff using the uploader contributions list.
15:33 🔗 SketchCow Agreed
15:35 🔗 SketchCow root@teamarchive-0:/2/FRIENDSTER# du -sh .
15:35 🔗 SketchCow 1.7T .
15:36 🔗 winr4r :D
15:40 🔗 emijrp 15 years later
15:40 🔗 emijrp 17:35:42 <SketchCow> 1.7P .
15:40 🔗 emijrp 17:35:42 <SketchCow> root@teamarchive-0:/2/FACEBOOK# du -sh .
15:40 🔗 winr4r yes
15:40 🔗 Wyatt Only 1.7?
15:40 🔗 BlueMax Onl- DAMNIT
15:40 🔗 nitro2k01 Haha
15:41 🔗 winr4r https://www.google.co.uk/search?hl=en&site=webhp&q=define+gigabyte
15:41 🔗 winr4r also
15:41 🔗 Wyatt Facebook has got to be past 100PB by now, right?
15:41 🔗 nitro2k01 All of this is nothing compared to when YouTube goes down
15:41 🔗 winr4r will someone click on the audio icon there and tell me that google is just trolling us
15:41 🔗 BlueMax YouTube Collection
15:42 🔗 winr4r JYGABYTE
15:42 🔗 winr4r nitro2k01: let's not even think about that :<
15:44 🔗 nitro2k01 I wonder if there's a real risk that Flickr goes down. I think not, since it seems to be one of the Yahoo services that are actually profitable.
15:44 🔗 nitro2k01 Or, I would imagine so
15:44 🔗 Wyatt Didn't they lay off most of the people working on it?
15:45 🔗 nitro2k01 I don't know, just speaking of the data retention here
15:45 🔗 nitro2k01 Seems unlikely they would just kill it like Geocities
15:45 🔗 nitro2k01 Since quite a few people actually have those pro badges that means they pay up every year
15:48 🔗 Wyatt I'd say Delicious is definitely likely to be amputated first among their remaining high-profile sites.
15:49 🔗 winr4r Wyatt: it already was
15:49 🔗 winr4r delicious is not owned by yahoo anymore
15:49 🔗 Wyatt ...what, someone actually bought it?!
15:49 🔗 nitro2k01 And myspace... Just think of all the flashy designs that were trashed overnight
15:50 🔗 emijrp http://longbets.org
15:50 🔗 winr4r Wyatt: yup, i believe it was the founder of youtube
15:50 🔗 Wyatt I thought that was a joke. I...am rather surprised.
15:50 🔗 Wyatt Somehow I missed the reality of it.
15:50 🔗 SketchCow You did.
15:50 🔗 SketchCow Hard.
15:51 🔗 SketchCow And it was two of the founders of youtube
15:51 🔗 SketchCow In April.
15:51 🔗 SketchCow Of 2011.
15:51 🔗 winr4r yeah, just looked that up and found that
15:51 🔗 winr4r delicious was actually one of the best services that i have seen
15:52 🔗 winr4r all together now: "fuck yahoo!"
15:53 🔗 emijrp fuck the internet
15:53 🔗 nitro2k01 Fuck everything
15:53 🔗 * nitro2k01 omniphile
15:53 🔗 SmileyG http://i.imgur.com/PHzN9.jpg
15:54 🔗 SmileyG FUCK IT WITHOUT DEPS!
15:54 🔗 nitro2k01 Heh
15:54 🔗 winr4r haha
15:54 🔗 Wyatt Right, now I have to google for what Yahoo even owns anymore.
16:01 🔗 SketchCow shhh, we're fucking
16:02 🔗 undersco2 lol
16:03 🔗 undersco2 SketchCow: textfiles still down?
16:03 🔗 undersco2 curl textfiles.com
16:03 🔗 undersco2 curl: (7) couldn't connect to host
16:03 🔗 Wyatt If it helps I just described AT as bearers of a "titanic black strap-on archival dildocannon"...
16:04 🔗 ersi nitro2k01: so, what says they won't delete all of the free users content? :P
16:04 🔗 undersco2 ...bahaha
16:04 🔗 ersi nitro2k01: I mean, don't give Yahoooooo too much credit
16:05 🔗 nitro2k01 They're stupid if they shut down a service that brings in the green stuff (this includes even free accounts)
16:05 🔗 nitro2k01 Hey something is actually making money in our empire. LET'S SHUT IT DOWN! MOAHAHAHAHA!
16:05 🔗 ersi They've demonstrated exactly how they work over and over again
16:05 🔗 ersi Just because they get income on a project, doesn't mean they'll let it be
16:05 🔗 ersi We do have #flickrfckr you know, just not grabbing everything continously ;p
16:06 🔗 nitro2k01 Worst case scenario, it'll branch off, or Yahoo will go bankrupt and be split
16:06 🔗 ersi You're now on the lulz list
16:06 🔗 nitro2k01 Come back in 5 years when you've discovered I was right :p
16:07 🔗 SketchCow textfiles.com is down.
16:07 🔗 SketchCow I put the firewall block in the wrong place
16:07 🔗 SketchCow LOVE the nerds all backseat humping me on how I should run my website
16:07 🔗 SketchCow LOVE LOVE LOOOOOOOOOOOOOOVE IT
16:07 🔗 SketchCow Love that.
16:07 🔗 ersi put it in the cloud maaan
16:07 🔗 undersco2 I'm not trying to tell you how to do it
16:07 🔗 undersco2 I just wanted to read the meth textfile
16:07 🔗 Wyatt Put it on the moooon!
16:08 🔗 SketchCow Can I say that a lot? Can I say it in a way that just echoes in the back of your mind for days and days? Aspy nerds guffawing and saying what I should do a and b and c and d as and why it's better and so on?
16:08 🔗 SketchCow Love it
16:08 🔗 SketchCow I want to fuck it and make 10 of it and fuck those and make 100 of it
16:08 🔗 nitro2k01 Sometimes aspie nerds are right
16:08 🔗 nitro2k01 SOMETIMES
16:08 🔗 ersi sometimes they're just loud fucking assholes though
16:08 🔗 SketchCow A broken clock is right twice a day and also doesn't flip out when you move its juice box
16:09 🔗 nitro2k01 And sometimes even both
16:09 🔗 nitro2k01 in the same time
16:09 🔗 undersco2 I think SketchCow just likes to fuck
16:09 🔗 undersco2 regardless of what sentiment it is
16:09 🔗 ersi SketchCow: :D
16:10 🔗 nitro2k01 SketchCow likes to fuck because he's a dick <3
16:11 🔗 ersi point is I don't give a fuck if you're right in five years or not
16:12 🔗 ersi because it doesn't matter
16:12 🔗 nitro2k01 Right. What matters is say something positive about Yahoo -> lulz list
16:12 🔗 nitro2k01 Or even just neutral common sense
16:13 🔗 nitro2k01 Yahoo must be bashed
16:13 🔗 nitro2k01 It's the rite of passage
16:14 🔗 ersi right, totally
16:14 🔗 ersi We'll leave it at that
16:15 🔗 nitro2k01 inb4 someone highlights me in two hours and goes like "Well you see the real point is..."
16:17 🔗 winr4r SketchCow: holy shit that was brilliant
16:19 🔗 SketchCow Me: Blue hair, silver tube top, fishnets, Knee high black biker boots.
16:19 🔗 SketchCow You: Red mohawk, black pentagram gauges, viper piercings.
16:19 🔗 SketchCow I was grinding on you in the pit, then we went to the bathroom, and got f***ed up. You had a nice c**k and I was wasted so I let [you] raw dog it in the stall. You were really good and you had to gag me so I would make too much noise.
16:19 🔗 SketchCow Anyway I'm pregnant. It's yours. contact me if you want to be part of your child's life.
16:20 🔗 SketchCow What's brilliant.
16:20 🔗 nitro2k01 I came and farted.
16:24 🔗 undersco2 SketchCow: hot
16:29 🔗 SketchCow Oh here we go
16:29 🔗 SketchCow Gigabytes of polish cd-roms
16:30 🔗 SketchCow First one in!
16:30 🔗 SketchCow http://archive.org/details/chip-cds will get them as they go
16:31 🔗 BlueMax Good luck with that :P
16:31 🔗 SketchCow http://archive.org/details/chip-cds-1997-0 added
16:31 🔗 undersco2 yay
17:04 🔗 SketchCow http://archive.org/details/chip-cds
17:04 🔗 SketchCow awwww yeah
17:05 🔗 emijrp language attribute is wrong
17:06 🔗 SketchCow Yes
17:06 🔗 SketchCow That's the ingestor.
17:06 🔗 SketchCow After it's done, I'll fix them like THAT
17:15 🔗 * SmileyG is so confused
17:15 🔗 SmileyG so you run a website SketchCow /
17:16 🔗 SmileyG or were you quoting someone? :D
17:19 🔗 SketchCow I run a website
17:23 🔗 SketchCow Almost done with the CDs!
17:23 🔗 SmileyG aye
17:23 🔗 SmileyG Wyatt is updating me elsewhere ;)
17:23 🔗 SketchCow More and more things I can put to bed on batcave.
17:23 🔗 SmileyG Nice video of you at Defcon.
17:26 🔗 SketchCow Whoops, forgot 1998, fixing.
17:28 🔗 SmileyG Have to say, your very good at presenting
17:35 🔗 Nemo_bis For a moment I thought SketchCow had already ripped all my discs.
17:35 🔗 emijrp This is sad, but I don't understand most of English spoken talks.
17:35 🔗 emijrp That includes SketchCow presentations.
17:36 🔗 emijrp Language is a fucking barrier.
17:40 🔗 emijrp https://www.universalsubtitles.org/es/videos/NE0VZdfk5yzP/info/archive-team-a-distributed-preservation-of-service-attack/
17:41 🔗 emijrp #subtitlesteam spread the word about backups writing subtitles in any language
17:48 🔗 emijrp who can help?
17:49 🔗 SmileyG WTF IS UP WITH THAT
17:50 🔗 SmileyG Bulgey McFishhat?
17:50 🔗 chronomex <3
17:51 🔗 SmileyG emijrp: have you tried googles auto translate+cc stuff?
17:51 🔗 emijrp SmileyG: sucks
17:51 🔗 SmileyG Ah :S
17:55 🔗 SmileyG Hnnnm
17:55 🔗 SmileyG you guys got room to grab megaupload? :/
17:56 🔗 Wyatt Can't be "got" in its current state.
17:56 🔗 Wyatt (Last I heard, at least)
17:56 🔗 Wyatt EFF is fighting that fight.
18:00 🔗 SmileyG yeah
18:01 🔗 SmileyG Just, if someone turned up and went "Ok, you don't wanna pay for it, we can store it. Hand it over"..... once the legal issues are done, I think the hosting company would jump at the chance by the sound of things...
18:01 🔗 SmileyG (funny how they were happy to take the money up until then ;))
18:02 🔗 SmileyG heh
18:02 🔗 SmileyG one day, the archive will end up larger than the entire worlds current info :/
18:06 🔗 SmileyG I think that'll be a proud moment
18:14 🔗 SmileyG hmm
18:14 🔗 SmileyG I think I have a new found respect, and a bit of a man crush on SketchCow :O
18:16 🔗 winr4r SmileyG: HANDS OFF HE'S MINE
18:16 🔗 SmileyG hehe
18:16 🔗 SmileyG i wish I had the..... well, money he musth ave :D
18:16 🔗 SmileyG I havel ike £4 spare a month :(
18:18 🔗 emijrp gayteam
18:18 🔗 SmileyG no9t much I can do with £4 heh :D
18:21 🔗 ersi £4/mo? That's not much
18:21 🔗 ersi but it's something
18:22 🔗 mistym Ergh. digigame.info uses completely different html encoding for its different content types, looks like I'll have to special case a bunch of stuff manually. Oh well.
18:23 🔗 winr4r mistym: wonderful!
18:23 🔗 winr4r i love that so much <3
18:23 🔗 winr4r random content encoding <3
18:23 🔗 winr4r p.s. if i see another UnicodeDecodeError i will actually burn an orphanage
18:23 🔗 mistym e.g. journal articles (at least that I've seen so far) use a bunch of unambiguously named divs. Whereas proceedings articles use tables with unnamed elements.
18:24 🔗 mistym winr4r: Oh yeah, that's the other thing I love - random unicode fails.
18:24 🔗 winr4r mistym: oh sorry i thought you meant content encoding
18:24 🔗 mistym winr4r: I was ambiguous, my fault
18:24 🔗 mistym My absolute FAVOURITE case is Excel for Mac.
18:25 🔗 mistym It can export CSV that IT ITSELF cannot read because it uses some crazy text encoding.
18:25 🔗 winr4r haha!
18:25 🔗 winr4r that's beautiful
18:26 🔗 mistym I thought I was doing something wrong when I couldn't figure out what encoding to give it in Ruby's CSV.parse. But no, Excel itself couldn't open the data it produced. Brilliant.
18:27 🔗 chronomex not relevant: http://youtube.com/watch?v=7odAbL3Ygts
18:27 🔗 winr4r mistym: it's an achievement of sorts
18:27 🔗 SmileyG one way encoding \o/
18:40 🔗 shaqfu Well, that was encouraging
18:41 🔗 shaqfu "We haven't seen names named, but the literature mentions companies working to provide DRM-free software for long-term preservation"
18:41 🔗 winr4r i'll believe that when i see it
18:41 🔗 winr4r i.e. never
18:41 🔗 shaqfu winr4r: "encouraging" not "fucking awesome"
18:42 🔗 winr4r shaqfu: hi, btw :)
18:42 🔗 shaqfu winr4r: ohai o/
18:43 🔗 SmileyG http://www.flickr.com/photos/djsmiley2k/4548258767/
18:43 🔗 SmileyG :O
18:43 🔗 SmileyG my cat
18:43 🔗 SmileyG is like
18:43 🔗 SmileyG his cat
18:43 🔗 SmileyG :O
18:43 🔗 SmileyG http://www.flickr.com/photos/djsmiley2k/4488022986/
18:44 🔗 shaqfu You store your soap on the roof?!
18:44 🔗 mistym SmileyG: Aww, your cat's a cutey
18:44 🔗 SmileyG i got 4
18:44 🔗 SmileyG :O
18:44 🔗 SmileyG he talks
18:44 🔗 SmileyG :D
18:44 🔗 shaqfu Same litter?
18:44 🔗 SmileyG or at least tries to. He thanks you if you open the door for him.
18:44 🔗 SmileyG lol Is Jason from Norwich, UK?
18:45 🔗 ersi getting creepy
18:45 🔗 winr4r SmileyG: you're about six million miles out
18:45 🔗 winr4r SmileyG: are you from norwich?
18:46 🔗 SmileyG No, but the cat was :D
18:46 🔗 winr4r SmileyG: oh!
18:46 🔗 SmileyG It was originally my.... wifes brothers girlfriends grans
18:46 🔗 SmileyG she couldn't look after it, she couldn't look after it, we lived with him and my wifes parents, and so he came with us
18:46 🔗 winr4r oh my god, sockington has figured out self-replication
18:46 🔗 SmileyG :D
18:46 🔗 emijrp You talk some hours ago about the closing of YouTube. That closing has been happening since ages. Using a sample of 6500 videos about SpanishRevolution, 3.59% of them were deleted (or accounts closed) after 6 months. You can extrapolate to the YouTube age and the million videos are uploaded.
18:46 🔗 * winr4r just saw the photo
18:47 🔗 SmileyG winr4r: Its freaky aint it?
18:47 🔗 SmileyG their face is slightly different
18:47 🔗 winr4r SmileyG: it really is!
18:47 🔗 SmileyG and apollo has smaller eyes
18:47 🔗 SmileyG But the markings, wow.
18:47 🔗 SmileyG Sorry, larger eyes, smaller iris
18:47 🔗 winr4r SmileyG: i wondered about norwich, i'm from king's lynn
18:47 🔗 SmileyG winr4r: :D
18:47 🔗 winr4r norfolk best county in world
18:47 🔗 SmileyG I have a old school friend who moved to king's lynn
18:48 🔗 winr4r SmileyG: on purpose?!
18:48 🔗 SmileyG his family moved when I was..... 13?
18:48 🔗 SmileyG moved to hunstantington?
18:48 🔗 winr4r ah
18:48 🔗 SmileyG (I've spelt that wrong).
18:48 🔗 winr4r hunstanton
18:48 🔗 SmileyG Yah.
18:48 🔗 shaqfu mistym: In other news, Archivematica is really damn cool
18:48 🔗 SmileyG Wild cats there... heh
18:48 🔗 winr4r hunstanton is nice, king's lynn is a massive shithole
18:49 🔗 mistym shaqfu: Isn't it? Those guys are awesome.
18:49 🔗 mistym shaqfu: They have an IRC channel over on Freenode, though it's not usually too busy.
18:49 🔗 mistym Wait, no. Not Freenode, it was some other server.
18:49 🔗 shaqfu mistym: I hadn't heard of it before this weekend, but it came up at nearly every talk this weekend
18:49 🔗 winr4r SmileyG: fortunately i'm about 6 miles south of it
18:49 🔗 SmileyG who will log all of irc :S
18:50 🔗 mistym SmileyG: Who will bug every public space ;o
18:50 🔗 shaqfu Anyway, AFK, lunch
18:50 🔗 SmileyG winr4r: ah
18:50 🔗 SmileyG I'm
18:50 🔗 SmileyG I'm in coventry...
18:50 🔗 SmileyG don't suppose you also heard the sonic booms?
18:50 🔗 winr4r SmileyG: things will get better
18:50 🔗 winr4r and nope!
18:51 🔗 SmileyG I really quite like cov :d
18:51 🔗 SmileyG D:
18:52 🔗 * winr4r adores apollo
18:52 🔗 mistym Sigh. Twitter is making me jealous. Not only are there tons of #marac tweets, but now Capy games are showing off the ridiculous 25-foot screen installation of Super TIME Force in LA.
18:53 🔗 SmileyG winr4r: hehe
18:53 🔗 SmileyG theres some pics on there of my other cats too
18:54 🔗 SmileyG I don't think i've actually done a "cats" set hto ¬_¬ failure by me there
18:54 🔗 SmileyG anyway, dads birthday meal tonight :/
18:54 🔗 SmileyG laters
18:54 🔗 winr4r bye!
19:05 🔗 shaqfu mistym: I'll refrain from posting beach pics, then
19:06 🔗 mistym ;o
19:06 🔗 shaqfu Nothing like capping a conference with a beach trip
19:08 🔗 winr4r OH GOD
19:08 🔗 winr4r IS JASON GOING TO GO CLEAN SHAVED AGAIN
19:08 🔗 winr4r https://twitter.com/#!/textfiles/status/191239290158710785/photo/1
19:09 🔗 winr4r *suspense*
19:09 🔗 winr4r (talking of twitter)
19:13 🔗 shaqfu It feels surreal seeing him hatless
19:14 🔗 winr4r haha
19:16 🔗 chronomex that will pass
19:17 🔗 emijrp the hat is inside his hair, you will see it after the cut
19:17 🔗 winr4r emijrp: hahaha
19:17 🔗 shaqfu Rofl
19:18 🔗 chronomex yes, I think jason will be bald in 1 hour
19:18 🔗 winr4r no way
19:19 🔗 winr4r i'm going with clean-shaven
19:19 🔗 winr4r no beard
19:19 🔗 chronomex no hair
19:19 🔗 winr4r it was important enough to announce on twitter, so i'm guessing it's the beard
19:26 🔗 winr4r oh, i was wrong!
19:27 🔗 shaqfu Phew; balance of nature not disturbed
19:29 🔗 chronomex I was right
20:10 🔗 SketchCow Oh good
20:10 🔗 SketchCow hairblogging
20:15 🔗 SketchCow Looking good
20:23 🔗 winr4r SketchCow: yes you do! :D
20:24 🔗 DFJustin who wants some cp/m http://archive.org/download/cdrom-rlee-peters-cpm-archive/rlee_peters_cpm_archive.zip/
20:27 🔗 SketchCow Whoops, deleted two cds by mistake
20:28 🔗 SketchCow Shiiiiiit happens
20:28 🔗 SketchCow Ironically shoving it into archive.org
20:28 🔗 SketchCow and I killed it
20:28 🔗 SketchCow I make mistakes too!
20:28 🔗 chronomex wat no
20:28 🔗 SketchCow Pretty commercial CDs, no worries, they'l show again.
20:30 🔗 SketchCow OK, all those CDs have Polish as the language now
20:30 🔗 winr4r excellent!
20:57 🔗 SketchCow I havent done bald recently
21:03 🔗 winr4r you haven't done no-beard in a while either
21:03 🔗 winr4r (you shouldn't, you're a whole lot less scary and Jason Scott without one)
21:05 🔗 SketchCow ha ha
21:05 🔗 SketchCow thanks for the fashion advice
21:06 🔗 emijrp be careful, here are more gays than archivists
21:08 🔗 winr4r i'm not gay!
21:09 🔗 winr4r the whole public bathroom thing was a misunderstanding
21:11 🔗 SketchCow Whoops, fucked up AGAIN
21:11 🔗 SketchCow Where's my hug
21:11 🔗 * winr4r hugs SketchCow!
21:24 🔗 alard winr4r: There were about ten FortuneCity sites on your list that we didn't have, but I have now downloaded those too.
21:25 🔗 winr4r alard: yay!
21:28 🔗 winr4r http://members.fortunecity.com/aaronsmom/
21:29 🔗 winr4r :/
21:29 🔗 winr4r found that while flicking through screenshots earlier
21:32 🔗 emijrp nice
21:34 🔗 winr4r it's a tribute for someone, by people who loved them, done as well as they could in the late 90s
21:37 🔗 emijrp curious, first image fail http://web.archive.org/web/20090203061353/http://members.fortunecity.com/aaronsmom/
21:38 🔗 DFJustin for a while fortunecity was doing referer blocking such that the wayback machine got their placeholder image for everything
21:38 🔗 winr4r DFJustin: ah
21:39 🔗 SketchCow aaronsmom has got it going on
21:39 🔗 winr4r well among other things, that's where i hope the screenshot collection will be useful
21:40 🔗 emijrp this guy http://awt.ancestry.com/cgi-bin/igm.cgi?op=GET&db=lockard-park&id=I52598&ti=5541
21:41 🔗 winr4r the downside: i had to disable javascript in the script i'm using, because their ads had a hilarious "slide up out of nowhere and cover up all the content" thing going on
21:41 🔗 SketchCow Well fuck, I did it FUCKING AGAIN
21:41 🔗 winr4r emijrp: yes
21:41 🔗 SketchCow Well, of 80 CD-ROMs, I murdered 4 in their beds
21:41 🔗 winr4r SketchCow: hey remember that thing you did three times?
21:41 🔗 SketchCow Made a few choices with the scripting I shouldn't have.
21:41 🔗 winr4r i think it'd be a good idea to not do that
21:42 🔗 winr4r seriously though, what happened?
21:42 🔗 SketchCow Pressing control-c during a zip-up makes it go "OK, stop running the zip, but keep running the script that calls it."
21:42 🔗 SketchCow booooo
21:43 🔗 winr4r oh shit :<
21:43 🔗 SketchCow Again, I'm not too worried
21:43 🔗 winr4r and "keep running" means "rm -rf"?
21:43 🔗 SketchCow I can get these
21:43 🔗 SketchCow Well keep running means rm that thing being zipped, yes
21:43 🔗 SketchCow Normally I don't do that, got lazy, made mistake.
21:43 🔗 winr4r ah
21:43 🔗 winr4r so you didn't actually lose anything
21:43 🔗 SketchCow Anyway, I'll just tell that guy we need to re-upload.
21:43 🔗 SketchCow No, I definitely lost stuff that was at arm's reach
21:43 🔗 SketchCow Dude must re-send
21:44 🔗 winr4r bummer
21:44 🔗 SketchCow It's OK, we have a billion of these things going
21:46 🔗 winr4r somewhere, in my loft or piled under other piles of shit i have a couple of magazine cover CDs from the late 1990s
21:46 🔗 SketchCow I just moved the shareware cd collection to the title bar of archive.org's software section.
21:46 🔗 SketchCow It was time to do it.
21:46 🔗 SketchCow Oh my god, I have so many cds, I am considering having someone come over or who is local to me to do it.
21:46 🔗 winr4r from the late 1990s where it's like "hey you don't have the INTERNET but here is SOME OF IT"
21:46 🔗 winr4r i need to get those to you some time
21:47 🔗 winr4r SketchCow: i can imagine, i have a *few*, at most like 5, but i think it was an interesting time
21:47 🔗 emijrp man, read this http://www.chron.com/CDA/archives/archive.mpl/1998_3052078/man-jailed-after-friend-shot.html
21:48 🔗 winr4r emijrp: moral of the story: don't be friends with stupid people
21:50 🔗 winr4r emijrp: WAIT HOLD ON AARON
21:50 🔗 mistym Also: if a friend says "Hey, watch this!" and pulls out a gun, don't stick around.
21:50 🔗 SketchCow Wow, one of the CD-ROMs has been downloaded 1,1442
21:50 🔗 SketchCow 1,442
21:51 🔗 winr4r holy shit
21:53 🔗 winr4r how do archive.org do backups anyway?
21:53 🔗 winr4r i mean that is just an unbelievably huge amount of shit
21:55 🔗 chronomex they duplicate off-site
21:59 🔗 Coderjoe winr4r: they have the data on two nodes locally, and try to duplicate it off-site (like in alexandria)
22:00 🔗 emijrp And the question is, have they lost data?
22:00 🔗 Coderjoe i have to wonder if they regularly scrub items
22:01 🔗 Coderjoe (verify hashes against those in the files.xml file)
22:02 🔗 emijrp TOP SECRET.
22:02 🔗 SketchCow They do things.
22:03 🔗 Coderjoe oh boy. some of these items are crap like full anime episodes
22:06 🔗 BlueMax I just imagine a giant-arse RAID array
22:06 🔗 BlueMax dunno why
22:06 🔗 Coderjoe ugh
22:06 🔗 Coderjoe no
22:09 🔗 BlueMax has this been shared here yet? http://www.masswerk.at/googleBBS/
22:11 🔗 winr4r "ERROR: Quota Exceeded. Please see http://code.google.com/apis/websearch" :<
22:11 🔗 emijrp Typical BBS error.
22:12 🔗 BlueMax lol
22:21 🔗 SketchCow Poor google
22:21 🔗 SketchCow Getting DDOS
22:21 🔗 SketchCow Coderjoe: You realize a lot of this is likely to go dark.
22:25 🔗 SketchCow OK, CDs done
22:25 🔗 SketchCow Going out to see the Comic-Con Documentary... with Morgan Spurlock presenting! And Q&A.
22:25 🔗 SketchCow http://archive.org/search.php?query=collection%3Achip-cds&sort=-publicdate
22:26 🔗 Coderjoe SketchCow: yes. unfortunately
23:30 🔗 Wyatt|Wor This is just morbid curiosity at this point, but that grep is still going.

irclogger-viewer