[01:14] http://netpreserve.org/projects/warc-tools-project - libwarc never got developed [01:15] Sturgeon's Law is in full effect [02:56] so i just found 2 files with the same names [02:56] good thing i'm keeping the file tree [04:26] So wget has had 3 different html parsers [04:28] looking at the one they are using now I wonder if the programmers ever heard of this concept of reusable code. We build these things called libraries [04:28] and the CSS parser is not a parser, it is a fucking guesser [04:30] omf_: I looked at wget some time ago [04:30] what I saw did not please me [04:31] lftp is cleaner, though still pretty messy [04:31] might just be better off writing a clean alternative in python [04:31] There is some good C code in there but the technical debt is through the roof [04:31] For example the warc support that was added in 1.14 has zero tests in the test suite [04:32] and looking at the string of bugs they already fixed in it via the changelog is just fucked [04:33] many tools that have history this long are like that [04:33] balrog, what is the killer feature you use lftp for? [04:33] foreign FTP charsets [04:33] there's no other CLI tool I've found that reliably supports that. [04:33] curl [04:34] also the mirror command is handy, and supports both multifile and multipart [04:34] curl will recursively download a cp1251 ftp server? [04:34] pretty sure I tried and failed [04:35] gotta say, lftp works reliably [04:36] omf_: have you ever had to download a utf16 http site? [04:36] not in years [04:36] 2004/5 ish [04:36] wget can't handle it [04:36] lftp can't handle it [04:36] of course [04:36] I don't think curl can handle it [04:37] philpem wrote some custom python crawlers to do it :p [04:37] https://bitbucket.org/philpem/grabbers [04:38] seriously why does wget's codebase have to be so shitty -.0 [04:38] -.- * [04:39] because there have been too many people who worked on it with no coding standards or unified direction about the application [04:40] *sigh* [04:40] why does that feel like a thing with GNU projects? [04:40] it's the culture [04:40] hacks are cool [04:40] etc [04:40] yes it is a serious culture problem [04:40] if you consider it a problem :P [04:41] not being able to update the code without fixing bugs to make it happen is junk [04:41] yes, that's a problem [04:41] incomplete test suite = your software is shit [04:41] a complete test suite won't prevent code duplication [04:41] gnu focused on little cli apps not libraries as such [04:42] glibc has only recently gotten better since they changed governance [04:42] and kicked that mega-asshole out [04:42] yeah and eglibc pushed them [04:42] no it didn't [04:42] ask them [04:42] since debian and friends got tired of him [04:42] debian never went forward [04:42] pretty sure debian uses eglibc [04:43] nope [04:43] read the follow up blogs they made years later [04:43] the big push was uclibc [04:43] that and the arm guys [04:43] uclibc is different [04:43] they created forks, put on pressure and got code upstream [04:44] they are all sister projects now [04:44] yeah [04:44] before they were not [04:45] gcc has the same design problem. It was not modular at all, hence LLVM getting support since it started modular and design ideas [04:45] yeo [04:45] yep* [04:46] but that was in part RMS being afraid that modular would cause proprietary plugins, which I think is totally BS [04:46] the gcc guys know this is a problem and talk about it every now and again on the mailing list looking for a way to make it happen [04:46] people tried it though [04:46] and what can you do about proprietary plugins with clang? nothing [04:46] since clang allows that by design [04:46] but the world is different now [04:47] Companies see the value of open source [04:47] rewind 15 years and shit [04:47] I remember when libxml2 came out and it killed all the closed source versions except MSXML [04:48] People want to share libraries and collaborate now which is best for everyone [04:48] balrog, you a programmer or a system admin? [04:48] both :p [04:48] I'm a comp-sci student [04:48] graduating in may [04:48] and I do sysadmin stuff for myself and on the side [04:49] why are you asking? just curious? :) [04:50] I was asking earlier to find out where the technical field is for the archiveteam. I am a programming and system admin. [04:50] ahh [04:50] programming since 1986 and admin since 1997 [04:50] I see [04:50] I mess with old computers as a hobby [04:50] I'm somewhat involved with MAME/MESS [04:51] The original or the JSMESS version? [04:51] that's another project where ... well the code isn't as bad as much as it's impenetrable due to lack of good documentation of how the core stuff works [04:51] jsmess isn't a "version" per se [04:51] jsmess uses Emscripten, an LLVM to JS backend, to compile the mainline version into JS [04:51] pretty amazing, eh? [04:52] it's stripped down (one system per binary, rather than all) for size reasons, and some other changes are made, but it's pretty much the same code [04:52] Actually I think emscripten is a terrible idea. Lets build a cross compiler to a programming language that is really not that good [04:53] You ever watch 'Code Rush'? [04:53] JS is not that good but it's ubiquitous [04:53] When they made javascript, they talk about how bad it is and the hopes it could be fixed in the future [04:53] if something else becomes ubiquitous, a cross compiler may be written for that [04:53] but good luck with it [04:54] that is the quandry JS is in [04:54] Google built dart to get around it [04:54] coffeescript [04:54] that shit MS made [04:54] dart still compiles to js [04:54] so does the MS thing [04:54] they all do [04:55] but to get support in all the browsers is impossible mainly because of IE [04:55] Everything else is open source [04:55] yeah [04:55] now that Opera is switching to Webkit [04:55] it's still mostly all Webkit [04:56] you have three engines, Webkit, Gecko, and Trident (IE) [04:56] and that's it [04:56] the syntax for closures in JS is so clunky [04:58] There was nothing ready for the browser when they had to make JS [04:58] you're sure debian didn't switch to eglibc? http://packages.debian.org/search?keywords=libc&searchon=names&suite=stable§ion=all shows "Embedded GNU C Library" [04:58] for squeeze [04:59] blah [04:59] I have to take off [04:59] later [05:03] debian debian debian, they finally made it work [05:05] I remember the big problem was flash didn't work with eglibc but eglibc was updated to fix that [05:13] wget should really use libxml2, then again modern needs are beyond things like wget, httrack, curl [05:26] It takes 1/100th of a second to parse a 1mb html file [06:44] and the parser is thread safe, we truly live in the future [09:10] the archive team logo [09:10] the sword one, whats that based off? [09:11] Adventure time [09:11] lol ok [09:11] and they got it from dungeon siege? [09:41] i found a lost linus interview [10:24] cool [10:34] http://hackaday.com/2011/08/21/this-glados-potato-is-a-lie/ [10:34] I hope that highlighted GLaDOS :D [10:34] It did [10:36] So I had the opportunity to rant about Yahoo to one of my teachers today, and I sure did. [10:37] Also showed the messages tracker. [10:37] Nice. [10:37] Any students interested? [10:37] Nah [10:37] My class is filled with technological dimwits :c [10:37] what do you study? [10:37] I'm just in Year 10! (We don't get to choose course) [10:37] D: [10:38] Jeez bud [10:38] I thought you were far older, congratz [10:38] Next year, I swear.. [10:38] Heh, everyone does. [10:39] Although, the IT teacher here is rather fascinated by the Posterous situation.. [10:40] always good :) [10:40] how old are you then? [10:40] 15 [10:41] we decided on gcse's at year 10 [10:41] so yr 10/11 weren't so bad as the others. [10:42] All we get to choose in Year 10 here is, 1. Do you want to do a higher or lower pathway, and 2. Home Economics or Design Technology? [10:48] we had higher lower math, english, [10:48] choice of science or double science [10:48] and thne between design tech (graphic design basically), cooking, or woodwork [10:49] Ah [10:49] Our Design Tech is woodwork, metalwork, furnishing, etc. [10:50] And hell, we get a Cert 1 in Furnishing at the end of it, so I'm not complaining. [10:56] nice. [13:03] GLaDOS: Is "Year 10" similar to our 10th grade? [13:03] (ie, are you like 16/17?) [13:03] Oh, 15 [13:03] I'm 15. [13:04] Holy shit [13:04] YEah [13:04] Congrats, dude :) [13:04] That's fantastic. [13:04] Lurking here since I was 12, actually started partaking at 13.. [13:05] That's so cool :D [13:05] I wish my 12 year old brother was as cool :P [13:05] Yeah, that's what happens when you have literally nothing else to do. [13:05] (for 3 years, I was in isolated places) [13:12] what underscor said GLaDOS, 15, "Holy shit" [13:56] Wow, I thought you were 14-15 like 2 years ago (when you were using my home server) [14:27] http://imgur.com/rbXnD3m [14:38] nooneyb: https://www.youtube.com/watch?v=MxVdU2eVYSg much? [14:47] dude your in australia [14:47] the whole place is isolated. [15:41] Schbirid a little :P [15:54] https://twitter.com/internetarchive [16:12] WAIT WHAT [16:12] 15 [16:12] I had you pegged at 30 [16:14] tell that to the judge [16:20] heh [16:23] also that means there are geocities pages older than you :o [16:49] :D [16:50] I just realized the keyboard I am using is 14 years old [17:04] that's like almost GLaDOS's age [17:04] wewt :) [17:04] It is the oldest part of my computer. I never got a new one since it just works [17:04] Model M? [17:05] I have my model M from my 80s IBM on my media center. I gave an ergonomic keyboard a shot and it really has helped improve my typing [17:06] Its so weird now because typing on a laptop is hard. It feels so cramped [17:06] hehe, a bit [17:06] Unless you get a laptop with a bigger keyboard layout [17:07] You know there is a talk about designing and building keyboards at OSCON this year [17:29] Some of the spam on the wiki right now is *amazing* [17:29] As i seated straight down at the stand, a pair of connected with our food pets enquired in unison, using eye-opening seems to be on their faces, “Did you notice what is this great? ” [17:29] “Yes, ” My partner and i reacted as i shuffled the couch in as well as unfurled the paper napkin. “They harvested a brand new pope, by Latina America. ”

“No, certainly not which, ” they will reacted. “Google can be closing along Yahoo and google Reader in Come early july 1. ” [17:29] mmmmh, harvest a brand new pope [17:30] What's the purpose of this sort of spam? [17:33] Honestly not sure. The only copy I have is text-only but maybe it was laden with links. [17:35] I can undersatnd "click here for knock off viagra" and spam our product name to make it look popular/trending type spam, but there seems to be a disproportionate amount of "We're testing our random sentence generator and spambot combo" stuff out there. [17:40] I also see it as a way to make a site look bad and drive people away from a project. Who wants to use a site that is mostly SPAM? [17:41] But if that was the aim, wouldn't something more inflamatory work better? [17:43] All that matters is raising the signal to noise ratio. More inflammatory content might get cleaned up faster [17:43] Think of spam as an information war [17:45] Anything that degrades the quality of your data means you are losing [17:46] Also, loads of spam make it way more work to back up thinks, like we're seeing with posterous. [18:42] I've seen tons and tons of markov chains spam and I don't understand why [18:42] not just here, but all over the web in the weirdest places, and it has no links [18:42] THE SKYNET IS COMING :O [18:49] some of it is steganographic messages [18:49] and no this is not me being paranoid, I have evidence [18:49] I think I read a paper on that once [18:50] do you have a link handy? [18:50] I'd love to read that paper [18:51] Searching for it, but I only vaguely remember it so I might not be able to find it [18:51] sure [18:54] chronomex: http://arxiv.org/abs/1101.0350 [18:55] nice work, thanks! [18:56] someone has emailed me asking for removal of a comment on a message board that mentions her, because it ranks highly in google and that's not good for this person [18:56] 1) this person has no google results other than this comment [18:56] 2) this comment is ten years old [18:58] answer that the law forbid you to change 10+ years old comments [18:59] I said that we don't remove comments except by request of the original poster [18:59] which is true, if a bit evasive [18:59] find a way to blame MPAA and RIAA [18:59] who let you in here? [19:00] the door was open [19:00] o ok [19:00] this its not the AA meeting? [19:00] I came for the cookies [19:00] chronomex: wait for the DMCA [19:00] so apparently if you annoy someone on irc, the new CFAA might make it a felony [19:01] https://twitter.com/internetarchive/status/315281157841354752 [19:02] I could use a couple of those cases [19:02] sad that I am too far [19:02] i need moar drives [19:03] So the IA is shucking externla HDDs? [19:03] they get the bundle, but only use the HDDs [19:04] heh, empty hard drive enclosures [19:04] the Hitachi Touro ones have to be pried open, voiding warranty [19:06] I gotta buy another 12tb in the next month, le sigh [19:07] I'm going to decomission my old < 2 TB HDDs in a few weeks :D [19:08] soultcer, that is part of the reason for me getting more drives [19:08] It is rotation time [19:09] I love rotation time [19:11] I xcurrently have 2x 1gb : [19:11] D: [19:12] I had a 1.2GB back in the day [19:12] big foot [19:12] o back then I had 800mb i think [20:01] undersco2, are you awake? [20:01] underscor would be fine too [20:03] * underscor is a cat or something? [20:03] ;D [20:04] lol [20:04] (@joepie91) [20:04] let me PM you [20:04] mk [20:04] also meow [20:04] :3 [20:05] https://twitter.com/ab2525/status/281100349165670401 [20:05] (cc joepie91) [20:05] :P [20:05] underscor: haha [21:07] http://jakonrath.blogspot.com/2013/03/obsolete-anonymous.html [21:13] too true [22:12] hey Famicoman [22:12] i'm starting to find all the lost techtv video ids [22:13] does anyone know of a way to find out what warc wayback machine is uing [22:13] *using [22:14] i want to know so i can download it and zcat it [22:21] https://twitter.com/joepie91/status/316676034932133890 [22:56] hi folks, if you're living in the US, please read this and at least consider calling your Congress people: http://www.techdirt.com/articles/20130324/14342822435/rather-than-fix-cfaa-house-judiciary-committee-planning-to-make-it-worse-way-worse.shtml [23:03] From a tweet: Price of 1 gigabyte of storage over time: 1981 $300,000 1987 $50,000 1990 $10,000 1994 $1,000 1997 $100 2000 $10 2004 $1 2012 $0.10 [23:14] so, the folks at the Mister Wong bookmarking site have applied all your favorite things about freemiums and DLC to a bookmarking site: http://www.mister-wong.com/plans/ [23:16] I wonder how many customers they have [23:16] "bookmarks you can save in total: 10" [23:16] That's cute. [23:24] I didnt know that Triumph of the Nerds has a sequel [23:24] http://www.pbs.org/opb/nerds2.0.1/ [23:31] so, according to this tweet: https://twitter.com/Pinboard/status/316623260714414080 the site wasn't always like that