[01:48] Moo. [03:29] SketchCow: I have a 1.2M five-and-some-change floppy drive sitting wrapped up in a box. Want it? [04:07] Sure. [06:04] I'm planning on borrowing my employer's old, no-longer-used 3.5" autoloader to image a bunch of floppies, if I can figure out the control commands (just need to snoop on the serial data the software sends). I'm thinking of imaging the floppy, ejecting it, and then having a camera take a picture of the label, for when I go back through them [06:05] I also have a 5.25" floppy I need to install and start going through old floppies [06:06] my parents had a few 8" floppy disks, but I suspect they are in a landfill by now [06:09] LOVING MY JOB [06:09] Kind of hungry, though. [06:10] I went low-carb for July. [06:11] I might be lucky. a large number of the 5.35" disks are DD rather than HD [06:14] er, 5.25 [06:16] I am putting the scare into people. [06:16] It needs to be done. [06:22] http://abc80databasen.blinkenshell.org/bilder/ABCdisk.JPG heres my super rig, i'm dumping abc80 ("swedish trash80") disks which i've gotten from our "MIT". If you think it's bad in the US then you should check Sweden, here i'm like the goddamn last of the mohicans when it comes to this sort of stuff ;) [06:23] i can imagine theres a ongoing information genocide in various small countries that had their own computer systems. [06:34] Yes [06:35] I wonder if I can get my 520ST working again [07:45] I'll have something neat to show soon. [07:45] Just something I'm doing for work, not archiveteamy [07:45] But preservationy [07:46] We've been given a new archiveteam machine, too [07:46] So that'll be happening [08:07] choopa appears to hate life tonight [11:47] so... who's got an up to date copy of astronautix.com? [11:47] I've got a wget -r rip from Jun 2010 if anyone wants it, but I'd love it if someone has a fresher copy [11:48] archive.org seems to have it from jan 2010... [11:59] ex-parrot: I have a copy [11:59] that's cool. glad some other folks have copies now that it seems to be gone :( [12:00] denial of service? [12:00] nope, he's pulled it [12:01] yea, but the page says there was a DOS, so he pulled it [12:01] I got the impression it was pulled in a "not coming back" kind of way [12:01] but fingers crossed I guess [12:03] * ex-parrot requires sleep [12:06] I'm essentially out of disk space [14:55] So, what projects do we still have going on right now? [14:56] There's that list you gave me, sir [14:57] The WGET/WARC thing has really amazed archive.org [14:57] I'm about to start pulling friendster over to the new machine [14:57] I need to name the new machine [14:58] The new machine has 27 terabytes [14:58] Which sounds like a fraction of the other, but this is a pooled drive, also better maintained/mirrored [14:58] We're using a smaller amount. [14:58] And the other machines are available for video and similar emergencies. [14:59] Aha [14:59] Myspace is now bought out, so I'm not 100% worried of it [14:59] That's good [14:59] Call it deuterium [14:59] :D [14:59] (Heavy hydrogen) [15:00] Me and two buddies here just ordered some from united nuclear, so we can make sinking ice cubes [15:00] Unfortunately, it causes sterility in high concentrations [15:01] SketchCow: I know there's the list, but what is the next priority? [15:02] Wiki [15:02] I'd like the wiki to be better and cleaned [15:02] I'm working on finalizing the starwars forums stuff, but that's mostly autonomous [15:02] I've been working on away from keyboard. [15:02] Since it's sorting like 35k threads and 115 mil profiles [15:02] Can I get admin on the wiki? [15:02] http://www.archive.org/details/awayfromkeyboard [15:03] That collection looks like a really neat idea [15:04] Needs some refinement, but it will be. [15:04] This is related to Len Sassaman [15:06] I see [15:13] Ugh, class is so boring today >:| [15:18] Damn wifi >:| [15:57] Educate yourself [15:57] Class shouldn't be boring [16:15] Fuck, time got away from me today. [16:37] SketchCow: It is boring, it's just people presenting stuff [16:37] s/is/was/ [16:37] Now we get to go bowling! [18:24] Bowling! [19:07] soultcer: what's the matter with your ggroups discovery instance(s)? [19:22] I'm about to a do a massive purge of shitty users on the archiveteam wiki. [19:26] http://www.archiveteam.org/index.php?title=Special:ListUsers&limit=500 [19:26] Watch me purge purge purge, baby [19:27] wow. lots of spammers [19:27] "breast enhancement gum"? [19:28] ndurner: your tracker seems to be working quicker now. [19:28] I've downloaded 44GB in the last few days. [19:29] of google-groups archives. [19:29] Good! [19:31] ndurner: out of curiousity, have you made any changes to make things quicker? [19:31] or did things just speed up at google? [19:31] not yet. [19:31] hmm [19:32] There seems to be an issue on soultcer's side [19:32] which means we have more resources available for processing downloads [19:41] ndurner: Checking right now [19:42] The instances seem to be running fine on a total of 7 vServers [19:44] hrm, ok, thanks [19:44] We're currently processing 99 dirs/hr. That used to be 10x as much. [19:44] Would it be of any interest to make a generic tracker for these kind of things that anyone could just start using for projects? [19:44] yes [19:45] I have been thinking of that [19:45] tomorrow i will have 50/5 mbit connection at home, 10x my current upload rate [19:45] Hmm. Ok. I'll keep that in mind. [19:45] ~600kb/s [19:45] One thing that could help is Apache Zookeeper [19:46] after some test, i will try to upload Jamendo to Internet Archive [19:46] disadvantage: Java [19:47] Can someone help me with the user purge? [19:47] I am going after the obvious ones, of course. [19:47] people called 20393 fuf dslfhlkfjdf buy cialis 20202 [19:47] But I think there are other people, with no contributions, who should be purged. [19:50] I HAVE GOOD NEWS FOR YOU. [19:50] What do you want SketchCow? block them? [19:51] I am merging and deleting. [19:53] I have purged all the users of spammishness, I believe, up to "7" [19:53] In terms of first letter. [19:53] I'm using the 500 view to see what's there. [19:53] It goes down to T. [19:53] I think I can get it under 500, once spam is gone. [19:53] There was a rapefest of spam users in sept. of 2009 [19:53] So that's helping. [19:53] It appears that I don't have permissions to delete/merge users [19:54] Also, why am I excess flooding? [19:54] No, just msg me usernames. [19:55] SketchCow: http://archiveteam.org/index.php?title=Special:BlockList also contains users that have been blocked, but not merged and deleted [19:58] I could possibly hack something up to pull the userlist and then spit out a list of people with a 0 editcount [19:59] using the mediawiki api [19:59] OK, all the users fit on one page. [20:04] Userlist coming under control. [20:07] he, i developing a tool for statistical analysis of mediawikis [20:07] look at this graph of AT wiki http://img232.imageshack.us/img232/6559/usereditsnetwork2.jpg [20:07] nodes = users, edges = how many pages were edited by both users [20:11] activity http://img703.imageshack.us/img703/5333/archiveteamactivity.jpg [20:12] http://www.archiveteam.org/index.php?title=Special:ListUsers&limit=500 [20:12] Much nicer. [20:12] which day is day 0? [20:13] good question, looking at the code [20:13] i guess sunday [20:14] hmm [20:14] yep [20:14] %w http://www.somacon.com/p370.php [20:17] if you think about other interesting graphs, tell me [20:19] Coderjoe gave me a list of zero contribution users, killing now [20:19] If only this worked at office buildings [20:22] ha ha vat a bloodbath [20:24] OK! Totally cleaned. [20:24] http://www.archiveteam.org/index.php?title=Special:ListUsers&limit=500 [20:25] Probably a few assy ones stuck in there but we no longer look totally owned [20:27] congrats [20:27] Great job, thanks Coderjoe, that helped a ton. [20:27] If you guys still stumble on some assery, let me know. [20:27] Pages or other things. [20:28] i think recaptcha scares spambots [20:29] and disturb legit users : P [20:29] i dont see by being an admin, but i remember while trying to post links, enter a captcha a tiem [20:30] i've had some weird recaptchas before [20:31] like http://img411.imageshack.us/img411/9711/recaptcawhat.png and http://img839.imageshack.us/img839/1474/recaptchawha.jpg [20:33] he [20:33] look this http://img844.imageshack.us/img844/6449/smostresilientbittorren.png [20:34] close to this one http://www.gully.org/~mackys/lj/captcha-reimann.png [20:35] Oh jesus, my floppy post caught fire [20:35] and then the FIRE CAUGHT FIRE [20:35] It's on Digg [20:35] jwz has said something [20:35] It's crazy [20:50] no description, 199mb .doc, IA has weird stuff http://www.archive.org/details/wetyuaw964839 [20:52] another one http://www.archive.org/details/uaiwtybiway396793 [20:52] i think they are not books, but chunks of a large file [20:52] movies/warez? [20:52] look at the download counter [20:53] 7000+ [20:53] yep http://taifon.net/video/saymove/5f86728f48060d39/ [20:53] fuckers [20:53] The second file you linked is a mpeg file renamed to .doc, I assume the other one is a mpeg file too [20:57] another video as doc, 55,000+ downloads http://www.archive.org/details/account-text0023es [20:58] there is no report link [20:58] abuse@archive.org? [20:59] Yeah, this is a known problem and archive.org works hard to.... [20:59] * SketchCow keeps watching the anime [20:59] WILL SHE CONFESS HER LOVE TO HIM???? [20:59] WILL HE KNOW BEFORE THE SAKURA FESTIVAL?????? [20:59] * SketchCow hugs his otaku pillow [21:00] I bet there is no Dragonball anime on IA archive servers. I am pretty sure one fight would take enough storage to fill a whole internet archive rack. [21:02] I just let archive.org know about the animes. [21:02] It's an ongoing problem. [21:02] did you hear about Captain Tsubasa matchs? [21:03] SETI@home is a project to recode Captain Tsubasa episodes to Xvid. [21:04] SketchCow: request a report link [21:04] hehe [21:05] info@archive.org [21:05] It would be best if you just compiled up a list instead of 3,000 letters [21:06] Man, look all that wionywioyuwowiwrionuwprnpy random on http://www.archive.org/search.php?query=%28collection%3Atexts%20OR%20mediatype%3Atexts%29%20AND%20-mediatype%3Acollection&sort=-week [21:08] http://www.google.es/#sclient=psy&hl=es&safe=off&source=hp&q=%2Bdoc+%2B199.8M+site:http%3A%2F%2Fwww.archive.org%2Fdetails%2F&aq=f&aqi=&aql=&oq=&pbx=1&bav=on.2,or.r_gc.r_pw.&fp=4036e66a30b4edf1&biw=1320&bih=600 [21:10] Internet Archive may offer a full dump of their metadata, to scan all this shit. [21:15] OK, e-mail sent. [21:54] go jason go http://www.archiveteam.org/index.php?title=Special:RecentChanges [21:54] oh wait it was scrolled up [21:58] not sure why they bother when you can just straight up upload anime and nobody catches it http://www.archive.org/details/DeadmanWonderland [21:58] for those who may think I'm dead, I'm actually working on a project involving scanning and sharing about a half-million pages of ring-bound documents. [21:59] It's a good fraction of these: http://en.wikipedia.org/wiki/Bell_System_Practices [22:00] ooh [22:00] DFJustin: hahah [22:01] figuring out how to separate and metadataify documents scanned from the same hopper has been interesting. we're choosing to go with separator pages with fill-in-the-dots "this next document is doc.nr:" spaces, and bingo-marker the shit out of it [22:03] how are you identifying those pages so the processing software sees them, in the (rather odd) case that some document has something that looks similar [22:04] I fucking love the BSPs [22:04] going to be some sort of registration/identification marks (maybe a qr code?), and also a thick black rectangle enclosing all the metadata marks [22:04] I had a strike manual stolen from a CO [22:04] had to return it to the stealer, he was caught [22:04] But I was transcribing [22:05] SketchCow: nice. current hosts of these documents have graciously agreed to share these scans on archive.org. [22:05] we have four pallets worth. [22:05] The strike manual is the best, it tells you how to barracade the CO and how much food to buy [22:05] ! [22:05] Let me know if you need me to facilitate the collection [22:05] hahahaha, do you know the document number? I can jump it up in the queue :) [22:05] I totally would appreciate that, I'm estimating this will be about 1T of imagery. [22:06] http://pdfs.telephonearchive.com/bsps/ [22:06] Then yes, I am your guy. [22:07] We'll make it happen. [22:07] Do it in e-mail. [22:07] jscott@archive.org [22:07] we actually intend to scan every page of ring-bound paper in the telephone museum, including operation and mtce manuals for the panel switch [22:07] ok [22:08] grand, don't expect anything to be done immediately. we did get a scanner this week though. [22:08] Do you want to wait until the new DIY book scanner is finished, and I can send that in. [22:09] we're removing these from the ring binders and putting them through a sheetfeeder; they're all looseleaf. [22:09] is that from Dan? [22:10] DIY Bookscanner Dan's friend Andy is working with me on this [22:10] Yeah [22:10] I kinda wish I hadn't thrown out my boxes of old computer shoppers something like 9 years ago. these were the old telephone-book-thick ones [22:10] Yeah, those computer shoppers need scanning. [22:10] Oh, don't worry, we'll get it all! [22:10] :D [22:10] I need more metadata warriors, I need them constantly. [22:11] The current batch is doing well, always worried about burnout. [22:11] But come on, arcade manuals! [22:11] BSP project will generate a fuckload of nearly metadataless scans; I'm considering outsourcing to mechanical turk [22:12] There's an argument for this. [22:13] Cost is an issue. [22:13] I'm also working on scanning and OCRing the code for the #3 ESS: https://plus.google.com/118060174030033503719/posts/Hi7J7hfpCsv [22:13] Yeah. I'm willing to pay for some, and the museum does have some budget. [22:13] I'd love to discuss this, but I have to go. Birthday party. On a boat. In NYC. [22:13] farewell; have fun! [22:14] And I'm 1.5 hours north, and 45 minutes until party starts. [22:14] go! [22:14] But yeah, bring me in on this, you'll get a collection, etc. [22:14] rad [22:20] SketchCow [22:20] did you want zoink.it archive? [22:31] woah [22:31] http://www.kryoflux.com/ [22:55] Well, it's a good thing I bought a 3-pack of motherboards off eBay last time -- looks like that storm fried the old box's moboard. [22:55] 'Cause what I wanted to do this weekend was drive out to MicroCenter for thermal paste. [22:55] Stupid lightning. [22:56] At least you have a microcenter nearby. If I wanted to go to microcenter, I would have to drive 4-8 hours round-trip depending on which store and traffic conditions. [22:57] True 'nuff. Only a half-hour out of my way, but still an annoyance. [22:57] there are ratshacks and small computer stores nearby, though [22:58] Hm. Would a Radio Shack carry that? It's not a cellphone so I'm dubious. [22:58] "Web only". Jerks. [22:59] one of the local stores still had a small collection of components, last time I was in [22:59] nowhere near as extensive as when I was a kid, though [23:00] Oh no, not at all. I can't remember the last time I went into a Radio Shack and knew less than the person "helping" me. [23:00] I'll probably check on Saturday anyway, since there's one pretty much on my way to anywhere around here I'd want to go. [23:01] yeah... I say their slogan (is it still the current one) as "You've got questions? So do we." [23:01] I usually used "You've got questions, we've got cell phones." [23:02] haha [23:02] Though in fairness, back around 2003 or so when I needed a peizo buzzer for my old Civic (it wouldn't beep at you when you left your lights on) they did have the exact part I needed. [23:02] they did give me a box of blank 5 1/4" floppies once because it wasn't even in inventory [23:03] Heh. I could imagine the current crop of them being confronted with 5.25s. "WHat kind of messed-up frisbee is this?" [23:04] Cool about the disks though. I sure wish I'd gotten a deal like that back when 3.5s were a buck a pop. [23:18] it was such a bummer when aol switched to cd, no more free disks [23:19] I used a few that way too. Then started using the DVD-type cases they mailed the CDs in for a while. [23:19] Then they finally stopped sending them. [23:52] hi folks, is GPT partition tables only for 3 TB or larger drives, or is there some benefit I would get over the standard partition table for formatting an external drive? [23:58] wow- I've never seen wikis merge their spam users into one user before- usually they just get deleted & perma-banned [23:58] we're archivists.