[00:17] I noticed archiveteam.org is 508ing. Is there somewhere we can donate to help pay the bills? [00:23] hmmm it's loading here, thats weird [00:23] Loading again for me [01:11] !list [02:31] I'm not afraid of being donated cash, who cares about the legal. [02:31] But the archive.org advantages to donating there instead are huge. [02:52] Is there a way to have 1 warrior instance work on multiple projects? [03:02] mutoso: you can switch back and forth to get different jobs in the warrior's queue, but there's no permanent way to say "allocate N workers to this, M to that" [03:04] Darn, alright. [03:06] 495470 welp perhaps tomorrow ill hit 500k [04:04] new small shortener i just heard about http://dft.ba [04:14] if anyone wants all, or at least everything up to the very recent, of ao3, here http://archive.org/details/Ao3ArchiveCrawl done with this https://code.google.com/p/fanficdownloader/ . i dont know if i posted this already, but i thought it would be good anyway. [04:15] btw, thats just the stories [07:21] http://xteensx.info/mia-manarote-a-spanish-afternoon-hot-hardcore-scene/ [08:41] Why did people add last.fm info to the wiki? I thought this was supposed to be an under our hat project [08:42] well it's out in the public now... [08:45] Nothing on the wiki page gives away any important info so that is good. [08:46] Please review the IRC channel list and status to see if anything is missing http://www.archiveteam.org/index.php?title=IRC [09:02] omf_: #archiveteam-twitter isn't archiving twitter. It's tweets about us on twitter. [09:02] All outputted by swebb bot [09:03] aah [09:03] I will fix that [09:04] Also get rid of #ArchiveMeme, it doesn't exist anymore [09:06] I'd rename "In use channels" to "General channels" [09:06] And put -twitter up in there [09:08] ugh "Resource Limit Is Reached" again and again [09:11] Got the changes saved. [09:14] Oh boy, people who develop websites for case-insensitive filesystems! [09:14] FUCK YOU DEVELOPER [09:14] Is it at least unicode aware [09:15] Not sure. [09:16] What language is it in? [09:16] GLaDOS: opz [09:16] me, omf [09:16] in all channels. [09:17] Plz '_' [09:17] \o/ [09:17] ty bud. [09:17] Helps to identify us to people asking questions [09:17] and epeen, ofc [09:17] It's all about epeen [09:18] Also, lets alll go to the #archiveteam-bs [09:19] http://xteensx.info/italian-amateur-slut/ [09:20] (╯°□°)╯︵ Ɫʎoquɯɹǝƃ [12:23] Here is a well thought out and extensive 74 page paper on web crawling https://research.microsoft.com/pubs/121136/1500000017.pdf [12:23] I am reading it now. [12:46] glitch is still going at 830mb [13:25] http://www.archiveteam.org/index.php?title=Rescuing_Floppy_Disks is good stuff. [14:29] SketchCow: there's some things I wrote on actually archiving DOS/Windows 3.5'' floppies on the discussion page- if they're any good, could you move them to the main page? [15:56] so if yahoo was to announce that they're shutting down some portion of yahoo groups, what would we do? [15:57] cry havoc and let slip the warriors of archive? [15:57] well I have this yahoo group archiver script working [15:58] problem is that you have to be a member of a yahoo group to access all the data on it [15:58] and many groups require approval [15:58] oh, that would complicate matters [16:06] On the other hand, if the messages weren't publicly visible in the first place, they probably shouldn't be in a public archive. [16:24] dashcloud: I trust you, integrate them [16:26] say, does the tracker assign items based on size and past performance? E.g. larger items to clients that have previously completed large items? [16:26] (if that metric is even known to the tracker) [16:28] InitHello: No, it assigns items randomly. We generally don't know the size of an item before it's downloaded. [16:29] yeah, I figured the size would be hard to pre-determine [16:29] I just noticed that the longer a warrior runs, the bigger the datasets tend to get [16:48] Well, that's not entirely true. [16:49] But what does happen is that longer sets take longer, while little sets get chewed through crazily. [16:49] right, that's a more reasonable explanation [16:49] So if you have enough people in, all the 1k files get slammed through in seconds, meanwhile the 250mb mofos sit there and ruin someone's day [16:49] Eventually, it's all 250mb mofos [16:50] Or, and we've seen this on some runs, 5gb mofos [16:50] I'm running it on one of my servers, so nothing is being ruined for me :D [17:01] GIF for "ARCHIVE TEAM WARRIORS RUNNING AT FULL CAPACITY" http://i.imgur.com/M8Ul6p5.gif [17:13] I am chewing some 5GB mofo atm [17:13] or well two actually, humbug [17:17] 742MPH [17:18] 250Mb mofos? We are getting 8Gb mofos from formspring XD [17:18] Someone needs to poke alard and tell him to reassign the formspring user. I have failed due to a dodgy desk :*( [18:05] It'll work out. [18:11] thanks SketchCow - I addedd the info to the main page on dumping DOS/Win floppies [18:11] feel free to change or adjust the page as needed [18:19] Just a quick note [18:19] The rest of the article is written in a third person, here's how it is done style. [18:20] Yours is written like HAI GUYS THIS IS HOW I DOS [18:20] Easily fixable, and I will, but keep it in mind in the future [18:20] Also, got italics wrong. Use preview next time and do repairs before it's clicked in. [18:23] dashcloud: in short: write protect all your disks FIRST (since windows likes to write media ids to the boot sector which will 'brick' certain hp boot disks for logic analyzers etc) [18:23] and use dd or winimage to do a full cooked sector dump, OR BETTER YET use teledisk or imagedisk to create a .td0 or .imd of the disk, which contains more useful metadata [18:24] imagedisk and teledisk should be able to correctly image weird-sectored formats like dmf, 2m-f, 2m-m and that funky linux fdutil format [18:25] So much nerd [18:25] please add that to the wiki page- I wrote what I know, no more [18:25] I'm ripping this out, dashcloud [18:25] that's okay [18:25] I think this sort of intense how-to needs to be a different page linked from this one. [18:25] And if Lord_Nigh wants to out-nerd on this process, that should be on the broken-out page too. [18:28] SketchCow, I just updated http://www.archiveteam.org/index.php?title=How_to_use_our_wiki with your points. [18:31] dashcloud: the info you posted was correct though could use some expanding upon [18:37] Oh jesus, someone sent me 500 3.5" disks [18:37] dashcloud: Make a new page, called RAWRITE [18:37] Put your stuff there [18:37] Wait [18:37] No. [18:37] Make a new page called DOS Floppies [18:37] Put it there [18:37] Then Lord Nigh can come in and make it, apparently perfect. [18:37] Sound good? [18:38] We'll then link from the Floppies page. [18:38] I'm actually leaving shortly for dinner- if no one gets to it before then, I'll do it when I get back [18:38] thanks for looking at it [18:42] i can't make it perfect; linux doesn't allow low enough level access to the floppy controller to image some weird disks which teledisk and imagedisk can, so you actually need an old pc runnig dos to use thiose effectively [18:42] Lord_Nigh, not even with dd [18:42] for most NORMAL disks with 512 byte sectors of the usual number per track, dd or ddrescue in linux works fine [18:43] dd will sort of choke on 2m formatted disks iirc since the sector size per track can vary [18:43] though it might work. maybe i'm wrong [18:44] "normal" 1.44mb floppies have 18 sectors per track, two sides, 80 tracks and 512 bytes per sector for a total of 1474560 bytes of storage space [18:45] dd can read those fone [18:51] 502455 and im over 500k [19:30] SketchCow: sounds like you need a trace machine [19:33] electron microscope ftw. [19:38] Smiley: http://www.ebay.com/itm/Trace-Tracer-ST-3-5-Standalone-Automatic-Floppy-Diskette-Duplicator-/140775279176 [19:38] something like that [19:38] :P [19:47] anyone else have ideas of more content to add to this page http://www.archiveteam.org/index.php?title=How_to_use_our_wiki [19:49] same seller - steel bars..... wat [19:50] Smiley: that was a recycler I think [20:25] Downloaded: 17298 files, 175G in 2h 52m 11s (17.3 MB/s) [20:25] FINISHED --2013-03-31 14:28:08-- [20:27] wow... [20:27] Asimov Apple archive. [20:28] ah [20:28] be warned, stuff keeps getting added to it [20:29] You don't say [20:29] I better come up with a brand new way of handling this terrible new problem, like putting a date on it or something. [20:29] ;) [20:29] Of the 175gb, 150 of it is documentation and emulators, by the way [20:35] Also, jesus, Formspring. [20:37] I WISH I'D BEEN INFORMED ABOUT FORMSPRING [20:38] Because even though I'd cleared that drive nicely, we're up to 7.3gb of downloaded formspring. [20:38] ugh... [20:39] 7.3tb already, shit that is a big site [20:39] God, I hope that they don't close it down to the public today. [20:40] SketchCow: I've been screwing with yahoo group archiving [20:40] god forbid yahoo ever shuts groups down [20:40] so much is only accessible to approved group members :/ [20:40] id offer some hints, but its been 3 years or so since ive been in the labs [20:40] grrr [20:40] my bad [20:41] doing too much at the same time [20:42] glitch is up to 863mb warc.gz and still chugging [20:45] 3370765453 2013-03-31 22:45 formspring.me-MadzNasri-20130330-072030.warc.gz [20:45] 3.2GB -.- [20:45] got another 1.2GB one going too [20:46] - Downloaded 193540 URLs, found 19813 usernames [20:46] chug chug ... zzz [20:57] balrog_ | so much is only accessible to approved group members :/ [20:57] a robot that asks for access to all the groups? [21:00] chronomex: a lot of groups require a reason to join [21:00] that a moderator reviews [21:01] sure [21:01] I know [21:18] > Content-Length: 27771838382 [21:18] Fuck yesssssssss [21:19] that is big [21:19] what is that? [21:20] Asimov Archive [21:21] Minus documentation and emulator sections. [21:22] Are the docs all scans? [21:25] They range. Greatly. [21:28] wow [21:44] http://archive.org/details/asimov.apple.archive.2013.03 [21:44] There it is, ready to hate life. [21:45] (It's duping in the 27gb zip file) [22:33] https://vimeo.com/61059533 just came up. [22:34] This talk is a very interesting one for one main reason: Waza was held in a large, long space, with other things going on, and so the divisions between the speaking areas and the chatting and the food areas were not very present. As a result, you are seeing what I do when I am getting ZERO feedback from the audience because I simply can't hear them. [22:35] SketchCow, That is rough [22:42] Mostly, I think it makes me seem a bit pushy, because I'm not feeding off the audience [22:43] I got a few laughs I could hear, but not much else. [22:49] At certain points it is slightly faster than what you show in previous talks, but it is still good. [22:49] downloading it now [22:49] Also their camera work, lighting and sound is quality [22:49] your talk [22:49] Was it bright on that stage? [23:54] https://archive.org/details/don_maslin_archive is another candidate for disk drives