#archiveteam 2013-03-31,Sun

↑back Search

Time Nickname Message
00:17 πŸ”— daxelrod I noticed archiveteam.org is 508ing. Is there somewhere we can donate to help pay the bills?
00:23 πŸ”— Smiley hmmm it's loading here, thats weird
00:23 πŸ”— daxelrod Loading again for me
01:11 πŸ”— Antimac !list
02:31 πŸ”— SketchCow I'm not afraid of being donated cash, who cares about the legal.
02:31 πŸ”— SketchCow But the archive.org advantages to donating there instead are huge.
02:52 πŸ”— mutoso Is there a way to have 1 warrior instance work on multiple projects?
03:02 πŸ”— yipdw mutoso: you can switch back and forth to get different jobs in the warrior's queue, but there's no permanent way to say "allocate N workers to this, M to that"
03:04 πŸ”— mutoso Darn, alright.
03:06 πŸ”— WiK 495470 welp perhaps tomorrow ill hit 500k
04:04 πŸ”— bsmith094 new small shortener i just heard about http://dft.ba
04:14 πŸ”— bsmith094 if anyone wants all, or at least everything up to the very recent, of ao3, here http://archive.org/details/Ao3ArchiveCrawl done with this https://code.google.com/p/fanficdownloader/ . i dont know if i posted this already, but i thought it would be good anyway.
04:15 πŸ”— bsmith094 btw, thats just the stories
07:21 πŸ”— mansgrf http://xteensx.info/mia-manarote-a-spanish-afternoon-hot-hardcore-scene/
08:41 πŸ”— omf_ Why did people add last.fm info to the wiki? I thought this was supposed to be an under our hat project
08:42 πŸ”— BlueMax well it's out in the public now...
08:45 πŸ”— omf_ Nothing on the wiki page gives away any important info so that is good.
08:46 πŸ”— omf_ Please review the IRC channel list and status to see if anything is missing http://www.archiveteam.org/index.php?title=IRC
09:02 πŸ”— GLaDOS omf_: #archiveteam-twitter isn't archiving twitter. It's tweets about us on twitter.
09:02 πŸ”— GLaDOS All outputted by swebb bot
09:03 πŸ”— omf_ aah
09:03 πŸ”— omf_ I will fix that
09:04 πŸ”— BlueMax Also get rid of #ArchiveMeme, it doesn't exist anymore
09:06 πŸ”— GLaDOS I'd rename "In use channels" to "General channels"
09:06 πŸ”— GLaDOS And put -twitter up in there
09:08 πŸ”— omf_ ugh "Resource Limit Is Reached" again and again
09:11 πŸ”— omf_ Got the changes saved.
09:14 πŸ”— GLaDOS Oh boy, people who develop websites for case-insensitive filesystems!
09:14 πŸ”— GLaDOS FUCK YOU DEVELOPER
09:14 πŸ”— omf_ Is it at least unicode aware
09:15 πŸ”— GLaDOS Not sure.
09:16 πŸ”— omf_ What language is it in?
09:16 πŸ”— Smiley GLaDOS: opz
09:16 πŸ”— Smiley me, omf
09:16 πŸ”— Smiley in all channels.
09:17 πŸ”— Smiley Plz '_'
09:17 πŸ”— omf_ \o/
09:17 πŸ”— Smiley ty bud.
09:17 πŸ”— Smiley Helps to identify us to people asking questions
09:17 πŸ”— Smiley and epeen, ofc
09:17 πŸ”— GLaDOS It's all about epeen
09:18 πŸ”— GLaDOS Also, lets alll go to the #archiveteam-bs
09:19 πŸ”— germnboy7 http://xteensx.info/italian-amateur-slut/
09:20 πŸ”— omf_ (ҕ¯Â°Ò–‘°)ҕ¯ï¸¡ Ò±’ÊŽoquɯɹǝƃ
12:23 πŸ”— omf_ Here is a well thought out and extensive 74 page paper on web crawling https://research.microsoft.com/pubs/121136/1500000017.pdf
12:23 πŸ”— omf_ I am reading it now.
12:46 πŸ”— omf_ glitch is still going at 830mb
13:25 πŸ”— SketchCow http://www.archiveteam.org/index.php?title=Rescuing_Floppy_Disks is good stuff.
14:29 πŸ”— dashcloud SketchCow: there's some things I wrote on actually archiving DOS/Windows 3.5'' floppies on the discussion page- if they're any good, could you move them to the main page?
15:56 πŸ”— balrog_ so if yahoo was to announce that they're shutting down some portion of yahoo groups, what would we do?
15:57 πŸ”— InitHello cry havoc and let slip the warriors of archive?
15:57 πŸ”— balrog_ well I have this yahoo group archiver script working
15:58 πŸ”— balrog_ problem is that you have to be a member of a yahoo group to access all the data on it
15:58 πŸ”— balrog_ and many groups require approval
15:58 πŸ”— InitHello oh, that would complicate matters
16:06 πŸ”— creature_ On the other hand, if the messages weren't publicly visible in the first place, they probably shouldn't be in a public archive.
16:24 πŸ”— SketchCow dashcloud: I trust you, integrate them
16:26 πŸ”— InitHello say, does the tracker assign items based on size and past performance? E.g. larger items to clients that have previously completed large items?
16:26 πŸ”— InitHello (if that metric is even known to the tracker)
16:28 πŸ”— alard InitHello: No, it assigns items randomly. We generally don't know the size of an item before it's downloaded.
16:29 πŸ”— InitHello yeah, I figured the size would be hard to pre-determine
16:29 πŸ”— InitHello I just noticed that the longer a warrior runs, the bigger the datasets tend to get
16:48 πŸ”— SketchCow Well, that's not entirely true.
16:49 πŸ”— SketchCow But what does happen is that longer sets take longer, while little sets get chewed through crazily.
16:49 πŸ”— InitHello right, that's a more reasonable explanation
16:49 πŸ”— SketchCow So if you have enough people in, all the 1k files get slammed through in seconds, meanwhile the 250mb mofos sit there and ruin someone's day
16:49 πŸ”— SketchCow Eventually, it's all 250mb mofos
16:50 πŸ”— SketchCow Or, and we've seen this on some runs, 5gb mofos
16:50 πŸ”— InitHello I'm running it on one of my servers, so nothing is being ruined for me :D
17:01 πŸ”— SketchCow GIF for "ARCHIVE TEAM WARRIORS RUNNING AT FULL CAPACITY" http://i.imgur.com/M8Ul6p5.gif
17:13 πŸ”— Trancer I am chewing some 5GB mofo atm
17:13 πŸ”— Trancer or well two actually, humbug
17:17 πŸ”— Smiley 742MPH
17:18 πŸ”— Smiley 250Mb mofos? We are getting 8Gb mofos from formspring XD
17:18 πŸ”— Smiley Someone needs to poke alard and tell him to reassign the formspring user. I have failed due to a dodgy desk :*(
18:05 πŸ”— SketchCow It'll work out.
18:11 πŸ”— dashcloud thanks SketchCow - I addedd the info to the main page on dumping DOS/Win floppies
18:11 πŸ”— dashcloud feel free to change or adjust the page as needed
18:19 πŸ”— SketchCow Just a quick note
18:19 πŸ”— SketchCow The rest of the article is written in a third person, here's how it is done style.
18:20 πŸ”— SketchCow Yours is written like HAI GUYS THIS IS HOW I DOS
18:20 πŸ”— SketchCow Easily fixable, and I will, but keep it in mind in the future
18:20 πŸ”— SketchCow Also, got italics wrong. Use preview next time and do repairs before it's clicked in.
18:23 πŸ”— Lord_Nigh dashcloud: in short: write protect all your disks FIRST (since windows likes to write media ids to the boot sector which will 'brick' certain hp boot disks for logic analyzers etc)
18:23 πŸ”— Lord_Nigh and use dd or winimage to do a full cooked sector dump, OR BETTER YET use teledisk or imagedisk to create a .td0 or .imd of the disk, which contains more useful metadata
18:24 πŸ”— Lord_Nigh imagedisk and teledisk should be able to correctly image weird-sectored formats like dmf, 2m-f, 2m-m and that funky linux fdutil format
18:25 πŸ”— SketchCow So much nerd
18:25 πŸ”— dashcloud please add that to the wiki page- I wrote what I know, no more
18:25 πŸ”— SketchCow I'm ripping this out, dashcloud
18:25 πŸ”— dashcloud that's okay
18:25 πŸ”— SketchCow I think this sort of intense how-to needs to be a different page linked from this one.
18:25 πŸ”— SketchCow And if Lord_Nigh wants to out-nerd on this process, that should be on the broken-out page too.
18:28 πŸ”— omf_ SketchCow, I just updated http://www.archiveteam.org/index.php?title=How_to_use_our_wiki with your points.
18:31 πŸ”— Lord_Nigh dashcloud: the info you posted was correct though could use some expanding upon
18:37 πŸ”— SketchCow Oh jesus, someone sent me 500 3.5" disks
18:37 πŸ”— SketchCow dashcloud: Make a new page, called RAWRITE
18:37 πŸ”— SketchCow Put your stuff there
18:37 πŸ”— SketchCow Wait
18:37 πŸ”— SketchCow No.
18:37 πŸ”— SketchCow Make a new page called DOS Floppies
18:37 πŸ”— SketchCow Put it there
18:37 πŸ”— SketchCow Then Lord Nigh can come in and make it, apparently perfect.
18:37 πŸ”— SketchCow Sound good?
18:38 πŸ”— SketchCow We'll then link from the Floppies page.
18:38 πŸ”— dashcloud I'm actually leaving shortly for dinner- if no one gets to it before then, I'll do it when I get back
18:38 πŸ”— dashcloud thanks for looking at it
18:42 πŸ”— Lord_Nigh i can't make it perfect; linux doesn't allow low enough level access to the floppy controller to image some weird disks which teledisk and imagedisk can, so you actually need an old pc runnig dos to use thiose effectively
18:42 πŸ”— omf_ Lord_Nigh, not even with dd
18:42 πŸ”— Lord_Nigh for most NORMAL disks with 512 byte sectors of the usual number per track, dd or ddrescue in linux works fine
18:43 πŸ”— Lord_Nigh dd will sort of choke on 2m formatted disks iirc since the sector size per track can vary
18:43 πŸ”— Lord_Nigh though it might work. maybe i'm wrong
18:44 πŸ”— Lord_Nigh "normal" 1.44mb floppies have 18 sectors per track, two sides, 80 tracks and 512 bytes per sector for a total of 1474560 bytes of storage space
18:45 πŸ”— Lord_Nigh dd can read those fone
18:51 πŸ”— WiK 502455 and im over 500k
19:30 πŸ”— balrog_ SketchCow: sounds like you need a trace machine
19:33 πŸ”— Smiley electron microscope ftw.
19:38 πŸ”— balrog_ Smiley: http://www.ebay.com/itm/Trace-Tracer-ST-3-5-Standalone-Automatic-Floppy-Diskette-Duplicator-/140775279176
19:38 πŸ”— balrog_ something like that
19:38 πŸ”— balrog_ :P
19:47 πŸ”— omf_ anyone else have ideas of more content to add to this page http://www.archiveteam.org/index.php?title=How_to_use_our_wiki
19:49 πŸ”— Smiley same seller - steel bars..... wat
19:50 πŸ”— balrog_ Smiley: that was a recycler I think
20:25 πŸ”— SketchCow Downloaded: 17298 files, 175G in 2h 52m 11s (17.3 MB/s)
20:25 πŸ”— SketchCow FINISHED --2013-03-31 14:28:08--
20:27 πŸ”— balrog_ wow...
20:27 πŸ”— SketchCow Asimov Apple archive.
20:28 πŸ”— balrog_ ah
20:28 πŸ”— balrog_ be warned, stuff keeps getting added to it
20:29 πŸ”— SketchCow You don't say
20:29 πŸ”— SketchCow I better come up with a brand new way of handling this terrible new problem, like putting a date on it or something.
20:29 πŸ”— Smiley ;)
20:29 πŸ”— SketchCow Of the 175gb, 150 of it is documentation and emulators, by the way
20:35 πŸ”— SketchCow Also, jesus, Formspring.
20:37 πŸ”— SketchCow I WISH I'D BEEN INFORMED ABOUT FORMSPRING
20:38 πŸ”— SketchCow Because even though I'd cleared that drive nicely, we're up to 7.3gb of downloaded formspring.
20:38 πŸ”— balrog_ ugh...
20:39 πŸ”— omf_ 7.3tb already, shit that is a big site
20:39 πŸ”— SketchCow God, I hope that they don't close it down to the public today.
20:40 πŸ”— balrog_ SketchCow: I've been screwing with yahoo group archiving
20:40 πŸ”— balrog_ god forbid yahoo ever shuts groups down
20:40 πŸ”— balrog_ so much is only accessible to approved group members :/
20:40 πŸ”— WiK id offer some hints, but its been 3 years or so since ive been in the labs
20:40 πŸ”— WiK grrr
20:40 πŸ”— WiK my bad
20:41 πŸ”— WiK doing too much at the same time
20:42 πŸ”— omf_ glitch is up to 863mb warc.gz and still chugging
20:45 πŸ”— Trancer 3370765453 2013-03-31 22:45 formspring.me-MadzNasri-20130330-072030.warc.gz
20:45 πŸ”— Trancer 3.2GB -.-
20:45 πŸ”— Trancer got another 1.2GB one going too
20:46 πŸ”— Trancer - Downloaded 193540 URLs, found 19813 usernames
20:46 πŸ”— Trancer chug chug ... zzz
20:57 πŸ”— chronomex balrog_ | so much is only accessible to approved group members :/
20:57 πŸ”— chronomex a robot that asks for access to all the groups?
21:00 πŸ”— balrog_ chronomex: a lot of groups require a reason to join
21:00 πŸ”— balrog_ that a moderator reviews
21:01 πŸ”— chronomex sure
21:01 πŸ”— chronomex I know
21:18 πŸ”— SketchCow > Content-Length: 27771838382
21:18 πŸ”— SketchCow Fuck yesssssssss
21:19 πŸ”— omf_ that is big
21:19 πŸ”— yipdw what is that?
21:20 πŸ”— SketchCow Asimov Archive
21:21 πŸ”— SketchCow Minus documentation and emulator sections.
21:22 πŸ”— omf_ Are the docs all scans?
21:25 πŸ”— SketchCow They range. Greatly.
21:28 πŸ”— yipdw wow
21:44 πŸ”— SketchCow http://archive.org/details/asimov.apple.archive.2013.03
21:44 πŸ”— SketchCow There it is, ready to hate life.
21:45 πŸ”— SketchCow (It's duping in the 27gb zip file)
22:33 πŸ”— SketchCow https://vimeo.com/61059533 just came up.
22:34 πŸ”— SketchCow This talk is a very interesting one for one main reason: Waza was held in a large, long space, with other things going on, and so the divisions between the speaking areas and the chatting and the food areas were not very present. As a result, you are seeing what I do when I am getting ZERO feedback from the audience because I simply can't hear them.
22:35 πŸ”— omf_ SketchCow, That is rough
22:42 πŸ”— SketchCow Mostly, I think it makes me seem a bit pushy, because I'm not feeding off the audience
22:43 πŸ”— SketchCow I got a few laughs I could hear, but not much else.
22:49 πŸ”— omf_ At certain points it is slightly faster than what you show in previous talks, but it is still good.
22:49 πŸ”— godane downloading it now
22:49 πŸ”— omf_ Also their camera work, lighting and sound is quality
22:49 πŸ”— godane your talk
22:49 πŸ”— omf_ Was it bright on that stage?
23:54 πŸ”— DFJustin https://archive.org/details/don_maslin_archive is another candidate for disk drives

irclogger-viewer