#archiveteam 2012-02-25,Sat

↑back Search

Time Nickname Message
00:04 🔗 Coderjoe ah. just random stuff
00:14 🔗 SketchCow They're uploading 320gb of new stuff soon
00:17 🔗 Coderjoe "just random stuff" referring to the DDR and traffic crash slides
00:18 🔗 Coderjoe I think, this weekend, I will locate and prepare my stage6 data for upload
00:33 🔗 soultcer haha I just noticed underscor's t-shirt and that he is standing in front of what appears to be the IA building
00:38 🔗 Coderjoe doh
00:38 🔗 Coderjoe http://games.slashdot.org/story/12/02/25/0011254/inventor-of-the-modern-pinball-machine-dies-at-100?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Slashdot%2Fslashdot+%28Slashdot%29
00:41 🔗 Coderjoe GAH
00:41 🔗 Coderjoe missed out on the floppy diskette shirt.woot
00:42 🔗 Coderjoe tried ordering it last night only to run into trouble. made a deposit today that would allow me to use a different card, but now it is sold out
01:14 🔗 SketchCow Oh no! What did the shirt look like?
01:16 🔗 arrith http://sale.images.woot.com/1971-Floppy_Diskette2wwDetail.png
01:16 🔗 arrith from http://shirt.woot.com/Forums/ViewPost.aspx?PostID=4880141&PageIndex=1&ReplyCount=179
01:18 🔗 undersco2 Schbirid: Aww, thanks <3
01:18 🔗 undersco2 hahahahahaha
01:18 🔗 undersco2 Coderjoe_: They'll have it again tomorrow for $15
01:19 🔗 undersco2 SketchCow: Relevant to your interests? http://techcrunch.com/2012/02/23/and-now-theres-a-kickstarter-for-porn/
01:19 🔗 undersco2 :D
01:19 🔗 arrith undersco2: after 2girlsetc i'm not sure the internet needs to have much of a voice in exploring new areas for porn
01:20 🔗 undersco2 hahahah
01:23 🔗 Coderjoe_ hey now. the internet didn't make 2girls or swap.avi
01:23 🔗 Coderjoe they just happened to stumble upon them
01:27 🔗 undersco2 hello.jpg
01:29 🔗 DFJustin the internet funded swap.avi
01:29 🔗 undersco2 hahaha
01:36 🔗 SketchCow OK, heading out again. But I'll see stuff in here.
01:37 🔗 SketchCow pda12 talk went well!!!!!!
01:47 🔗 shaqfu So I'm an unemployed archivist, and while I don't have the heavy-duty bandwidth/storage most of your projects seem to need, I'd like to contribute
02:03 🔗 arrith HOLY
02:03 🔗 arrith Bibliotik has shutdown all operations. We are no longer able to assume the risks involved. The staff would like to apologize for the sudden (but necessary) decision and thank everyone that participated and made Bibliotik such a great place for so long. We love you guys!
02:03 🔗 arrith http://bibliotik.org/
02:03 🔗 arrith wohamagod
02:04 🔗 Coderjoe what was it?
02:04 🔗 arrith ebook/elearning private torrent tracker site
02:04 🔗 arrith best besides library.nu i think
02:05 🔗 Coderjoe brace for "library of alexandria" comments
02:06 🔗 arrith :(
02:12 🔗 dashcloud shaqfu: the archive team motto is "We are going to rescue your shit!" http://archiveteam.org/index.php?title=File:Archiveteam.jpg so start looking out for user-content sites shutting down, and back them up
02:15 🔗 dashcloud here's some smaller projects that all sorts of people who saw something happening or about to happen decided to archive them: http://www.archive.org/search.php?query=collection%3Aarchiveteam-fire&sort=-publicdate&page=1
02:17 🔗 shaqfu dashcloud: Sounds good; thanks
02:17 🔗 dashcloud and if you have lots of time, but not much storage or bandwidth, there's vast amounts of items that could use metadata- magazines and shareware CDs
02:19 🔗 shaqfu Gotcha; is there a project list of those, or should I go through and start marking up anything that needs it?
02:21 🔗 dashcloud email SketchCow at metadata@textfiles.com and ask for one- he's good at getting back to you
02:22 🔗 shaqfu Got it
02:22 🔗 dashcloud actually- http://archiveteam.org/index.php?title=Metadata_warriors
02:22 🔗 dashcloud that gives you a better idea of the whole thing, and some suggestions
02:23 🔗 dashcloud you'll want to do the online version- it works a lot better than the PDF in my experience
02:28 🔗 shaqfu Seems straightforward enough
02:31 🔗 shaqfu I gotta head out in a bit, but I'll send off an email tonight
02:31 🔗 dashcloud if you have any questions, just ask
02:34 🔗 shaqfu Are we preferring human or machine readability?
02:35 🔗 dashcloud take a look at one of the examples in the wiki page- it should have some structure, and be consistent thru all the issues you do
02:35 🔗 shaqfu Gotcha
02:38 🔗 dashcloud it's really fascinating though because you get to see the evolution of a magazine the way not many have because you can look at them one after another and immediately notice how it's changing
02:39 🔗 shaqfu Yeah
02:39 🔗 shaqfu I've done a lot of manuscript processing/metadata before; that's the best part about it
02:42 🔗 shaqfu Zzap64 sounds fun - ten years of C64 talk
03:36 🔗 Coderjoe_ mmm
03:39 🔗 Coderjoe hey at&t: while I am thankful that you're not contributing to buffer bloat hell, I would also appreciate it if you would expand capacity on the first three hops from my house so you're not giving 90% packet loss. thanks.
06:03 🔗 undersco2 arrith: http://gen.lib.rus.ec
06:03 🔗 undersco2 I'm almost halfway done downloading the entire contents
06:03 🔗 arrith undersco2: nice
06:03 🔗 undersco2 Then I'll probably make it available for a short time to people who want a mirror
06:04 🔗 arrith i've heard people linking that. people are trying to port stuff
06:04 🔗 undersco2 It's the best 'tik replacement I've found so far
06:04 🔗 arrith the hydra needs to kick in, as in a bunch of bibliotik clones and library.nu clones need to spring up
06:04 🔗 arrith it'd be nice if one of those sites at least released their db/index
06:04 🔗 undersco2 Yeah
06:05 🔗 undersco2 Well, I'm friends with some of the 'tik admins, trying to see what they'll share
06:05 🔗 undersco2 But something big must have happened
06:05 🔗 undersco2 They're very shaken
06:11 🔗 arrith dang
06:11 🔗 arrith i figured if tpb and demonoid are still up why wouldn't these ebook sites do fine
06:11 🔗 arrith i mean the RIAA has been the big threatening body and what.cd is still doing *fine*
06:12 🔗 arrith undersco2: yeah at least a list of the books/formats would be nice. so there's at least something for a new site to build on
06:25 🔗 undersco2 what.cd takes a LOT of precautions though
06:26 🔗 Coderjoe_ i've not used what.cd, but don't they have some fake wiki-looking front or something?
06:27 🔗 Coderjoe getting tired of at&t's routers losing 90-100% of packets
06:29 🔗 undersco2 Coderjoe: That's waffles
06:30 🔗 undersco2 what.cd just has a poem
06:31 🔗 Coderjoe ah
06:37 🔗 DFJustin good thing no one's blowing their cover in a publicly logged irc channel
06:38 🔗 yipdw that would be terrible
06:40 🔗 Coderjoe never used waffles or what
06:40 🔗 undersco2 what cover
06:40 🔗 undersco2 8D
07:25 🔗 Coderjoe woot. floppy shirt ordered
07:25 🔗 Coderjoe http://shirt.woot.com/friends.aspx?k=24671
07:31 🔗 undersco2 ggfdggesfgdfds
07:31 🔗 undersco2 I need to go find $10
07:31 🔗 undersco2 er, $15
07:31 🔗 undersco2 That is so awesome
07:31 🔗 undersco2 Perfect for my collection
07:31 🔗 undersco2 (I've started to get the reputation that I'm the awesome-shirt-guy)
07:32 🔗 undersco2 Random people I don't know will seek me out to see what I'm wearing
07:32 🔗 undersco2 (in school)
07:32 🔗 undersco2 It's awesome
07:34 🔗 kennethre undersco2: haha, i used to do that
07:34 🔗 kennethre undersco2: far too many dollars spent at thinkgeek
07:34 🔗 undersco2 Yeah, there, and woot
07:35 🔗 undersco2 I honestly could probably go 2 or 3 months without doing laundry, shirt-wise
07:35 🔗 undersco2 One of my friends wrote this https://github.com/habnabit/shurtapp
07:35 🔗 undersco2 Been meaning to set it up for myself
08:25 🔗 kennethre i just go with "a shirt i didn't wear yesterday"
08:25 🔗 kennethre assuming i went somewhere and saw somebody
08:31 🔗 undersco2 http://escism.net/omfgdogs/
08:32 🔗 kennethre wtf happened to hampsterdance.com
08:32 🔗 undersco2 best site ever
08:32 🔗 kennethre they sold out
08:32 🔗 undersco2 I remember discovering that song in 5th grade
08:33 🔗 kennethre haha
08:33 🔗 kennethre i think it was like 4th or 5th for me
08:33 🔗 kennethre word of mouth meme
08:34 🔗 kennethre it had to have been one of the first
08:34 🔗 undersco2 yeah
08:34 🔗 undersco2 :D
08:34 🔗 Coderjoe damn. make me feel old
08:36 🔗 undersco2 btw, loving that hershey's censor
08:39 🔗 Coderjoe my boss was messing around with the camera, threatening to take a picture if i took my hand away from my face
08:39 🔗 Coderjoe had the bar on my desk, decided to see if i could get it to stick
08:39 🔗 undersco2 haha
08:40 🔗 kennethre So, I got a full backup of Convore's data
08:40 🔗 kennethre how do package this up for an acceptable archive?
08:41 🔗 kennethre i'm planning to make it a fully browsable website
08:42 🔗 yipdw undersco2: omfgdogs on a 30" screen is awesome
08:42 🔗 undersco2 :D
08:42 🔗 undersco2 and max sound
08:42 🔗 undersco2 ;)
08:42 🔗 arrith and f11
08:43 🔗 undersco2 oh man
08:43 🔗 undersco2 this is so trippy
08:43 🔗 undersco2 :D
08:43 🔗 undersco2 I wonder what it's like stoned
08:43 🔗 undersco2 hahahaha
08:43 🔗 kennethre it froze my browser
08:44 🔗 undersco2 It's probably a good pixel massager
08:44 🔗 undersco2 kennethre: Need better browser? :D
08:44 🔗 yipdw I'm gonna wrap it in a Quartz Composer setup
08:44 🔗 undersco2 haha
08:44 🔗 yipdw and use it as my screen saver on my work OS X machine
08:44 🔗 yipdw it will be genius
08:45 🔗 kennethre undersco2: unfortunately they don't get much better :)
08:45 🔗 arrith aw uses flash for sound
08:45 🔗 undersco2 :(
08:45 🔗 undersco2 If only FF played MP3
08:45 🔗 undersco2 yipdw: omg
08:45 🔗 undersco2 That's awesome
08:45 🔗 undersco2 :D
08:46 🔗 yipdw well
08:46 🔗 yipdw I think that's possible
08:46 🔗 yipdw if not I'm sure there's a way to reconstruct that from omfgdogs.{gif,mp3}
08:47 🔗 kennethre man i forgot about screensavers in general
08:47 🔗 * kennethre downloads electric sheep
08:47 🔗 Coderjoe don't forget after dark and more after dark
08:48 🔗 yipdw FLYING TOASTERS
08:49 🔗 undersco2 http://29.media.tumblr.com/tumblr_lzoh9mh6Bd1qhhhaco1_500.gif Did I already link this?
08:49 🔗 undersco2 Oh man
08:49 🔗 undersco2 After Dark
08:49 🔗 undersco2 That brings back memories
08:51 🔗 undersco2 Oh maaaaan
08:51 🔗 undersco2 I love holiday lights
08:51 🔗 undersco2 Any of you use that?
08:51 🔗 undersco2 I had that running on the G3AIO in my room in like 4th grade
08:51 🔗 undersco2 Thought I was the coolest shit
08:51 🔗 undersco2 haha
08:51 🔗 undersco2 http://downloads.yahoo.com/software/macintosh-knick-knacks-holiday-lights-s10668
08:54 🔗 DFJustin http://www.archive.org/details/tucows_204921_Holiday_Lights
08:55 🔗 yipdw http://www.youtube.com/watch?v=StA81MNuqB8
08:55 🔗 undersco2 http://support.tigertech.net/old-software
08:55 🔗 undersco2 <3
15:45 🔗 Schbirid SketchCow mentioned a way to browse iso files on archive.org by adding a slash or so. how would that work eg for http://www.archive.org/details/CdZoneIssue48march1997 ? if it is already public
15:45 🔗 Schbirid oh i am so smart
15:45 🔗 Schbirid http://www.archive.org/download/CdZoneIssue48march1997/cdzone48_march1997.iso/
15:45 🔗 Schbirid tried it on the direct http link before
15:46 🔗 Schbirid fantastic
15:55 🔗 Coderjoe http://www.reddit.com/r/compsci/comments/q4e57/help_save_the_worlds_first_webserver_we_need_to/
15:55 🔗 Coderjoe updates: found copies of versions 0.0, 0.1, and 0.2 on an MIT mirror of the cern ftp server.
15:57 🔗 Coderjoe someone sould possibly contact MIT about getting a mirror into IA. (or something. yes, we're assholes, but I would rather not have MIT see tons of traffic and pull the files because of many redditors plus others attempting to mirror it)
15:59 🔗 Coderjoe heh
15:59 🔗 Coderjoe http://www.reddit.com/r/compsci/comments/q4e57/help_save_the_worlds_first_webserver_we_need_to/c3uq8w5
16:02 🔗 Coderjoe O_O
16:02 🔗 Coderjoe http://www.openafs.org/
16:09 🔗 Schbirid damn, that iso viewer serves files with no proper timestamp on download
16:09 🔗 Coderjoe cool idea, but i'd rather keep something at arms-length from kernelspace
16:10 🔗 Coderjoe er, something like that
16:43 🔗 alard SketchCow: I'm currently doing a test run with the direct-to-the-archive mobileme script, but I think it may take too long to fill 10GB.
17:46 🔗 undersco2 alard: Do you want me to try it? I've got gbit.
17:48 🔗 alard undersco2: Well, to be honest, I tried one on batcave. :) It's just the slowness of mobileme.
17:48 🔗 undersco2 Oh, alrighty, hehe
17:49 🔗 undersco2 That's odd, though
17:49 🔗 undersco2 I can fill 10GB in around an hour
17:49 🔗 alard Well, if you start wget --mirror it takes a long time.
17:49 🔗 undersco2 Yeah, some of those do
17:49 🔗 alard It's just one at a time, not multiple.
17:49 🔗 undersco2 Ohhhh
17:49 🔗 undersco2 Nevermind then ^^;
17:50 🔗 alard The idea is that kennethre will run heroku instances that run a seesaw script, and each instance would download 10GB, create a tar file and upload it to the archive.
17:51 🔗 alard But these instances are killed every 24 hours, so if it takes too long to finish 10GB you'll lose a lot of data.
17:53 🔗 alard The upload was successful, by the way: http://www.archive.org/details/archiveteam-mobileme-hero-1
17:54 🔗 alard (They wouldn't let me add it to the archiveteam-mobileme collection.)
17:55 🔗 Nemo_bis Cute!
17:55 🔗 Nemo_bis The permissions can be fixed I hope.
17:57 🔗 alard Permissions?
18:07 🔗 alard SketchCow: I think I've got a solution for the 10GB-is-a-bit-much problem. So if you like the upload (and perhaps find a way to fix the collection) I think we're ready.
18:19 🔗 undersco2 Do the instances have 10GB of local disk?
18:19 🔗 undersco2 (Oh, I suppose that's a better question for kennethre, heh)
18:44 🔗 Nemo_bis undersco2, he said it's basically unlimited
18:44 🔗 undersco2 oh okay, cool
20:16 🔗 DFJustin well you could say upload 10x 1gb files to each item
20:21 🔗 yipdw huh
20:21 🔗 yipdw heroku is down
20:21 🔗 yipdw kennethre killed it
20:21 🔗 yipdw :P
20:22 🔗 yipdw well, at least i can't get to www.heroku.com
20:30 🔗 undersco2 lol
20:39 🔗 yipdw for more lol, see https://twitter.com/#!/search/heroku
21:06 🔗 DFJustin looks like more shareware is on the way :) http://www.archive.org/details/classicpcgames http://demu.org/news-detailed/11
21:07 🔗 undersco2 Ooh
21:07 🔗 undersco2 Make sure to tell SketchCow so he can connect them
21:15 🔗 * Nemo_bis preparing to send 100 kg of stuff to him
21:37 🔗 Schbirid i wish http://www.archive.org/details/F-15StrikeEagleIiiDemo would include version numbers
21:37 🔗 Schbirid err
21:37 🔗 Schbirid http://www.archive.org/details/classicpcgames i mean
21:39 🔗 Nemo_bis Schbirid, what do you mean?
21:39 🔗 Nemo_bis He's doing a good job with metadata it seems.
21:39 🔗 Schbirid sometimes there are multiple versions/demos of games
21:40 🔗 Nemo_bis hm
21:40 🔗 Nemo_bis Some items on http://www.archive.org/details/open_source_software look slightly offtopic.
21:41 🔗 Schbirid heh
21:52 🔗 chronomex well, you know, there's a lot of software in the world.
22:14 🔗 kennethre yipdw: yikes
22:15 🔗 yipdw seems to be alive again
22:15 🔗 kennethre yeah we're back in action
22:15 🔗 kennethre i slept through it
22:15 🔗 yipdw heh
22:18 🔗 SketchCow Wjaaaaa
22:21 🔗 godane just found out how to get forks in github i think
22:22 🔗 godane curl -s http://github.com/api/v2/yaml/repos/show/cjdelisle/cjdns/network
22:22 🔗 SketchCow I just mailed swizzle.
22:23 🔗 kennethre godane: github v3
22:24 🔗 kennethre godane: curl https://api.github.com/repos/kennethreitz/requests/forks
22:25 🔗 alard SketchCow/kennethre: The script is ready, I think. http://www.archive.org/details/archiveteam-mobileme-hero-1
22:25 🔗 kennethre fantastic
22:25 🔗 kennethre does anythng need to happen before I start using it?
22:26 🔗 alard A quick check by SketchCow, perhaps a fix to get the items in the archiveteam-mobileme collection (I wasn't allowed to do that).
22:26 🔗 kennethre alard: where's the actual script?
22:27 🔗 alard The general idea is: instance downloads at least 10GB of data, makes a tar file, uploads to archive.org, dies. Heroku starts again.
22:27 🔗 kennethre perfect
22:27 🔗 kennethre exactly what i need
22:27 🔗 alard So that in theory the instance will never be running for more than 24 hours, and will not be recycled by heroku.
22:27 🔗 kennethre any dyno can die at any time
22:28 🔗 kennethre but typically it's once every 24 hours
22:28 🔗 kennethre yeah they get 10GB within an hour or two
22:28 🔗 kennethre so it won't be a problem
22:28 🔗 alard Yeah, so the probability of dying increases over time, I thought.
22:29 🔗 alard The script is still with me, perhaps there'll be a few changes. I used your Heroku repository and am currently running one instance.
22:30 🔗 alard What are the scrape, scrape2, scrape3, scrape4 for?
22:30 🔗 kennethre alard: i can only scale each process to 100
22:30 🔗 alard Does that mean that four scripts running in the same dyno?
22:30 🔗 alard Ah, good.
22:30 🔗 kennethre nah, all seperate dynos
22:30 🔗 kennethre every process is isolated
22:31 🔗 kennethre alard: how long did it take to run the one instance?
22:31 🔗 alard Since you can run multiple seesaw scripts in the same directory, but the new script assumes it's on its own.
22:31 🔗 alard I did a test run on batcave, that took five to six hours.
22:32 🔗 alard There was one user with a very large web.me.com site, so that took a while.
22:32 🔗 alard The heroku instance is up for 1 hour, says heroku ps, and is at 4% of 10GB.
22:33 🔗 kennethre hmm
22:33 🔗 kennethre i feel like making it smaller (e.g. 5GB) would make things easier
22:33 🔗 kennethre but perhaps not
22:33 🔗 alard It really depends on the selection of users. My first run had 8 to get 10GB, but the current run already has more than that for 4%
22:36 🔗 alard We should ask SketchCow about the file size. I think 5GB might be better, but 10GB will probably work too.
22:37 🔗 kennethre the smaller, the better the chance that everything goes smoothly
22:37 🔗 kennethre but whatever works best for the archive
22:37 🔗 kennethre that's a lot of data
22:38 🔗 kennethre alard: does this still interact with the tracker dashboard? :)
22:39 🔗 alard Of course. And you'll get credit as soon as you've downloaded something, even if you don't upload. (That's because that's easiest. We should do a clean-up step later.)
22:40 🔗 kennethre oh i thought the tracker only did uploads
22:40 🔗 kennethre awesome
22:40 🔗 kennethre not that it matters, obviously, but it makes things more f un
22:41 🔗 Jkid Evening everyone
22:41 🔗 Schbirid hello
22:42 🔗 Jkid I run Yotsuba Society
22:42 🔗 SketchCow HEre's deal.
22:42 🔗 SketchCow Oof
22:42 🔗 Jkid ...
22:42 🔗 Jkid Here's deal?
22:42 🔗 SketchCow Here's the deal. My concern is not that the individual filesizes are anything, be they 5gb or 10gb or whatever. It's item-size I care about.
22:43 🔗 SketchCow So all I care about is making them, say, 100-200gb apiece.
22:43 🔗 SketchCow So we're only dealing with a few thousand items or less, instead of tens of thousands of items.
22:43 🔗 SketchCow 200gb an item, that's going to be, like, 1200 items.
22:44 🔗 SketchCow So that's mostly it. Injecting files into the same items until it's known the items are at something near 200gb, then moving onto the next item.
22:45 🔗 kennethre sounds simple enough
22:47 🔗 SketchCow Yes.
22:47 🔗 Schbirid anyone got tips for simplemachines forum mirroring?
22:48 🔗 SketchCow The needs of the archive's backend are a little different than the needs of our running scripts, I just mostly try and deal with those personally in my ingestion scripts. So since we have to load kenneth with the ingestion, that's how.
22:48 🔗 SketchCow I'd REALLY like us to get on forum mirroring. Does the wiki have something on that?
22:48 🔗 DFJustin Jkid: hi there, you've walked into the middle of a conversation but what's up
22:49 🔗 arrith forum mirroring is sad since it's like mirroring a wiki: if you could get a db dump it's so much cleaner. otherwise it's all this grabbing pages.
22:50 🔗 Jkid Well I'm Ndee "Jkid" Okeh
22:50 🔗 Jkid The Co-Founder and head admin of Yotsuba Society.
22:50 🔗 arrith ideally there'd be something to scan a bunch of static pages from a dump of a site and try to insert them into a db that can be used with mediawiki, but i don't know of anything
22:50 🔗 Schbirid :o
22:50 🔗 SketchCow He's not going to like that.
22:50 🔗 SketchCow Me however Me, I like that very, very, very much.
22:51 🔗 SketchCow Anyway, back on it.
22:51 🔗 SketchCow I agree, I'd prefer the db method, but we're not able to, so getting some sort of functional ripper would be good.
22:51 🔗 SketchCow We have all these dying PHPbb things.
22:51 🔗 Schbirid hm, for SMF the print function seems nice. you get one html page per thread, no pagination
22:52 🔗 Schbirid if you are just after the posts' contents
22:52 🔗 Schbirid avatars, signatures etc would be missing
22:52 🔗 Schbirid in-post links are written with the url behind the text
22:53 🔗 DFJustin a big thing with forums is getting all the off-site images from imageshack etc
22:53 🔗 Schbirid hm, images are not embedded but only shown as url
22:53 🔗 Schbirid yeah
22:54 🔗 DFJustin since getting 14 pages of "LOL LOOK AT THAT CAT" is not too useful :)
22:54 🔗 kennethre So, I got a full backup of convore
22:54 🔗 kennethre it's an api dump essentially
22:54 🔗 kennethre how do we make this a legit archive
22:56 🔗 SketchCow How big
22:57 🔗 kennethre small
22:57 🔗 kennethre less than 100MB for the text
22:57 🔗 kennethre 200MB of avatars
22:58 🔗 kennethre i'm working on reconstructing the whole thing into a browsable website
22:58 🔗 kennethre it's pretty piecemeal, as is, but it's every single peice of public information on the site
22:58 🔗 arrith a forum-software-specific ripper would probably be best. since phpbb, smf, etc each put stuff in special places. would make maintaining the crawler/ripper easier.
22:59 🔗 DFJustin I bet someone's written such a thing
22:59 🔗 arrith then a separate thing could be done later to reform it into a usable db, but i guess that's quite lower in priority to just getting the sites mirrored
22:59 🔗 arrith DFJustin: that would be neat
23:00 🔗 SketchCow So I am for putting a dump on archiveteam, and then a separate project to make archive.
23:00 🔗 SketchCow Fast action first, backed up and there, followed by fix-up
23:01 🔗 kennethre SketchCow: awesome, I have a tarball
23:02 🔗 Schbirid recommended wget settings for a "brute force" forum mirror? how about: -m -nv -nH --append-output=my.log --no-parent --no-clobber --continue --timeout=10 --adjust-extension --convert-links --page-requisites --keep-session-cookies
23:02 🔗 DFJustin kinda old but http://www.phpbbhacks.com/forums/phpbb-mirror-grabbs-boards-get-the-code-here-now-vt64073.html
23:02 🔗 Schbirid last time i tried --span-hosts to also get embedded external images iirc i ended up with much more than that
23:05 🔗 SketchCow kennethre: Just upload it to archive.org, I'll put it in the right place after you do. archiveteam-convore-panic-grab or whatever you want, then tell me.
23:07 🔗 alard SketchCow: So items with 40 files of 5GB would be OK?
23:07 🔗 SketchCow Yes.
23:08 🔗 alard And is there a way to add the items to the right collection?
23:08 🔗 kennethre alard: i'd love to start running this today
23:08 🔗 alard kennethre: Nearly there, I think.
23:08 🔗 kennethre alard: let me know if there's anything i can do to help
23:09 🔗 alard This 'community texts' looks a bit weird: http://www.archive.org/details/archiveteam-mobileme-hero-1
23:11 🔗 Schbirid i added --header="accept-encoding: gzip"
23:13 🔗 Schbirid now it stops after fetching index
23:13 🔗 Schbirid wicked
23:14 🔗 Schbirid http://lists.debian.org/debian-user/2003/12/msg02235.html
23:15 🔗 Schbirid well, this is a shame
23:16 🔗 alard SketchCow: I saw you moved the item to the right collection, but is there a way to do that in the s3 upload?
23:16 🔗 SketchCow Yes, I need to know the account doing the uploads (e-mail it to me) and an admin will change it in the next day or two.
23:17 🔗 arrith Schbirid: this page has a guy mentioning hearing about a patch to wget to support gzip: http://unix.stackexchange.com/questions/15176/using-wget-what-is-the-right-command-to-get-gzipped-version-instead-of-the-actu
23:17 🔗 alard Okay. So we'll start downloading and correct/upgrade later, is that OK?
23:18 🔗 arrith Schbirid: googling for wget gzip patch gives: http://lists.gnu.org/archive/html/bug-wget/2010-01/msg00032.html
23:20 🔗 Schbirid cheers
23:22 🔗 arrith definitely not pretty. could do a big convoluted wrapper script around wget but it makes so much more sense to just have wget support that. --decompress-gzip or something
23:22 🔗 alard kennethre: I'll send you the script.
23:22 🔗 kennethre alard: i'll be waiting :)
23:22 🔗 arrith i wonder if that patch is going to be merged and if not, why not
23:24 🔗 SketchCow A MUCH better idea than a patch to gzip wget, to be honest, is if wget can export the downloads to stdout
23:26 🔗 alard Isn't that just wget -O - ?
23:28 🔗 SketchCow Is it? if so, then why spend time with .gz wehn you can just export to - and do gz, bz, 7z, ferretz or mozarella-z
23:28 🔗 kennethre SketchCow: how's the availability of the s3 api?
23:28 🔗 SketchCow How is it? Mostly good.
23:28 🔗 SketchCow Sometimes it shits itself.
23:28 🔗 kennethre so it shouldn't be a concern?
23:28 🔗 SketchCow I would always have error checking in place.
23:28 🔗 SketchCow No, I use it constantly.
23:28 🔗 SketchCow As do others.
23:28 🔗 kennethre awesome
23:28 🔗 Schbirid i am getting quite a lot "WARNING: Upload failed: /myfile ([Errno 32] Broken pipe)" but s3cmd retries automatically and then it usually works
23:29 🔗 arrith "wget -O -" combines all output, but in the case of this gzip thing it would just feed out that one gzipped page, which is the same place you'd be in if you just saved it to disk.
23:29 🔗 alard The curl command in the script is configured to retry the uploads until it succeeds (or at least stops returning errors).
23:29 🔗 arrith still have to go into that file and grab the resources and dl other pages
23:29 🔗 arrith curl might work better with gzipped stuff, dunno about warc support and curl though
23:29 🔗 SketchCow I think it's underfunded and I'm about to make it less underfunded
23:30 🔗 alard But curl doesn't do recursive downloads.
23:30 🔗 dnova I knew someone here would want that floppy shirtwoot
23:30 🔗 arrith ah, didn't know curl didn't recurse
23:31 🔗 Schbirid http://www.youtube.com/watch?v=l9n_TkFIttk AWT Cybertag VR Demo from 1994
23:31 🔗 Schbirid awesome video
23:37 🔗 kennethre back in action!
23:37 🔗 kennethre \o/
23:37 🔗 kennethre thanks, alard / SketchCow
23:38 🔗 alard Let's hope the script works.
23:38 🔗 SketchCow alard: We should get that script running.
23:38 🔗 SketchCow So you want to edit it, or me edit it?
23:39 🔗 alard Well, there's not much to edit save a few filesystem references and the name of the redis queue.
23:39 🔗 alard I can edit it if you give me the new locations, or you can follow the instructions and do it yourself.
23:39 🔗 alard Up to you.
23:40 🔗 kennethre let me write a thing that lets you see this real quick
23:40 🔗 kennethre it's pretty amazing

irclogger-viewer