[00:00] hmmmm, I don't know exactly what the warc's do, I can't answer that [00:00] ok [00:00] Well the warcs contain http headers, the download parameters and a logfile [00:01] sorry, what is "warcs"? [00:02] It's a file format that archive.org uses to store archived websites. We use it as well for our projects [00:02] ok thanks [00:05] Do you always find enough time and resources to rescue 100% of the website before it is gone? [00:05] not often [00:05] maybe 40-60% of the time [00:05] (as much as one can measure a count of widely-varying sized objects) [00:06] ok [00:06] Do you find objection from site owners when you start archiving? [00:06] depends. [00:07] some people really take ownership of a site [00:07] others say "nope, done with that, whatever you want" and don't do anything [00:07] many site owners object to the higher-than-normal load we offer them [00:08] for example the posterous project we're running right now is offering significant load to their systems [00:09] posterous the website has been running a bit slower than usual (responding in ~150% of normal time), to cite a current event [00:10] and how do you react for the cases when they object? do you continue your mission or stop archiving? [00:10] we're really having to lay it on them, though, it's a tight time window and a whole bunch of content that costs cpu/database time for them to render [00:10] fuck no do we stop, this is archiveteam [00:11] ok :) [00:11] if it can be had and we think it has a fair chance of being valuable to someone now or in the future, we'll do anything we can to overcome blocks [00:12] we go to extra lengths to get the actual content, which in some cases (complex dynamic pages, flash) may be much more than any simple web spider will ever get [00:13] WARC (Web ARCHive) is an ISO Standard File Format by the way. [00:13] oh ok! Thanks [00:14] How many are the archive team members? [00:14] Hard to say, there's no real membership [00:15] but everyone who help out, which can be quite many to just a few [00:15] 15-50 depending on how you count activity [00:15] there's 105 people in this channel right now [00:16] at his defcon 19 talk, jason said onstage "fuck you, you are ALL in archiveteam!" [00:16] :) [00:16] one question that I've never seen asked is "is it archive team, archiveteam, Archive Team, Archiveteam, or ArchiveTeam?" [00:16] I tend to go for archiveteam or Archiveteam [00:17] ok [00:18] RubyDo: Just curious, for what class is your project? [00:18] Course: Management of Electronic Documents [00:19] Master program of Information management in university of Ottawa in Canada [00:19] library sciences/informatics? [00:19] ah [00:20] yes [00:22] We're like library visitors, saving a burning library [00:22] we're not as quiet though [00:23] For your archival plan, you divide the website into many torrents and you use the warrier to download the website, [00:24] but how do you know the size of a website and the number of parts in order to download all of it? [00:24] We don't [00:24] We usually saves sites with user-generated content, so we divide the work as "1 item per user on that site" [00:25] and since it's usually like that, after a while we start seeing averages on users [00:27] what do you mean by "1 item per user"? [00:30] I mean how do you grant that 2 members are not downloading the same files? [00:31] There is a tracker that assigns each warrior a list of users [00:31] ok [00:31] Thank you very much everybody! [00:32] np [00:32] good luck. [00:32] thanks :) [00:36] somehow I was expecting more questions [00:36] yah [00:36] ah well, always nice to help [00:37] chronomex: it's possible they've done their research and just needed a few things clearing up [00:37] most of those questions aren't clean from the warrior, even if your running it, to be fair [00:37] true [10:32] I added some more info to our clown hosting page [19:29] The first Posterous batch is in. But this is a difficult one. [19:29] We need to talk about all the parallel projects, I'm worried we're going to lose one. [19:33] yeah we have Punchfork (done), Posterous (getting the most attention but going too slow), Yahoo Message Boards (important), opensolaris (needs to be finished) [19:33] chronomex: any chance you can look at the other osol repos? [19:33] Don't forget storylane [19:34] oh yes, that's another [19:34] *sigh* [19:34] balrog_: what other osol repos? sorry I'm losing it [19:34] s/losing it/forgot/ [19:34] chronomex: see http://www.archiveteam.org/index.php?title=Closedsolaris [19:34] Actually, storylane tracker says it's almost done [19:34] there's src. and repo. [19:34] thx [19:34] aha [19:34] you did src., but there are more (and some svn repos) on repo. [19:34] if you need help archiving svn repos, let me know [19:34] for those you'd use svnsync [19:34] right [19:35] there's also hub and static [19:35] jfc [19:35] static probably can be just done with wget/warc [19:35] ideally, be logged in when doing so [19:36] is there a list of things on repo. ? [19:36] repo requires login [19:36] use bugmenot or so [19:36] ok [19:36] http://www.bugmenot.com/view/opensolaris.org [19:36] https://repo.opensolaris.org/info/projects.action [19:37] probably need to wget, then compile list of repos, and compare [19:37] ah, cool. [19:37] also not all are anonymous [19:37] those will be lost, oh well [19:37] :\ [20:20] looks like i'm uploading ces press conf [20:52] so i have access to thebox.bz again [20:59] they let you back? [20:59] i login using a proxy [20:59] but i can log back in with my own ip [21:14] so wget is banned [21:15] from thebox.bz [21:24] i just fixed it [21:25] i had the wrong forum id number [21:35] CES '09: LG Electronics Press Conference: http://archive.org/details/g4tv.com-video35862 [21:36] Dead Rising: Chop Til You Drop Japanese Music Video: http://archive.org/details/g4tv.com-video35837 [22:26] SketchCo1, balrog_ You guys also forgot the 30+ sites in the gamespy, ugo, ign, 1up deal [22:26] ugh [22:26] and Poland [22:34] I have a question to the Team. Since there has never been an emulator for the LaserActive to play Mega LD games could a KickStarter be created to get funding for whatever equipment needed to make one? [22:34] I'm not knowledgeable enough to make one but i'd toss in a hundred bucks to the KickStarter. [22:35] And I'd even donate my CLD-A100 and S10 Sega Pac. [22:36] I just think it'd be cool to have an emu and also have the very few Mega LD games/software that were released/leaked to be dumped for use. [22:36] Let me know your opinions and such. Thanks :) [22:41] afaik the only thing standing in the way of laseractive emulation is dumps of the discs [22:41] aaron giles of the mame team set up a method of dumping laserdiscs properly including all the off-screen info but so far nobody else has done so to my knowledge [22:42] * ersi read "meme team" [22:43] heh, meme team [22:43] archiveteam subcommittee on cat macros [22:45] the bios runs in mess already http://imageshack.us/a/img543/8372/laseract.png [22:45] cool cool [22:50] LaserActive emulation? Awesome [23:06] http://en.wikipedia.org/wiki/BBC_Domesday_Project [23:07] more specifically: http://en.wikipedia.org/wiki/BBC_Domesday_Project#Preservation [23:08] DFJustin: is it actively or somewhat actively being worked on? [23:09] * ersi glares in Dr Who's general direction [23:21] We only have 20 days left for MessageBoards [23:25] Bah [23:25] ... [23:34] no it's not actively being worked on [23:38] I think the current page has all the active projects listed now