[00:23] who's in charge of the facebook and twitter pages? [00:56] Which [00:57] http://twitter.com/archiveteam http://facebook.com/ArchiveTeam [00:57] those [01:35] Right here baby [01:35] k [01:35] I've been spending time on and off trying to figure how to delegate that without risking bad actors. [01:35] Probably easier with facebook than twitter at this point. [01:36] * xmc nods [01:36] There was fatnest, but fatnest seems kind of stupid [01:37] But if someone has suggestions, I'm up for it. [01:37] fatnest? [01:37] Yeah, exactly [01:37] FATNEST [01:37] you could use buffer [01:37] fatnest.com is parked now [01:37] Is it [01:37] ha ha ha [01:37] surprise surprise [01:39] I just poked the creator of fatnest, just to be a werb [01:42] did that guy on twitter who thought archiveteam was too aggressive ever tweet anything to you again? [01:42] No [01:42] And you shouldn't have responded to him. [01:43] sorry! [01:43] He has nothing to contribute, just a negative, just an opinion that things should happen [01:43] That's what, I'd say, 30% caused my statement today [01:43] I realized that was going to keep happening. [01:43] I'm sure lots of people are annoyed about twitpic [01:43] however I'm not sure what can be done about this and I fear more parties will act like they are [01:43] :/ [01:44] Also, did you just send me a huge list of crazy stuff you're sending [01:44] You maniac [01:49] what's going on ? [01:50] someone is a maniac, film at 11 [01:51] that's just the boxed software that's too old for us- other boxes have the rest of 15 years of computing in them [02:01] did someone say boxed software? [02:28] whoo boxed software [02:28] :P [02:45] copies of Norton Antivirus from when Peter Norton was still featured on the box! [08:34] Balrog [08:35] what the problem : [08:35] ... [11:53] never having used easel, not sure if there's anything that needs archiving or can be archived: http://blog.easel.io/blog/2014/09/17/easel-is-shutting-down/ [15:37] Has anyone considered grabbing the minecraft wiki since microsoft just acquired them? [15:38] We don't know what impending changes might be coming [15:40] wikiteam probably already did [15:40] https://archive.org/search.php?query=wikiteam%20minecraft [15:44] I checked that last night, only 1 of those is the official wiki and it says it hasn't been updated since 2014-01-26 [15:53] ok [15:54] This is the right one, just outdated: https://archive.org/details/wiki-minecraftgamepediacom [17:38] Hello. I'm new to using "warc" output. Do I need to keep just the warc file or the ward file AND the directory of downloaded files wget makes? [17:40] the warc has everything in it [17:41] Okay cool, thanks. I wasn't sure because it didn't seem to store images? Thank you for explaing. Is there any way to make wget not make a directory of files then if the warc has everything? (Avoiding having two copies of the same data taking up space?) [17:41] doesn't seem to store images? [17:42] I'm not sure how to make wget do that, but I think there's an option [17:42] I tested it on a site, and the directory wget made was 18mb but the warc file was only 400kb. The site had images and pdf files which didn't seem to be stored into the warc file [17:42] --page-requisites, --truncate-output [17:42] ok that's weird [17:42] we have a wiki article about this, one moment [17:43] yeah i wrote it Â¬_Â¬ [17:44] http://archiveteam.org/index.php?title=Wget#Creating_WARC_with_wget [17:44] you want to keep the cdx file [17:46] The command I used was: wget "http://antnest.co.uk/" --mirror --warc-file="file" [17:46] directory = 18.2 MB, file.warc.gz = 459 KB It seems like it isn't saving everything to warc. Maybe it's a problem with using wget on Windows? [17:47] b334: be aware that .warc.gz is gzip-compresesd [17:47] if it's primarily text, high compression ratios are not strange [17:47] Yes, however it's still much smaller when uncompressed [17:47] hm. strange [17:47] And the site has lots of images and pdf that don't compress well [17:47] The uncompressed warc file is only 1.08 MB [17:49] see the link i gave [17:51] and those extra flags yipdw mentioned [17:51] for small sites such as this, you can also use teh archivebot in #archivebot [17:51] I did read it but I cannot see anything there that explains what is going wrong. I'm sorry to be a bother, it's probably something I am doing wrong, I know. Wget is clearly downloading all the images and pdf because the directoty is the right size. Anyway thanks for your help [17:52] you might also try wpull [17:52] wget \ -e robots=off --mirror --page-requisites \ --waitretry 5 --timeout 60 --tries 5 --wait 1 \ --warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \ -U "$USER_AGENT" "$SAVE_HOST" [17:53] thats a full ommand... [17:53] to be honest I don't know what's wrong either, might be robots.txt screwing things i guess D: [17:54] nope :/ [17:55] I just used that command and it still results in an 18.2MB directory but only a 458 KB warc file. The cdx file is 55.8 KB [17:57] Anyway thank you for helping me. I was just wondering if it was only the warc file that was needed to upload an archived site, or the warc AND the wget directory. It seems both are needed? Or maybe I have a badly compiled version of wget not working properly on Windows. Anyway thanks again and sorry for bothtering you all [17:57] b334: only the WARC [17:57] it should in theory contain /all/ the requests and responses [17:57] what you're describing sounds like a bug [17:57] Even binary? When I look at it in a hex editor I only see text/html, no binary for images or anything [17:57] I've also noticed WARCs created in windows with wget don't contain images properly [17:58] b334: yes; as far as WARC is concerned, the payload is an opaque blob [17:58] it doesn't care what filetype it is [17:58] well, this definitely sounds like a bug [17:58] Ah okay. Then clearly something is wrong on my end. Thanks for all your help [17:58] not sure where to file wget bugs [17:58] b334: be sure to file a bug [17:58] ... wherever the bug filing place may be :) [17:59] cc sankin [18:07] oh [18:07] we should have told him to try archivebot [18:08] we did [18:08] oh [18:09] !ex auy5juosi5nkvht0zxodl3iz4 site requested in #archiveteam [18:09] argh feck [19:27] -- Grab will be starting very soon [19:27] -- Join #fizzilla [19:27] -- Saving everything from http://quizilla.teennick.com/ [19:27] ------------------------------------------------------------------ [19:29] Hi, could someone help me? [19:29] I'm trying to open this file from https://archive.org/details/Ao3ArchiveCrawl but 7Zip can't seem to open it as an archive [19:30] I've tried the torrent and direct download, in both Windows and Linux, but no luck. [19:33] If someone had individual files, that would work too. I only need the fanfics by a certain author. [19:35] Dorj: moment [19:40] Dorj: that's strange [19:40] I can't open it either [19:40] with p7zip [19:42] yeah, wtf [19:42] me either [19:42] Dorj: looking at it in a hex editor, it certainly seems to be a 7z file [19:43] that's worrying... [19:43] trid says it is too [19:43] magic number matches [19:43] Dorj: it's definitely a 7z file [19:43] looks like it may have been corrupted somehow... [19:45] does the torrent work? [19:47] I'm downloading the torrent again it should be done in 5 mins [19:50] oh, they web seed. [19:55] I don't know what is wrong. hashes match. Its beyond my ability. [19:57] oh well... I hope the original files are still available somewhere. [19:58] I really need those fics back. x_x [19:58] Dorj: you may want to look into archive recovery software [19:58] there's probably some corruption somewhere in the archive [19:58] I'm honestly not sure what's up with it [19:59] brainkidabc@gmail.com [19:59] no idea who this is [20:00] I've tried communicating with them, they said they could try sending me the file over rsync later and also suggested I try here [20:03] Dorj: you've already emailed them? [20:04] might be worth asking for an uncompressed copy [20:06] can https://code.google.com/p/theunarchiver/ extract anything from it? [20:12] holy shit it supports the NDS FS format [20:12] i spent hours trying to pick stuff out of those using shitty command line tools that all did the same thing but slightly differently depending on the version of the nds rom, the header, etc [20:13] lol [20:15] i even wrote a few myself but they mostly just clipped the header down to a size for other tools. [21:22] Hey. [21:22] So, we were all discussing stuff in the Twitpic fiasco, and my question is about a possible thing to hunt down. [21:23] Basically, we have this situation where we have stuff that is being threatened, and it's huge, and then it's either not so threatened or it's in a weird quantum state. [21:23] And we have, say, 100tb of it. Luckily, this only seems to be happening a few times a year. [21:24] So, this really stretches the bounds of what IA does. It's a huge amount of data, it's not likely to be overly touched if the originals are up, and IA will spend/lose a lot of money pulling it into their infrastructure. [21:24] i suspect a lot of "it's threatened" messages end up actually being "we had to say we were shutting down but if you'd watched our back door you would have seen a lot of google badges going in and out" [21:24] Right now, Kenshin is donating a ton of space to handle TwitPic, and he can't just "keep" it. [21:25] So maybe we can discuss actual, not pie-in-the-sky possibilities of what we can do to have some sort of not-IA pile of storage. [21:25] amazon glacier? [21:25] OK, so. Amazon Glacier. [21:25] Here's what I've discovered about them. [21:25] 1 cent per gig. [21:25] .....per month. [21:25] that's like, the easy, obvious solution, but it probably has a catch [21:25] Now, start calculating it. [21:25] oh there's the catch [21:25] and it costs more when you want it back [21:26] RETRIEVAL Requests $0.050 per 1,000 requests [21:26] So that's $120 a year per terabyte. [21:26] yow [21:26] not possible for us I guess, [21:27] tahoe grid is another possibility if there's a few seriously interested people here [21:27] we'd need a sponsor [21:27] I mean, like, really serious [21:27] Well, if Archive Team had a sponsor, yes. [21:27] I am completely for someone researching this. [21:27] Or making some calls, or gathering something. [21:27] Since I like naming things, I'd call it the Valhalla Option [21:28] Some sort of idea of Happy Hunting Grounds where we have these actual backups of things. [21:28] Another possibility is tape. [21:28] Get Kenshin a tape drive, backing the crap off, pay for tapes. [21:28] joepie91: I know you've done tahoe; do you have any interest in helping to coordinate this sort of thing [21:29] * joepie91 awakens [21:29] * joepie91 reads [21:29] tape costs per GB, as I understand it, aren't significantly cheaper than desktop hard drives [21:30] antomatic: longer lifespan, no? [21:30] I was looking at updating my personal file server, so could probably throw in a dozen TB [21:30] http://www.newegg.com/Product/Product.aspx?Item=40-998-097 [21:30] if i get my ass in gear i could contribute 30ish T [21:30] it's not much but if you find ten people like that [21:30] joepie91: depends what's done with it, I guess [21:30] i remember them being 1.5tb to 3.0tb for like $40 [21:30] the drives are where the cost the most [21:30] antomatic: tape doesn't require having disks spinning at thousands of rounds per minute with reading equipment hovering above it 24/7 [21:31] :) [21:31] also if you shove a tape in a good place it'll last for a nice long time [21:31] desktop hard drives don't have to be mounted and online either :) [21:31] $28 for (roughly) 1.2tb of material. [21:31] antomatic: desktop hard drives decay fairly quickly still, afaik [21:31] anyway [21:31] Yeah, hard drives stored away die. [21:31] Period. [21:31] glacier: you're fucked when you need the data [21:31] Guess that rules out optical media too then [21:32] i mean obv Parchive + tape [21:32] yes [21:32] optical media and HDDs are not an option for long term storage [21:32] interestingly enough, glacier probably uses optical media cartridges [21:32] Optical's not even close for long term. [21:32] do current tape formats have good reps for long term storage, these days? [21:32] been a while since I investigated it [21:32] RedType: they use tape, no? [21:32] afaik tape is reasonable when kept in a climate-controlled environment [21:32] LTO 4 is (TOTALLY NAPKIN CALCULATED) about $25/tb [21:32] for long-term storage [21:33] joepie91: no one knows officially what they use, but they spent a whooole lot of money on optical and magneto optical tech that seemingly went nowhere and then glacier popped up [21:33] okay, so, about tahoe: I am available for any technical questions / troubleshooting / etc., dev stuff on an as-needed-and-possible basis [21:33] cool [21:33] I know it fairly well, but I'm not a wizard [21:33] : [21:33] :P * [21:33] better than me, I only know the name [21:33] that said, tahoe-lafs people are also very helpful [21:33] so, factor in the costs of a climate-controlled tape library... :) [21:33] tapes etc I think hard drives are the most immediate bet we have [21:33] RedType: heh [21:34] How many files are we looking at for the quitpic archive? [21:34] I personally think that for this kind of storage, tape is a better option than a tahoe setup [21:34] I think tapes ar ebest. [21:34] since a tahoe setup still requires a live setup [21:34] Let me work this number out. [21:34] it's great when you actually /need/ data with some frequency or when you're pooling together space [21:34] but it's not magical storage fairy dust [21:34] it isn't, but neither are tapes [21:34] I mean where are you going to put them :P [21:34] tapes are pretty close to magical fairy dust for longterm [21:34] Tapes I can help. [21:35] In terms of storage, I can do storage. [21:35] as long as you don't /need/ the data [21:35] then it becomes a bit icky [21:35] SketchCow: would IA store tapes? [21:35] Yeah, it would. [21:35] that'd help [21:35] but do we need 'long term storage' or 'short-to-medium-term EASILY REUSABLE storage' [21:35] they're already doing climate control etc [21:35] SketchCow: is climate control for tapes similar to that for books [21:35] or different parameters? [21:35] Is it a case of write-once-keep-forever, or "we just need somewhere to keep this safe for now" [21:35] then would IA consider dumping less used data into tapes and archive? [21:36] Fun Fact: Brewster hates tapes [21:36] we need a tape like solution that can have random access speeds of bluray [21:36] godane: so.... wizards [21:36] Kenshin: I'm not sure I'd be in favour of that, since moving to tapes decreases accessibility to basically 0 [21:36] for a public archive [21:36] since this isn't the first time we're looking at stupid amount of data which likely may not be touched [21:36] but tapes are great for "we should have a backup of this probably just in case, but it's not gone from the internet /yet/" [21:36] I can also talk to other places willing to store it. [21:36] For example: Living Computer Museum in Seattle could partner. [21:36] joepie91: well, having all the twitch deleted data somewhere would have been better than them nuking it [21:37] They have a massive warehouse they're building in eastern washington state. [21:37] Kenshin: yes, but ideally you'd still want to somehow move it into active storage [21:37] eventually [21:37] maybe tahoe/bittorrent sync, wrap it in a parchive, have a set rule if it languishes for more than a year tape it? [21:37] Kenshin: That's what I said, but I got shouted at. :) [21:37] Kenshin: perhaps even just have transcoded versions in active storage [21:37] originals on tape [21:37] If LoC was to accept this data, what format would they expect it in? Over the wire to their equipment, or a big box of tapes or hard drives? [21:37] RedType: I would very very strongly oppose anything to do with btsync [21:37] OK, calling it. [21:38] I do like the idea of redundant distributed semi-online storage [21:38] we don't need /more/ shit to be locked up in proprietary everything [21:38] well its proprietary so obv not btsync [21:38] but there are foss equivilents [21:38] but the technology doesn't seem to exist for it yet [21:38] none that are production-ready with the same model [21:38] afaik [21:38] there's seafile but that's centralizerd [21:38] centralized * [21:38] #huntinggrounds [21:38] #-bs [21:38] or that [21:38] Someone owns #valhalla and is kicking out joins, so be it [21:39] lol [21:39] vallhallOfIt [21:40] I heard there were rumors they were going to make btsync open source [21:40] but thats just rumours [21:43] Muad-Dib: probably, eventually [21:43] until then, I avoid it like the plague [21:44] joepie91, the Gentoo mentality :p [21:45] ? [21:46] the The 4chan /g/entoo mentality to IT security Â¨If I cant compile it myself, its probably a botnetÂ¨ [21:46] its a joke [21:47] lol [21:47] but now IÂ´m going to bed [21:47] that 4:00 AM of yesterday is cutting into my rfythm [21:48] anyways, nn people [22:26] I have to go to a retirement party [22:26] She said "dress to impress". Challenge accepted [22:26] Can someone go into #huntinggrounds and shout "PUT IT IN THE FUCKING WIKI" every three minutes [22:39] SketchCow: done [22:52] lol [23:02] twitpic acquired [23:02] https://twitter.com/TwitPic/status/512705809696837632 [23:02] We're happy to announce we've been acquired and Twitpic will live on! We will post more details as we can disclose them [23:03] dickpic [23:03] shitpic [23:03] nitpic [23:04] fucknoaheverettpic [23:46] i just randomly went from 17gb free to 4.4?!?!? WTF!? [23:48] always be saving [23:48] lol