#archiveteam 2014-09-18,Thu

↑back Search

Time Nickname Message
00:23 πŸ”— xmc who's in charge of the facebook and twitter pages?
00:56 πŸ”— SketchCow Which
00:57 πŸ”— xmc http://twitter.com/archiveteam http://facebook.com/ArchiveTeam
00:57 πŸ”— xmc those
01:35 πŸ”— SketchCow Right here baby
01:35 πŸ”— xmc k
01:35 πŸ”— SketchCow I've been spending time on and off trying to figure how to delegate that without risking bad actors.
01:35 πŸ”— SketchCow Probably easier with facebook than twitter at this point.
01:36 πŸ”— * xmc nods
01:36 πŸ”— SketchCow There was fatnest, but fatnest seems kind of stupid
01:37 πŸ”— SketchCow But if someone has suggestions, I'm up for it.
01:37 πŸ”— xmc fatnest?
01:37 πŸ”— SketchCow Yeah, exactly
01:37 πŸ”— SketchCow FATNEST
01:37 πŸ”— xmc you could use buffer
01:37 πŸ”— xmc fatnest.com is parked now
01:37 πŸ”— SketchCow Is it
01:37 πŸ”— SketchCow ha ha ha
01:37 πŸ”— xmc surprise surprise
01:39 πŸ”— SketchCow I just poked the creator of fatnest, just to be a werb
01:42 πŸ”— dashcloud did that guy on twitter who thought archiveteam was too aggressive ever tweet anything to you again?
01:42 πŸ”— SketchCow No
01:42 πŸ”— SketchCow And you shouldn't have responded to him.
01:43 πŸ”— dashcloud sorry!
01:43 πŸ”— SketchCow He has nothing to contribute, just a negative, just an opinion that things should happen
01:43 πŸ”— SketchCow That's what, I'd say, 30% caused my statement today
01:43 πŸ”— SketchCow I realized that was going to keep happening.
01:43 πŸ”— balrog I'm sure lots of people are annoyed about twitpic
01:43 πŸ”— balrog however I'm not sure what can be done about this and I fear more parties will act like they are
01:43 πŸ”— balrog :/
01:44 πŸ”— SketchCow Also, did you just send me a huge list of crazy stuff you're sending
01:44 πŸ”— SketchCow You maniac
01:49 πŸ”— raylee what's going on ?
01:50 πŸ”— xmc someone is a maniac, film at 11
01:51 πŸ”— dashcloud that's just the boxed software that's too old for us- other boxes have the rest of 15 years of computing in them
02:01 πŸ”— raylee did someone say boxed software?
02:28 πŸ”— joepie91 whoo boxed software
02:28 πŸ”— joepie91 :P
02:45 πŸ”— dashcloud copies of Norton Antivirus from when Peter Norton was still featured on the box!
08:34 πŸ”— ASD Balrog
08:35 πŸ”— ASD what the problem :
08:35 πŸ”— midas ...
11:53 πŸ”— dashcloud never having used easel, not sure if there's anything that needs archiving or can be archived: http://blog.easel.io/blog/2014/09/17/easel-is-shutting-down/
15:37 πŸ”— Diesel__ Has anyone considered grabbing the minecraft wiki since microsoft just acquired them?
15:38 πŸ”— Diesel__ We don't know what impending changes might be coming
15:40 πŸ”— xmc wikiteam probably already did
15:40 πŸ”— xmc https://archive.org/search.php?query=wikiteam%20minecraft
15:44 πŸ”— Diesel__ I checked that last night, only 1 of those is the official wiki and it says it hasn't been updated since 2014-01-26
15:53 πŸ”— xmc ok
15:54 πŸ”— Diesel__ This is the right one, just outdated: https://archive.org/details/wiki-minecraftgamepediacom
17:38 πŸ”— b334 Hello. I'm new to using "warc" output. Do I need to keep just the warc file or the ward file AND the directory of downloaded files wget makes?
17:40 πŸ”— xmc the warc has everything in it
17:41 πŸ”— b334 Okay cool, thanks. I wasn't sure because it didn't seem to store images? Thank you for explaing. Is there any way to make wget not make a directory of files then if the warc has everything? (Avoiding having two copies of the same data taking up space?)
17:41 πŸ”— xmc doesn't seem to store images?
17:42 πŸ”— xmc I'm not sure how to make wget do that, but I think there's an option
17:42 πŸ”— b334 I tested it on a site, and the directory wget made was 18mb but the warc file was only 400kb. The site had images and pdf files which didn't seem to be stored into the warc file
17:42 πŸ”— yipdw --page-requisites, --truncate-output
17:42 πŸ”— xmc ok that's weird
17:42 πŸ”— yipdw we have a wiki article about this, one moment
17:43 πŸ”— Smiley yeah i wrote it ¬_¬
17:44 πŸ”— Smiley http://archiveteam.org/index.php?title=Wget#Creating_WARC_with_wget
17:44 πŸ”— Smiley you want to keep the cdx file
17:46 πŸ”— b334 The command I used was: wget "http://antnest.co.uk/" --mirror --warc-file="file"
17:46 πŸ”— b334 directory = 18.2 MB, file.warc.gz = 459 KB It seems like it isn't saving everything to warc. Maybe it's a problem with using wget on Windows?
17:47 πŸ”— joepie91 b334: be aware that .warc.gz is gzip-compresesd
17:47 πŸ”— joepie91 if it's primarily text, high compression ratios are not strange
17:47 πŸ”— b334 Yes, however it's still much smaller when uncompressed
17:47 πŸ”— joepie91 hm. strange
17:47 πŸ”— b334 And the site has lots of images and pdf that don't compress well
17:47 πŸ”— b334 The uncompressed warc file is only 1.08 MB
17:49 πŸ”— Smiley see the link i gave
17:51 πŸ”— Smiley and those extra flags yipdw mentioned
17:51 πŸ”— Smiley for small sites such as this, you can also use teh archivebot in #archivebot
17:51 πŸ”— b334 I did read it but I cannot see anything there that explains what is going wrong. I'm sorry to be a bother, it's probably something I am doing wrong, I know. Wget is clearly downloading all the images and pdf because the directoty is the right size. Anyway thanks for your help
17:52 πŸ”— DFJustin you might also try wpull
17:52 πŸ”— Smiley wget \ -e robots=off --mirror --page-requisites \ --waitretry 5 --timeout 60 --tries 5 --wait 1 \ --warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \ -U "$USER_AGENT" "$SAVE_HOST"
17:53 πŸ”— Smiley thats a full ommand...
17:53 πŸ”— Smiley to be honest I don't know what's wrong either, might be robots.txt screwing things i guess D:
17:54 πŸ”— Smiley nope :/
17:55 πŸ”— b334 I just used that command and it still results in an 18.2MB directory but only a 458 KB warc file. The cdx file is 55.8 KB
17:57 πŸ”— b334 Anyway thank you for helping me. I was just wondering if it was only the warc file that was needed to upload an archived site, or the warc AND the wget directory. It seems both are needed? Or maybe I have a badly compiled version of wget not working properly on Windows. Anyway thanks again and sorry for bothtering you all
17:57 πŸ”— joepie91 b334: only the WARC
17:57 πŸ”— joepie91 it should in theory contain /all/ the requests and responses
17:57 πŸ”— joepie91 what you're describing sounds like a bug
17:57 πŸ”— b334 Even binary? When I look at it in a hex editor I only see text/html, no binary for images or anything
17:57 πŸ”— sankin I've also noticed WARCs created in windows with wget don't contain images properly
17:58 πŸ”— joepie91 b334: yes; as far as WARC is concerned, the payload is an opaque blob
17:58 πŸ”— joepie91 it doesn't care what filetype it is
17:58 πŸ”— joepie91 well, this definitely sounds like a bug
17:58 πŸ”— b334 Ah okay. Then clearly something is wrong on my end. Thanks for all your help
17:58 πŸ”— joepie91 not sure where to file wget bugs
17:58 πŸ”— joepie91 b334: be sure to file a bug
17:58 πŸ”— joepie91 ... wherever the bug filing place may be :)
17:59 πŸ”— joepie91 cc sankin
18:07 πŸ”— yipdw oh
18:07 πŸ”— yipdw we should have told him to try archivebot
18:08 πŸ”— xmc we did
18:08 πŸ”— yipdw oh
18:09 πŸ”— yipdw !ex auy5juosi5nkvht0zxodl3iz4 site requested in #archiveteam
18:09 πŸ”— yipdw argh feck
19:27 πŸ”— arkiver -- Grab will be starting very soon
19:27 πŸ”— arkiver -- Join #fizzilla
19:27 πŸ”— arkiver -- Saving everything from http://quizilla.teennick.com/
19:27 πŸ”— arkiver ------------------------------------------------------------------
19:29 πŸ”— Dorj Hi, could someone help me?
19:29 πŸ”— Dorj I'm trying to open this file from https://archive.org/details/Ao3ArchiveCrawl but 7Zip can't seem to open it as an archive
19:30 πŸ”— Dorj I've tried the torrent and direct download, in both Windows and Linux, but no luck.
19:33 πŸ”— Dorj If someone had individual files, that would work too. I only need the fanfics by a certain author.
19:35 πŸ”— joepie91 Dorj: moment
19:40 πŸ”— joepie91 Dorj: that's strange
19:40 πŸ”— joepie91 I can't open it either
19:40 πŸ”— joepie91 with p7zip
19:42 πŸ”— joepie91 yeah, wtf
19:42 πŸ”— aaaaaaaaa me either
19:42 πŸ”— joepie91 Dorj: looking at it in a hex editor, it certainly seems to be a 7z file
19:43 πŸ”— joepie91 that's worrying...
19:43 πŸ”— aaaaaaaaa trid says it is too
19:43 πŸ”— joepie91 magic number matches
19:43 πŸ”— joepie91 Dorj: it's definitely a 7z file
19:43 πŸ”— joepie91 looks like it may have been corrupted somehow...
19:45 πŸ”— aaaaaaaaa does the torrent work?
19:47 πŸ”— Dorj I'm downloading the torrent again it should be done in 5 mins
19:50 πŸ”— aaaaaaaaa oh, they web seed.
19:55 πŸ”— aaaaaaaaa I don't know what is wrong. hashes match. Its beyond my ability.
19:57 πŸ”— Dorj oh well... I hope the original files are still available somewhere.
19:58 πŸ”— Dorj I really need those fics back. x_x
19:58 πŸ”— joepie91 Dorj: you may want to look into archive recovery software
19:58 πŸ”— joepie91 there's probably some corruption somewhere in the archive
19:58 πŸ”— joepie91 I'm honestly not sure what's up with it
19:59 πŸ”— joepie91 <uploader>brainkidabc@gmail.com</uploader>
19:59 πŸ”— joepie91 no idea who this is
20:00 πŸ”— Dorj I've tried communicating with them, they said they could try sending me the file over rsync later and also suggested I try here
20:03 πŸ”— joepie91 Dorj: you've already emailed them?
20:04 πŸ”— joepie91 might be worth asking for an uncompressed copy
20:06 πŸ”— DFJustin can https://code.google.com/p/theunarchiver/ extract anything from it?
20:12 πŸ”— RedType holy shit it supports the NDS FS format
20:12 πŸ”— RedType i spent hours trying to pick stuff out of those using shitty command line tools that all did the same thing but slightly differently depending on the version of the nds rom, the header, etc
20:13 πŸ”— xmc lol
20:15 πŸ”— RedType i even wrote a few myself but they mostly just clipped the header down to a size for other tools.
21:22 πŸ”— SketchCow Hey.
21:22 πŸ”— SketchCow So, we were all discussing stuff in the Twitpic fiasco, and my question is about a possible thing to hunt down.
21:23 πŸ”— SketchCow Basically, we have this situation where we have stuff that is being threatened, and it's huge, and then it's either not so threatened or it's in a weird quantum state.
21:23 πŸ”— SketchCow And we have, say, 100tb of it. Luckily, this only seems to be happening a few times a year.
21:24 πŸ”— SketchCow So, this really stretches the bounds of what IA does. It's a huge amount of data, it's not likely to be overly touched if the originals are up, and IA will spend/lose a lot of money pulling it into their infrastructure.
21:24 πŸ”— RedType i suspect a lot of "it's threatened" messages end up actually being "we had to say we were shutting down but if you'd watched our back door you would have seen a lot of google badges going in and out"
21:24 πŸ”— SketchCow Right now, Kenshin is donating a ton of space to handle TwitPic, and he can't just "keep" it.
21:25 πŸ”— SketchCow So maybe we can discuss actual, not pie-in-the-sky possibilities of what we can do to have some sort of not-IA pile of storage.
21:25 πŸ”— RedType amazon glacier?
21:25 πŸ”— SketchCow OK, so. Amazon Glacier.
21:25 πŸ”— SketchCow Here's what I've discovered about them.
21:25 πŸ”— SketchCow 1 cent per gig.
21:25 πŸ”— SketchCow .....per month.
21:25 πŸ”— RedType that's like, the easy, obvious solution, but it probably has a catch
21:25 πŸ”— SketchCow Now, start calculating it.
21:25 πŸ”— RedType oh there's the catch
21:25 πŸ”— antomatic and it costs more when you want it back
21:26 πŸ”— Diesel__ RETRIEVAL Requests $0.050 per 1,000 requests
21:26 πŸ”— SketchCow So that's $120 a year per terabyte.
21:26 πŸ”— antomatic yow
21:26 πŸ”— arkiver not possible for us I guess,
21:27 πŸ”— yipdw tahoe grid is another possibility if there's a few seriously interested people here
21:27 πŸ”— RedType we'd need a sponsor
21:27 πŸ”— yipdw I mean, like, really serious
21:27 πŸ”— SketchCow Well, if Archive Team had a sponsor, yes.
21:27 πŸ”— SketchCow I am completely for someone researching this.
21:27 πŸ”— SketchCow Or making some calls, or gathering something.
21:27 πŸ”— SketchCow Since I like naming things, I'd call it the Valhalla Option
21:28 πŸ”— SketchCow Some sort of idea of Happy Hunting Grounds where we have these actual backups of things.
21:28 πŸ”— SketchCow Another possibility is tape.
21:28 πŸ”— SketchCow Get Kenshin a tape drive, backing the crap off, pay for tapes.
21:28 πŸ”— yipdw joepie91: I know you've done tahoe; do you have any interest in helping to coordinate this sort of thing
21:29 πŸ”— * joepie91 awakens
21:29 πŸ”— * joepie91 reads
21:29 πŸ”— antomatic tape costs per GB, as I understand it, aren't significantly cheaper than desktop hard drives
21:30 πŸ”— joepie91 antomatic: longer lifespan, no?
21:30 πŸ”— yipdw I was looking at updating my personal file server, so could probably throw in a dozen TB
21:30 πŸ”— SketchCow http://www.newegg.com/Product/Product.aspx?Item=40-998-097
21:30 πŸ”— xmc if i get my ass in gear i could contribute 30ish T
21:30 πŸ”— yipdw it's not much but if you find ten people like that
21:30 πŸ”— antomatic joepie91: depends what's done with it, I guess
21:30 πŸ”— godane i remember them being 1.5tb to 3.0tb for like $40
21:30 πŸ”— godane the drives are where the cost the most
21:30 πŸ”— joepie91 antomatic: tape doesn't require having disks spinning at thousands of rounds per minute with reading equipment hovering above it 24/7
21:31 πŸ”— joepie91 :)
21:31 πŸ”— RedType also if you shove a tape in a good place it'll last for a nice long time
21:31 πŸ”— antomatic desktop hard drives don't have to be mounted and online either :)
21:31 πŸ”— SketchCow $28 for (roughly) 1.2tb of material.
21:31 πŸ”— joepie91 antomatic: desktop hard drives decay fairly quickly still, afaik
21:31 πŸ”— joepie91 anyway
21:31 πŸ”— SketchCow Yeah, hard drives stored away die.
21:31 πŸ”— SketchCow Period.
21:31 πŸ”— joepie91 glacier: you're fucked when you need the data
21:31 πŸ”— antomatic Guess that rules out optical media too then
21:32 πŸ”— RedType i mean obv Parchive + tape
21:32 πŸ”— joepie91 yes
21:32 πŸ”— joepie91 optical media and HDDs are not an option for long term storage
21:32 πŸ”— RedType interestingly enough, glacier probably uses optical media cartridges
21:32 πŸ”— SketchCow Optical's not even close for long term.
21:32 πŸ”— antomatic do current tape formats have good reps for long term storage, these days?
21:32 πŸ”— antomatic been a while since I investigated it
21:32 πŸ”— joepie91 RedType: they use tape, no?
21:32 πŸ”— joepie91 afaik tape is reasonable when kept in a climate-controlled environment
21:32 πŸ”— SketchCow LTO 4 is (TOTALLY NAPKIN CALCULATED) about $25/tb
21:32 πŸ”— joepie91 for long-term storage
21:33 πŸ”— RedType joepie91: no one knows officially what they use, but they spent a whooole lot of money on optical and magneto optical tech that seemingly went nowhere and then glacier popped up
21:33 πŸ”— joepie91 okay, so, about tahoe: I am available for any technical questions / troubleshooting / etc., dev stuff on an as-needed-and-possible basis
21:33 πŸ”— yipdw cool
21:33 πŸ”— joepie91 I know it fairly well, but I'm not a wizard
21:33 πŸ”— joepie91 :
21:33 πŸ”— joepie91 :P *
21:33 πŸ”— yipdw better than me, I only know the name
21:33 πŸ”— joepie91 that said, tahoe-lafs people are also very helpful
21:33 πŸ”— antomatic so, factor in the costs of a climate-controlled tape library... :)
21:33 πŸ”— yipdw tapes etc I think hard drives are the most immediate bet we have
21:33 πŸ”— joepie91 RedType: heh
21:34 πŸ”— Diesel__ How many files are we looking at for the quitpic archive?
21:34 πŸ”— joepie91 I personally think that for this kind of storage, tape is a better option than a tahoe setup
21:34 πŸ”— SketchCow I think tapes ar ebest.
21:34 πŸ”— joepie91 since a tahoe setup still requires a live setup
21:34 πŸ”— SketchCow Let me work this number out.
21:34 πŸ”— joepie91 it's great when you actually /need/ data with some frequency or when you're pooling together space
21:34 πŸ”— joepie91 but it's not magical storage fairy dust
21:34 πŸ”— yipdw it isn't, but neither are tapes
21:34 πŸ”— yipdw I mean where are you going to put them :P
21:34 πŸ”— joepie91 tapes are pretty close to magical fairy dust for longterm
21:34 πŸ”— SketchCow Tapes I can help.
21:35 πŸ”— SketchCow In terms of storage, I can do storage.
21:35 πŸ”— joepie91 as long as you don't /need/ the data
21:35 πŸ”— joepie91 then it becomes a bit icky
21:35 πŸ”— Kenshin SketchCow: would IA store tapes?
21:35 πŸ”— SketchCow Yeah, it would.
21:35 πŸ”— yipdw that'd help
21:35 πŸ”— antomatic but do we need 'long term storage' or 'short-to-medium-term EASILY REUSABLE storage'
21:35 πŸ”— yipdw they're already doing climate control etc
21:35 πŸ”— joepie91 SketchCow: is climate control for tapes similar to that for books
21:35 πŸ”— joepie91 or different parameters?
21:35 πŸ”— antomatic Is it a case of write-once-keep-forever, or "we just need somewhere to keep this safe for now"
21:35 πŸ”— Kenshin then would IA consider dumping less used data into tapes and archive?
21:36 πŸ”— SketchCow Fun Fact: Brewster hates tapes
21:36 πŸ”— godane we need a tape like solution that can have random access speeds of bluray
21:36 πŸ”— RedType godane: so.... wizards
21:36 πŸ”— joepie91 Kenshin: I'm not sure I'd be in favour of that, since moving to tapes decreases accessibility to basically 0
21:36 πŸ”— joepie91 for a public archive
21:36 πŸ”— Kenshin since this isn't the first time we're looking at stupid amount of data which likely may not be touched
21:36 πŸ”— joepie91 but tapes are great for "we should have a backup of this probably just in case, but it's not gone from the internet /yet/"
21:36 πŸ”— SketchCow I can also talk to other places willing to store it.
21:36 πŸ”— SketchCow For example: Living Computer Museum in Seattle could partner.
21:36 πŸ”— Kenshin joepie91: well, having all the twitch deleted data somewhere would have been better than them nuking it
21:37 πŸ”— SketchCow They have a massive warehouse they're building in eastern washington state.
21:37 πŸ”— joepie91 Kenshin: yes, but ideally you'd still want to somehow move it into active storage
21:37 πŸ”— joepie91 eventually
21:37 πŸ”— RedType maybe tahoe/bittorrent sync, wrap it in a parchive, have a set rule if it languishes for more than a year tape it?
21:37 πŸ”— antomatic Kenshin: That's what I said, but I got shouted at. :)
21:37 πŸ”— joepie91 Kenshin: perhaps even just have transcoded versions in active storage
21:37 πŸ”— joepie91 originals on tape
21:37 πŸ”— erazmus If LoC was to accept this data, what format would they expect it in? Over the wire to their equipment, or a big box of tapes or hard drives?
21:37 πŸ”— joepie91 RedType: I would very very strongly oppose anything to do with btsync
21:37 πŸ”— SketchCow OK, calling it.
21:38 πŸ”— antomatic I do like the idea of redundant distributed semi-online storage
21:38 πŸ”— joepie91 we don't need /more/ shit to be locked up in proprietary everything
21:38 πŸ”— RedType well its proprietary so obv not btsync
21:38 πŸ”— RedType but there are foss equivilents
21:38 πŸ”— antomatic but the technology doesn't seem to exist for it yet
21:38 πŸ”— joepie91 none that are production-ready with the same model
21:38 πŸ”— joepie91 afaik
21:38 πŸ”— joepie91 there's seafile but that's centralizerd
21:38 πŸ”— joepie91 centralized *
21:38 πŸ”— SketchCow #huntinggrounds
21:38 πŸ”— yipdw #-bs
21:38 πŸ”— yipdw or that
21:38 πŸ”— SketchCow Someone owns #valhalla and is kicking out joins, so be it
21:39 πŸ”— joepie91 lol
21:39 πŸ”— RedType vallhallOfIt
21:40 πŸ”— Muad-Dib I heard there were rumors they were going to make btsync open source
21:40 πŸ”— Muad-Dib but thats just rumours
21:43 πŸ”— joepie91 Muad-Dib: probably, eventually
21:43 πŸ”— joepie91 until then, I avoid it like the plague
21:44 πŸ”— Muad-Dib joepie91, the Gentoo mentality :p
21:45 πŸ”— joepie91 ?
21:46 πŸ”— Muad-Dib the The 4chan /g/entoo mentality to IT security ¨If I cant compile it myself, its probably a botnet¨
21:46 πŸ”— Muad-Dib its a joke
21:47 πŸ”— joepie91 lol
21:47 πŸ”— Muad-Dib but now I´m going to bed
21:47 πŸ”— Muad-Dib that 4:00 AM of yesterday is cutting into my rfythm
21:48 πŸ”— Muad-Dib anyways, nn people
22:26 πŸ”— SketchCow I have to go to a retirement party
22:26 πŸ”— SketchCow She said "dress to impress". Challenge accepted
22:26 πŸ”— SketchCow Can someone go into #huntinggrounds and shout "PUT IT IN THE FUCKING WIKI" every three minutes
22:39 πŸ”— yipdw SketchCow: done
22:52 πŸ”— joepie91 lol
23:02 πŸ”— RedType twitpic acquired
23:02 πŸ”— RedType https://twitter.com/TwitPic/status/512705809696837632
23:02 πŸ”— RedType We're happy to announce we've been acquired and Twitpic will live on! We will post more details as we can disclose them
23:03 πŸ”— xmc dickpic
23:03 πŸ”— yipdw shitpic
23:03 πŸ”— RedType nitpic
23:04 πŸ”— garyrh fucknoaheverettpic
23:46 πŸ”— bsmith093 i just randomly went from 17gb free to 4.4?!?!? WTF!?
23:48 πŸ”— xmc always be saving
23:48 πŸ”— joepie91 lol

irclogger-viewer