#archiveteam-bs 2014-11-25,Tue

↑back Search

Time Nickname Message
00:59 πŸ”— LordNigh2 has joined #archiveteam-bs
01:07 πŸ”— Lord_Nigh has quit IRC (Ping timeout: 600 seconds)
01:07 πŸ”— LordNigh2 is now known as Lord_Nigh
01:27 πŸ”— godane i'm now at 296,230 items uploaded
01:28 πŸ”— godane at this rate i will get the hit the 300k mart by thinksgiving
01:28 πŸ”— godane *thanksgiving
01:28 πŸ”— Rickster has quit IRC (ircd.choopa.net irc.eversible.com)
01:28 πŸ”— beardicus has quit IRC (ircd.choopa.net irc.eversible.com)
01:28 πŸ”— arkiver has quit IRC (ircd.choopa.net irc.eversible.com)
01:28 πŸ”— joepie91 has quit IRC (ircd.choopa.net irc.eversible.com)
01:28 πŸ”— Zebranky has quit IRC (ircd.choopa.net irc.eversible.com)
01:28 πŸ”— slash` has quit IRC (ircd.choopa.net irc.eversible.com)
01:28 πŸ”— Rallias has quit IRC (ircd.choopa.net irc.eversible.com)
01:28 πŸ”— balrog has quit IRC (ircd.choopa.net irc.eversible.com)
01:28 πŸ”— nico has quit IRC (ircd.choopa.net irc.eversible.com)
01:28 πŸ”— yipdw has quit IRC (ircd.choopa.net irc.eversible.com)
01:29 πŸ”— dashcloud do remember that the archive.org staff would like to have a nice uninterrupted Thanksgiving week, and (I hope) would be slow to respond because they are on vacation
01:29 πŸ”— godane ok
01:30 πŸ”— godane i will be backing off of the ERIC items during thanksgiving
01:31 πŸ”— godane i may start uploading more funny or die videos
01:31 πŸ”— godane during that time
01:31 πŸ”— godane it thanks longer to upload and there are less items so less stress on server load
01:32 πŸ”— godane *takes longer
01:33 πŸ”— Rickster has joined #archiveteam-bs
01:33 πŸ”— beardicus has joined #archiveteam-bs
01:33 πŸ”— arkiver has joined #archiveteam-bs
01:33 πŸ”— joepie91 has joined #archiveteam-bs
01:33 πŸ”— Rallias has joined #archiveteam-bs
01:33 πŸ”— Zebranky has joined #archiveteam-bs
01:33 πŸ”— slash` has joined #archiveteam-bs
01:33 πŸ”— yipdw has joined #archiveteam-bs
01:33 πŸ”— balrog has joined #archiveteam-bs
01:33 πŸ”— nico has joined #archiveteam-bs
01:33 πŸ”— irc.eversible.com sets mode: +o balrog
01:41 πŸ”— primus104 has quit IRC (Leaving.)
01:42 πŸ”— mistym has quit IRC (Remote host closed the connection)
02:01 πŸ”— Boppen has quit IRC (Ping timeout: 198 seconds)
02:05 πŸ”— mistym has joined #archiveteam-bs
02:13 πŸ”— Boppen has joined #archiveteam-bs
02:25 πŸ”— TFGBD has quit IRC (12( www.nnscript.com 12:: NoNameScript 4.22 12:: www.esnation.com 12))
02:54 πŸ”— joepie91 wow
02:54 πŸ”— joepie91 I think urpad just won the award for worst fucking hosting company ever
02:54 πŸ”— joepie91 jesus christ
02:54 πŸ”— joepie91 one of my VMs disappeared
02:55 πŸ”— joepie91 just, there was a different VM on the IP with a differennt SSH host key
02:55 πŸ”— joepie91 and WHMCS couldn't connect to mine anymore either
02:55 πŸ”— joepie91 I file a support ticket
02:55 πŸ”— joepie91 they continue to reboot the *wrong* VM (the other one I have with them that DID work)
02:56 πŸ”— joepie91 then respond with "okay can you try now" and "If there persists the issue, please update us with the ssh login details that you are trying to access the server so that we can have a detailed on it."
02:56 πŸ”— joepie91 ??!???!
02:56 πŸ”— joepie91 and on top of all that I can't get to the goddamn ticket in the panel because none of the items in the ticket list are actually clickable
02:56 πŸ”— joepie91 holy tits batman
02:56 πŸ”— joepie91 (this is why I don't run anything production there)
03:23 πŸ”— danneh_ ew, that sounds fun
03:24 πŸ”— danneh_ also, not worried but any response on that imageboard archiving stuff joepie91?
03:25 πŸ”— joepie91 danneh_: um, remind me?
03:28 πŸ”— joepie91 danneh_: what specifically are you refering to? :P
03:28 πŸ”— danneh_ all good, I'm writing an imageboard archiving specification (4chan, etc), was wondering whether anyone who does 'proper' archiving or who's written similar sorts of specs could take a look over it and give me some pointers
03:29 πŸ”— joepie91 unrelated, holy shit bitcasa: https://twitter.com/CloudStorageBuz/status/536950580930158592
03:29 πŸ”— joepie91 .tw https://twitter.com/BTapdicky/status/536963205046607872
03:29 πŸ”— botpie91 @CloudStorageBuz @Bitcasa what were they thinking? 900k a month in hosting charges with only 250k in revenue. Their CEO failed (@BTapdicky)
03:29 πŸ”— joepie91 .tw https://twitter.com/BTapdicky/status/536954071244996608
03:29 πŸ”— botpie91 @CloudStorageBuz @Bitcasa they are 6 million in debt. They are done for (@BTapdicky)
03:29 πŸ”— danneh_ just curious, if not it should be alright regardless, thought I'd ask around and see, since first time writing a proper filetype specification
03:30 πŸ”— joepie91 danneh_: ah, haven't really done anything on it ye
03:30 πŸ”— joepie91 yet *
03:30 πŸ”— joepie91 standard WARC should suffice, though?
03:30 πŸ”— joepie91 a custom script that can just append what it finds to a WARC
03:30 πŸ”— danneh_ wow, nice job bitcasa
03:31 πŸ”— joepie91 oh, apparently I'm hosting the opposition docs
03:31 πŸ”— joepie91 lol
03:32 πŸ”— danneh_ not really unfortunately, more aimed at sites that backup 4chan threads and host them like http://archive.moe and programs that archive threads on home users' systems
03:32 πŸ”— danneh_ because it's being read into other databases and software systems, need to extract lots of data and store it differently, though the spec does have a folder for warc dumps
03:33 πŸ”— joepie91 danneh_: not sure what exactly you're trying to accomplis
03:33 πŸ”— joepie91 accomplish *
03:33 πŸ”— joepie91 you can replay threads from a WARC archive, no?
03:39 πŸ”— danneh_ that's fair enough, though none of the proper archivers actually use warc files and the home archivers need to rewrite the html files and move all the files into different folders
03:39 πŸ”— danneh_ I'll probably go have a good think about it
03:41 πŸ”— yipdw WARC would be nice because you have the force multiplier effect
03:42 πŸ”— yipdw alternatively if you don't like WARC for some reason, HAR
03:42 πŸ”— yipdw the worst decision is to come up with yet another format
03:43 πŸ”— joepie91 :P
03:43 πŸ”— danneh_ nah, WARC is awesome
03:43 πŸ”— joepie91 danneh_: write your own tool!
03:44 πŸ”— danneh_ but it's never gonna happen, especially for that old data where it's already captured and imported into their system
03:44 πŸ”— danneh_ joepie91: I am, this is why I got into doing this!
03:44 πŸ”— joepie91 danneh_: what language are you using?
03:44 πŸ”— danneh_ sick of every single imageboard thread saver having its own folder structure and layout and all, nothing being compatible with each other
03:45 πŸ”— danneh_ using Python
03:45 πŸ”— joepie91 warc tools readily available, then :)
03:47 πŸ”— danneh_ 'course, and my tool will save the warcs, but the big guys'll never do WARC, just doesn't make sense for them
03:48 πŸ”— joepie91 danneh_: I don't really see why not?
03:48 πŸ”— Ravenloft has joined #archiveteam-bs
03:49 πŸ”— godane my reuters.com 2007 pages grab is almost done
03:49 πŸ”— godane :-D
03:51 πŸ”— ionpulse hmm, anyone know how crawling is being prevented by wget --page-requisites, despite setting custom user agent, bind address, and disabling robots?
03:51 πŸ”— ionpulse scratching my head trying to think of what could be done at the server level to detect and prevent the traffic
03:51 πŸ”— yipdw accept header
03:51 πŸ”— danneh_ aha, they're focused on being a user-friendly archive (searching of metadata and thread content and all sorts of filtering and junk) than saving everything perfectly, might be able to eventually convince them if they get some bigger servers down the line
03:52 πŸ”— danneh_ but they mostly just download the page, read data into their db and then throw the page itself away
03:52 πŸ”— ionpulse ah, will try that yipdw
03:52 πŸ”— yipdw ionpulse: also request timing
03:52 πŸ”— danneh_ I'll have a look through, see what I can do with them
03:52 πŸ”— yipdw also the IP you're using may already be flagged
03:52 πŸ”— ionpulse yea I tried that, set random wait, and high wait time, but its not having an impact
03:52 πŸ”— ionpulse i tried a completly different ip
03:53 πŸ”— joepie91 ionpulse: could just be a bot trap
03:53 πŸ”— joepie91 hidden <a href> that blocks the IP
03:53 πŸ”— joepie91 when followed
03:54 πŸ”— ionpulse ooo sneaky
03:55 πŸ”— ionpulse wonder if regex reject would solve for that
03:56 πŸ”— joepie91 if you can find the bot trap, then yes
03:56 πŸ”— joepie91 :P
03:56 πŸ”— ionpulse I have code to compare from before and after
03:56 πŸ”— ionpulse I got all of the data i needed from the resource for now. The clamp down was most likely a reaction to my activity on the site.
03:57 πŸ”— ionpulse But I was doubling back and running a quick test, and noticed the change in wget's ability to introspect the site.
04:25 πŸ”— mistym has quit IRC (Remote host closed the connection)
04:51 πŸ”— joepie91 well, this is a new one
04:52 πŸ”— joepie91 phone of a Ferguson livestreamer (that he was using to record) was apparently stolen mid-broadcast
04:52 πŸ”— joepie91 in front of 80k viewers
04:52 πŸ”— joepie91 wat.avi
05:01 πŸ”— aaaaaaaaa has quit IRC (Leaving)
05:02 πŸ”— mistym has joined #archiveteam-bs
06:04 πŸ”— BlueMaxim has joined #archiveteam-bs
06:17 πŸ”— godane uploaded: http://archive.org/details/www.reuters.com-2007-pages-20141124
06:45 πŸ”— ex-parro1 has joined #archiveteam-bs
07:05 πŸ”— ivan` has quit IRC (Read error: Operation timed out)
07:08 πŸ”— ivan` has joined #archiveteam-bs
07:09 πŸ”— amerrykan has quit IRC (Quit: Quitting)
07:12 πŸ”— amerrykan has joined #archiveteam-bs
07:20 πŸ”— primus104 has joined #archiveteam-bs
07:39 πŸ”— primus104 has quit IRC (Leaving.)
07:46 πŸ”— human39 has quit IRC (Read error: Operation timed out)
08:01 πŸ”— human39 has joined #archiveteam-bs
08:06 πŸ”— mistym has quit IRC (Leaving...)
08:58 πŸ”— midas can we grab this? http://abc7news.com/live/
08:58 πŸ”— schbirid has joined #archiveteam-bs
08:58 πŸ”— midas crap, ended
09:06 πŸ”— Lord_Nigh has quit IRC (Read error: Operation timed out)
09:10 πŸ”— Lord_Nigh has joined #archiveteam-bs
09:37 πŸ”— midas has quit IRC (Quit: WeeChat 0.4.3)
09:38 πŸ”— midas has joined #archiveteam-bs
10:43 πŸ”— primus104 has joined #archiveteam-bs
12:24 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
12:57 πŸ”— arkiver 1200 TB added to IA again :)
13:41 πŸ”— schbirid how much is that in failed yc startups?
13:52 πŸ”— ersi like 500 startups
13:52 πŸ”— ersi bazing
14:12 πŸ”— BiggieJo1 has joined #archiveteam-bs
14:15 πŸ”— BiggieJon has quit IRC (Read error: Operation timed out)
14:16 πŸ”— sankin has joined #archiveteam-bs
14:33 πŸ”— primus104 has quit IRC (Leaving.)
15:16 πŸ”— DFJustin heh that should last us through thanksgiving hopefully
15:26 πŸ”— aaaaaaaaa has joined #archiveteam-bs
15:26 πŸ”— ersi or well, this day or tomorro
16:03 πŸ”— schbirid http://www.ebay.com/itm/301380770152
16:04 πŸ”— schbirid 5TB for 130usd or someting
16:04 πŸ”— schbirid price does not display for me
16:07 πŸ”— balrog schbirid: seagate ;(
16:07 πŸ”— schbirid so?
16:11 πŸ”— yipdw my ZFS pool is six Barracudas, they're doing fine
16:27 πŸ”— brayden_ has quit IRC (Ping timeout: 606 seconds)
16:40 πŸ”— balrog their consumer end drives are lousy
16:40 πŸ”— balrog (I don't trust wd green drives very much either)
16:58 πŸ”— primus104 has joined #archiveteam-bs
17:04 πŸ”— primus104 has quit IRC (Leaving.)
17:30 πŸ”— mistym has joined #archiveteam-bs
17:37 πŸ”— joepie91 !a https://www.alchemistowl.org/pocorgtfo/
17:37 πŸ”— joepie91 eh, wrong channek
17:37 πŸ”— joepie91 channel
17:49 πŸ”— godane uploaded: https://archive.org/details/www.reuters.com-2008-pages-20141125
18:14 πŸ”— primus104 has joined #archiveteam-bs
18:41 πŸ”— ete has joined #archiveteam-bs
18:45 πŸ”— Pamela24 has joined #archiveteam-bs
18:45 πŸ”— Pamela24 has quit IRC (Read error: Connection reset by peer)
19:10 πŸ”— dashcloud has quit IRC (Ping timeout: 265 seconds)
19:10 πŸ”— dashcloud has joined #archiveteam-bs
19:30 πŸ”— joepie91 !!!!!
19:30 πŸ”— joepie91 there's a talk at 31c3 about geocities and "one terabyte of kilobyte age"!
19:31 πŸ”— schbirid already marked as "i want to see this" :)
19:39 πŸ”— ex-parro1 has quit IRC (Leaving.)
19:40 πŸ”— BlueMaxim has joined #archiveteam-bs
19:40 πŸ”— SketchCow Probably olia
19:42 πŸ”— schbirid yeah
19:42 πŸ”— schbirid http://halfnarp.events.ccc.de/
19:43 πŸ”— SketchCow Yeah, Olia
19:43 πŸ”— ete has quit IRC (Remote host closed the connection)
19:43 πŸ”— SketchCow that darling
19:48 πŸ”— * joepie91 selects ALL THE TALKS
19:55 πŸ”— garyrh http://www.americanradiohistory.com/Popular-Electronics-Guide.htm is pretty cool
20:31 πŸ”— bsmith093 has quit IRC (Read error: Operation timed out)
20:33 πŸ”— schbirid joepie91: have you looked at art & culture? :x
20:40 πŸ”— joepie91 schbirid: mmm?
20:40 πŸ”— joepie91 yes?
20:40 πŸ”— schbirid you select all of them? :P
20:41 πŸ”— joepie91 schbirid: nah, not all of them
20:41 πŸ”— joepie91 :P
20:41 πŸ”— schbirid :)
20:41 πŸ”— joepie91 only the English ones
20:41 πŸ”— schbirid i managed to select one
20:41 πŸ”— joepie91 :D
20:41 πŸ”— schbirid aaaaah
20:41 πŸ”— joepie91 well to be fair
20:41 πŸ”— joepie91 I picked one German talk
20:41 πŸ”— joepie91 because I need to work on my understanding of spoken german
20:41 πŸ”— joepie91 lol
20:42 πŸ”— joepie91 unrelated, for anybody who wasn't aware yet
20:42 πŸ”— joepie91 https://twitter.com/CloudStorageBuz/status/536950580930158592
20:42 πŸ”— joepie91 Bitcasa is 6 million in debt, 900k/mo hosting charges and 250k/mo revenue
20:42 πŸ”— balrog they're fucked
20:43 πŸ”— joepie91 yep.
20:43 πŸ”— joepie91 this is a very good example of why not to entrust your data to a "cloud storage" company
20:43 πŸ”— joepie91 (or any company, really)
20:43 πŸ”— joepie91 that it's a company doesn't mean it's sustainable...
20:43 πŸ”— mistym has quit IRC (Remote host closed the connection)
20:44 πŸ”— schbirid the only safe place is your own butt
20:49 πŸ”— schbirid One Infinite user, in 22 particular, used Bitcasa to store 82TB of data
20:49 πŸ”— schbirid ^ identify yourself :)
20:51 πŸ”— primus104 has quit IRC (Leaving.)
20:52 πŸ”— bsmith093 has joined #archiveteam-bs
20:55 πŸ”— Kazzy 'For most of the company's life, Bitcasa has had no way to identify data on the site that was abandoned by users who cancelled their accounts.'
20:55 πŸ”— antomatic ** siren **
20:55 πŸ”— antomatic "Fail! Fail!"
20:58 πŸ”— joepie91 it's amazing, isn't it
20:58 πŸ”— joepie91 antomatic: your siren immediately made me check whether we were in -bs >.>
20:59 πŸ”— antomatic The fail siren is cross-border. :)
20:59 πŸ”— antomatic How can they not know who owned what data?
21:00 πŸ”— antomatic How can they not identify all files from paying customers, then delete the rest?
21:00 πŸ”— schbirid how can they see how much data a customer uses but not where it is
21:01 πŸ”— antomatic that's like... even if you DON'T know, you must be able to find out.
21:02 πŸ”— antomatic I don't know where my phone is, but I don't just give up and say "oh well, phone gone."
21:02 πŸ”— schbirid on the other hand, apparently bitcasa was a really good host in terms of privacy and security from spies
21:02 πŸ”— joepie91 [22:02] <antomatic> I don't know where my phone is, but I don't just give up and say "oh well, phone gone."
21:02 πŸ”— joepie91 hehehe
21:02 πŸ”— joepie91 jason scott analogy, I see :)
21:03 πŸ”— joepie91 schbirid: they're so unaware of what you're hosting, they couldn't help the agencies if they tried!
21:03 πŸ”— joepie91 :p
21:03 πŸ”— schbirid exactly
21:03 πŸ”— schbirid win win!
21:03 πŸ”— antomatic was that on purpose or by accident? :)
21:04 πŸ”— joepie91 hehe
21:04 πŸ”— schbirid dammit, still no working fuse for weiyun or kuaipan :(
21:06 πŸ”— mistym has joined #archiveteam-bs
21:24 πŸ”— joepie91 ?
21:27 πŸ”— balrog what was bitcasa even using for storage?
21:27 πŸ”— balrog did they have their own servers?
21:28 πŸ”— balrog btw I'm surprised larger media outlets haven't reported on this
21:30 πŸ”— joepie91 balrog: S3
21:30 πŸ”— joepie91 (yes, really)
21:30 πŸ”— lytv has quit IRC (Ping timeout: 272 seconds)
21:32 πŸ”— schbirid balrog: Since the start of 2014, Bitcasa has incurred approximately $9 million in hosting charges to AWS and is now over $6 million in debt.
21:36 πŸ”— aaaaaaaaa has quit IRC (Leaving)
21:38 πŸ”— aaaaaaaaa has joined #archiveteam-bs
21:39 πŸ”— DFJustin wooow
21:40 πŸ”— aaaaaaaaa Wonder how much VC funding they lost. You can only spend that kind of money when you are used to burning it.
21:41 πŸ”— mistym has quit IRC (Remote host closed the connection)
21:43 πŸ”— aaaaaaaaa Just went to look that up and saw someone vandalized their wikipedia page.
21:43 πŸ”— balrog wtf, "unlimited storage" on S3?
21:43 πŸ”— balrog that's just... stupid
21:44 πŸ”— Kazzy someone somewhere funded that.
21:51 πŸ”— deathy so.. AWS seems to be a good business
21:54 πŸ”— schbirid has quit IRC (Leaving)
21:55 πŸ”— sankin has quit IRC (Leaving.)
21:57 πŸ”— xmc being aws seems like it would be profitable, yes
21:58 πŸ”— mistym has joined #archiveteam-bs
21:59 πŸ”— primus104 has joined #archiveteam-bs
22:02 πŸ”— deathy interesting, the 82 TB user, "singlehandedly costing Bitcasa approximately $3,000 or more per month in server storage fees"
22:02 πŸ”— deathy what was the IA calculation per TB? (think I saw one during twitch discussions)
22:03 πŸ”— DFJustin $2,000 for forever
22:08 πŸ”— deathy "Bitcasa’s estimates suggest that 1TB of data could be migrated in approximately 5 hours, and that up to 10TB of data could be migrated in two days."
22:08 πŸ”— deathy did users have to download / re-upload the data?
22:12 πŸ”— joepie91 [22:43] <aaaaaaaaa> Just went to look that up and saw someone vandalized their wikipedia page.
22:12 πŸ”— joepie91 linky!
22:12 πŸ”— joepie91 deathy: their estimates are a lie
22:14 πŸ”— lytv has joined #archiveteam-bs
22:14 πŸ”— aaaaaaaaa joepie91: https://en.wikipedia.org/w/index.php?title=Bitcasa&oldid=635325758
22:14 πŸ”— deathy yeah, one of their estimates is like 3 days, 7 hours each day at continuous 100 Mbps ... and they say average user can download in 3 days..
22:15 πŸ”— joepie91 aaaaaaaaa: ouch
22:16 πŸ”— joepie91 The company has a patent pending for an "infinite storage"[12] algorithm designed to reduce the actual storage space by identifying duplicate content and providing encryption of the stored data.
22:16 πŸ”— joepie91 lol
22:16 πŸ”— joepie91 how's that working out for you, bitcasa :P
22:20 πŸ”— godane is that not callled deduping that zfs and other filesystems do
22:23 πŸ”— aaaaaaaaa no, its convergent encryption
22:23 πŸ”— aaaaaaaaa i.e. gaping security hole.
22:29 πŸ”— godane ok
22:30 πŸ”— godane i figured that data would be check againist the un-encrypted one when uploaded
22:31 πŸ”— godane it maybe encrypted but as a some sort of plain text with md5sum/sha256sum in it
22:36 πŸ”— aaaaaaaaa Just read the patent
22:37 πŸ”— aaaaaaaaa they take the data an split it into chunks
22:37 πŸ”— aaaaaaaaa hash the chunk and use that hash as the encryption key
22:37 πŸ”— ete has joined #archiveteam-bs
22:37 πŸ”— joepie91 aaaaaaaaa: sorry, they do what?
22:37 πŸ”— aaaaaaaaa They then make a manifest of all the chunks owned by a person and encrypt that with the user key.
22:38 πŸ”— joepie91 okay...
22:38 πŸ”— joepie91 and they keep a quota separately, but don't keep an unencrypted list of who owns what chunks since that could break confidentiality aspect of the crypto
22:39 πŸ”— joepie91 thus unable to figure out which data belongs to whom
22:39 πŸ”— joepie91 itallmakessensenow.avi
22:40 πŸ”— aaaaaaaaa https://www.google.com/patents/US20130305039?dq=inassignee:%22Bitcasa,+Inc.%22&ei=AAR1VKWTPNLesASoroLoBw&cl=en

irclogger-viewer