[00:59] *** LordNigh2 has joined #archiveteam-bs [01:07] *** Lord_Nigh has quit IRC (Ping timeout: 600 seconds) [01:07] *** LordNigh2 is now known as Lord_Nigh [01:27] i'm now at 296,230 items uploaded [01:28] at this rate i will get the hit the 300k mart by thinksgiving [01:28] *thanksgiving [01:28] *** Rickster has quit IRC (ircd.choopa.net irc.eversible.com) [01:28] *** beardicus has quit IRC (ircd.choopa.net irc.eversible.com) [01:28] *** arkiver has quit IRC (ircd.choopa.net irc.eversible.com) [01:28] *** joepie91 has quit IRC (ircd.choopa.net irc.eversible.com) [01:28] *** Zebranky has quit IRC (ircd.choopa.net irc.eversible.com) [01:28] *** slash` has quit IRC (ircd.choopa.net irc.eversible.com) [01:28] *** Rallias has quit IRC (ircd.choopa.net irc.eversible.com) [01:28] *** balrog has quit IRC (ircd.choopa.net irc.eversible.com) [01:28] *** nico has quit IRC (ircd.choopa.net irc.eversible.com) [01:28] *** yipdw has quit IRC (ircd.choopa.net irc.eversible.com) [01:29] do remember that the archive.org staff would like to have a nice uninterrupted Thanksgiving week, and (I hope) would be slow to respond because they are on vacation [01:29] ok [01:30] i will be backing off of the ERIC items during thanksgiving [01:31] i may start uploading more funny or die videos [01:31] during that time [01:31] it thanks longer to upload and there are less items so less stress on server load [01:32] *takes longer [01:33] *** Rickster has joined #archiveteam-bs [01:33] *** beardicus has joined #archiveteam-bs [01:33] *** arkiver has joined #archiveteam-bs [01:33] *** joepie91 has joined #archiveteam-bs [01:33] *** Rallias has joined #archiveteam-bs [01:33] *** Zebranky has joined #archiveteam-bs [01:33] *** slash` has joined #archiveteam-bs [01:33] *** yipdw has joined #archiveteam-bs [01:33] *** balrog has joined #archiveteam-bs [01:33] *** nico has joined #archiveteam-bs [01:33] *** irc.eversible.com sets mode: +o balrog [01:41] *** primus104 has quit IRC (Leaving.) [01:42] *** mistym has quit IRC (Remote host closed the connection) [02:01] *** Boppen has quit IRC (Ping timeout: 198 seconds) [02:05] *** mistym has joined #archiveteam-bs [02:13] *** Boppen has joined #archiveteam-bs [02:25] *** TFGBD has quit IRC (12( www.nnscript.com 12:: NoNameScript 4.22 12:: www.esnation.com 12)) [02:54] wow [02:54] I think urpad just won the award for worst fucking hosting company ever [02:54] jesus christ [02:54] one of my VMs disappeared [02:55] just, there was a different VM on the IP with a differennt SSH host key [02:55] and WHMCS couldn't connect to mine anymore either [02:55] I file a support ticket [02:55] they continue to reboot the *wrong* VM (the other one I have with them that DID work) [02:56] then respond with "okay can you try now" and "If there persists the issue, please update us with the ssh login details that you are trying to access the server so that we can have a detailed on it." [02:56] ??!???! [02:56] and on top of all that I can't get to the goddamn ticket in the panel because none of the items in the ticket list are actually clickable [02:56] holy tits batman [02:56] (this is why I don't run anything production there) [03:23] ew, that sounds fun [03:24] also, not worried but any response on that imageboard archiving stuff joepie91? [03:25] danneh_: um, remind me? [03:28] danneh_: what specifically are you refering to? :P [03:28] all good, I'm writing an imageboard archiving specification (4chan, etc), was wondering whether anyone who does 'proper' archiving or who's written similar sorts of specs could take a look over it and give me some pointers [03:29] unrelated, holy shit bitcasa: https://twitter.com/CloudStorageBuz/status/536950580930158592 [03:29] .tw https://twitter.com/BTapdicky/status/536963205046607872 [03:29] @CloudStorageBuz @Bitcasa what were they thinking? 900k a month in hosting charges with only 250k in revenue. Their CEO failed (@BTapdicky) [03:29] .tw https://twitter.com/BTapdicky/status/536954071244996608 [03:29] @CloudStorageBuz @Bitcasa they are 6 million in debt. They are done for (@BTapdicky) [03:29] just curious, if not it should be alright regardless, thought I'd ask around and see, since first time writing a proper filetype specification [03:30] danneh_: ah, haven't really done anything on it ye [03:30] yet * [03:30] standard WARC should suffice, though? [03:30] a custom script that can just append what it finds to a WARC [03:30] wow, nice job bitcasa [03:31] oh, apparently I'm hosting the opposition docs [03:31] lol [03:32] not really unfortunately, more aimed at sites that backup 4chan threads and host them like http://archive.moe and programs that archive threads on home users' systems [03:32] because it's being read into other databases and software systems, need to extract lots of data and store it differently, though the spec does have a folder for warc dumps [03:33] danneh_: not sure what exactly you're trying to accomplis [03:33] accomplish * [03:33] you can replay threads from a WARC archive, no? [03:39] that's fair enough, though none of the proper archivers actually use warc files and the home archivers need to rewrite the html files and move all the files into different folders [03:39] I'll probably go have a good think about it [03:41] WARC would be nice because you have the force multiplier effect [03:42] alternatively if you don't like WARC for some reason, HAR [03:42] the worst decision is to come up with yet another format [03:43] :P [03:43] nah, WARC is awesome [03:43] danneh_: write your own tool! [03:44] but it's never gonna happen, especially for that old data where it's already captured and imported into their system [03:44] joepie91: I am, this is why I got into doing this! [03:44] danneh_: what language are you using? [03:44] sick of every single imageboard thread saver having its own folder structure and layout and all, nothing being compatible with each other [03:45] using Python [03:45] warc tools readily available, then :) [03:47] 'course, and my tool will save the warcs, but the big guys'll never do WARC, just doesn't make sense for them [03:48] danneh_: I don't really see why not? [03:48] *** Ravenloft has joined #archiveteam-bs [03:49] my reuters.com 2007 pages grab is almost done [03:49] :-D [03:51] hmm, anyone know how crawling is being prevented by wget --page-requisites, despite setting custom user agent, bind address, and disabling robots? [03:51] scratching my head trying to think of what could be done at the server level to detect and prevent the traffic [03:51] accept header [03:51] aha, they're focused on being a user-friendly archive (searching of metadata and thread content and all sorts of filtering and junk) than saving everything perfectly, might be able to eventually convince them if they get some bigger servers down the line [03:52] but they mostly just download the page, read data into their db and then throw the page itself away [03:52] ah, will try that yipdw [03:52] ionpulse: also request timing [03:52] I'll have a look through, see what I can do with them [03:52] also the IP you're using may already be flagged [03:52] yea I tried that, set random wait, and high wait time, but its not having an impact [03:52] i tried a completly different ip [03:53] ionpulse: could just be a bot trap [03:53] hidden that blocks the IP [03:53] when followed [03:54] ooo sneaky [03:55] wonder if regex reject would solve for that [03:56] if you can find the bot trap, then yes [03:56] :P [03:56] I have code to compare from before and after [03:56] I got all of the data i needed from the resource for now. The clamp down was most likely a reaction to my activity on the site. [03:57] But I was doubling back and running a quick test, and noticed the change in wget's ability to introspect the site. [04:25] *** mistym has quit IRC (Remote host closed the connection) [04:51] well, this is a new one [04:52] phone of a Ferguson livestreamer (that he was using to record) was apparently stolen mid-broadcast [04:52] in front of 80k viewers [04:52] wat.avi [05:01] *** aaaaaaaaa has quit IRC (Leaving) [05:02] *** mistym has joined #archiveteam-bs [06:04] *** BlueMaxim has joined #archiveteam-bs [06:17] uploaded: http://archive.org/details/www.reuters.com-2007-pages-20141124 [06:45] *** ex-parro1 has joined #archiveteam-bs [07:05] *** ivan` has quit IRC (Read error: Operation timed out) [07:08] *** ivan` has joined #archiveteam-bs [07:09] *** amerrykan has quit IRC (Quit: Quitting) [07:12] *** amerrykan has joined #archiveteam-bs [07:20] *** primus104 has joined #archiveteam-bs [07:39] *** primus104 has quit IRC (Leaving.) [07:46] *** human39 has quit IRC (Read error: Operation timed out) [08:01] *** human39 has joined #archiveteam-bs [08:06] *** mistym has quit IRC (Leaving...) [08:58] can we grab this? http://abc7news.com/live/ [08:58] *** schbirid has joined #archiveteam-bs [08:58] crap, ended [09:06] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [09:10] *** Lord_Nigh has joined #archiveteam-bs [09:37] *** midas has quit IRC (Quit: WeeChat 0.4.3) [09:38] *** midas has joined #archiveteam-bs [10:43] *** primus104 has joined #archiveteam-bs [12:24] *** BlueMaxim has quit IRC (Quit: Leaving) [12:57] 1200 TB added to IA again :) [13:41] how much is that in failed yc startups? [13:52] like 500 startups [13:52] bazing [14:12] *** BiggieJo1 has joined #archiveteam-bs [14:15] *** BiggieJon has quit IRC (Read error: Operation timed out) [14:16] *** sankin has joined #archiveteam-bs [14:33] *** primus104 has quit IRC (Leaving.) [15:16] heh that should last us through thanksgiving hopefully [15:26] *** aaaaaaaaa has joined #archiveteam-bs [15:26] or well, this day or tomorro [16:03] http://www.ebay.com/itm/301380770152 [16:04] 5TB for 130usd or someting [16:04] price does not display for me [16:07] schbirid: seagate ;( [16:07] so? [16:11] my ZFS pool is six Barracudas, they're doing fine [16:27] *** brayden_ has quit IRC (Ping timeout: 606 seconds) [16:40] their consumer end drives are lousy [16:40] (I don't trust wd green drives very much either) [16:58] *** primus104 has joined #archiveteam-bs [17:04] *** primus104 has quit IRC (Leaving.) [17:30] *** mistym has joined #archiveteam-bs [17:37] !a https://www.alchemistowl.org/pocorgtfo/ [17:37] eh, wrong channek [17:37] channel [17:49] uploaded: https://archive.org/details/www.reuters.com-2008-pages-20141125 [18:14] *** primus104 has joined #archiveteam-bs [18:41] *** ete has joined #archiveteam-bs [18:45] *** Pamela24 has joined #archiveteam-bs [18:45] *** Pamela24 has quit IRC (Read error: Connection reset by peer) [19:10] *** dashcloud has quit IRC (Ping timeout: 265 seconds) [19:10] *** dashcloud has joined #archiveteam-bs [19:30] !!!!! [19:30] there's a talk at 31c3 about geocities and "one terabyte of kilobyte age"! [19:31] already marked as "i want to see this" :) [19:39] *** ex-parro1 has quit IRC (Leaving.) [19:40] *** BlueMaxim has joined #archiveteam-bs [19:40] Probably olia [19:42] yeah [19:42] http://halfnarp.events.ccc.de/ [19:43] Yeah, Olia [19:43] *** ete has quit IRC (Remote host closed the connection) [19:43] that darling [19:48] * joepie91 selects ALL THE TALKS [19:55] http://www.americanradiohistory.com/Popular-Electronics-Guide.htm is pretty cool [20:31] *** bsmith093 has quit IRC (Read error: Operation timed out) [20:33] joepie91: have you looked at art & culture? :x [20:40] schbirid: mmm? [20:40] yes? [20:40] you select all of them? :P [20:41] schbirid: nah, not all of them [20:41] :P [20:41] :) [20:41] only the English ones [20:41] i managed to select one [20:41] :D [20:41] aaaaah [20:41] well to be fair [20:41] I picked one German talk [20:41] because I need to work on my understanding of spoken german [20:41] lol [20:42] unrelated, for anybody who wasn't aware yet [20:42] https://twitter.com/CloudStorageBuz/status/536950580930158592 [20:42] Bitcasa is 6 million in debt, 900k/mo hosting charges and 250k/mo revenue [20:42] they're fucked [20:43] yep. [20:43] this is a very good example of why not to entrust your data to a "cloud storage" company [20:43] (or any company, really) [20:43] that it's a company doesn't mean it's sustainable... [20:43] *** mistym has quit IRC (Remote host closed the connection) [20:44] the only safe place is your own butt [20:49] One Infinite user, in 22 particular, used Bitcasa to store 82TB of data [20:49] ^ identify yourself :) [20:51] *** primus104 has quit IRC (Leaving.) [20:52] *** bsmith093 has joined #archiveteam-bs [20:55] 'For most of the company's life, Bitcasa has had no way to identify data on the site that was abandoned by users who cancelled their accounts.' [20:55] ** siren ** [20:55] "Fail! Fail!" [20:58] it's amazing, isn't it [20:58] antomatic: your siren immediately made me check whether we were in -bs >.> [20:59] The fail siren is cross-border. :) [20:59] How can they not know who owned what data? [21:00] How can they not identify all files from paying customers, then delete the rest? [21:00] how can they see how much data a customer uses but not where it is [21:01] that's like... even if you DON'T know, you must be able to find out. [21:02] I don't know where my phone is, but I don't just give up and say "oh well, phone gone." [21:02] on the other hand, apparently bitcasa was a really good host in terms of privacy and security from spies [21:02] [22:02] I don't know where my phone is, but I don't just give up and say "oh well, phone gone." [21:02] hehehe [21:02] jason scott analogy, I see :) [21:03] schbirid: they're so unaware of what you're hosting, they couldn't help the agencies if they tried! [21:03] :p [21:03] exactly [21:03] win win! [21:03] was that on purpose or by accident? :) [21:04] hehe [21:04] dammit, still no working fuse for weiyun or kuaipan :( [21:06] *** mistym has joined #archiveteam-bs [21:24] ? [21:27] what was bitcasa even using for storage? [21:27] did they have their own servers? [21:28] btw I'm surprised larger media outlets haven't reported on this [21:30] balrog: S3 [21:30] (yes, really) [21:30] *** lytv has quit IRC (Ping timeout: 272 seconds) [21:32] balrog: Since the start of 2014, Bitcasa has incurred approximately $9 million in hosting charges to AWS and is now over $6 million in debt. [21:36] *** aaaaaaaaa has quit IRC (Leaving) [21:38] *** aaaaaaaaa has joined #archiveteam-bs [21:39] wooow [21:40] Wonder how much VC funding they lost. You can only spend that kind of money when you are used to burning it. [21:41] *** mistym has quit IRC (Remote host closed the connection) [21:43] Just went to look that up and saw someone vandalized their wikipedia page. [21:43] wtf, "unlimited storage" on S3? [21:43] that's just... stupid [21:44] someone somewhere funded that. [21:51] so.. AWS seems to be a good business [21:54] *** schbirid has quit IRC (Leaving) [21:55] *** sankin has quit IRC (Leaving.) [21:57] being aws seems like it would be profitable, yes [21:58] *** mistym has joined #archiveteam-bs [21:59] *** primus104 has joined #archiveteam-bs [22:02] interesting, the 82 TB user, "singlehandedly costing Bitcasa approximately $3,000 or more per month in server storage fees" [22:02] what was the IA calculation per TB? (think I saw one during twitch discussions) [22:03] $2,000 for forever [22:08] "Bitcasa’s estimates suggest that 1TB of data could be migrated in approximately 5 hours, and that up to 10TB of data could be migrated in two days." [22:08] did users have to download / re-upload the data? [22:12] [22:43] Just went to look that up and saw someone vandalized their wikipedia page. [22:12] linky! [22:12] deathy: their estimates are a lie [22:14] *** lytv has joined #archiveteam-bs [22:14] joepie91: https://en.wikipedia.org/w/index.php?title=Bitcasa&oldid=635325758 [22:14] yeah, one of their estimates is like 3 days, 7 hours each day at continuous 100 Mbps ... and they say average user can download in 3 days.. [22:15] aaaaaaaaa: ouch [22:16] The company has a patent pending for an "infinite storage"[12] algorithm designed to reduce the actual storage space by identifying duplicate content and providing encryption of the stored data. [22:16] lol [22:16] how's that working out for you, bitcasa :P [22:20] is that not callled deduping that zfs and other filesystems do [22:23] no, its convergent encryption [22:23] i.e. gaping security hole. [22:29] ok [22:30] i figured that data would be check againist the un-encrypted one when uploaded [22:31] it maybe encrypted but as a some sort of plain text with md5sum/sha256sum in it [22:36] Just read the patent [22:37] they take the data an split it into chunks [22:37] hash the chunk and use that hash as the encryption key [22:37] *** ete has joined #archiveteam-bs [22:37] aaaaaaaaa: sorry, they do what? [22:37] They then make a manifest of all the chunks owned by a person and encrypt that with the user key. [22:38] okay... [22:38] and they keep a quota separately, but don't keep an unencrypted list of who owns what chunks since that could break confidentiality aspect of the crypto [22:39] thus unable to figure out which data belongs to whom [22:39] itallmakessensenow.avi [22:40] https://www.google.com/patents/US20130305039?dq=inassignee:%22Bitcasa,+Inc.%22&ei=AAR1VKWTPNLesASoroLoBw&cl=en