[00:07] Links are pretty much dead when searching 'wakoopa profile'. Google has some cached versions though. [00:18] Yahoo is shutting down Koprol August 28th 2012. Koprol is a location-based social networking site. [00:18] "...over the coming quarters, we are shutting down or transitioning a number of products..." [00:19] Koprol isn't the only product Yahoo! is shutting down. [00:19] soon yahoo's only business will be selling dialup internet and renting rotary-dial phones [00:20] You can only stand so tall for so long [00:20] (I guess) [00:21] Oh and here is the announcement http://www.koprolblog.com/2012/06/bye/ [00:32] Oh and last December there was an article on Reuters about how Yahoo is shutting down 4 of their entertainment blogs: The Set, The Famous, The Amplifier, The Projector. http://www.reuters.com/article/2011/12/02/idUS162033995420111202 [00:34] But for some reason they are back up. [01:08] a site that I think was archived is now basically dead: http://www.artknowledgenews.com/ [01:08] note that's a single image driving the whole website. not sure if the remaining content is still there [07:06] iWork is going byebyes, I've never used it, is there anything we can grab? [08:33] Any idea how large the whole Usenet archive stored at Google Groups (excluding the binaries groups) is ? [08:51] does google's mangled usenet archive even cover binaries? [08:52] someone give me a 4TB drive and a time machine. (I can use my own laptop to interface with the drive). I can then go infiltrate deja before google screwed everything up. [08:54] it doesn't [08:54] i wish it did [08:54] i maybe able to find old techtv shows on there if it did [08:55] http://www.zeropaid.com/bbs/threads/14705-Tech-TV-Music-Wars-Special-on-eDonkey-Network [08:55] it was pirated [08:56] but the razorback server was taking down back in 2006: http://www.slyck.com/news.php?story=1102 [09:24] It's just that Usenet archives are an impressive treasure trove of information imprisoned inside google servers. It's a very unconfortable situation. [09:25] wasnt google groups one of AT'S first projects? [09:27] godane: the commercial usenet providers have very long binaries retention [09:27] first projects? no [09:27] but there was some focus, but you're confusing google groups with google groups. [09:28] (usenet vs mailing lists) [09:28] ah, i was thinking of google groups [09:30] The Usenet part of Google Groups [09:31] i know you meant that X-Scale, but Schbirid was thinking of AT's google groups (mailing list) files effort [09:31] i was not aware it was "just" mailing lists :( [09:31] that sucks [09:32] and I really want to hurt the people that made the decision to name their ML project the same as an existing project [09:33] at least they did not name a programming language go [09:33] has anyone made a language named "stop" yet? [09:40] we only grabbed the files and pages sections of mailing lists (which is the only stuff that was being taken down) [09:41] no list messages or usenet posts [09:49] I see. But is there any estimate how large the whole Usenet archive (since 1980) is ? [09:51] (excluding all kinds of binaries, of course) [11:07] well X-Scale I got a partial answer [11:11] here is 1981- june 1991 http://archive.org/details/utzoo-wiseman-usenet-archive [11:11] I have been working on a scraper for google groups [11:11] first I am trying to get as much of usenet from other places. [11:12] it is going to take a huge amount of time to get all of usenet back [11:15] has anyone reached out to someone in the google groups division and asked for it? [11:21] or has anyone asked on usenet? [11:26] i'm close to half way on dl.tv [11:45] hi [11:46] this is [11:47] hi [11:47] i have an issue with suse 11.3 [11:48] Who doesn't, haha [11:48] I'd personally would like to launch SUSE into space, or a incinerator [11:48] Ok, enough ranting. What's the problem? [11:49] i want to enable xscreensaver run when user is not logged in [11:51] lol [11:51] i guess he wanted to demonstrate the state of the user not being logged in [11:51] lol whaat, I thought he has some archivist related problem T_ T [11:52] well, suse 11.3... :D [11:52] Yeaah.. *shrug*. [12:49] I have been asked to give a talk at a tech conference again this year. I was thinking of talking about archiving, big data and open source [12:51] I am tired of giving my other talks [13:36] DO IT [13:41] SketchCow, you ever do Ohio LinuxFest? I know you have been in the area with notacon before [14:00] Nah. [14:00] And I don't go to Notacon anymore. [14:00] But the OLF people, oh they do love the zesty life [14:03] All the Fan Fiction is now stores safely inside archive.org's walls. [14:03] I think OLF is getting worse not better. People skip out on the speakers dinner, there seem to be more equipment problems not less [14:04] they never have enough organizing stuff because the internal politics are totally not worth it [14:05] I am all for open source conferences [17:14] SmileyG: Yes. What's the best way though? [17:19] SmileyG: use Wget with a tracker? [17:20] (I'm still learning) [17:35] arkhive: sorry I'm lost.... [17:35] as am I [17:35] Oh RE: iWork? [17:36] I don't know how it works other than spotting the news its closing for these guys who do the clever stuff. [17:38] oh [17:38] ya iWork [18:52] http://www.independent.co.uk/life-style/gadgets-and-tech/news/web-hits-delete-on-magazines-12year-archive-7920565.html [18:56] amerrykan: Geeze. [18:57] Hope it's on the wayback machine... [18:58] how do you go 12 years without ever once switching hosts or backends [18:58] "we're still waiting for the first cloud disaster" I doubt that very much [18:58] isn't the tsunami of disappearing society disaster enough? [19:08] so the various aws outages don't count? (particularly ones like the one last year (iirc) where there were cascading network failures leading to corruption and data loss) [19:08] i guess that didn't affect anything anyone really cares about [19:09] they're waiting for gmail to tank [19:10] or facebook, twitter, etc... [19:11] but both of those companies at least are known for their backups [19:11] the google infrastructure has a minimum of a triple redundancy for all live data also hot [19:11] iirc, netflix was caught with their pants down, and that lead to development of the chaos monkey [19:12] chaos monkey rules [19:12] how long was psn down? [19:12] three weeks? [19:12] I use that idea when I test websites. I built up this big test suite to just fuck things up [19:12] i guess that's not a disaster because bideo games [19:13] that wasn't backups [19:13] that was "wtf we didn't even know they got in!" [19:13] Also, this is verring off topic. [19:13] indeed [19:14] just out of curiosity why weren't the ff.net files compressed before upload? It is going to take forever to download [19:22] archive.org can't browse solid archives, I guess you could do zip but that wouldn't have the insane compression ratio [19:22] and would take a fuckass long time to convert [19:22] omf_: because the contents are already compressed [19:22] oh right [19:22] gzipping gzipped archives don't get you very much [19:23] and ungzipping + repacking might get you more [19:23] but is not worth the time [19:23] I downloaded one of the tars in about nine hours [19:23] just the other day i saw something about compressing log files twice [19:23] it isn't fast, but it also isn't forever [19:23] Coderjoe: I think "disaster" in their mind means data loss [19:23] one file might not have a lot of redundancy. but daily logs do [19:25] omf_: also, if you're planning on loading the files up for viewing, be aware that the cooked WARCs are what you want for that [19:25] the non-cooked WARCs contain gzipped CSS despite wget not asking for it [19:25] so the request is a bit fucked up [19:27] I will make a note of that [19:29] chronomex: I thought I said "data loss" in the description of last years massive aws network failure. [19:32] indeed, sometimes gzipping a gzipped file does wind up with noticable gains. IIRC, nzb files are such a situation. I also have a couple of multi-gig apache error log files that compressed down to a few hundred meg on the first level of gzip which I suspect would compress even more on a second pass [19:34] Were they originally compressed with -9? [19:34] yes [19:35] 433285184 log.gz [19:35] 221859643 log.gz.gz [19:35] woopwoop [19:35] Oh man [19:35] this is about archives [19:35] Sirens [19:35] not off-topic [19:36] (should move to -bs) [19:36] ok, fine [19:36] And I'm op, somehow [19:36] even with -9, a highly repetitive file will wind up with higly repetitive bit patterns in the compressed output [19:37] especially if the repetitions are the same number of bytes from each other over and over [19:43] sure, the high-count backreference will wind up with a short huffman code, but the extra bits are not encoded in any way, and the huffman code can only get so short. [19:43] and for the third pass of gz [19:43] 221728979 log.gz.gz.gz [19:44] diminishing returns at that point [21:13] Coderjoe: that's why you use lzma as a first pass sometimes [21:13] .lz.gz [21:24] or use bz2 instead of gz [21:26] different tools for different uses [21:37] bz2 has different strengths. I don't think highly repetitive log files are it [21:38] and I needed to get the log file compressed and cleared up as soon as I could (piping the output over ssh to a different system) [21:39] I was getting someone else's server back up and running after that log file filled the disk [21:39] (while they were on vacation) [21:39] and yes, I did tell them about it [21:40] bz2 is best for data with similar but not repeating patterns, like english text or source code [21:43] is the fortunecity archived stuff actually uploaded anywhere? [21:44] i'm trying very hard to get 'decfnt.zip' and 'vt_fonts.zip' which are linked to from http://npj.netangels.ru/shattered/inventory/fonts [21:44] but no luck [21:45] google shows that vt_fonts.zip was once on a fortunecity site at http://members.fortunecity.com/vsons/sib/russify/vt-terminals/index.htm [21:45] i've written encoders or decoders for each format, so I do realize what strengths each has [21:46] (well, my bz2 decoder isn't 100% complete yet, but that's beside the point) [21:46] er [21:46] ENcode [21:46] r [22:00] Lord_Nigh: http://ia601202.us.archive.org/3/items/test-memac-index-test/fortunecity.html [22:01] thanks! [22:03] hmm both vsons files are 0 bytes [22:53] I think the first version of the second generation ArchiveTeam Warrior is more or less ready. [22:57] If anyone wants to try it out: http://archive.org/download/archiveteam-warrior/archiveteam-warrior-v2-20120707.ova [22:57] (There's only an example project at the moment.)