#archiveteam 2012-07-06,Fri

↑back Search

Time Nickname Message
00:07 🔗 arkhive Links are pretty much dead when searching 'wakoopa profile'. Google has some cached versions though.
00:18 🔗 arkhive Yahoo is shutting down Koprol August 28th 2012. Koprol is a location-based social networking site.
00:18 🔗 arkhive "...over the coming quarters, we are shutting down or transitioning a number of products..."
00:19 🔗 arkhive Koprol isn't the only product Yahoo! is shutting down.
00:19 🔗 chronomex soon yahoo's only business will be selling dialup internet and renting rotary-dial phones
00:20 🔗 arkhive You can only stand so tall for so long
00:20 🔗 arkhive (I guess)
00:21 🔗 arkhive Oh and here is the announcement http://www.koprolblog.com/2012/06/bye/
00:32 🔗 arkhive Oh and last December there was an article on Reuters about how Yahoo is shutting down 4 of their entertainment blogs: The Set, The Famous, The Amplifier, The Projector. http://www.reuters.com/article/2011/12/02/idUS162033995420111202
00:34 🔗 arkhive But for some reason they are back up.
01:08 🔗 antonrojo a site that I think was archived is now basically dead: http://www.artknowledgenews.com/
01:08 🔗 antonrojo note that's a single image driving the whole website. not sure if the remaining content is still there
07:06 🔗 SmileyG iWork is going byebyes, I've never used it, is there anything we can grab?
08:33 🔗 X-Scale Any idea how large the whole Usenet archive stored at Google Groups (excluding the binaries groups) is ?
08:51 🔗 Coderjoe does google's mangled usenet archive even cover binaries?
08:52 🔗 Coderjoe someone give me a 4TB drive and a time machine. (I can use my own laptop to interface with the drive). I can then go infiltrate deja before google screwed everything up.
08:54 🔗 DFJustin it doesn't
08:54 🔗 godane i wish it did
08:54 🔗 godane i maybe able to find old techtv shows on there if it did
08:55 🔗 godane http://www.zeropaid.com/bbs/threads/14705-Tech-TV-Music-Wars-Special-on-eDonkey-Network
08:55 🔗 godane it was pirated
08:56 🔗 godane but the razorback server was taking down back in 2006: http://www.slyck.com/news.php?story=1102
09:24 🔗 X-Scale It's just that Usenet archives are an impressive treasure trove of information imprisoned inside google servers. It's a very unconfortable situation.
09:25 🔗 Schbirid wasnt google groups one of AT'S first projects?
09:27 🔗 DFJustin godane: the commercial usenet providers have very long binaries retention
09:27 🔗 Coderjoe first projects? no
09:27 🔗 Coderjoe but there was some focus, but you're confusing google groups with google groups.
09:28 🔗 Coderjoe (usenet vs mailing lists)
09:28 🔗 Schbirid ah, i was thinking of google groups
09:30 🔗 X-Scale The Usenet part of Google Groups
09:31 🔗 Coderjoe i know you meant that X-Scale, but Schbirid was thinking of AT's google groups (mailing list) files effort
09:31 🔗 Schbirid i was not aware it was "just" mailing lists :(
09:31 🔗 Schbirid that sucks
09:32 🔗 Coderjoe and I really want to hurt the people that made the decision to name their ML project the same as an existing project
09:33 🔗 Schbirid at least they did not name a programming language go
09:33 🔗 Coderjoe has anyone made a language named "stop" yet?
09:40 🔗 DFJustin we only grabbed the files and pages sections of mailing lists (which is the only stuff that was being taken down)
09:41 🔗 DFJustin no list messages or usenet posts
09:49 🔗 X-Scale I see. But is there any estimate how large the whole Usenet archive (since 1980) is ?
09:51 🔗 X-Scale (excluding all kinds of binaries, of course)
11:07 🔗 omf_ well X-Scale I got a partial answer
11:11 🔗 omf_ here is 1981- june 1991 http://archive.org/details/utzoo-wiseman-usenet-archive
11:11 🔗 omf_ I have been working on a scraper for google groups
11:11 🔗 omf_ first I am trying to get as much of usenet from other places.
11:12 🔗 omf_ it is going to take a huge amount of time to get all of usenet back
11:15 🔗 omf_ has anyone reached out to someone in the google groups division and asked for it?
11:21 🔗 C-Keen or has anyone asked on usenet?
11:26 🔗 godane i'm close to half way on dl.tv
11:45 🔗 hatman hi
11:46 🔗 hatman this is
11:47 🔗 Schbirid hi
11:47 🔗 hatman i have an issue with suse 11.3
11:48 🔗 ersi Who doesn't, haha
11:48 🔗 ersi I'd personally would like to launch SUSE into space, or a incinerator
11:48 🔗 ersi Ok, enough ranting. What's the problem?
11:49 🔗 hatman i want to enable xscreensaver run when user is not logged in
11:51 🔗 Schbirid lol
11:51 🔗 Schbirid i guess he wanted to demonstrate the state of the user not being logged in
11:51 🔗 ersi lol whaat, I thought he has some archivist related problem T_ T
11:52 🔗 Schbirid well, suse 11.3... :D
11:52 🔗 ersi Yeaah.. *shrug*.
12:49 🔗 omf_ I have been asked to give a talk at a tech conference again this year. I was thinking of talking about archiving, big data and open source
12:51 🔗 omf_ I am tired of giving my other talks
13:36 🔗 SketchCow DO IT
13:41 🔗 omf_ SketchCow, you ever do Ohio LinuxFest? I know you have been in the area with notacon before
14:00 🔗 SketchCow Nah.
14:00 🔗 SketchCow And I don't go to Notacon anymore.
14:00 🔗 SketchCow But the OLF people, oh they do love the zesty life
14:03 🔗 SketchCow All the Fan Fiction is now stores safely inside archive.org's walls.
14:03 🔗 omf_ I think OLF is getting worse not better. People skip out on the speakers dinner, there seem to be more equipment problems not less
14:04 🔗 omf_ they never have enough organizing stuff because the internal politics are totally not worth it
14:05 🔗 omf_ I am all for open source conferences
17:14 🔗 arkhive SmileyG: Yes. What's the best way though?
17:19 🔗 arkhive SmileyG: use Wget with a tracker?
17:20 🔗 arkhive (I'm still learning)
17:35 🔗 SmileyG arkhive: sorry I'm lost....
17:35 🔗 Coderjoe as am I
17:35 🔗 SmileyG Oh RE: iWork?
17:36 🔗 SmileyG I don't know how it works other than spotting the news its closing for these guys who do the clever stuff.
17:38 🔗 arkhive oh
17:38 🔗 arkhive ya iWork
18:52 🔗 amerrykan http://www.independent.co.uk/life-style/gadgets-and-tech/news/web-hits-delete-on-magazines-12year-archive-7920565.html
18:56 🔗 mistym amerrykan: Geeze.
18:57 🔗 mistym Hope it's on the wayback machine...
18:58 🔗 amerrykan how do you go 12 years without ever once switching hosts or backends
18:58 🔗 chronomex "we're still waiting for the first cloud disaster" I doubt that very much
18:58 🔗 chronomex isn't the tsunami of disappearing society disaster enough?
19:08 🔗 Coderjoe so the various aws outages don't count? (particularly ones like the one last year (iirc) where there were cascading network failures leading to corruption and data loss)
19:08 🔗 amerrykan i guess that didn't affect anything anyone really cares about
19:09 🔗 amerrykan they're waiting for gmail to tank
19:10 🔗 omf_ or facebook, twitter, etc...
19:11 🔗 omf_ but both of those companies at least are known for their backups
19:11 🔗 omf_ the google infrastructure has a minimum of a triple redundancy for all live data also hot
19:11 🔗 Coderjoe iirc, netflix was caught with their pants down, and that lead to development of the chaos monkey
19:12 🔗 omf_ chaos monkey rules
19:12 🔗 amerrykan how long was psn down?
19:12 🔗 amerrykan three weeks?
19:12 🔗 omf_ I use that idea when I test websites. I built up this big test suite to just fuck things up
19:12 🔗 amerrykan i guess that's not a disaster because bideo games
19:13 🔗 SmileyG that wasn't backups
19:13 🔗 SmileyG that was "wtf we didn't even know they got in!"
19:13 🔗 SmileyG Also, this is verring off topic.
19:13 🔗 omf_ indeed
19:14 🔗 omf_ just out of curiosity why weren't the ff.net files compressed before upload? It is going to take forever to download
19:22 🔗 DFJustin archive.org can't browse solid archives, I guess you could do zip but that wouldn't have the insane compression ratio
19:22 🔗 DFJustin and would take a fuckass long time to convert
19:22 🔗 yipdw omf_: because the contents are already compressed
19:22 🔗 DFJustin oh right
19:22 🔗 yipdw gzipping gzipped archives don't get you very much
19:23 🔗 yipdw and ungzipping + repacking might get you more
19:23 🔗 yipdw but is not worth the time
19:23 🔗 yipdw I downloaded one of the tars in about nine hours
19:23 🔗 Schbirid just the other day i saw something about compressing log files twice
19:23 🔗 yipdw it isn't fast, but it also isn't forever
19:23 🔗 chronomex Coderjoe: I think "disaster" in their mind means data loss
19:23 🔗 Schbirid one file might not have a lot of redundancy. but daily logs do
19:25 🔗 yipdw omf_: also, if you're planning on loading the files up for viewing, be aware that the cooked WARCs are what you want for that
19:25 🔗 yipdw the non-cooked WARCs contain gzipped CSS despite wget not asking for it
19:25 🔗 yipdw so the request is a bit fucked up
19:27 🔗 omf_ I will make a note of that
19:29 🔗 Coderjoe chronomex: I thought I said "data loss" in the description of last years massive aws network failure.
19:32 🔗 Coderjoe indeed, sometimes gzipping a gzipped file does wind up with noticable gains. IIRC, nzb files are such a situation. I also have a couple of multi-gig apache error log files that compressed down to a few hundred meg on the first level of gzip which I suspect would compress even more on a second pass
19:34 🔗 nitro2k01 Were they originally compressed with -9?
19:34 🔗 Coderjoe yes
19:35 🔗 Coderjoe 433285184 log.gz
19:35 🔗 Coderjoe 221859643 log.gz.gz
19:35 🔗 Coderjoe woopwoop
19:35 🔗 nitro2k01 Oh man
19:35 🔗 yipdw this is about archives
19:35 🔗 nitro2k01 Sirens
19:35 🔗 yipdw not off-topic
19:36 🔗 Coderjoe (should move to -bs)
19:36 🔗 Coderjoe ok, fine
19:36 🔗 nitro2k01 And I'm op, somehow
19:36 🔗 Coderjoe even with -9, a highly repetitive file will wind up with higly repetitive bit patterns in the compressed output
19:37 🔗 Coderjoe especially if the repetitions are the same number of bytes from each other over and over
19:43 🔗 Coderjoe sure, the high-count backreference will wind up with a short huffman code, but the extra bits are not encoded in any way, and the huffman code can only get so short.
19:43 🔗 Coderjoe and for the third pass of gz
19:43 🔗 Coderjoe 221728979 log.gz.gz.gz
19:44 🔗 Coderjoe diminishing returns at that point
21:13 🔗 chronomex Coderjoe: that's why you use lzma as a first pass sometimes
21:13 🔗 chronomex .lz.gz
21:24 🔗 omf_ or use bz2 instead of gz
21:26 🔗 chronomex different tools for different uses
21:37 🔗 Coderjoe bz2 has different strengths. I don't think highly repetitive log files are it
21:38 🔗 Coderjoe and I needed to get the log file compressed and cleared up as soon as I could (piping the output over ssh to a different system)
21:39 🔗 Coderjoe I was getting someone else's server back up and running after that log file filled the disk
21:39 🔗 Coderjoe (while they were on vacation)
21:39 🔗 Coderjoe and yes, I did tell them about it
21:40 🔗 chronomex bz2 is best for data with similar but not repeating patterns, like english text or source code
21:43 🔗 Lord_Nigh is the fortunecity archived stuff actually uploaded anywhere?
21:44 🔗 Lord_Nigh i'm trying very hard to get 'decfnt.zip' and 'vt_fonts.zip' which are linked to from http://npj.netangels.ru/shattered/inventory/fonts
21:44 🔗 Lord_Nigh but no luck
21:45 🔗 Lord_Nigh google shows that vt_fonts.zip was once on a fortunecity site at http://members.fortunecity.com/vsons/sib/russify/vt-terminals/index.htm
21:45 🔗 Coderjoe i've written encoders or decoders for each format, so I do realize what strengths each has
21:46 🔗 Coderjoe (well, my bz2 decoder isn't 100% complete yet, but that's beside the point)
21:46 🔗 Coderjoe er
21:46 🔗 Coderjoe ENcode
21:46 🔗 Coderjoe r
22:00 🔗 alard Lord_Nigh: http://ia601202.us.archive.org/3/items/test-memac-index-test/fortunecity.html
22:01 🔗 Lord_Nigh thanks!
22:03 🔗 Lord_Nigh hmm both vsons files are 0 bytes
22:53 🔗 alard I think the first version of the second generation ArchiveTeam Warrior is more or less ready.
22:57 🔗 alard If anyone wants to try it out: http://archive.org/download/archiveteam-warrior/archiveteam-warrior-v2-20120707.ova
22:57 🔗 alard (There's only an example project at the moment.)

irclogger-viewer