Time |
Nickname |
Message |
00:07
🔗
|
arkhive |
Links are pretty much dead when searching 'wakoopa profile'. Google has some cached versions though. |
00:18
🔗
|
arkhive |
Yahoo is shutting down Koprol August 28th 2012. Koprol is a location-based social networking site. |
00:18
🔗
|
arkhive |
"...over the coming quarters, we are shutting down or transitioning a number of products..." |
00:19
🔗
|
arkhive |
Koprol isn't the only product Yahoo! is shutting down. |
00:19
🔗
|
chronomex |
soon yahoo's only business will be selling dialup internet and renting rotary-dial phones |
00:20
🔗
|
arkhive |
You can only stand so tall for so long |
00:20
🔗
|
arkhive |
(I guess) |
00:21
🔗
|
arkhive |
Oh and here is the announcement http://www.koprolblog.com/2012/06/bye/ |
00:32
🔗
|
arkhive |
Oh and last December there was an article on Reuters about how Yahoo is shutting down 4 of their entertainment blogs: The Set, The Famous, The Amplifier, The Projector. http://www.reuters.com/article/2011/12/02/idUS162033995420111202 |
00:34
🔗
|
arkhive |
But for some reason they are back up. |
01:08
🔗
|
antonrojo |
a site that I think was archived is now basically dead: http://www.artknowledgenews.com/ |
01:08
🔗
|
antonrojo |
note that's a single image driving the whole website. not sure if the remaining content is still there |
07:06
🔗
|
SmileyG |
iWork is going byebyes, I've never used it, is there anything we can grab? |
08:33
🔗
|
X-Scale |
Any idea how large the whole Usenet archive stored at Google Groups (excluding the binaries groups) is ? |
08:51
🔗
|
Coderjoe |
does google's mangled usenet archive even cover binaries? |
08:52
🔗
|
Coderjoe |
someone give me a 4TB drive and a time machine. (I can use my own laptop to interface with the drive). I can then go infiltrate deja before google screwed everything up. |
08:54
🔗
|
DFJustin |
it doesn't |
08:54
🔗
|
godane |
i wish it did |
08:54
🔗
|
godane |
i maybe able to find old techtv shows on there if it did |
08:55
🔗
|
godane |
http://www.zeropaid.com/bbs/threads/14705-Tech-TV-Music-Wars-Special-on-eDonkey-Network |
08:55
🔗
|
godane |
it was pirated |
08:56
🔗
|
godane |
but the razorback server was taking down back in 2006: http://www.slyck.com/news.php?story=1102 |
09:24
🔗
|
X-Scale |
It's just that Usenet archives are an impressive treasure trove of information imprisoned inside google servers. It's a very unconfortable situation. |
09:25
🔗
|
Schbirid |
wasnt google groups one of AT'S first projects? |
09:27
🔗
|
DFJustin |
godane: the commercial usenet providers have very long binaries retention |
09:27
🔗
|
Coderjoe |
first projects? no |
09:27
🔗
|
Coderjoe |
but there was some focus, but you're confusing google groups with google groups. |
09:28
🔗
|
Coderjoe |
(usenet vs mailing lists) |
09:28
🔗
|
Schbirid |
ah, i was thinking of google groups |
09:30
🔗
|
X-Scale |
The Usenet part of Google Groups |
09:31
🔗
|
Coderjoe |
i know you meant that X-Scale, but Schbirid was thinking of AT's google groups (mailing list) files effort |
09:31
🔗
|
Schbirid |
i was not aware it was "just" mailing lists :( |
09:31
🔗
|
Schbirid |
that sucks |
09:32
🔗
|
Coderjoe |
and I really want to hurt the people that made the decision to name their ML project the same as an existing project |
09:33
🔗
|
Schbirid |
at least they did not name a programming language go |
09:33
🔗
|
Coderjoe |
has anyone made a language named "stop" yet? |
09:40
🔗
|
DFJustin |
we only grabbed the files and pages sections of mailing lists (which is the only stuff that was being taken down) |
09:41
🔗
|
DFJustin |
no list messages or usenet posts |
09:49
🔗
|
X-Scale |
I see. But is there any estimate how large the whole Usenet archive (since 1980) is ? |
09:51
🔗
|
X-Scale |
(excluding all kinds of binaries, of course) |
11:07
🔗
|
omf_ |
well X-Scale I got a partial answer |
11:11
🔗
|
omf_ |
here is 1981- june 1991 http://archive.org/details/utzoo-wiseman-usenet-archive |
11:11
🔗
|
omf_ |
I have been working on a scraper for google groups |
11:11
🔗
|
omf_ |
first I am trying to get as much of usenet from other places. |
11:12
🔗
|
omf_ |
it is going to take a huge amount of time to get all of usenet back |
11:15
🔗
|
omf_ |
has anyone reached out to someone in the google groups division and asked for it? |
11:21
🔗
|
C-Keen |
or has anyone asked on usenet? |
11:26
🔗
|
godane |
i'm close to half way on dl.tv |
11:45
🔗
|
hatman |
hi |
11:46
🔗
|
hatman |
this is |
11:47
🔗
|
Schbirid |
hi |
11:47
🔗
|
hatman |
i have an issue with suse 11.3 |
11:48
🔗
|
ersi |
Who doesn't, haha |
11:48
🔗
|
ersi |
I'd personally would like to launch SUSE into space, or a incinerator |
11:48
🔗
|
ersi |
Ok, enough ranting. What's the problem? |
11:49
🔗
|
hatman |
i want to enable xscreensaver run when user is not logged in |
11:51
🔗
|
Schbirid |
lol |
11:51
🔗
|
Schbirid |
i guess he wanted to demonstrate the state of the user not being logged in |
11:51
🔗
|
ersi |
lol whaat, I thought he has some archivist related problem T_ T |
11:52
🔗
|
Schbirid |
well, suse 11.3... :D |
11:52
🔗
|
ersi |
Yeaah.. *shrug*. |
12:49
🔗
|
omf_ |
I have been asked to give a talk at a tech conference again this year. I was thinking of talking about archiving, big data and open source |
12:51
🔗
|
omf_ |
I am tired of giving my other talks |
13:36
🔗
|
SketchCow |
DO IT |
13:41
🔗
|
omf_ |
SketchCow, you ever do Ohio LinuxFest? I know you have been in the area with notacon before |
14:00
🔗
|
SketchCow |
Nah. |
14:00
🔗
|
SketchCow |
And I don't go to Notacon anymore. |
14:00
🔗
|
SketchCow |
But the OLF people, oh they do love the zesty life |
14:03
🔗
|
SketchCow |
All the Fan Fiction is now stores safely inside archive.org's walls. |
14:03
🔗
|
omf_ |
I think OLF is getting worse not better. People skip out on the speakers dinner, there seem to be more equipment problems not less |
14:04
🔗
|
omf_ |
they never have enough organizing stuff because the internal politics are totally not worth it |
14:05
🔗
|
omf_ |
I am all for open source conferences |
17:14
🔗
|
arkhive |
SmileyG: Yes. What's the best way though? |
17:19
🔗
|
arkhive |
SmileyG: use Wget with a tracker? |
17:20
🔗
|
arkhive |
(I'm still learning) |
17:35
🔗
|
SmileyG |
arkhive: sorry I'm lost.... |
17:35
🔗
|
Coderjoe |
as am I |
17:35
🔗
|
SmileyG |
Oh RE: iWork? |
17:36
🔗
|
SmileyG |
I don't know how it works other than spotting the news its closing for these guys who do the clever stuff. |
17:38
🔗
|
arkhive |
oh |
17:38
🔗
|
arkhive |
ya iWork |
18:52
🔗
|
amerrykan |
http://www.independent.co.uk/life-style/gadgets-and-tech/news/web-hits-delete-on-magazines-12year-archive-7920565.html |
18:56
🔗
|
mistym |
amerrykan: Geeze. |
18:57
🔗
|
mistym |
Hope it's on the wayback machine... |
18:58
🔗
|
amerrykan |
how do you go 12 years without ever once switching hosts or backends |
18:58
🔗
|
chronomex |
"we're still waiting for the first cloud disaster" I doubt that very much |
18:58
🔗
|
chronomex |
isn't the tsunami of disappearing society disaster enough? |
19:08
🔗
|
Coderjoe |
so the various aws outages don't count? (particularly ones like the one last year (iirc) where there were cascading network failures leading to corruption and data loss) |
19:08
🔗
|
amerrykan |
i guess that didn't affect anything anyone really cares about |
19:09
🔗
|
amerrykan |
they're waiting for gmail to tank |
19:10
🔗
|
omf_ |
or facebook, twitter, etc... |
19:11
🔗
|
omf_ |
but both of those companies at least are known for their backups |
19:11
🔗
|
omf_ |
the google infrastructure has a minimum of a triple redundancy for all live data also hot |
19:11
🔗
|
Coderjoe |
iirc, netflix was caught with their pants down, and that lead to development of the chaos monkey |
19:12
🔗
|
omf_ |
chaos monkey rules |
19:12
🔗
|
amerrykan |
how long was psn down? |
19:12
🔗
|
amerrykan |
three weeks? |
19:12
🔗
|
omf_ |
I use that idea when I test websites. I built up this big test suite to just fuck things up |
19:12
🔗
|
amerrykan |
i guess that's not a disaster because bideo games |
19:13
🔗
|
SmileyG |
that wasn't backups |
19:13
🔗
|
SmileyG |
that was "wtf we didn't even know they got in!" |
19:13
🔗
|
SmileyG |
Also, this is verring off topic. |
19:13
🔗
|
omf_ |
indeed |
19:14
🔗
|
omf_ |
just out of curiosity why weren't the ff.net files compressed before upload? It is going to take forever to download |
19:22
🔗
|
DFJustin |
archive.org can't browse solid archives, I guess you could do zip but that wouldn't have the insane compression ratio |
19:22
🔗
|
DFJustin |
and would take a fuckass long time to convert |
19:22
🔗
|
yipdw |
omf_: because the contents are already compressed |
19:22
🔗
|
DFJustin |
oh right |
19:22
🔗
|
yipdw |
gzipping gzipped archives don't get you very much |
19:23
🔗
|
yipdw |
and ungzipping + repacking might get you more |
19:23
🔗
|
yipdw |
but is not worth the time |
19:23
🔗
|
yipdw |
I downloaded one of the tars in about nine hours |
19:23
🔗
|
Schbirid |
just the other day i saw something about compressing log files twice |
19:23
🔗
|
yipdw |
it isn't fast, but it also isn't forever |
19:23
🔗
|
chronomex |
Coderjoe: I think "disaster" in their mind means data loss |
19:23
🔗
|
Schbirid |
one file might not have a lot of redundancy. but daily logs do |
19:25
🔗
|
yipdw |
omf_: also, if you're planning on loading the files up for viewing, be aware that the cooked WARCs are what you want for that |
19:25
🔗
|
yipdw |
the non-cooked WARCs contain gzipped CSS despite wget not asking for it |
19:25
🔗
|
yipdw |
so the request is a bit fucked up |
19:27
🔗
|
omf_ |
I will make a note of that |
19:29
🔗
|
Coderjoe |
chronomex: I thought I said "data loss" in the description of last years massive aws network failure. |
19:32
🔗
|
Coderjoe |
indeed, sometimes gzipping a gzipped file does wind up with noticable gains. IIRC, nzb files are such a situation. I also have a couple of multi-gig apache error log files that compressed down to a few hundred meg on the first level of gzip which I suspect would compress even more on a second pass |
19:34
🔗
|
nitro2k01 |
Were they originally compressed with -9? |
19:34
🔗
|
Coderjoe |
yes |
19:35
🔗
|
Coderjoe |
433285184 log.gz |
19:35
🔗
|
Coderjoe |
221859643 log.gz.gz |
19:35
🔗
|
Coderjoe |
woopwoop |
19:35
🔗
|
nitro2k01 |
Oh man |
19:35
🔗
|
yipdw |
this is about archives |
19:35
🔗
|
nitro2k01 |
Sirens |
19:35
🔗
|
yipdw |
not off-topic |
19:36
🔗
|
Coderjoe |
(should move to -bs) |
19:36
🔗
|
Coderjoe |
ok, fine |
19:36
🔗
|
nitro2k01 |
And I'm op, somehow |
19:36
🔗
|
Coderjoe |
even with -9, a highly repetitive file will wind up with higly repetitive bit patterns in the compressed output |
19:37
🔗
|
Coderjoe |
especially if the repetitions are the same number of bytes from each other over and over |
19:43
🔗
|
Coderjoe |
sure, the high-count backreference will wind up with a short huffman code, but the extra bits are not encoded in any way, and the huffman code can only get so short. |
19:43
🔗
|
Coderjoe |
and for the third pass of gz |
19:43
🔗
|
Coderjoe |
221728979 log.gz.gz.gz |
19:44
🔗
|
Coderjoe |
diminishing returns at that point |
21:13
🔗
|
chronomex |
Coderjoe: that's why you use lzma as a first pass sometimes |
21:13
🔗
|
chronomex |
.lz.gz |
21:24
🔗
|
omf_ |
or use bz2 instead of gz |
21:26
🔗
|
chronomex |
different tools for different uses |
21:37
🔗
|
Coderjoe |
bz2 has different strengths. I don't think highly repetitive log files are it |
21:38
🔗
|
Coderjoe |
and I needed to get the log file compressed and cleared up as soon as I could (piping the output over ssh to a different system) |
21:39
🔗
|
Coderjoe |
I was getting someone else's server back up and running after that log file filled the disk |
21:39
🔗
|
Coderjoe |
(while they were on vacation) |
21:39
🔗
|
Coderjoe |
and yes, I did tell them about it |
21:40
🔗
|
chronomex |
bz2 is best for data with similar but not repeating patterns, like english text or source code |
21:43
🔗
|
Lord_Nigh |
is the fortunecity archived stuff actually uploaded anywhere? |
21:44
🔗
|
Lord_Nigh |
i'm trying very hard to get 'decfnt.zip' and 'vt_fonts.zip' which are linked to from http://npj.netangels.ru/shattered/inventory/fonts |
21:44
🔗
|
Lord_Nigh |
but no luck |
21:45
🔗
|
Lord_Nigh |
google shows that vt_fonts.zip was once on a fortunecity site at http://members.fortunecity.com/vsons/sib/russify/vt-terminals/index.htm |
21:45
🔗
|
Coderjoe |
i've written encoders or decoders for each format, so I do realize what strengths each has |
21:46
🔗
|
Coderjoe |
(well, my bz2 decoder isn't 100% complete yet, but that's beside the point) |
21:46
🔗
|
Coderjoe |
er |
21:46
🔗
|
Coderjoe |
ENcode |
21:46
🔗
|
Coderjoe |
r |
22:00
🔗
|
alard |
Lord_Nigh: http://ia601202.us.archive.org/3/items/test-memac-index-test/fortunecity.html |
22:01
🔗
|
Lord_Nigh |
thanks! |
22:03
🔗
|
Lord_Nigh |
hmm both vsons files are 0 bytes |
22:53
🔗
|
alard |
I think the first version of the second generation ArchiveTeam Warrior is more or less ready. |
22:57
🔗
|
alard |
If anyone wants to try it out: http://archive.org/download/archiveteam-warrior/archiveteam-warrior-v2-20120707.ova |
22:57
🔗
|
alard |
(There's only an example project at the moment.) |