Time |
Nickname |
Message |
02:10
π
|
kim__ |
Not on freenode? |
02:10
π
|
* |
kim__ scratches head |
02:11
π
|
kim__ |
I spotted folks wanting to archive knol, call for help on wikimedia-l |
02:11
π
|
kim__ |
Anyone in atm, or should I come back during .eu or .us daytime? |
02:11
π
|
dnova |
there's always people around |
02:14
π
|
kim__ |
hmm, knol seems to already be down |
02:14
π
|
kim__ |
what can still be done to help? |
02:15
π
|
kim__ |
The call for help was posted on April 30 |
02:16
π
|
kim__ |
Anyone I can talk with or so? |
02:16
π
|
dnova |
if I understand correctly, only owners of the content can download the content |
02:16
π
|
dnova |
until oct 1 |
02:17
π
|
kim__ |
hmm, http://web.archive.org/web/20110722190349/http://knol.google.com/k |
02:20
π
|
kim__ |
apparantly google let IA crawl knol aok. What's the value-add of #archiveteam. Are you with IA, or are you separate? |
02:20
π
|
kim__ |
And even if I can't help with this, I can always run a torrent box or etc if that's any use? |
02:20
π
|
kim__ |
(or run wget or do ftp operations, and I don't mind poking relevant site operators first ;-) |
02:20
π
|
dnova |
we are separate from IA |
02:21
π
|
* |
kim__ listens |
02:21
π
|
dnova |
www.archiveteam.org |
02:21
π
|
kim__ |
I'm already reading there, hence the question :-) |
02:21
π
|
dnova |
a lot of or most of what we grab ends up at IA |
02:22
π
|
kim__ |
fair enough |
02:22
π
|
kim__ |
also, define "a lot" of online storage? |
02:23
π
|
dnova |
in what context |
02:23
π
|
kim__ |
in any context that you might find useful |
02:23
π
|
* |
kim__ isn't sure. just reading http://archiveteam.org/index.php?title=Who_We_Are |
02:24
π
|
kim__ |
"People with Lots of Hosted Disk Space" |
02:24
π
|
dnova |
oh |
02:24
π
|
dnova |
not sure how great a need there is for that right this moment |
02:24
π
|
kim__ |
if you have a NAS sitting out someplace at a hosting provider, that'll work fine |
02:25
π
|
dnova |
if you'd like to mirror mobile.me then you'll need around 250-300 terabytes usable space. |
02:25
π
|
dnova |
(for ridiculous example) |
02:27
π
|
kim__ |
That's about Eur 14400 at current HDDprices |
02:27
π
|
* |
kim__ scratches head. I'm not quite that rich ;-) |
02:28
π
|
kim__ |
by a bit of a margin :-P |
02:28
π
|
dnova |
yeah. |
02:28
π
|
dnova |
Generally what happens is a few dozen of us each grab however many gb/tb we can and eventually upload it to IA for digestion |
02:29
π
|
kim__ |
Okay, but IA does their own crawls too. What's the advantage to doing it this way?) (is there a webpage about that, so I can TFM and stop asking silly questions? ;) |
02:30
π
|
dnova |
we we spring into action we're making a comprehensive archive of an entire site |
02:30
π
|
kim__ |
http://archiveteam.org/index.php?title=Frequently_Asked_Questions <- short |
02:30
π
|
kim__ |
How long does AT exist? |
02:30
π
|
kim__ |
There's 99 lurkers on IRC right now, so I figure it's been a while :-) |
02:31
π
|
dnova |
3ish years |
02:31
π
|
kim__ |
IA has 2 locations, 1 mirror at the new library of alexandria |
02:31
π
|
* |
kim__ says, checking http://archiveteam.org/index.php?title=Fire_Drill |
02:32
π
|
kim__ |
oh wait, listed. Also stating it's b0rked. That Can't Be Good (tm) |
02:33
π
|
kim__ |
Ok, one thing I could propose a joint project with AT on is the recovery of WP dumps. |
02:34
π
|
kim__ |
preferably including the photos. (they should be available via commons.wikimedia.org) |
02:34
π
|
dnova |
hmm |
02:34
π
|
dnova |
I'm not sure if wikipedia falls under wikiteam's purvey (there is a sub-group here who archives any wikimedia wiki they come across) |
02:35
π
|
kim__ |
interesting. I wonder where those archives go |
02:35
π
|
* |
kim__ <- wikimedia-ish vonlunteer |
02:35
π
|
kim__ |
volunteer too. |
02:36
π
|
dnova |
http://archiveteam.org/index.php?title=Wikiteam |
02:36
π
|
kim__ |
I could have found that one ^^;; |
02:36
π
|
kim__ |
But there is no image dump available, only the image descriptions |
02:36
π
|
kim__ |
Okay |
02:36
π
|
dnova |
yeah that's too bad |
02:36
π
|
kim__ |
Is anyone from wikiteam online? |
02:37
π
|
dnova |
the wikimedia foundation is a bit... ehh, not the best with these things. |
02:37
π
|
kim__ |
geh, please don't tell me that |
02:37
π
|
kim__ |
fortunately, there's a bit of a solution |
02:37
π
|
kim__ |
I'd ask on freenode #wikimedia-tech , possibly |
02:38
π
|
dnova |
I've never seen it this quiet in here when there is already some talking going on |
02:38
π
|
kim__ |
WP is not the oldest wiki in the world btw. It's a young whipper-snapper |
02:38
π
|
dnova |
people with their lives and what-not |
02:38
π
|
kim__ |
it's 4:30 AM local |
02:38
π
|
dnova |
it's evening in the US |
02:38
π
|
kim__ |
on a "sunday" |
02:38
π
|
dnova |
I'm pretty sure it's thursday where you are |
02:39
π
|
dnova |
where are you by the way? I'm moving to your time zone soon. |
02:39
π
|
kim__ |
Ascension day today |
02:39
π
|
kim__ |
and lots of people are taking a long weekend besides |
02:40
π
|
kim__ |
(me three!) |
02:40
π
|
kim__ |
I'm in .nl |
02:40
π
|
dnova |
ah |
02:40
π
|
* |
dnova moving to .at |
02:40
π
|
kim__ |
That' |
02:40
π
|
kim__ |
s a VERY pretty country |
02:41
π
|
kim__ |
where are you moving from? :-) |
02:41
π
|
DFJustin |
jason scott is archiving wikimedia images as we speak http://archive.org/details/2012-04-30-wikimedia-images-snapshot |
02:42
π
|
kim__ |
we have just got to be able to make that easier |
02:43
π
|
dnova |
the us |
02:43
π
|
kim__ |
dnova, Interesting, what part of the US? Is your part very different or very similar? |
02:43
π
|
kim__ |
;-) |
02:43
π
|
dnova |
Buffalo, NY |
02:43
π
|
kim__ |
The buffalo where buffalo buffalo buffalo buffalo buffalo |
02:44
π
|
kim__ |
? |
02:44
π
|
dnova |
s/where// and some of those need capitalization |
02:44
π
|
dnova |
if you'd like it to be grammatically correct :) |
02:44
π
|
kim__ |
http://en.wikipedia.org/wiki/Buffalo_buffalo |
02:44
π
|
* |
kim__ sniggers |
02:45
π
|
chronomex |
kim__: the "value-add" of archiveteam is we have multiple people focusing on a single site |
02:45
π
|
chronomex |
IA has like 5 people for the whole internet |
02:45
π
|
chronomex |
we do deeply focused crawls |
02:46
π
|
kim__ |
fair enough |
02:46
π
|
dnova |
and sometimes we piss people right the hell off. |
02:46
π
|
kim__ |
dnova, how so? |
02:46
π
|
chronomex |
but I resent the term "value-add" |
02:47
π
|
kim__ |
the famous (C) brigade? ;-) |
02:47
π
|
dnova |
it turns out that people don't want you to download stuff that they put on the internet |
02:47
π
|
* |
dnova shrugs |
02:47
π
|
chronomex |
weird |
02:47
π
|
kim__ |
chronomex, fair enough. |
02:47
π
|
chronomex |
besides |
02:47
π
|
chronomex |
overlap is better than gappiness |
02:47
π
|
dnova |
yes |
02:47
π
|
kim__ |
sounds like you folks know what you're doing. |
02:47
π
|
chronomex |
you should know that wbm is gappy, if youve ever used it |
02:48
π
|
chronomex |
e.g. it skips non-small files |
02:48
π
|
kim__ |
wbm stands for? Oh wayback machine. And if you mean gappy in time, or gappy in ... |
02:48
π
|
kim__ |
oh. That Can't Be Good. |
02:48
π
|
chronomex |
gappy in all ways |
02:48
π
|
chronomex |
time, breadth, depth |
02:48
π
|
kim__ |
well shoot |
02:48
π
|
DFJustin |
another value-add is that people write scrapers that can navigate javascripty or login-based sites like friendster or google video |
02:49
π
|
chronomex |
yes |
02:49
π
|
DFJustin |
IA is more set up for straightforward html sites |
02:49
π
|
kim__ |
okay, well, I'd like for wmf sites to be archived properly. One can get mediawiki and database dumps |
02:50
π
|
kim__ |
I'm not entirely sure how to transfer over commons though. if you want I can ask. Or do you already have contacts with wmf/volunteers/techs? |
02:51
π
|
kim__ |
(in which case, I'd just be in the way) |
02:51
π
|
dnova |
we can always use more warm bodies |
02:51
π
|
dnova |
and in some cases warm optional |
02:51
π
|
kim__ |
Grrr, Argh? |
02:53
π
|
kim__ |
hmm, emijrp (might use a different nick at AT) is apparantly an AT-er |
02:53
π
|
dnova |
he is emijrp here |
03:00
π
|
kim__ |
are they any good with helping with wikis? ;-) |
03:01
π
|
kim__ |
http://lists.wikimedia.org/pipermail/wikimedia-l/2012-May/120200.html |
03:02
π
|
kim__ |
this helpful? |
03:03
π
|
kim__ |
and if it's any use at any time, I've got a small server at hetzner.de and know how to use it |
03:03
π
|
dnova |
it is any use |
03:03
π
|
dnova |
stick around |
03:05
π
|
kim__ |
(ps, the Grr Argh: http://www.youtube.com/watch?v=NCLPHSVtvmU involves warm-optional bodies ;-) |
03:08
π
|
dnova |
never saw the show |
03:08
π
|
kim__ |
hence the no-laughing ;-P |
03:09
π
|
dnova |
sorry :P |
03:09
π
|
kim__ |
you're excused :-P |
03:09
π
|
chronomex |
STAND BACK |
03:09
π
|
chronomex |
KIM__ HAS A SERVER AND HE KNOWS HOW TO USE IT |
03:09
π
|
kim__ |
actually, no wait, you haven't seen buffy? That's inexcusable! :-P |
03:10
π
|
yipdw^ |
I use all my servers as clients |
03:10
π
|
kim__ |
chronomex, I know regex too https://www.xkcd.com/208/ |
03:10
π
|
kim__ |
yipdw, too much X11? ;-) |
03:11
π
|
chronomex |
o_o_o_o |
03:11
π
|
kim__ |
chronomex, Of course, now I have 2 problems. http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html |
03:13
π
|
* |
kim__ figures AT to be more beautiful soup fans , though http://www.crummy.com/software/BeautifulSoup/ |
03:14
π
|
yipdw^ |
that or a dozen other things |
03:15
π
|
kim__ |
*nod* |
03:15
π
|
yipdw^ |
https://github.com/ArchiveTeam has a bunch of code, so you can check that out if you'd like |
03:19
π
|
underscor |
http://a4.sphotos.ak.fbcdn.net/hphotos-ak-ash4/318277_10150947081920605_42214640604_12269339_963561368_n.jpg |
03:22
π
|
dnova |
heh |
03:24
π
|
underscor |
kim__: SketchCow's been doing dumps and transfers, and I've been working with Ariel Glenn at WMF and Kevin Day at your.org to get a more useable archive up and running on IA |
03:24
π
|
underscor |
(if you know either of them) |
03:25
π
|
* |
kim__ copy-pastes from a mail I'm about to send: |
03:25
π
|
kim__ |
==Fire Drill== |
03:25
π
|
kim__ |
Has anyone recently set up a full-external-duplicate of (for instance) en.wp? |
03:25
π
|
kim__ |
This includes all images, all discussions, all page history (excepting the user |
03:25
π
|
kim__ |
accounts and deleted pages) |
03:26
π
|
underscor |
nope |
03:26
π
|
underscor |
that would be an interesting project |
03:26
π
|
underscor |
lol |
03:26
π
|
underscor |
oh, I just saw that it was a mail |
03:26
π
|
kim__ |
Ok, then I won't make a fool of myself by sending that |
03:26
π
|
kim__ |
Consider yourself (almost) volunteered then ;-) |
03:27
π
|
kim__ |
or volunteerized |
03:27
π
|
underscor |
haha |
03:27
π
|
kim__ |
we need a word |
03:28
π
|
yipdw^ |
FUCK YOU, YOU ARE ALL IN ARCHIVE TEAM |
03:28
π
|
yipdw^ |
I DEPUTIZE ALL OF YOU |
03:29
π
|
underscor |
hahaha |
03:29
π
|
underscor |
I loved that |
03:29
π
|
underscor |
I don't think he said the second, but the first was at a DEFCON talk, wasn't it? |
03:29
π
|
kim__ |
yipdw^, I try that all the time. The en.wp regulars cottoned on long ago |
03:30
π
|
yipdw^ |
underscor: yeah |
03:30
π
|
kim__ |
http://lists.wikimedia.org/pipermail/wikimedia-l/2012-May/120203.html |
03:30
π
|
yipdw^ |
kim__: that's weird, I'd expect Wikipedia people to be interested in duplication |
03:30
π
|
kim__ |
yipdw, the "I deputize all of you" trick ;-) |
03:30
π
|
kim__ |
And WP should indeed be interested in duplication |
03:31
π
|
yipdw^ |
I need to see if CouchDB replicates attachments |
03:31
π
|
yipdw^ |
I think it does |
03:31
π
|
yipdw^ |
if it does, then maybe I can just dump Wikipedia in a couch |
03:31
π
|
dnova |
when I see "WP" I think wordpress :/ |
03:31
π
|
dnova |
or write protect |
03:31
π
|
dnova |
or word perfect |
03:31
π
|
yipdw^ |
like |
03:31
π
|
yipdw^ |
a CouchDB instance that is, uh |
03:32
π
|
yipdw^ |
how big is Wikipedia, text, images, discussions, and all? |
03:32
π
|
kim__ |
btw, here's the mailing list page https://lists.wikimedia.org/mailman/listinfo/wikimedia-l , in case anyone wants to sign up, answer my mail, and take ownership from the AT side ;-) |
03:32
π
|
yipdw^ |
just say the English one for no |
03:32
π
|
yipdw^ |
w |
03:32
π
|
kim__ |
oh, I was about to say there's 800 wikimedia wikis |
03:32
π
|
kim__ |
I'm not actually entirely sure anymore |
03:32
π
|
yipdw^ |
yeah, let's just do English WP |
03:32
π
|
kim__ |
the text is a few gigs |
03:32
π
|
kim__ |
but english wikipedia links out to a different wiki called commons.wikimedia.org |
03:33
π
|
yipdw^ |
right |
03:33
π
|
kim__ |
whicyh contains a lot of the images and multimedia |
03:33
π
|
yipdw^ |
do you know how big that is? |
03:33
π
|
kim__ |
and commons is Very Very Large |
03:33
π
|
yipdw^ |
hm |
03:33
π
|
yipdw^ |
maybe I should do a Kickstarter |
03:33
π
|
yipdw^ |
ask for $10,000 for a hundred TB |
03:33
π
|
yipdw^ |
and just try to suck in all WMF sites |
03:33
π
|
kim__ |
http://commons.wikimedia.org/wiki/Main_Page 12,819,893 |
03:34
π
|
underscor |
yipdw^: do it! |
03:34
π
|
kim__ |
yipdw^, Orrrr, I convince wmf that this is essential |
03:34
π
|
underscor |
money to create an "archival copy of wikipedia" |
03:34
π
|
underscor |
that's synced like once a month or something |
03:34
π
|
kim__ |
or you convince them |
03:34
π
|
kim__ |
like "AT can do this, but we'd need storage) |
03:34
π
|
yipdw^ |
or we use fuckloads of bandwidth :D |
03:34
π
|
underscor |
:D |
03:34
π
|
yipdw^ |
Archive Team is good at that |
03:34
π
|
yipdw^ |
if nothing else |
03:34
π
|
kim__ |
or All Of The Above |
03:34
π
|
underscor |
1-800-BW-SUCKR |
03:35
π
|
dnova |
plz don't post my direct line |
03:35
π
|
kim__ |
http://commons.wikimedia.org/wiki/Special:Statistics |
03:35
π
|
yipdw^ |
huh |
03:35
π
|
yipdw^ |
no size |
03:35
π
|
yipdw^ |
that is a lot of pages, though |
03:35
π
|
underscor |
It's 18 TB |
03:36
π
|
yipdw^ |
that's it? |
03:36
π
|
underscor |
for media? |
03:36
π
|
underscor |
yes |
03:36
π
|
yipdw^ |
huh |
03:36
π
|
yipdw^ |
neat |
03:36
π
|
underscor |
204.x.x.x:/z/public/pub/wikimedia/dumps 152T 33T 118T 22% /mnt/dumps |
03:36
π
|
underscor |
204.x.x.x:/z/public/pub/wikimedia/images 136T 17T 118T 13% /mnt/images |
03:36
π
|
underscor |
(I have a box with them nfs mounted) |
03:36
π
|
kim__ |
underscor, heh, useful |
03:37
π
|
yipdw^ |
does that also include audio and video clips? |
03:37
π
|
kim__ |
underscor, buut, that's not really 100% public, is it? |
03:37
π
|
underscor |
yes |
03:37
π
|
kim__ |
yipdw, should do, it's the same dir |
03:37
π
|
yipdw^ |
oh ok |
03:37
π
|
underscor |
kim__: yeah, should be. I mean, it's what's available on ftpmirror.your.org |
03:37
π
|
underscor |
yipdw^: :D |
03:37
π
|
underscor |
800-439-2978 In Disconnect 800-HEY-BWSUCKR |
03:37
π
|
underscor |
800-932-9782 In Disconnect 800-WE-BWSUCKR |
03:37
π
|
underscor |
888-225-5297 In Disconnect 888-CALL-BWSUCKR |
03:37
π
|
underscor |
888-237-8297 In Disconnect 888-BEST-BWSUCKR |
03:37
π
|
yipdw^ |
awesome |
03:38
π
|
yipdw^ |
Archive Team direct line |
03:38
π
|
yipdw^ |
CALL THE A-TEAM |
03:38
π
|
underscor |
hahaha |
03:38
π
|
underscor |
I think I know what to get SketchCow for his birthday |
03:38
π
|
kim__ |
underscor, so we've got all the data now we need to do the firedrill |
03:38
π
|
yipdw^ |
1-800-MAH-DICK |
03:38
π
|
kim__ |
yipdw, the AT-TEAM? |
03:38
π
|
underscor |
kim__: us |
03:38
π
|
yipdw^ |
nah A-Team |
03:38
π
|
underscor |
800-627-2448 In Disconnect 800-6-ARCHIVETEAM |
03:38
π
|
underscor |
866-727-2448 In Disconnect 866-7-ARCHIVETEAM |
03:38
π
|
underscor |
866-825-8327 In Disconnect 866-VALUE-ARCHIVETEAM |
03:38
π
|
underscor |
877-367-2724 In Disconnect 877-FOR-ARCHIVETEAM |
03:38
π
|
underscor |
888-665-9272 In Disconnect 888-ONLY-ARCHIVETEAM |
03:38
π
|
underscor |
800-743-2724 800-743-ARCHIVETEAM |
03:38
π
|
underscor |
ha |
03:39
π
|
* |
kim__ plays the theme http://www.youtube.com/watch?v=_MVonyVSQoM |
03:39
π
|
kim__ |
we need a new intro blurb though |
03:42
π
|
underscor |
Lol |
03:43
π
|
kim__ |
http://meta.wikimedia.org/wiki/Data_dump_torrents |
03:43
π
|
kim__ |
Also useful |
03:44
π
|
kim__ |
you probably already knew that one |
03:44
π
|
kim__ |
Anyway, can I recruit some of you fine folks for fire-drill kind of things? |
03:44
π
|
kim__ |
it would leave you with a fully functional archival copy of $wmf-wiki and all |
03:45
π
|
yipdw^ |
sounds like fun, but I only have a couple of terabytes free on my personal machines |
03:45
π
|
kim__ |
so I'd need to find some TBs? |
03:45
π
|
yipdw^ |
I think underscor has TB out his as |
03:45
π
|
yipdw^ |
s |
03:46
π
|
kim__ |
does he loan them to you? ;-) |
03:46
π
|
yipdw^ |
I don't want tuberculosis |
03:46
π
|
kim__ |
:-p |
03:49
π
|
yipdw^ |
though, you did give me an idea |
03:49
π
|
yipdw^ |
I have been on a CouchDB kick for a while, mostly because the replication system is so damn smooth nowadays |
03:50
π
|
yipdw^ |
so I think I'll just load the text of en.wp and see how that works out |
03:50
π
|
mistym |
Hey, what's the status on that radio site SketchCow was posting earlier today? Is someone on that? |
03:50
π
|
yipdw^ |
see if I can reconstruct a hyperlinked, textual WP from that |
03:50
π
|
kim__ |
right. That *is* interesting. But I was wondering if it was possible to recreate a Fully Operational Battlest^W I mean copy of wikipedia |
03:51
π
|
yipdw^ |
yeah, that was my second stage plan |
03:51
π
|
underscor |
yipdw^: :D |
03:51
π
|
yipdw^ |
incorporate all multimedia as CouchDB document attachments |
03:51
π
|
yipdw^ |
at that point, if you want a copy of Wikipedia, you replicate the DB |
03:51
π
|
kim__ |
and then write back out? |
03:51
π
|
yipdw^ |
and the logic needed to render that data out goes with it |
03:51
π
|
yipdw^ |
if you store it in e.g. CouchDB design documents |
03:52
π
|
yipdw^ |
I mean, yes, it is a shitload of data, and you will need a corresponding shitload of throughput |
03:52
π
|
kim__ |
The mediawiki engine is fully open source, right? :-) |
03:52
π
|
yipdw^ |
there is no US consumer-grade ISP package that will let you do ths |
03:52
π
|
kim__ |
yipdw^, how is this a problem? |
03:52
π
|
yipdw^ |
it isn' |
03:52
π
|
yipdw^ |
t |
03:52
π
|
yipdw^ |
I'm just saying |
03:53
π
|
kim__ |
*nod* |
03:53
π
|
yipdw^ |
and yes, mediawiki is open source |
03:54
π
|
yipdw^ |
ha |
03:54
π
|
yipdw^ |
http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia |
03:55
π
|
yipdw^ |
I like how "multiple terabytes" is a link to en.wp/Terabyte |
03:55
π
|
kim__ |
Currently Wikipedia does not allow or provide facilities to download all images. As of 17 May 2007, Wikipedia disabled or neglected all viable bulk downloads of images including torrent trackers. Therefore, there is no way to download image dumps other than scraping Wikipedia pages up or using Wikix, which converts a database dump into a series of scripts to fetch the images. |
03:55
π
|
yipdw^ |
LIES |
03:56
π
|
kim__ |
well, ICK |
03:56
π
|
kim__ |
yipdw, this is fixed? |
03:56
π
|
yipdw^ |
no |
03:56
π
|
underscor |
it is fixed now |
03:56
π
|
yipdw^ |
I just stopped reading at 'viable' |
03:56
π
|
underscor |
ftpmirror.your.org has a copy |
03:56
π
|
yipdw^ |
and was like 'well, we could just scrape everything' |
03:56
π
|
kim__ |
underscor, that's something |
03:56
π
|
yipdw^ |
then I realized that they permitted that option |
03:59
π
|
yipdw^ |
http://dumps.wikimedia.org/enwiki/20120502/ |
03:59
π
|
yipdw^ |
huh |
03:59
π
|
yipdw^ |
why are the 7z metahistory dumps so much smaller than the bz2s? |
03:59
π
|
yipdw^ |
is 7z actually just that good? |
03:59
π
|
yipdw^ |
or are the 7z dumps broken |
03:59
π
|
underscor |
it's that good |
03:59
π
|
underscor |
well, at least for the set before that |
04:00
π
|
underscor |
I haven't checked 5/2's |
04:00
π
|
underscor |
but it's the same ratio-ish with the previous set |
04:00
π
|
yipdw^ |
damn |
04:00
π
|
kim__ |
ok, I just found me a new compressor O:-) |
04:00
π
|
yipdw^ |
I've known about 7zip for a while, but never really used it regularly |
04:01
π
|
yipdw^ |
I am wondering how it manages to kick the shit out of bzip2 like this |
04:01
π
|
underscor |
ugh |
04:01
π
|
underscor |
I have a networking proposal to finish tonight |
04:02
π
|
underscor |
and I don't want to work on it >:I |
04:02
π
|
underscor |
I need motivation :'( |
04:02
π
|
yipdw^ |
what is the proposal? |
04:02
π
|
underscor |
it's the final project of our CCNA class |
04:02
π
|
underscor |
we have to propose the network installation for a company that just bought a new office space |
04:03
π
|
yipdw^ |
lots of WRT54Gs |
04:03
π
|
underscor |
everything, from PCs and printer models, to what ISPs we'll peer with, where in the building WAPs will go |
04:03
π
|
underscor |
everything |
04:03
π
|
underscor |
hahahahahahaha |
04:03
π
|
yipdw^ |
you know that that is the way most companies start |
04:03
π
|
underscor |
that's what they have |
04:03
π
|
yipdw^ |
or at least most companies that don't have an overbearing IT staff |
04:03
π
|
underscor |
two 2960 switches |
04:03
π
|
underscor |
and 8 wrt54gs |
04:04
π
|
underscor |
we have to fix it according to Cisco Best Practices (tm) |
04:04
π
|
yipdw^ |
propose a layout that uses equipment that is not produced by Cisco |
04:04
π
|
underscor |
I did, originally |
04:04
π
|
yipdw^ |
heh |
04:04
π
|
underscor |
I was told that was not allowed. |
04:04
π
|
yipdw^ |
what |
04:04
π
|
underscor |
It's a cisco networking academy course |
04:04
π
|
underscor |
the judges are all from cisco |
04:04
π
|
underscor |
so |
04:05
π
|
underscor |
They like to pretend Juniper, force10, et al. don't exist |
04:05
π
|
yipdw^ |
that seems to be doing them favors in the market |
04:05
π
|
kim__ |
alright, I'm off to bed |
04:05
π
|
kim__ |
the day star is rising already |
04:05
π
|
yipdw^ |
'night |
04:05
π
|
kim__ |
'day! ;-) |
04:05
π
|
underscor |
ha |
04:10
π
|
underscor |
Unable to Add an Account: You are only allowed to be signed into 8 accounts simultaneously |
04:10
π
|
underscor |
I guess that's google telling me I have too many google profiles |
04:11
π
|
dnova |
you're really wearing their resources thin |
04:11
π
|
underscor |
lol |
04:12
π
|
underscor |
I have like 17, between my personal, work, school, other school, third school, other work, other other work |
04:12
π
|
underscor |
lol |
04:13
π
|
dnova |
and they all have the same 4 digit numeric password |
04:13
π
|
yipdw^ |
5 digi |
04:13
π
|
yipdw^ |
t |
04:13
π
|
underscor |
actually, no, they're all random alphanumeric strings |
04:13
π
|
yipdw^ |
underscor is security-conscious |
04:13
π
|
underscor |
12 characters |
04:13
π
|
yipdw^ |
why do you have 17 google profiles? |
04:14
π
|
yipdw^ |
that seems excesssive, even in that acse |
04:14
π
|
underscor |
it's actually 14 |
04:14
π
|
underscor |
I hyperbolized |
04:14
π
|
mistym |
Oh, *only* 14. |
04:14
π
|
yipdw^ |
why do you have 14 google profiles? |
04:14
π
|
yipdw^ |
that seems excesssive, even in that case |
04:14
π
|
underscor |
uh |
04:14
π
|
underscor |
I don't know |
04:14
π
|
dnova |
it does seem excessive but it's not improbable these days |
04:14
π
|
dnova |
everyone using google apps |
04:15
π
|
underscor |
yeah |
04:15
π
|
underscor |
that |
04:15
π
|
underscor |
I don't regularly USE all of them |
04:15
π
|
underscor |
these are just sessions that slowly accumulate |
04:15
π
|
underscor |
since I haven't logged off on here for like 3 or 4 weeks |
04:16
π
|
dnova |
if only you could combine them and gather up all the storage |
04:16
π
|
yipdw^ |
I guess that makes sense |
04:16
π
|
underscor |
It's not bad, I have 4x10GB, and the remaining 10 are 25GB |
04:16
π
|
yipdw^ |
You are using 1% of your 290GB |
04:16
π
|
dnova |
was there ever any official statement about their jump from 7.xgb to 10.xgb? |
04:16
π
|
dnova |
I noticed it one day but never saw anything about it |
04:16
π
|
underscor |
that's a lot of email :o |
04:17
π
|
yipdw^ |
I just use my gmail account as a spam trap |
04:17
π
|
underscor |
I don't think so |
04:17
π
|
dnova |
they went shopping, got some hard drives |
04:17
π
|
dnova |
shared the space with everyone |
04:17
π
|
underscor |
my friend was laughing because he had 40 mb left, and was going to have to purchase space |
04:17
π
|
underscor |
then they changed it |
04:17
π
|
dnova |
my friend is PISSED because he JUST bought a year of storage upgrade |
04:18
π
|
dnova |
but it's like $50 so I laugh at him |
04:18
π
|
underscor |
haha |
04:18
π
|
dnova |
google recently got me, too |
04:18
π
|
underscor |
teacher sends me a screenshot |
04:19
π
|
underscor |
11MB bmp |
04:19
π
|
dnova |
I had to make a call to austria and had the free 10 cents gvoice credit, but figured 5 minutes wasn't going to be enough, so I paid $10 to add more |
04:19
π
|
underscor |
ugh |
04:19
π
|
dnova |
then I only used 4 minutes |
04:19
π
|
dnova |
so now I have $10.02 |
04:19
π
|
underscor |
aww |
04:19
π
|
underscor |
lol |
04:19
π
|
yipdw^ |
call someone random in Austria |
04:19
π
|
dnova |
haha |
04:19
π
|
dnova |
the credit will get used eventually I'm sure |
04:19
π
|
underscor |
do they still have free us calls? |
04:19
π
|
dnova |
yep |
04:20
π
|
underscor |
I should get a headset for this machine, so I can call from it |
04:20
π
|
dnova |
I use it a lot |
04:20
π
|
underscor |
when I need to order chinese food or something |
04:20
π
|
yipdw^ |
I use a phone, you hipster bastards |
04:20
π
|
underscor |
:D |
04:20
π
|
dnova |
I doo too, but it's tied to gvoice also |
04:20
π
|
dnova |
s/doo/do/ |
04:29
π
|
yipdw^ |
oh, damn |
04:29
π
|
yipdw^ |
40091648 enwiki-20120502-pages-meta-history1.xml-p000000010p000002979 268240 enwiki-20120502-pages-meta-history1.xml-p000000010p000002979.7z |
04:29
π
|
yipdw^ |
ls -1s |
04:29
π
|
yipdw^ |
total 46308376 |
04:29
π
|
yipdw^ |
er |
04:29
π
|
yipdw^ |
oops |
04:29
π
|
yipdw^ |
well, anyway, yeah, that's like two orders of magnitude compression |
05:01
π
|
Zebranky |
SketchCow: Project funded. Only three weeks left! |
05:52
π
|
ariana |
hi all |
05:52
π
|
mdupont |
there is a problem with the mail server |
05:53
π
|
mdupont |
archiveteam@archiveteam.org is not working |
05:59
π
|
dnova |
good. fucking. lord. |
06:00
π
|
dnova |
also, hi |
06:15
π
|
mdupont |
hi dnova |
06:16
π
|
mdupont |
also the register new user page does not work |
06:16
π
|
dnova |
on the wiki? |
06:17
π
|
dnova |
that was a thing I thought it got fixed |
06:19
π
|
mdupont |
dnova, i ran into it today |
06:19
π
|
mdupont |
and then i could not even report it |
06:19
π
|
dnova |
well you've done so in here |
06:29
π
|
mdupont |
:D |
06:49
π
|
SketchCow |
Excellent, Zebranky |
06:49
π
|
SketchCow |
And I guess I have things to fix!! |
06:51
π
|
Coderjoe |
yipdw: 7zip's secret is essentially LZMA: it uses an LZ77, but has a larger window, uses markov chains, and an arithmetic encoder. and several different chains. it is rather crazy. |
06:52
π
|
Coderjoe |
yipdw: whereas bzip2 operates on however much data fits in a 900000 byte RLE block per output block. |
06:55
π
|
Coderjoe |
s/arithmetic/range/ |
06:55
π
|
Coderjoe |
(which are close to identical concepts) |
06:58
π
|
Coderjoe |
plus, it adapts to the data continuously, while bzip2 is per block and deflate (gzip) is just also per block (but the block length is arbitaray) |
06:58
π
|
yipdw |
oh |
06:58
π
|
yipdw |
that'd explain it |
06:58
π
|
yipdw |
I noticed a shitload of redundancy in the uncompressed XML |
06:59
π
|
yipdw |
that and the decompression took forever |
07:11
π
|
Coderjoe |
yeah, LZ77-based algorithms love redundancy |
07:12
π
|
mdupont |
so i have been just reading about the firedrill and the wikipedia |
07:12
π
|
mdupont |
anyone have access to the binlogs of the wp mysql? |
07:13
π
|
yipdw |
probably not here, but I'm not sure why they're necessary |
07:31
π
|
* |
SmileyG ponders if you want the history for the changes and if they are stored there |
07:32
π
|
mdupont |
SmileyG, yipdw i would like to get the "not notable" articles |
07:32
π
|
mdupont |
for some place and topics, almost everything is not notable |
07:32
π
|
mdupont |
and alot of good stuff is deleted |
07:34
π
|
SmileyG |
yah sux |
07:34
π
|
SmileyG |
thats why Idon't contribute to wikipedia :/ |
07:34
π
|
SmileyG |
Something which no one knows about, never gets to be known about. |
07:35
π
|
mdupont |
yes |
07:36
π
|
mdupont |
so i think the binlogs would be good for catching deletions |
07:36
π
|
mdupont |
and supposedly they are on the toolserver, i will have to go searching |
07:46
π
|
SmileyG |
ah |
08:05
π
|
mdupont |
https://wiki.toolserver.org/view/Database_access |
08:05
π
|
mdupont |
it looks like the toolserver is a replica already |
08:12
π
|
SmileyG |
lolk |
08:29
π
|
alard |
I think the end of MobileMe is in sight: I've now searched with English, Spanish, French, German, Dutch and finally Italian dictionaries. (The 350,000 Italian words produced only 400 more items.) |
08:29
π
|
alard |
Unless I've missed a really important language I'll stop searching for more. There are 21,000 items left on the to do list. |
08:32
π
|
ersi |
alard: Good job :] |
08:47
π
|
mdupont |
also people, i am working on putting the osm/fosm.org cc data on archive.org http://osmopenlayers.blogspot.de/2012/05/s3-buckets-for-fosm-in-progress.html |
08:58
π
|
SmileyG |
hmmm how many more items to grab alard ? (I don't have the url handy to check :P ) |
08:59
π
|
ersi |
SmileyG: 10:30 <@alard> Unless I've missed a really important language I'll stop searching for more. There are 21,000 items left on the to do list. |
08:59
π
|
ersi |
Try reading the last part |
09:02
π
|
alard |
SmileyG: http://memac.heroku.com/ |
09:08
π
|
SmileyG |
:D |
09:08
π
|
SmileyG |
I thought he grabbed 21000 more items ΓΒ¬_ΓΒ¬ |
09:09
π
|
ersi |
Oh, heh |
09:10
π
|
* |
SmileyG fails reading when tired :( |
09:11
π
|
ersi |
^_^ |
09:14
π
|
SmileyG |
and its 10:15am :( |
09:16
π
|
schbiridi |
SmileyG: did you finish your fileplanet chunk? :> |
09:16
π
|
schbiridi |
i would just re-grab it otherwise, no biggie |
09:17
π
|
SmileyG |
i never go one.... |
09:17
π
|
SmileyG |
time got the better of me :( |
09:17
π
|
schbiridi |
ah, ok |
09:17
π
|
schbiridi |
np |
09:51
π
|
chronomex |
Coderjoe, yipdw: mediawiki xml -> 7zip gets about the same size reduction as mediawiki -> RCS file, no compression |
10:19
π
|
Schbirid |
http://archive.org/post/419916/old-friendster-blog |
11:34
π
|
godane |
geting some techtv stuff from news groups |
11:35
π
|
godane |
i hope to find music wars special on newsgroups |
11:56
π
|
visitro |
hi there ! I would like to contribute to project Gutenberg. I want to digitalize a book containing drawings, figures, ... what kind of file format should I use for this book to be compliant with your guidelines ? Where should I submit it for approval ? |
11:59
π
|
ersi |
There's nothing like approval, or guidelines |
12:00
π
|
ersi |
and this isn't #gutenberg :P But feel free to digitalize the book anyhow, as large scanning resolution as possible with open formats.. should do fine |
12:01
π
|
visitro |
I got the link here http://archiveteam.org/index.php?title=Project_Gutenberg :) |
12:03
π
|
ersi |
hm~ |
12:04
π
|
ersi |
well, we're not gutenberg anyhow :) |
12:04
π
|
ersi |
we want to download and archive everything from gutenberg |
12:04
π
|
visitro |
and for the kind of book I'm speaking about, using LaTeX should be the best way to do, but that will break compatibility with ePub and other kindle file formats, no ? |
12:04
π
|
visitro |
humm ok |
12:06
π
|
visitro |
and what's your goal ? I mean, what's the point of copying gutenberg.org ? |
12:06
π
|
ersi |
we like to store things, we're digital pack rats |
12:06
π
|
ersi |
Copying is good. Copying makes things possibly last longer and be around longer |
12:07
π
|
Schbirid |
visitro: this is http://archiveteam.org/ . |
12:08
π
|
ersi |
Trying to make sure things don't disappear into the vast nothingness |
12:08
π
|
visitro |
haha ok nice :) |
12:08
π
|
visitro |
then, sorry for the noise :) I fear there is nothink like an official gutenberg irc chanel |
12:08
π
|
visitro |
hava nice day ;) |
12:09
π
|
alard |
visitro: Maybe you should check with the project gutenberg digital proofreaders project. |
12:09
π
|
alard |
I think they have a wiki full of tips about scanning etc. |
12:10
π
|
alard |
http://www.pgdp.net/ (digital proofreaders should be distributed proofreaders, obviously) |
13:46
π
|
godane |
i'm trying to find this file: ΓΒ TECHTV The Screen Savers Cable in The Classroom - December 2001.avi |
13:46
π
|
godane |
there are torrents but no one is seeding |
13:46
π
|
godane |
i was hoping to find it on newsbin |
15:43
π
|
SketchCow |
Morning. |
15:44
π
|
SketchCow |
I've got the mobileme transfers pretty sewn up right now, so that's going well, again. |
15:48
π
|
balrog_ |
morning, SketchCow |
16:36
π
|
SketchCow |
Trying to fix the mail issues. |
16:36
π
|
SketchCow |
Updating from Mon Feb 14 18:03:51 EST 2011 to Thu May 17 12:20:32 EDT 2012. |
16:36
π
|
SketchCow |
WELL GUESS IT'S BEEN A WHILE FOR THAT SERVER HUH |
16:51
π
|
mistym |
Hey SketchCow, what's the haps with that radio site you posted yesterday? Is someone on that? |
16:53
π
|
godane |
SketchCow: uploading a lost epiosde of the screen savers |
16:54
π
|
SketchCow |
I don't know |
16:54
π
|
SketchCow |
Archive.org is grabbing a copy as we speak, I didn't see anyone else here indicate they'd be don. |
16:57
π
|
balrog_ |
one vt100 manual == 10gb of scans (before processing) |
16:59
π
|
SketchCow |
Broome County Sheriff's Deputies say a tractor trailer hauling Chobani Yogurt got on the ramp to Interstate 81 too fast. When it rounded a curve, the trailer slid over the embankment and spilled 36,000 pounds of yogurt on the shoulder and down the hillside. |
17:02
π
|
chronomex |
*slime* |
17:04
π
|
SketchCow |
Applying patches... done. |
17:04
π
|
SketchCow |
Fetching 22713 new ports or files... |
17:04
π
|
SketchCow |
Oh yeah, this is going to be quite the fixup. |
17:12
π
|
godane |
whats a good newsgroup search index? |
17:13
π
|
godane |
having a hard time finding techtv in classroom |
17:20
π
|
yipdw |
http://www.businessweek.com/articles/2012-05-16/is-google-plus-a-ghost-town-and-does-it-matter |
17:20
π
|
yipdw |
google+ archive time |
17:28
π
|
godane |
best to start the archive now |
17:51
π
|
SmileyG |
Hmmm |
17:52
π
|
SmileyG |
G+ is a CDN |
17:52
π
|
SmileyG |
I read _lots_ of posts, all day long infact |
17:52
π
|
SmileyG |
yet I post... ~1once a week |
17:52
π
|
SmileyG |
the problem is the metric is measured by how much people post.... |
17:52
π
|
SmileyG |
because you know, twitter isn't anything unless every single of the X members if posting constantly like zomg? |
17:53
π
|
aggro |
I know I'm on the dark side with G+ here... but I love its interface and the ability to keep up with different groups and the like. |
17:53
π
|
SmileyG |
agreed |
17:53
π
|
SmileyG |
To put things in perspective, he points to a recent Lady Gaga post that received 570 Γ’ΒΒ+1sΓ’ΒΒ on Google+. The exact same post on Facebook got 133,539 Γ’ΒΒLikes.Γ’ΒΒ |
17:53
π
|
SmileyG |
*I* Have got +15 on a post bashing anon, on their own thread. |
17:53
π
|
SmileyG |
:D |
17:53
π
|
* |
SmileyG is 1/10th as popular as lady gaga now? |
17:54
π
|
Schbirid |
well, maybe the g+ audience is not interested in lady gaga |
17:54
π
|
Schbirid |
SmileyGaga |
17:54
π
|
SmileyG |
oh wait, that said 570, not 150, but you know what I mean ;) |
17:54
π
|
aggro |
P P P Popular P P Popular |
17:54
π
|
SmileyG |
Schbirid: damnit they figured me out ΓΒ¬_ΓΒ¬ |
17:54
π
|
Schbirid |
ha |
17:54
π
|
Schbirid |
you look like a horse! |
17:54
π
|
SmileyG |
.o_O I do? |
17:56
π
|
Schbirid |
:) |
18:34
π
|
SketchCow |
I'm heading down to NYC to see a movie and do things later today. Anybody need anything? |
18:34
π
|
SketchCow |
IUMA's going well, we have disk space for the remainder of mobileme, etc. |
18:34
π
|
Schbirid |
and fileplanet breached it's first terabyte today |
18:38
π
|
Schbirid |
http://i.imgur.com/Dp1AM.png |
18:38
π
|
Schbirid |
ignore the title |
18:38
π
|
Schbirid |
green is done |
18:40
π
|
DFJustin |
damn that's more data than cdbbsarchive |
18:40
π
|
SketchCow |
For now. |
18:41
π
|
Schbirid |
including a fantastically useless encrypted 8GB unreal tournament 3 installer |
18:41
π
|
SketchCow |
Like, I have a 303gb pack of CD-ROM images waiting to go |
18:41
π
|
Schbirid |
cute! |
18:45
π
|
Nemo_ter |
:D |
18:46
π
|
Schbirid |
http://www.tested.com/news/44376-16_bit-time-capsule-how-emulator-bsnes-makes-a-case-for-software-preservation/ |
19:21
π
|
nitro2k01 |
Ah, didn't expect a link to bsnes here |
19:21
π
|
nitro2k01 |
That guy has the attitude that all optimizations are evil |
19:22
π
|
nitro2k01 |
Or rather, he wants the source code to be readable. He's aiming for BSNES to be a reference implementation and a documentation of the hardware |
19:23
π
|
Schbirid |
and that is fantastic! |
19:23
π
|
mistym |
nitro2k01: pretty significant difference between "speedhacks" and "optimizations" ;) |
19:24
π
|
SketchCow |
e's cute. |
19:24
π
|
SketchCow |
I like the little shoutout |
19:24
π
|
mistym |
I figured you were well known enough by now they'd have used your name. |
19:53
π
|
shaqfu |
Pity some systems are nigh-impossible to accurately emulate, barring throwing some serious brainpower at it |
19:53
π
|
shaqfu |
Saturn and its eight processors... |
20:33
π
|
mistym |
shaqfu: Yeah, the Saturn is eccentric for sure. |
20:37
π
|
SmileyG |
8 o_O |
20:39
π
|
SmileyG |
schbiridi: your inspiring me, maybe one day.... |
21:01
π
|
nitro2k01 |
Hahaha! http://pouet.net/topic.php?which=4792#c174835 |
21:01
π
|
nitro2k01 |
"Jason Scott once replaced some picture people ripped from his site to use on MySpace with Goatse. |
21:01
π
|
nitro2k01 |
So he is ok in my book." |
21:13
π
|
SmileyG |
LOL |
21:35
π
|
godane |
uploaded it: http://archive.org/details/TechTVSept2002 |
21:35
π
|
godane |
:-D |