Time |
Nickname |
Message |
03:08
🔗
|
BlueMax |
rather quiet tonight. no-one must be drinking. |
03:18
🔗
|
omf_ |
I am working on my new backup |
03:20
🔗
|
Aranje |
I'm drinking... which is why I'm quiet... dumb enough to try and play musical chairs with linux partitions |
03:24
🔗
|
godane |
i just bought 180gb of credit for my usenet acccount |
03:24
🔗
|
* |
Aranje can't watch as the new / is expanded to cover the old data |
03:25
🔗
|
Aranje |
It took, now to see if the fucker boots! |
03:51
🔗
|
GLaDOS |
Damn people who get their traceroutes in order.. |
04:26
🔗
|
BlueMax |
godane, I get completely free Usenet from my ISP but I've never gotten around to actually looking at what Usenet is. Do you mind explaining it to me? |
04:35
🔗
|
omf_ |
usenet is the precursor to what we know as forums |
04:36
🔗
|
omf_ |
made up of tens of thousands of groups ranging from mysql to cooking |
04:36
🔗
|
omf_ |
It supports threaded conversations and attachments |
04:36
🔗
|
omf_ |
in 1980 |
04:37
🔗
|
omf_ |
stackoverflow is the next iteration. A moderated Q&A targeted at a small market |
04:41
🔗
|
omf_ |
The problem with usenet is that most was unmoderated so the signal to noise sucked if you didn't know how to navigate it |
04:42
🔗
|
BlueMax |
sure sounds like it'd be that way |
04:44
🔗
|
omf_ |
many of the usenet groups had concentrations of expertise that far eclipse any modern forum |
04:44
🔗
|
omf_ |
linux kernel development started there |
04:44
🔗
|
omf_ |
gnu |
04:44
🔗
|
omf_ |
emacs |
04:44
🔗
|
omf_ |
imdb |
04:44
🔗
|
omf_ |
fanfiction.net |
04:44
🔗
|
omf_ |
the servers traded news feeds for free |
04:45
🔗
|
omf_ |
We see stuff from comp.lang.c and comp.lang.lisp still mentioned in blog posts today |
04:45
🔗
|
omf_ |
When the internet got too popular is when some say usenet started to loose importance |
04:47
🔗
|
BlueMax |
is it anywhere near relevant today? |
04:47
🔗
|
omf_ |
I can honestly say as a whole the internet was more civilized. |
04:47
🔗
|
omf_ |
well most of it is sucked in to build google groups |
04:47
🔗
|
omf_ |
and a ton of people still use that |
04:48
🔗
|
omf_ |
many universities still run newsgroups internally like a bbs |
04:49
🔗
|
omf_ |
It is all trade offs |
04:50
🔗
|
BlueMax |
google groups? |
04:50
🔗
|
omf_ |
yeah they bought a usenet provider to build that |
04:50
🔗
|
omf_ |
trying to compete with yahoo groups |
04:50
🔗
|
omf_ |
which are both just more controlled versions of usenet |
04:52
🔗
|
BlueMax |
fair enough |
04:52
🔗
|
BlueMax |
is it worth hopping on usenet these days? |
04:52
🔗
|
omf_ |
it really depends on which groups you follow |
04:53
🔗
|
omf_ |
there are currently 111,000 groups |
04:54
🔗
|
BlueMax |
jesus |
04:57
🔗
|
BlueMax |
how would I take a look around? client? |
05:00
🔗
|
Aranje |
played musical chairs with linux partitions... grub held a temporary victory, but I won the war. |
05:10
🔗
|
DFJustin |
BlueMax: usenet was originally for discussion as explained above, but at some point it was figured out that you could post files on it as well if you encoded them properly |
05:11
🔗
|
DFJustin |
nowadays the file sharing side of it is much more popular than the rest |
05:11
🔗
|
omf_ |
except the dmca is taking that apart slowly |
05:12
🔗
|
BlueMax |
damn americans. |
05:12
🔗
|
omf_ |
I fucking hate that law |
05:12
🔗
|
DFJustin |
it's been traditional for pirate groups of tv shows, movies, games, etc. to do the initial posting on usenet, after which it gets reposted to torrent sites etc. |
05:12
🔗
|
omf_ |
some groups did irc first |
05:12
🔗
|
Aranje |
ftp too |
05:13
🔗
|
DFJustin |
ISPs provide usenet free access generally have a poor service in comparison with the paid services like godane is talking about |
05:13
🔗
|
DFJustin |
e.g. slow download speeds and only keeping a month's worth of history |
05:13
🔗
|
omf_ |
Are there even any pay usenet services outside the US? |
05:14
🔗
|
BlueMax |
I think the provider my ISP uses is GigaNews |
05:14
🔗
|
BlueMax |
they any good? |
05:14
🔗
|
DFJustin |
that's one of the big name ones yes |
05:14
🔗
|
DFJustin |
I wonder how much they provide through the ISP deal though |
05:15
🔗
|
BlueMax |
apparently it's infinite. |
05:15
🔗
|
dashcloud |
probably not the bin groups |
05:15
🔗
|
omf_ |
yeah that goes back to the asshat AG from New York |
05:15
🔗
|
omf_ |
up till that lawsuit every isp carried everything |
05:16
🔗
|
BlueMax |
Asshat AG? |
05:16
🔗
|
Aranje |
attorney general probably |
05:16
🔗
|
omf_ |
the Attorney General |
05:16
🔗
|
omf_ |
fucker did the case to get in the news which it did |
05:17
🔗
|
omf_ |
had a good half year run of news coverage |
05:17
🔗
|
omf_ |
Like most things that fuck the internet it comes down to a clown trying to make a name |
05:17
🔗
|
omf_ |
Jack Thompson |
05:17
🔗
|
omf_ |
for example |
05:18
🔗
|
omf_ |
The history of the internet can be summed up as a large stack of lawsuits |
05:19
🔗
|
omf_ |
and terrible IP law in the usa |
05:19
🔗
|
BlueMax |
I hate US copyright law, it fucks the rest of us more than them |
05:19
🔗
|
omf_ |
I am an EFF, creative commons person if it is not blaringly obivious |
05:19
🔗
|
omf_ |
No it fucks us hard |
05:20
🔗
|
omf_ |
it fucks everyone else trying to force them to be like us |
05:20
🔗
|
omf_ |
and no lube |
05:20
🔗
|
DFJustin |
well it was only a matter of time before someone noticed really, if you look at even just the list of group names they're listing all manner of illegal things like even child porn |
05:21
🔗
|
omf_ |
That was the argument used to try and shut usenet down |
05:21
🔗
|
DFJustin |
figures |
05:22
🔗
|
omf_ |
and it will be used again to try and shut other things down because it sounds good in the media |
05:22
🔗
|
omf_ |
we are STOPPING child porn |
05:22
🔗
|
omf_ |
all they did was move it somewhere else |
05:23
🔗
|
omf_ |
I wonder how "file sharing" is going to work now with the six strikes law and vpns |
05:23
🔗
|
omf_ |
I find the whole thing fascinating since pirates bring the high tech |
05:23
🔗
|
omf_ |
just like porn sites |
05:24
🔗
|
omf_ |
look at the current online tv offerings |
05:24
🔗
|
omf_ |
there is no where to buy hd files that I can just play |
05:24
🔗
|
omf_ |
I gotta stream or get some drm non-hd crap |
05:25
🔗
|
omf_ |
movies are only marginally better |
05:25
🔗
|
omf_ |
music has somewhat caught up. I bought the soundtrack to TDKR in 24bit flac online |
05:26
🔗
|
omf_ |
open source file format and encoder, patent free format, no drm and high quality |
05:27
🔗
|
BlueMax |
that sounds like heaven. |
05:27
🔗
|
omf_ |
oh I was so happy to find that |
05:27
🔗
|
omf_ |
Hans Zimmer loves computers, he is all about online |
05:28
🔗
|
BlueMax |
I'm guessing the movie has nothing like that though |
05:29
🔗
|
omf_ |
I didn't look, got the bluray for xmas |
05:29
🔗
|
omf_ |
It might be possible |
05:29
🔗
|
omf_ |
but the file format will still suck |
05:30
🔗
|
BlueMax |
Don't you wish all the traditional media marketplaces would just die |
05:30
🔗
|
omf_ |
This issue is complex |
05:30
🔗
|
omf_ |
Since I cannot buy shows in HD |
05:30
🔗
|
omf_ |
I have to get them in bluray |
05:31
🔗
|
BlueMax |
1920x1080 vs 1280x720? |
05:31
🔗
|
omf_ |
and if that goes away then there is no place to buy stuff you can keep |
05:31
🔗
|
omf_ |
no like |
05:31
🔗
|
omf_ |
standard def |
05:31
🔗
|
omf_ |
I learned all about it reading through the amazon prime docs |
05:32
🔗
|
omf_ |
Anyone here heard of oink? |
05:32
🔗
|
BlueMax |
the sound a pig makes. |
05:32
🔗
|
Aranje |
yep |
05:32
🔗
|
Aranje |
there's a few people here that're whatters |
05:33
🔗
|
omf_ |
I thought that group was the future of the scene but even they got busted |
05:34
🔗
|
Aranje |
those sites are the only way to get any good amount of music anymore |
05:34
🔗
|
Aranje |
it pisses me off |
05:34
🔗
|
Aranje |
Like, I love it, but I do really want to give money to people making music to continue their craft |
05:35
🔗
|
BlueMax |
I dunno what Oink was |
05:35
🔗
|
Aranje |
big fuckin music tracker |
05:35
🔗
|
omf_ |
it was high quality |
05:36
🔗
|
Aranje |
and had a good community to boot |
05:36
🔗
|
omf_ |
yeah the oink community is what made it so good |
05:37
🔗
|
omf_ |
I wonder if we are going to go back to just mailing each other physical media |
05:37
🔗
|
Aranje |
mail me a tb hd and find out |
05:38
🔗
|
Aranje |
actually, preferably 2-3tb hd so others can add stuff too |
05:38
🔗
|
Aranje |
haha |
05:38
🔗
|
db48x |
heh |
05:39
🔗
|
Aranje |
I have a friend who does that |
05:39
🔗
|
Aranje |
she drives from SF to LA often enough that she drops a drive on the way down and picks it back up when she goes back north |
05:44
🔗
|
omf_ |
Are there any apps that can copy hulu, amazon, or netflix streams? I haven't looked in over a year. Most stuff I find interesting is on youtube which is easy |
05:45
🔗
|
omf_ |
but you know those media companies. Gotta have turf wars |
05:47
🔗
|
Aranje |
obvious solution: clandestinely produce a script or app that does exactly that, but for your competitors |
05:47
🔗
|
db48x |
heh |
05:47
🔗
|
Aranje |
seriously, our corps suck |
05:47
🔗
|
db48x |
I read a good book where it was part of the background of the world that MMOs did things like that |
05:47
🔗
|
omf_ |
Well I have some ideas about it on Linux |
05:48
🔗
|
omf_ |
just stick some middleware in to copy the ram |
05:48
🔗
|
omf_ |
but since flash is mostly a blackbox it can get tricky |
05:48
🔗
|
db48x |
omf_: that sounds complicated. why not just record the traffic? |
05:49
🔗
|
omf_ |
I haven't tested to see if the payloads are encrypted or not |
05:49
🔗
|
omf_ |
or if delivered out of order and then pieced together |
05:49
🔗
|
omf_ |
I would do both of those things if I was trying to make a secure streaming service |
05:52
🔗
|
omf_ |
I am also going on the theory that if it was as simple as capturing the network traffic there would already be apps out there to do it |
05:52
🔗
|
omf_ |
take get_flash_video |
05:52
🔗
|
omf_ |
it got figured out and then turned into an open source project |
05:58
🔗
|
Aranje |
fixed my boot shit, and blasted away my encrypted homedir. the linux, it is MINE |
05:59
🔗
|
db48x |
Aranje: what was the problem? |
06:01
🔗
|
Aranje |
dunno, grub couldn't find its ass |
06:01
🔗
|
Aranje |
but it boots now, so it's whatever :D |
06:01
🔗
|
omf_ |
did you switch to grub 2? |
06:01
🔗
|
Aranje |
I was always on grub2 |
06:02
🔗
|
omf_ |
okay quick sanity check. My new backup scheme is a set of raid 1 drives. No optical media anymore |
06:02
🔗
|
Aranje |
I was playing musical chairs with linux and windows partitions... it didn't work out so well... some UUID's got changed... it was a pain |
06:03
🔗
|
Aranje |
but I lost no data, and removed windows from my drive without reinstalling a damn thing |
06:04
🔗
|
db48x |
heh |
06:05
🔗
|
db48x |
omf_: you are going to take one of the mirrored drives out and store it, replacing it with a blank? |
06:09
🔗
|
db48x |
if so, be aware of the BER of the drives |
06:09
🔗
|
db48x |
consumer drives have a bit error rate of 1 in 10^14 |
06:10
🔗
|
db48x |
that's only 11tb |
06:10
🔗
|
db48x |
so if you resilver a 4tb drive, you have a 35% chance that the other drive will misread a bit, and you will lose a sector with no way to recover |
06:11
🔗
|
db48x |
I recommend using ZFS instead. it'll let you do a three-way mirror |
06:11
🔗
|
db48x |
you'll still have the same chance of failure, but a miniscule chance that both drives will exhibit an error in the same sector at the same time |
06:11
🔗
|
omf_ |
no the whole raid unit is the backup. Once it is full it gets turned off and put on a shelf |
06:11
🔗
|
db48x |
and since ZFS keeps seperate hashes of every sector it stores, it'll know which drive goofed |
06:12
🔗
|
db48x |
ah |
06:12
🔗
|
db48x |
in that case, make doubly sure that the drives don't get mixed up :) |
06:13
🔗
|
omf_ |
I thought about just raid 10 for extra redundancy but I need to earn more money |
06:13
🔗
|
db48x |
also, if you're doing hardware raid, remember that different raid controllers can't read each other's raid drives |
06:13
🔗
|
db48x |
so you'll need spare controllers as well |
06:14
🔗
|
omf_ |
That is the advantage to software |
06:14
🔗
|
db48x |
good answer :) |
06:15
🔗
|
omf_ |
I want everything to be modular and I found that helps keep the cost down |
06:15
🔗
|
omf_ |
a 5 disk enclosure w/ raid 10 costs way more than 3 2 disk enclosures which also support software raid |
06:16
🔗
|
db48x |
yea |
06:16
🔗
|
db48x |
I think I'm about to pull the trigger on a 32tb setup |
06:16
🔗
|
omf_ |
home made? |
06:17
🔗
|
db48x |
yea |
06:17
🔗
|
omf_ |
for normal usage or backup |
06:17
🔗
|
db48x |
archiving |
06:18
🔗
|
omf_ |
a stand alone unit or built into a computer |
06:18
🔗
|
db48x |
a 4u case with 24 drives |
06:19
🔗
|
omf_ |
how much power is this thing going to use? |
06:21
🔗
|
db48x |
300-400 watts |
06:22
🔗
|
Aranje |
what kinda drives |
06:22
🔗
|
db48x |
probably needs 750 watt supply to handle the peak load at startup |
06:22
🔗
|
db48x |
I'm thinking of getting Hitachi Ultrastars |
06:22
🔗
|
omf_ |
consumer or enterprise grade |
06:23
🔗
|
Aranje |
yeah |
06:23
🔗
|
Aranje |
cause if you're running any kind of hardware raid, you'll get fucked by consumer drives now or later |
06:23
🔗
|
Aranje |
fuckin things time out of the array |
06:24
🔗
|
db48x |
Ultrastar is enterprise, they call the consumer ones Deskstar |
06:24
🔗
|
db48x |
and I'll be using ZFS, so no hardware raid |
06:25
🔗
|
Aranje |
heh, cool |
06:25
🔗
|
Aranje |
<3 zfs |
06:25
🔗
|
omf_ |
solaris or freebsd |
06:25
🔗
|
db48x |
linux |
06:25
🔗
|
omf_ |
ooh living on the edge |
06:25
🔗
|
db48x |
well, I might give IllumOS a try |
06:26
🔗
|
omf_ |
I am thinking about building up a firewall with a raspberry pi |
06:26
🔗
|
db48x |
ZFS on Linux is coming along |
06:26
🔗
|
db48x |
I've had a few problems with it, but nothing that would risk dataloss |
06:27
🔗
|
db48x |
well, except for the bug where chown causes it to lose all the permission information |
06:27
🔗
|
omf_ |
sounds like the crap netapp devices used to do |
06:28
🔗
|
db48x |
heh |
06:28
🔗
|
omf_ |
then again zfs is just a poor mans https://en.wikipedia.org/wiki/Write_Anywhere_File_Layout |
06:28
🔗
|
omf_ |
which netapp owns the patents on |
06:29
🔗
|
* |
Aranje dances around in linux |
06:30
🔗
|
Aranje |
even ubuntu is good if you fuck it hard enough |
06:30
🔗
|
Aranje |
:D |
06:30
🔗
|
db48x |
ZFS has always seemed more capable than netapp |
06:30
🔗
|
omf_ |
it isn't |
06:30
🔗
|
omf_ |
there are dozens of things netapp does that zfs doesn't |
06:30
🔗
|
db48x |
such as? |
06:31
🔗
|
Aranje |
this is my storage wet dream fs: http://en.wikipedia.org/wiki/HAMMER |
06:31
🔗
|
omf_ |
All I know is my college roommate who is a Linux kernel hacker works for netapp writing freebsd kernel shit and he says it is way better. |
06:31
🔗
|
omf_ |
in terms of hardware management |
06:32
🔗
|
omf_ |
I mean it is a whole stack |
06:32
🔗
|
db48x |
oh, yea |
06:32
🔗
|
omf_ |
you get a netapp device you get an embedded freebsd and all this management software |
06:32
🔗
|
omf_ |
the in built samba + NFS support |
06:32
🔗
|
omf_ |
tuning capability and doing shit while it is online |
06:32
🔗
|
db48x |
ZFS is just the filesystem and raid |
06:33
🔗
|
omf_ |
don't get wrong I love zfs because it is pushing this stuff more into the consumer market where it is just as useful |
06:33
🔗
|
omf_ |
on solaris they have it dressed up with more apps and shit like a netapp device |
06:33
🔗
|
omf_ |
which is why I asked which os you were going to use |
06:33
🔗
|
omf_ |
have you given any thought to freenas or that other one |
06:34
🔗
|
omf_ |
I always forget the name |
06:34
🔗
|
db48x |
truenas? |
06:36
🔗
|
shaqfu |
db48x: ZFS via FUSE, or is there actual support for it? |
06:36
🔗
|
db48x |
shaqfu: not ZFS FUSE, ZFS on Linux |
06:36
🔗
|
db48x |
which is an actual kernel module |
06:36
🔗
|
shaqfu |
!!! |
06:36
🔗
|
db48x |
developed by LLNL |
06:36
🔗
|
* |
shaqfu googles |
06:36
🔗
|
db48x |
http://zfsonlinux.org/ |
06:37
🔗
|
shaqfu |
I knew they were working on it, but I didn't know it was useful yet |
06:38
🔗
|
db48x |
they have a huge setup already: http://insidehpc.com/2012/04/24/video-sequoias-55pb-lustrezfs-filesystem/ |
06:39
🔗
|
shaqfu |
That's dynamite |
06:40
🔗
|
shaqfu |
ZFS on a useful OS; it's like a dream come true |
06:43
🔗
|
db48x |
indeed |
06:43
🔗
|
db48x |
I feel like the hardware isn't quite there though |
06:43
🔗
|
db48x |
there are only so many JBODs I can plug into a single server |
06:44
🔗
|
db48x |
infiniband/fiber channel seem like the better way to go |
06:44
🔗
|
db48x |
just plugging more drives into routers, rather than into HBAs |
06:46
🔗
|
Aranje |
that looks like a ragequit |
06:48
🔗
|
db48x |
heh |
06:48
🔗
|
db48x |
I'd like to be able to expand a single set for the rest of my life |
06:49
🔗
|
db48x |
I want to be 5000 years old, capture an asteroid, and use nanotech to convert it into storage and have my zfs pool expand as that comes online |
06:49
🔗
|
Aranje |
is that a challenge? :D |
06:50
🔗
|
shaqfu |
db48x: If only RAIDZ supported expansion :( |
06:50
🔗
|
db48x |
shaqfu: you can replace each drive in the vdev with a larger one until they're all larger |
06:50
🔗
|
db48x |
Aranje: :) |
06:51
🔗
|
shaqfu |
db48x: True, but as a poor college student, slowly growing my array one drive at a time wasn't an option :) |
06:51
🔗
|
shaqfu |
Until one day, I slot in the final drive, and poof, 4TB more space |
06:55
🔗
|
shaqfu |
(that actually would've worked very slowly - Samsung stopped producting 1.5TB drives in that line, so they were sending me 2TB as RMA replacements) |
06:57
🔗
|
db48x |
heh |
07:01
🔗
|
omf_ |
yeah here is the new shit |
07:01
🔗
|
omf_ |
long term one write media |
07:01
🔗
|
omf_ |
http://www.techspot.com/news/50313-hitachi-unveils-quartz-based-storage-data-may-last-100-million-years.html |
07:02
🔗
|
omf_ |
1,000 degrees C for 2 hours and it still works afterward |
07:02
🔗
|
omf_ |
This might be the archival storage medium we have all waited for |
07:04
🔗
|
shaqfu |
omf_: Sure, so long as the devices to read it also last 100M years ;) |
07:04
🔗
|
db48x |
reminds me of http://newscenter.lbl.gov/feature-stories/2009/06/03/billion-year-ultra-dense-memory-chip/ |
07:04
🔗
|
omf_ |
the base encoding is binary dots |
07:04
🔗
|
omf_ |
so that is good |
07:04
🔗
|
omf_ |
here is the funny part |
07:05
🔗
|
omf_ |
Los Alamos national lab already built this crystal stuff in 2009 |
07:05
🔗
|
omf_ |
they use it now |
07:05
🔗
|
db48x |
nice |
07:05
🔗
|
db48x |
I'd like to buy some |
07:05
🔗
|
shaqfu |
For what? Long-term storage? |
07:05
🔗
|
omf_ |
the really high end research places for hardware get to do some crazy shit |
07:05
🔗
|
omf_ |
A medium that actually does not fail |
07:06
🔗
|
omf_ |
I have gold gold cdrs I burned in the 90s that still work |
07:06
🔗
|
omf_ |
but I cannot buy that media today |
07:06
🔗
|
omf_ |
I got 14 year old burned dvdrs that still work and they kinda still sell that quality today |
07:06
🔗
|
shaqfu |
Oh man, remember when CD-ROM was said to last 500 years |
07:06
🔗
|
omf_ |
of course tape drives last a few decades |
07:07
🔗
|
omf_ |
in reality the best disks will make it over 100 years and that is about it |
07:07
🔗
|
shaqfu |
omf_: Long before then, you'll have a loss of reading mechanisms |
07:07
🔗
|
omf_ |
yep |
07:07
🔗
|
shaqfu |
You could have a 8" disk made of adamantium but good luck using it |
07:08
🔗
|
omf_ |
See that is the thing |
07:08
🔗
|
db48x |
heh, 40 megs per square inch |
07:08
🔗
|
omf_ |
my bluray drive can still play cds |
07:08
🔗
|
omf_ |
on the prototype |
07:08
🔗
|
omf_ |
they are giving themselves 2 more years |
07:08
🔗
|
db48x |
yea |
07:08
🔗
|
shaqfu |
omf_: Yeah; backwards compatability lengthens media life considerably |
07:09
🔗
|
omf_ |
and that has been intentional |
07:09
🔗
|
omf_ |
at least from the computer side |
07:09
🔗
|
omf_ |
fuck the media companies |
07:09
🔗
|
omf_ |
I am thinking a real long term media like this will get Library of Congress and other bodies approval for use |
07:09
🔗
|
omf_ |
meaning there will be continual business for equipment |
07:10
🔗
|
shaqfu |
But there are still media that were used on dead product genealogies, so to speak - good luck finding GD-ROM drives that aren't built into Sega products |
07:10
🔗
|
shaqfu |
LC is dealing more with scale than length atm |
07:10
🔗
|
omf_ |
was that the sega saturn disks |
07:10
🔗
|
shaqfu |
Dreamcast |
07:11
🔗
|
shaqfu |
Length is kinda taken care of, since you can just throw redundancy at a problem |
07:11
🔗
|
shaqfu |
But moving huge data, that's hard |
07:11
🔗
|
shaqfu |
I know they're working to figure out how to make Twitter useful |
07:12
🔗
|
omf_ |
it depends |
07:12
🔗
|
omf_ |
researchers already find twitter useful |
07:12
🔗
|
shaqfu |
omf_: I mean, getting their dataset to researchers |
07:13
🔗
|
omf_ |
yeah cause the dump is too fucking big |
07:13
🔗
|
shaqfu |
Yep |
07:13
🔗
|
omf_ |
shit you cannot even get a copy from the LoC |
07:13
🔗
|
shaqfu |
I think they'll mail you disks for small (30TB and under?) requests |
07:14
🔗
|
omf_ |
You supply the disks |
07:14
🔗
|
omf_ |
and then how do they cut the data up for you |
07:14
🔗
|
shaqfu |
Who knows |
07:14
🔗
|
omf_ |
On a smaller scale this is the same kinda problem reddit has |
07:15
🔗
|
shaqfu |
I wouldn't be surprised if they offer a random section of 1% of Twitter once it's closed |
07:15
🔗
|
omf_ |
they have so much "dark" data |
07:15
🔗
|
shaqfu |
That's still so much fucking data that it'll be statistically significant |
07:15
🔗
|
omf_ |
but will it be on topic |
07:15
🔗
|
shaqfu |
omf_: Most people asking for all of Twitter want to do things like track language development |
07:16
🔗
|
omf_ |
well I have interest in doing ngram analysis on that too |
07:16
🔗
|
shaqfu |
I suppose you could ask for certain subsets, like everything on Election Day, for example |
07:16
🔗
|
omf_ |
but to run a dataset like tthat is going to take a computer bigger than what you probably got |
07:16
🔗
|
shaqfu |
Yeah, you need to be a university |
07:16
🔗
|
omf_ |
Take freebase |
07:17
🔗
|
omf_ |
I got a whole backup of that locally |
07:17
🔗
|
omf_ |
I can only partially load it on a 16gb RAM machine because the server software is fucking java |
07:17
🔗
|
omf_ |
and then it runs slow as balls |
07:17
🔗
|
shaqfu |
You were able to rebuild its functionality? |
07:17
🔗
|
omf_ |
largest known graph database in the world |
07:18
🔗
|
omf_ |
what do you mean functionality? The graph database does all the work |
07:18
🔗
|
shaqfu |
Gotcha |
07:18
🔗
|
omf_ |
shit they wondered for decades when they would need a graph database |
07:18
🔗
|
omf_ |
think of how long the RDBMs have lasted |
07:19
🔗
|
omf_ |
csv files, spreadsheets, mysql |
07:19
🔗
|
omf_ |
now we got neo4j, graphd, and titan |
07:20
🔗
|
omf_ |
multi-threaded, graph systems designed to start at millions of rows and scale in every direction |
07:20
🔗
|
omf_ |
sorry it is not even rows |
07:21
🔗
|
omf_ |
millions of entities and edges |
07:21
🔗
|
omf_ |
Freebase is hundreds of millions facts and categories |
07:21
🔗
|
omf_ |
and billions on edges |
07:22
🔗
|
omf_ |
Google and Bing both use it to power search now |
07:22
🔗
|
omf_ |
it is fun stuff |
07:23
🔗
|
omf_ |
the downside is database loading can be measured in days |
07:24
🔗
|
omf_ |
I would say there are at least 10 years worth of new apps that can be built upon freebase |
07:24
🔗
|
shaqfu |
Sheesh |
07:25
🔗
|
omf_ |
yeah some jackass did a full freebase load into mysql and it took 2 weeks to load |
07:25
🔗
|
omf_ |
while that was a few years ago it is still crazy |
07:25
🔗
|
omf_ |
the data is just so complicated but that is the power |
07:26
🔗
|
omf_ |
you got metadata galore |
07:26
🔗
|
shaqfu |
I'd imagine, if you have that much data |
07:26
🔗
|
omf_ |
Google bought freebase and all of metaweb |
07:26
🔗
|
omf_ |
and it took them a decade to get it viable |
07:27
🔗
|
omf_ |
And only in the last year have we seen the results in google and other search engines |
07:28
🔗
|
omf_ |
freebase is so big it can identify facebook accounts |
07:28
🔗
|
omf_ |
I tried that out and it is crazy |
07:28
🔗
|
omf_ |
which means soon it could possibly map everyone on earth |
07:29
🔗
|
db48x |
sweet: scan: scrub repaired 2K in 88h49m with 0 errors on Thu Feb 7 17:39:17 2013 |
07:30
🔗
|
omf_ |
I am glad all the data is staying CC |
07:30
🔗
|
omf_ |
I am not sure if the data design or the license astounds me more. |
07:31
🔗
|
shaqfu |
It's good that it's CC - at least it's available to know how you're being mined :P |
07:32
🔗
|
omf_ |
I am not sure if the facebook thing is going to be permanent or not |
07:32
🔗
|
omf_ |
freebase is about fact based knowledge of the world |
07:32
🔗
|
omf_ |
not social people bullshit |
07:32
🔗
|
omf_ |
plus the data cannot go in without being CC |
07:33
🔗
|
omf_ |
freebase hooks into a bunch of services. They are trying some crazy stuff out |
07:33
🔗
|
shaqfu |
Wait, is that what Wolfram hooks into? |
07:34
🔗
|
omf_ |
They might now |
07:34
🔗
|
omf_ |
They never mentioned it when they launched |
07:35
🔗
|
shaqfu |
I should give it the same test I gave it a year ago |
07:36
🔗
|
shaqfu |
Hunh, that's unusual |
07:38
🔗
|
Aranje |
better response? |
07:38
🔗
|
omf_ |
https://www.wolframalpha.com/input/?i=batman&a=*C.batman-_*Movie- |
07:39
🔗
|
shaqfu |
No - it still doesn't track baseball stats - but the Ty Cobb wiki page had an insane number of hits in January |
07:39
🔗
|
Aranje |
no showtimes nearby :( |
07:40
🔗
|
omf_ |
yeah stats are tricky because the leagues claim to own them |
07:42
🔗
|
shaqfu |
That's what I figured |
07:42
🔗
|
omf_ |
shit has been going on for years |
07:42
🔗
|
omf_ |
it was the first big data thing I was interested in |
07:42
🔗
|
omf_ |
Now I am consumed by freebase |
07:43
🔗
|
db48x |
yea, I don't see how they can claim that |
07:43
🔗
|
omf_ |
I am building/expanding some existing schemes so I can load a few million facts I collected via running web scrapers for 10+ years |
07:43
🔗
|
db48x |
you can't copyright a fact in the US |
07:43
🔗
|
db48x |
in a phone book, for instance, only the ads have any copyright |
07:43
🔗
|
omf_ |
they got a whole racket between the baseball cards and the sports almanic books |
07:44
🔗
|
omf_ |
I know that |
07:44
🔗
|
omf_ |
It is what it is |
07:44
🔗
|
db48x |
still, logic doesn't enter into it :) |
07:44
🔗
|
omf_ |
indeed |
07:44
🔗
|
omf_ |
^.0 |
07:45
🔗
|
omf_ |
I guess that is a Teal'C eyebrow raise |
07:45
🔗
|
db48x |
hahaha |
07:46
🔗
|
omf_ |
See I think usenet and ircs has helped us evolve the written word |
07:46
🔗
|
omf_ |
using emoticons we can transfer feelings and concepts that are far too verbose to be written out constantly |
07:47
🔗
|
omf_ |
I mean look at this |
07:47
🔗
|
omf_ |
http://www.emojidick.com/ |
07:47
🔗
|
omf_ |
It is Moby Dick translated into emoji |
07:49
🔗
|
db48x |
it's not very readable though |
07:52
🔗
|
omf_ |
I didn't see a sample chapter on that page |
07:55
🔗
|
db48x |
oh, I was thinking of this: http://languagelog.ldc.upenn.edu/nll/?p=4399 |
07:56
🔗
|
db48x |
but I can't imagine it being much better |
07:57
🔗
|
omf_ |
yeah I have seen things like that before |
08:29
🔗
|
omf_ |
Still haven't found a good 2 drive enclosure |
08:29
🔗
|
omf_ |
reviews are somewhat lacking for this class of device |
08:30
🔗
|
db48x |
heh |
08:36
🔗
|
chronomex |
why stop at 2 bays when you can have 16? http://www.ebay.com/itm/SGI-3U-Omnistor-SE3016-SATA-SAS-Expander-Media-Storage-Server-16-Hard-Drive-Bay-/150990090853 |
08:43
🔗
|
omf_ |
because I would never want more than a fraction of the drives on anyway |
08:43
🔗
|
omf_ |
this is for a cold backup |
08:43
🔗
|
omf_ |
since optical media has failed me |
08:43
🔗
|
omf_ |
and tape is just not cost effective enough |
08:45
🔗
|
chronomex |
aye |
08:45
🔗
|
chronomex |
do you need an enclosure, or is a top-loading dock sufficient? |
08:45
🔗
|
omf_ |
a multibay top loading dock? |
08:47
🔗
|
chronomex |
sure, something like http://www.newegg.com/Product/Product.aspx?Item=N82E16817153112 |
08:49
🔗
|
omf_ |
yeah I had considered it since I have two 1 port docks usb 2.0 at present |
08:49
🔗
|
omf_ |
and software raid |
08:49
🔗
|
omf_ |
I think I will need to access this backup maybe 4 times a year |
08:49
🔗
|
omf_ |
so 5 years on the shelf easy |
08:50
🔗
|
omf_ |
does that sound reasonable? |
08:51
🔗
|
* |
chronomex shrugs |
08:53
🔗
|
omf_ |
At least I have the data backed up in 2 places |
08:54
🔗
|
chronomex |
cool |
08:55
🔗
|
omf_ |
well I will |
08:55
🔗
|
omf_ |
bluray media is completely un-trust worthy. I got 400 disks to speak to that |
08:55
🔗
|
chronomex |
yeah |
08:55
🔗
|
SmileyG |
hmmm |
08:55
🔗
|
chronomex |
I remember |
08:56
🔗
|
omf_ |
I am estimating at least 2 more weeks just to figure out what still works |
11:07
🔗
|
godane |
you know whats funny about being banned from thebox.bz |
11:07
🔗
|
godane |
i'm able to still have access theshow.bz with same user name |
11:08
🔗
|
ersi |
Yeah, so it's only an IP-ban.. that's pretty.. amatureish |
11:09
🔗
|
godane |
how do you change the ip then? |
11:11
🔗
|
db48x |
hrm: 32tb @ $3088+$3768=$6856 or 64tb @ $3088+$9120=$12208 |
11:13
🔗
|
ersi |
godane: Depends on your ISP. But most often, just rebooting your modem/router will do the trick. Sometimes, you need to let it be off for a period of time (timeout of around 5-30min seems to be common) - some cases, you got static IP |
11:14
🔗
|
db48x |
buy a $5-a-month vps and proxy through it |
11:15
🔗
|
godane |
i have seen that the $5 vps get banned |
11:15
🔗
|
db48x |
delete it and recreat it from your snapshot on a new ip |
11:16
🔗
|
db48x |
I think I will stick with the 32tb array |
11:16
🔗
|
GLaDOS |
I have 14 IPs you could use.. |
11:16
🔗
|
godane |
oh |
11:17
🔗
|
godane |
ok |
11:17
🔗
|
db48x |
64tb is a sweet number, but it would start to effect the retirement equation |
11:18
🔗
|
godane |
i'm on thegeeks.bz |
11:20
🔗
|
db48x |
hmm, hadn't seen that one before |
11:36
🔗
|
godane |
i maybe able to get the original techtv big thingers episodes |
11:36
🔗
|
godane |
*big thinkers |
11:49
🔗
|
godane |
this is very good thing i found this torrent |
11:49
🔗
|
godane |
i only have 7 of the 11 episodes so far |
12:30
🔗
|
asiekierk |
When I'm mirroring a site with wget, is there a way to have it ignore all the ?showComment= links? |
12:30
🔗
|
asiekierk |
there are 1250 comments in one post and 1250 times 200KB is a bit too much when the same info is on the main page |
12:31
🔗
|
godane |
--reject-regex='(showComment=)' |
12:31
🔗
|
db48x |
alard's wget-lua branch can exclude urls based on a regex |
12:31
🔗
|
asiekierk |
the Arch Linux default one can, too |
12:31
🔗
|
db48x |
godane: is that in the... awesome |
12:32
🔗
|
asiekierk |
ok, thanks |
12:32
🔗
|
db48x |
I thought that was specific to that one version |
12:32
🔗
|
godane |
its in there |
12:32
🔗
|
db48x |
you're welcome |
12:32
🔗
|
godane |
warc support is also in wget 1.14 |
12:34
🔗
|
db48x |
indeed |
12:40
🔗
|
godane |
getting 1993 the emperior's new mind |
12:43
🔗
|
db48x |
the Penrose book? |
12:43
🔗
|
godane |
its a video |
12:43
🔗
|
db48x |
ah |
12:43
🔗
|
db48x |
what's it about? |
12:44
🔗
|
godane |
it does say Roger Penrose in the title |
12:44
🔗
|
db48x |
ah |
12:44
🔗
|
db48x |
a video about Roger Penrose, and probably his book of the same title |
12:44
🔗
|
* |
db48x yawns |
12:45
🔗
|
db48x |
I should have gone to sleep ages ago |
12:50
🔗
|
godane |
5787 videos in my g4tv.com video dump |
12:51
🔗
|
godane |
tons of them are broken in 14k and 15k |
13:16
🔗
|
ersi |
It really is a shame that the IA Liveweb doesn't handle SSL |
17:15
🔗
|
godane |
its start to look like 15700s was not that bad |
17:35
🔗
|
godane |
i'm almost at 6000 videos in my g4tv.com video dump |
17:35
🔗
|
godane |
also i maybe close to get the computer tech videos collection up to 10000 |
19:21
🔗
|
godane |
uploaded: https://archive.org/details/bits_and_bytes_1-v2 |
19:22
🔗
|
asiekierk |
neat |
19:22
🔗
|
godane |
the first one is from the official youtube channel of bits and bytes |
19:23
🔗
|
godane |
i think |
19:23
🔗
|
godane |
this is vhsrip of the broadcast version of the episodes |
22:50
🔗
|
DFJustin |
found some more pasokon sunday |