Korelasi: Perbedaan antara revisi

Konten dihapus Konten ditambahkan
SieBot (bicara | kontrib)
k bot Menambah: ko:상관분석
Baris 30:
Korelasi dapat dihitung bila simpangan baku finit dan keduanya tidak sama dengan nol. Dalam pembuktian [[ketidaksamaan Cauchy-Schwarz]], koefisien korelasi tak akan melebihi dari 1 dalam [[nilai absolut]]. Korelasi bernilai 1 jika terdapat hubungan linier yang positif, bernilai -1 jika terdapat hubungan linier yang negatif, dan antara -1 dan +1 yang menunjukkan tingkat [[dependensi linier]] antara dua variabel. Semakin dekat dengan -1 atau +1, semakin kuat korelasi antara kedua variabel tersebut.
 
Jika variabel-variabel tersebut [[variabel yang saling bebas|saling bebas]], nilai korelasi sama dengan 0. Namun tidak demikian untuk kebalikannya, karena koefisien korelasi hanya mendeteksi ''ketergantungan linier'' antara kedua variabel. Misalnya, peubah acak ''X'' berdistribusi uniform pada interval antara -1 dan +1, dan ''Y'' = ''X''<sup>2</sup>. Dengan demikian nilai ''Y'' ditentukan sepenuhnya oleh ''X'', sehingga ''X'' dan ''Y'' memiliki dependensi, namun korelasi keduanya sama dengan nol, yang keduanya tidak berkorelasi. Namun dalam kasus tertentu jika ''X'' dan ''Y'' berditribusi normal bivariat, saling bebas ekuivalen dengan tak berkorelasi.
 
<!--
=== Korelasi untuk sampel ===
 
If we have a series of ''n''&nbsp; measurements of ''X''&nbsp; and ''Y''&nbsp; written as ''x<sub>i</sub>''&nbsp; and ''y<sub>i</sub>''&nbsp; where ''i'' = 1, 2, ..., ''n'', then the [[Pearson product-moment correlation coefficient]] can be used to estimate the correlation of ''X''&nbsp; and ''Y''&nbsp;. The Pearson coefficient is
also known as the "sample correlation coefficient". It is especially important if ''X''&nbsp; and ''Y''&nbsp; are both [[normal distribution|normally distributed]]. The Pearson correlation coefficient is then the best estimate of the correlation of ''X''&nbsp; and ''Y''&nbsp;. The Pearson correlation coefficient is written:
 
 
:<math>
r_{xy}=\frac{\sum (x_i-\bar{x})(y_i-\bar{y})}{(n-1) s_x s_y}
</math>
 
 
where <math>\bar{x}</math> and <math>\bar{y}</math> are the sample [[arithmetic mean|mean]]s of ''x<sub>i</sub>''&nbsp; and ''y<sub>i</sub>''&nbsp;, ''s''<sub>''x''</sub>&nbsp; and ''s''<sub>''y''</sub>&nbsp; are the sample [[standard deviation]]s of ''x<sub>i</sub>''&nbsp; and ''y<sub>i</sub>''&nbsp; and the sum is from ''i'' = 1 to ''n''. As with the population correlation, we may rewrite this as
 
 
:<math>
r_{xy}=\frac{n\sum x_iy_i-\sum x_i\sum y_i}
{\sqrt{n\sum x_i^2-(\sum x_i)^2}~\sqrt{n\sum y_i^2-(\sum y_i)^2}}.
</math>
 
 
Again, as is true with the population correlation, the absolute value of the sample correlation must be less than or equal to 1. Though the above formula conveniently suggests a single-pass algorithm for calculating sample correlations, it is notorious for its numerical instability (see below for something more accurate).
 
The sample correlation coefficient is the fraction of the variance in ''y<sub>i</sub>''&nbsp; that is accounted for by a linear fit of ''x<sub>i</sub>''&nbsp; to ''y<sub>i</sub>''&nbsp;. This is written
 
 
:<math>r_{xy}^2=1-\frac{\sigma_{y|x}^2}{\sigma_y^2}</math>
 
 
where ''&sigma;<sub>y|x</sub><sup>2</sup>''&nbsp; is the square of the error of a linear fit of ''y<sub>i</sub>''&nbsp; to ''x<sub>i</sub>''&nbsp; by the [[equation]] ''y = a + bx''.
 
 
:<math>\sigma_{y|x}^2=\sum_{i=1}^n (y_i-a-bx_i)^2</math>
 
 
and ''&sigma;<sub>y</sub><sup>2</sup>''&nbsp; is just the variance of ''y''
 
 
:<math>\sigma_y^2=\sum_{i=1}^n (y_i-\bar{y})^2</math>
 
 
Note that since the sample correlation coefficient is symmetric in ''x<sub>i</sub>''&nbsp; and ''y<sub>i</sub>''&nbsp;, we will get the same value for a fit of ''x<sub>i</sub>''&nbsp; to ''y<sub>i</sub>''&nbsp;:
 
 
:<math>r_{xy}^2=1-\frac{\sigma_{x|y}^2}{\sigma_x^2}</math>
 
 
This equation also gives an intuitive idea of the correlation coefficient for higher [[dimension]]s. Just as the above described sample correlation coefficient is the fraction of variance accounted for by the fit of a 1-dimensional [[Euclidean space|linear submanifold]] to a set of 2-dimensional vectors (''x<sub>i</sub>''&nbsp;, ''y<sub>i</sub>''&nbsp;), so we can define a correlation coefficient for a fit of an ''m''-dimensional linear submanifold to a set of ''n''-dimensional vectors. For example, if we fit a plane ''z = a + bx + cy''&nbsp; to a set of data (''x<sub>i</sub>''&nbsp;, ''y<sub>i</sub>''&nbsp;, ''z<sub>i</sub>''&nbsp;) then the correlation coefficient of ''z''&nbsp; to ''x''&nbsp; and ''y''&nbsp; is
 
 
:<math>r^2=1-\frac{\sigma_{z|xy}^2}{\sigma_z^2}.\,</math>
 
-->
 
==Koefisien korelasi non-parametrik==