International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 108
ISSN 2229-5518
A Combined Approximation to t-distribution
Naveen Kumar Boiroju, R. Ramakrishna
atleast three decimal point accuracy, which is more than sufficient to compare the probability value with the level of significance in statistical hy- pothesis testing.
—————————— ——————————
T is common knowledge that the t-statistic plays a key role in statistics and is the mostly used statistic in the statistical infer- ence of a population mean or comparison of two population means. Therefore an accurate approximation to its cumulative distribution function (CDF) is very much needed in the statistical
hypothesis testing (Jing et al., 2004, Johnson et al., 1995). Two in-
develop a new approximation function to the CDF of t- distribution. In this paper, an improved function suggested by correcting the Gleason (2000) function, then a new combined ap- proximation discussed for 3 ≤ ν ≤ 30 and for all t ≥ 0 .
dependent variables X and Y such that
X ~ N (0,1)and
It is well known that the t-distribution is symmetric distribution
and tends to follow normal distribution for large degrees of free-
Y ~ χ ( n )
respectively, the statistic
t = X /
(Y / n)
is said to
dom (say n>30). The case t<0 can be handled by symmetry proper-
have a t-distribution with n degrees of freedom. The probability
density function of t-distribution with ν degrees of freedom is
ty of the distribution. Gleason (2000) proposed two approxima- tions with two decimal point accuracy.
given by
F1 = Fν (t ) = F(Zν (t ))
(2)
f (t ) =
1
1 ν
1
ν +1
; − ∞ < t < ∞
(1)
where
F(.)is the CDF of standard normal distribution,
ν B ,
2 2
1 +
t 2 2
ln(1 + t 2
Z t
/ν )
ν − 1.5
ν
ν ( ) =
g (ν )
and
g (ν ) = (ν − 1)2 . (3)
There is no closed form to the CDF of t-distribution which show
the way to refer the cumbersome and insufficient statistical tables.
The second function defined by Gleason (2000) is given by substi-
ν − 1.5 − (0.1/ν ) + 0.5825 /ν 2
Hence, an approximation of CDF could provide the probability
values for a t-statistic and often plays a key role in statistical infer-
tuting
g ∗ (ν ) =
(ν − 1)2
in place of
ence. Recently, the approximations of t-distribution function dis- cussed by Yerukala et al. (2013) and their paper motivated us to
g (ν ) in equation (3).
ln(1 + t 2 /ν )
————————————————
F2 = Fν (t ) = F(Zν (t )) with Zν (t ) =
g ∗ ν
(4)
• R. Ramakrishna, Vidya Jyothi Institute of Technology, Post, Aziznagar,
Hyderabad, India. E-mail: ramakrishnaraavi9292@gmail.com
We propose a better approximation function by subtracting a non-
linear component to the function F2 and the resulting function is given as
IJSER © 2014 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 109
ISSN 2229-5518
7.9 +
Atleast two decimal point accuracy is obtained at 3 and 4 degrees
F3 = Fν (t ) = F2 − [ / 10000]
7.9 tanh 3 − 0.63x − 0.52ν
of freedom for the functions F , F
and F
where as the function F
1 2 3 4
9
where x =
if t = 0
(5)
has the same accuracy for the degrees of freedom between 3 and
t otherwise
Li and Moor (1999) suggested a natural modification of the ordi- nary normal approximation to t-distribution.
F4 = Fν (t ) = F(Zν (t )) ,
5. The function F1 provides the three decimal point accuracy
when the degrees of freedom lie in between 5 and 12 whereas the functions F2 and F3 provide atleast three decimal point accuracy for the degrees of freedom lie in between 5 and 14. The function
F4 provides three decimal value accuracy for degrees freedom
where Zν
(t ) = t (4ν + t 2 − 1)/(4ν + 2t 2 )
(6)
from 6 to 11 whereas the function F5 gives the same accuracy for
A combined function defined based on the errors of these func- tions as
degrees of freedom from 3 to 9. The four decimal point accuracy for the function F1 is obtained for degrees of freedom from 13 to
F4 ;
5 3 ;
F1 ;
0 ≤ t <1.3 + 0.04ν
1.3 + 0.04ν ≤ t < 5.94 − 0.04ν
t ≥ 5.94 − 0.04ν
(7)
30, for the functions F2 and F3 , it is obtained for the degrees of freedom from 15 to 30. The function F4 gives four decimal point accuracy when the degrees of freedom from 12 to 21 whereas the
same is observed for the function F5 in between 10 to 21 degrees
The efficiency of these functions measured using the minimum of
maximum absolute error and the error is computed as the differ- ence between the probability of the given function and with that of the TDIST() function available in Microsoft office Excel 2007 software.
The maximum absolute error of these functions observed at 3 de- grees of freedom and the Figure 1 presents the absolute errors of the functions at 3 degrees of freedom. It is evident that the cor- rected function and combined function has lowest absolute errors as compared with other approximations. At 3 degrees of freedom, Function F 4 has the maximum absolute error 0.0069818 observed at t=3.8, function F1 has the maximum absolute error 0.0049514 observed at t=1 and the function F2 has the maximum absolute error 0.0025012 observed at t=0.9. The corrected function F 3 has the maximum absolute error 0.0011699 observed at t=1 and the combined function (F5 ) has the maximum absolute error
0.0008117 observed at t=1.5. The proposed combined function also accurate to the three decimal points as like of the functions de- fined in Yerukala et al. (2013). It is also observed that the pro-
posed functions performing well at the tail probabilities.
of freedom. Only two functions F4 and F5 provide the accuracy up to five decimal points when the degrees of freedom are greater than or equal to 22. From the Table 1, it is observed that the pro- posed combined function F5 , guaranty the three decimal point accuracy and it may be treated as a competitor for the functions proposed by Yerukala et al. (2013).
The proposed combined function (F5 ) guaranties the accuracy up to three decimal points to the CDF of t-distribution where as the corrected function F3 is the efficient function as compared with the other two functions at lower degrees of freedom (Table 1). The function F5 is better than the functions F1 , F2 , F3 and F4 for all ν ≤ 30 . The accuracy of F4 and F5 is almost equivalent for all
ν > 16 . The functions F1 and F2 are better than the function F4
for all ν < 8 and F1 is better than the functions F2 and F3 for all ν > 5 . The accuracy of the functions F2 and F3 are same for all ν > 11. The proposed two functions are guarantying the accura- cy up to three decimal points at the tails of the distribution and it
is more than sufficient in the testing of hypothesis using t-
statistics.
IJSER © 2014 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 110
ISSN 2229-5518
TABLE 1
Maximum absolute errors of the approximations
df | Gleason (2000)-F1 | Gleason (2000)-F2 | Corrected Model-F3 | Li & Moor (1999)-F4 | Combined Model-F5 |
3 | 0.004951 | 0.002501 | 0.001170 | 0.006982 | 0.000812 |
4 | 0.001901 | 0.001370 | 0.001080 | 0.003216 | 0.000413 |
5 | 0.000984 | 0.000874 | 0.000880 | 0.001659 | 0.000326 |
6 | 0.000595 | 0.000608 | 0.000531 | 0.000931 | 0.000214 |
7 | 0.000396 | 0.000448 | 0.000323 | 0.000557 | 0.000203 |
8 | 0.000283 | 0.000344 | 0.000296 | 0.000351 | 0.000148 |
9 | 0.000212 | 0.000273 | 0.000255 | 0.000230 | 0.000122 |
10 | 0.000165 | 0.000221 | 0.000215 | 0.000156 | 0.000083 |
11 | 0.000132 | 0.000183 | 0.000181 | 0.000109 | 0.000068 |
12 | 0.000108 | 0.000154 | 0.000154 | 0.000077 | 0.000056 |
13 | 0.000089 | 0.000132 | 0.000132 | 0.000056 | 0.000040 |
14 | 0.000076 | 0.000114 | 0.000114 | 0.000041 | 0.000032 |
15 | 0.000065 | 0.000099 | 0.000099 | 0.000030 | 0.000027 |
16 | 0.000056 | 0.000087 | 0.000087 | 0.000022 | 0.000022 |
17 | 0.000049 | 0.000078 | 0.000078 | 0.000018 | 0.000018 |
18 | 0.000043 | 0.000069 | 0.000069 | 0.000015 | 0.000015 |
19 | 0.000038 | 0.000062 | 0.000062 | 0.000013 | 0.000013 |
20 | 0.000034 | 0.000056 | 0.000056 | 0.000011 | 0.000011 |
21 | 0.000031 | 0.000051 | 0.000051 | 0.000010 | 0.000010 |
22 | 0.000028 | 0.000047 | 0.000047 | 0.000008 | 0.000008 |
23 | 0.000025 | 0.000043 | 0.000043 | 0.000008 | 0.000008 |
24 | 0.000023 | 0.000039 | 0.000039 | 0.000007 | 0.000007 |
25 | 0.000021 | 0.000036 | 0.000036 | 0.000007 | 0.000007 |
26 | 0.000019 | 0.000033 | 0.000033 | 0.000006 | 0.000006 |
27 | 0.000018 | 0.000031 | 0.000031 | 0.000006 | 0.000006 |
28 | 0.000017 | 0.000029 | 0.000029 | 0.000005 | 0.000005 |
29 | 0.000015 | 0.000027 | 0.000027 | 0.000005 | 0.000005 |
30 | 0.000014 | 0.000025 | 0.000025 | 0.000004 | 0.000004 |
IJSER © 2014 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 111
ISSN 2229-5518
0.008
0.007
0.006
Gleason (2000)-F1
Gleason (2000)-F2
Corrected Function-F3
0.005
0.004
Li & Moor (1999)-F4
Combined Function-F5
0.003
0.002
0.001
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Fig. 1. Maximum absolute error of the approximations for 3 degrees of freedom
[1] B.Y. Jing, Shao, Q.M. and Zhou, W. Saddle- point approximation for student’s t-statistic with no moment conditions, The Annals of Sta- tistics, 32 (6), pp2679-2711, 2004.
[2] N.L. Johnson, Kotz, S. and Balakrishnan, N., Distributions in Statistics: Continuous Univariate Distributions, Vol. 2, Second edition, New York. Wiley, 1995.
[3] R. Yerukala, Boiroju, N.K. and Reddy, M.K., Approximations to the t-distribution, Interna- tional Journal of Statistika and Mathematika, Vol.
8 (1), pp19-21, 2013.
[4] J.R. Gleason, A note on a proposed student t approximation, Computational Statistics & Data
Analysis, 34, pp63-66, 2000.
[5] B. Li and Moor, B.D., A corrected normal ap- proximation for the Student’s t distribution, Computational Statistics & Data Analysis, 29, pp213-216, 1999.
IJSER © 2014 http://www.ijser.org