【RStudio】 7강. 확률분포

7강. 확률분포(probability distribution)

추천글 : 【RStudio】 R 스튜디오 목차

1. 개요 [본문]

2. 균일분포 [본문]

3. 이항분포 [본문]

4. 초기하분포 [본문]

5. 푸아송분포 [본문]

6. 정규분포 [본문]

7. 카이제곱분포 [본문]

8. T 분포 [본문]

9. F 분포 [본문]

1. 개요 [목차]

⑴ mean(·) : 객체의 평균

⑵ sum(·) : 객체의 합계

⑶ summary(·) : 객체의 분포 요약

⑷ d- : 확률밀도함수 dΦ(x)/dx

⑸ p- : 누적분포함수 Φ(q) = Pr(X ≤ q)

⑹ q- : 분위수 함수 Φ^-1(p)

⑺ r- : 확률변수 생성

2. 균일분포(uniform distribution) [목차]

dunif(x = 5, min = 0, max = 10)
punif(q = 5, min = 0, max = 10)
quinf(p = 0.5, min = 0, max = 10)
runif(n = 10000, min = 0, max = 10)

3. 이항분포(binomial distribution) [목차]

dbinom(x = 2, size = 5, prob = 0.2)
pbinom(q = 2, size = 5, prob = 0.2)
qbinom(p = 0.5, size = 5, prob = 0.2)
rbinom(n = 10000, size = 5, prob = 0.2)
BINOM <- dbinom(0:100, 100, prob = 0.2)
sum(BINOM)
plot(BINOM)
binom.test(14, n = 100, p = 0.25, alternative = "two.sided", conf.level = 0.95)

4. 초기하분포(hypergeometric distribution) [목차]

x = 0
m = 50
n = 20
k = 30

dhyper(x, m, n, k, log.FALSE)
phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)
qhyper(p, m, n, k, lower.tail = TRUE, log.p = FALSE)
rhyper(nn, m, n, k)

5. 푸아송분포(Poisson distribution) [목차]

dat = rpois(10000, 24)

6. 정규분포(normal distribution) [목차]

dnorm(x = 1, mean = 0, sd = 1)
pnorm(q = 1, mean = 0, sd = 1)
qnorm(p = 0.5, mean = 0, sd = 1)
rnorm(n = 10000, mean = 0, sd = 1)
z.test(c(-1, -2, 0, 3, 2), sigma.x = 1, mu = 0)   # OUTPUT : z-value, p-value, confidence interval,

7. 카이제곱분포(chi-squared distribution) [목차]

### Method 1 ###
qchisq(0.95, 1)
# [1] 3.841459
qchisq(0.99, 1)
# [1] 6.634897
chi_square <- seq(0, 10) dchisq(chi_square, 1)    # density function
# [1] Inf 0.2419707245 0.1037768744 0.0513934433 0.0269954833
# [6] 0.0146449826 0.0081086956 0.0045533429 0.0025833732 0.0014772828
# [11] 0.0008500367
df <- matrix(c(38, 14, 11, 51), ncol = 2, dimnames = list(hair = c("Fair", "Dark"), eye = c("Blue", "Brown"))) df_chisq <- chisq.test(df)
attach(df_chisq)
p.value
# [1] 8.700134e-09


### Method 2 ###
a <- read.csv("data/Titanic.csv")
library(dplyr)
result_chisq <- chisq.test(a$Gender, a$Survived)
print(round(result_chisq$statistic,3))

8. T 분포(T distribution) [목차]

qt(0.025, df = 8)    # Pr(t < -2.306004, df = 8) = 0.025
# [1] -2.306004
qt(0.975, df = 8)    # Pr(t < 0.975, df = 8) = 0.975
# [1] 2.306004
t.test(c(-1, 2, 0, 3, 2), mu = 0)    # sample standard error is used instead of sigma.x
#       ,  One Sample t-test
# data: c(-1, 2, 0, 3, 2)
# t = 1.633, df = 4, p-value = 0.1778
# alternative hypothesis: true mean is not equal to 0
# 95 percent confidence interval:
#  -0.8402621 3.2402621
# sample estimates:
# mean of x
#      1.2


t.test(c(13.5, 14.6, 12.7, 15.5), c(13.6, 14.6, 12.6, 15.7), paired = TRUE)
#         Paired t-test
# data: c(13.5, 14.6, 12.7, 15.5) and c(13.6, 14.6, 12.6, 15.7)
# t = -0.7746, df = 3, p-value = 0.495
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#  -0.255426 0.155426
# sample estimates:
# mean of the differences
#                   -0.05


? mtcars
# starting httpd help server ... done


t.test(mpg ~ am, data = mtcars, alternative = "less")
#         Welch Two Sample t-test
# data: mpg by am
# t = -3.7671, df = 18.332, p-value = 0.0006868
# alternative hypothesis: true difference in means is less than 0
# 95 percent confidence interval:
#      -Inf -3.913256
# sample estimates:
# mean in group 0 mean in group 1
#       17.14737       24.39231


t.test(mpg ~ am, data = mtcars, alternative = "less", var.equal = T)
#         Two Sample t-test
# data: mpg by am
# t = -4.1061, df = 30, p-value = 0.0001425
# alternative hypothesis: true difference in means is less than 0
# 95 percent confidence interval:
#       -Inf -4.250255
# sample estimates:
# mean in group 0 mean in group 1
#        17.14737        24.39231

9. F 분포(F distribution) [목차]

n = 100
x = rnorm(n, sd = sqrt(2))
y = rnorm(n, mean = 1, sd =sqrt(2))
var.test(x, y)
#         F test to compare two variances
# data: x and y
# F = 1.2229, num df = 99, denom df = 99, p-value = 0.3184
# alternative hypothesis: true ratio of variances is not equal to 1
# 95 percent confidence interval:
#  0.8228112 1.8175001
# sample estimates:
# ratio of variances
#            1.22289


1-pf(0.12899, 2, 12)      # 2는 분자의 자유도, 12는 분모의 자유도
# [1] 0.8801851
# 1에서 빼줌으로써 p value를 구할 수 있음

입력: 2019.10.28 22:46

저작자표시

'▶ 자연과학 > ▷ RStudio' 카테고리의 다른 글

【RStudio】 R 스튜디오 목차 (0)	2019.11.02
【RStudio】 8강. 회귀분석 (0)	2019.10.28
【RStudio】 6강. 그래프 그리기 (0)	2019.10.27
【RStudio】 5강. 데이터 입출력 (0)	2019.10.27
【RStudio】 4강. 행렬 (0)	2019.10.27

정빈이의 공부방

최근댓글

【RStudio】 7강. 확률분포

'▶ 자연과학 > ▷ RStudio' 카테고리의 다른 글

티스토리툴바

【RStudio】 7강. 확률분포

'▶ 자연과학 > ▷ RStudio' 카테고리의 다른 글

'▶ 자연과학/▷ RStudio' 관련 포스팅

티스토리툴바