R에서 유용한 주요 함수 모음
추천글 : 【RStudio】 R 스튜디오 목차, 【생물정보학】 생물정보학 분석 목차
1. 개요 [본문]
2. 기초 함수 [본문]
3. 파일 입출력 [본문]
4. 벡터 및 행렬 연산 [본문]
5. gsub 이용 [본문]
6. 통계 분석 [본문]
7. 생물정보학 [본문]
8. 이미지 생성 [본문]
a. GitHub (executable .R file)
b. 파이썬 유용 함수 모음
1. 개요 [목차]
⑴ 아래에서 정의한 함수는 다음과 같이 호출할 수 있음
source("https://github.com/JB243/nate9389/blob/main/RStudio/A_Collection_of_Useful_Functions_in_RStudio.R?raw=true")
① 다만, 위 코드로 인해 다음과 같은 문제점이 발생할 수 있음
② 문제 1. Error in sum(dim(scRNAseqData)): argument "b" is missing, with no default
③ 문제 2. "do.call" r argument "b" is missing, with no default
⑵ .R 파일은 다음과 같이 사용할 수도 있음
Rscript /numbat/inst/bin/pileup_and_phase.R \
--label possorted_genome_bam \
--samples possorted_genome_bam \
--bams possorted_genome_bam.bam \
--barcodes possorted_genome_bam_barcodes.tsv \
--outdir possorted_genome_bam \
--gmap /Eagle_v2.4.1/tables/genetic_map_hg38_withX.txt.gz \
--snpvcf /data/genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf \
--paneldir /data/1000G_hg38 \
--ncores 1
⑶ 아래에서 정의한 함수는 다음과 같이 사용할 수 있음
SUM <- function(a, b){
return (a + b)
}
A <- 1
B <- 2
print(SUM(A, B))
# 3
2. 기초 함수 [목차]
⑴ 자료형 타입 확인
# x is a variable
typeof(x)
⑵ 자연수 앞에 0을 채워서 총 다섯 자리가 되도록 하는 함수
n = 55
sprintf("%05d", n)
⑶ 엔터를 칠 때까지 대기하는 함수 (레퍼런스)
readline(prompt="Press [enter] to continue")
⑷ 다중 연산 지원
# Reference : https://cran.r-project.org/web/packages/future/vignettes/future-3-topologies.html
plan("multisession", workers = 10)
⑸ %in% 연산자
x <- c('월','화','수','목','금','토','일')
x %in% c('토','일')
# FALSE FALSE FALSE FALSE FALSE TRUE TRUE
3. 파일 입출력 [목차]
⑴ 특정 폴더 'destination.folder'가 존재하는지 확인
## Check whether destination folder exists
if (!file.exists(destination.folder)) {
stop(.wrap("The destination folder could not be found. Please change",
"the path specified in", sQuote(destination.folder)))
}
⑵ 특정 폴더 'destination.folder'에 쓰기 권한이 존재하는지 확인
## Check for write permissions in the destination folder
if (file.access(destination.folder, 2) == -1) {
stop(.wrap("You do not have write permission in the destination",
"folder."))
}
⑶ 폴더 생성 : destination.folder 아래 Filtered_bams 폴더 생성
## Create folder
destination.folder <- file.path(destination.folder, "Filtered_bams")
tryCatch({
if (!file.exists(file.path(destination.folder))) {
dir.create(file.path(destination.folder),
recursive = TRUE)
} else {
stop(.wrap("The folder",
sQuote(file.path(destination.folder, "Filtered_bams")),
"already exists. Please remove it, or (in case you",
"still need it), rename it to prevent files from being",
"overwritten."))
}
}, warning = function(e) {
stop(.wrap("You do not have write permissions in the destination",
"folder. Stopping execution of the remaining part of the",
"script..."))
})
⑷ .gz 파일을 읽어들여서 압축을 푸는 코드
tmp = readMM('matrix.mtx.gz')
writeMM(tmp, 'matrix.mtx')
4. 벡터 및 행렬 연산 [목차]
⑴ 주어진 벡터를 a, b, c 순으로 정렬하기
my_vec <- c('c', 'b', 'a')
print( my_vec[order(my_vec)] )
# "a" "b" "c"
⑵ 주어진 문자열(given_str)에서 특정 문자열(partial_str)을 포함하는지 여부 : grepl (레퍼런스)
grepl(partial_str, given_str, fixed = TRUE)
⑶ 주어진 문자열 벡터에서 특정 문자열 str을 포함하는 index 출력
has_string_arr <- function(str, arr){
flag <- 0
ar <- array()
for(i in 1:length(arr)){
if( grepl(str, arr[i], fixed=TRUE) ){
flag <- flag + 1
ar[flag] <- i
}
}
return(ar)
}
⑷ 세 벡터의 교집합
tri_intersect <- function(A, B, C){
a <- intersect(A, B)
b <- intersect(a, C)
return(b)
}
⑸ 주어진 벡터에서 NA 부분을 0으로 바꾸는 함수
trim_na <- function(vector){
for(i in 1: length(vector)){
if(is.na(vector[i])){
vector[i] = 0
}
}
return (vector)
}
⑹ 주어진 벡터에서 NA 부분을 제외하는 함수
ignore_na <- function(vector){
flag = 0
ar = array()
for(i in 1: length(vector)){
if(! is.na(vector[i])){
flag = flag + 1
ar[flag] = vector[i]
}
}
return (ar)
}
⑺ 주어진 벡터에서 특정 원소를 다른 원소로 치환하는 함수
replace_in_vector <- function(v, from_element, to_element){
ar <- array(dim = length(v))
for(i in 1 : length(v)){
if(v[i] == from_element){
ar[i] = to_element
} else{
ar[i] = v[i]
}
}
return(ar)
}
⑻ 행렬 혹은 데이터 프레임의 왼쪽 모퉁이를 반환하는 함수
corner <- function(x, num = 10){
return(x[1:min( num, dim(x)[1] ),
1:min( num, dim(x)[2] )])
}
⑼ 임의의 두 행렬이 주어져 있을 때 cbind를 출력하는 함수
my.cbind <- function(data1, data2){
a <- intersect(rownames(data1), rownames(data2))
return (cbind(data1[a, ], data2[a, ]))
}
⑽ 임의의 두 행렬이 주어져 있을 때 rbind를 출력하는 함수
my.rbind <- function(data1, data2){
a <- intersect(colnames(data1), colnames(data2))
return (rbind(data1[, a], data2[, a]))
}
⑾ 주어진 행렬 혹은 데이터프레임에서 A행과 B행을 바꾸어 출력해주는 함수
switch_A_B_row <- function(mat0, A, B){
mat <- mat0
mat[A, ] = mat0[B, ]
mat[B, ] = mat0[A, ]
rownames(mat)[A] = rownames(mat0)[B]
rownames(mat)[B] = rownames(mat0)[A]
return(mat)
}
⑿ 주어진 행렬 혹은 데이터프레임에서 A열과 B열을 바꾸어 출력해주는 함수
switch_A_B_col <- function(mat0, A, B){
mat <- mat0
mat[, A] = mat0[, B]
mat[, B] = mat0[, A]
colnames(mat)[A] = colnames(mat0)[B]
colnames(mat)[B] = colnames(mat0)[A]
return(mat)
}
5. gsub 이용 [목차]
⑴ 주어진 문자열에서 하이픈(-)을 제거하는 함수
eliminate_hyphene <- function(str){
return(gsub("-", "", str))
}
⑵ 주어진 문자열에서 "("과 같은 특수기호를 제거하는 함수 (레퍼런스)
eliminate_symbol <- function(str){
return(gsub("(", "", str, fixed = TRUE))
}
⑶ 주어진 문자열에서 x와 x 뒤의 모든 문자들을 제거하는 함수 (레퍼런스)
eliminate_backward <- function(str){
return(gsub('x.*$', '', str))
}
⑷ "ABCD.123"과 같이 주어진 문자열에서 .과 . 뒤에 오는 모든 숫자들을 제거하는 함수
eliminate_backward2 <- function(str){
return(gsub('*.[0-9]*$', '', str))
}
⑸ 주어진 문자열에서 x와 x 앞의 모든 문자들을 제거하는 함수
eliminate_forward <- function(str){
return(gsub('.*x', '', str))
}
⑹ 주어진 string형 변수(’given_str’)에서 start부터 end까지의 문자를 “x”로 채운 뒤 반환하는 함수
str_substitute <- function(given_str, start, end){
library(stringr)
library(stringi)
result = paste0(
str_sub(given_str, 1, start-1),
stri_dup("x", (end-start+1) ),
str_sub(given_str, end+1, str_length(given_str))
)
return(result)
}
⑺ 주어진 string (’given_str’) 안에서 특정 패턴 뒤에 오는 숫자들 수집하는 코드
number_after_pattern_in_str <- function(given_str, pattern){
# reference : https://cran.r-project.org/web/packages/stringr/vignettes/stringr.html
library(stringr)
phone1 <- paste0(pattern, '([1-9]{1})')
phone2 <- paste0(pattern, '([1-9][0-9]{1})')
phone3 <- paste0(pattern, '([1-9][0-9]{2})')
phone4 <- paste0(pattern, '([1-9][0-9]{3})')
arr <- array()
flag = 0
for(i in 1 : dim(str_locate_all(given_str, phone4)[[1]])[1] ){
if (dim(str_locate_all(given_str, phone4)[[1]])[1] > 0){
flag <- flag + 1
arr[flag] = str_sub(
given_str,
str_locate_all(given_str, phone4)[[1]][1, 1], #start
str_locate_all(given_str, phone4)[[1]][1, 2] #end
)
given_str = str_substitute(
given_str,
str_locate_all(given_str, phone4)[[1]][1, 1], #start
str_locate_all(given_str, phone4)[[1]][1, 2] #end
)
}
}
for(i in 1 : dim(str_locate_all(given_str, phone3)[[1]])[1] ){
if (dim(str_locate_all(given_str, phone3)[[1]])[1] > 0){
flag <- flag + 1
arr[flag] = str_sub(
given_str,
str_locate_all(given_str, phone3)[[1]][1, 1], #start
str_locate_all(given_str, phone3)[[1]][1, 2] #end
)
given_str = str_substitute(
given_str,
str_locate_all(given_str, phone3)[[1]][1, 1], #start
str_locate_all(given_str, phone3)[[1]][1, 2] #end
)
}
}
for(i in 1 : dim(str_locate_all(given_str, phone2)[[1]])[1] ){
if (dim(str_locate_all(given_str, phone2)[[1]])[1] > 0){
flag <- flag + 1
arr[flag] = str_sub(
given_str,
str_locate_all(given_str, phone2)[[1]][1, 1], #start
str_locate_all(given_str, phone2)[[1]][1, 2] #end
)
given_str = str_substitute(
given_str,
str_locate_all(given_str, phone2)[[1]][1, 1], #start
str_locate_all(given_str, phone2)[[1]][1, 2] #end
)
}
}
for(i in 1 : dim(str_locate_all(given_str, phone1)[[1]])[1] ){
if (dim(str_locate_all(given_str, phone1)[[1]])[1] > 0){
flag <- flag + 1
arr[flag] = str_sub(
given_str,
str_locate_all(given_str, phone1)[[1]][1, 1], #start
str_locate_all(given_str, phone1)[[1]][1, 2] #end
)
given_str = str_substitute(
given_str,
str_locate_all(given_str, phone1)[[1]][1, 1], #start
str_locate_all(given_str, phone1)[[1]][1, 2] #end
)
}
}
arr <- str_sub(arr, str_length(pattern)+1, str_length(arr))
arr <- as.numeric(arr)
return(arr)
}
6. 통계 분석 [목차]
⑴ 10% ~ 90%까지의 percentile 분위수를 알려주는 코드
# reference: https://www.statology.org/percentiles-in-r/
quantile(data, probs = seq(.1, .9, by = .1))
⑵ n!에 log10을 취한 값
log10_factorial <- function(n){
if(n == 0){
return(0)
}
out <- 0
for(i in 1 : n){
out <- out + log(i) / log(10)
}
return(out)
}
⑶ n개 중 k개를 뽑는 경우의 수인 이항계수 nCk를 구하는 함수 : 단순히 factorial을 이용하면 Inf가 뜰 수 있으므로 코드를 개선함
my.combination <- function(n, k){
# return nCk = n! / ((n-k)! k!)
if (n == k || n == 0 || k == 0){
return(1)
}
A = log10_factorial(n)
B = log10_factorial(n-k)
C = log10_factorial(k)
log10_nCk = A - B - C
return(10^(log10_nCk))
}
⑷ 두 집단의 t test
t.test(v1, v2, paired = FALSE)
# maybe you can activate 'paired' in a special condition
⑸ 두 집단의 ANOVA test
one_way_2_factor_anova <- function(v1, v2){
dat <- matrix(0, nrow = ( length(v1) + length(v2) ), ncol = 2 )
for(i in 1 : length(v1) ){
dat[i, 1] <- v1[i]
dat[i, 2] <- 'v1'
}
for(i in 1 : length(v2) ){
dat[i + length(v1), 1] <- v2[i]
dat[i + length(v1), 2] <- 'v2'
}
dat <- as.data.frame(dat)
colnames(dat) <- c('val', 'factor')
anova_IS <- aov(val ~ factor, data = dat)
print(summary(anova_IS))
anova_residuals <- anova_IS$residuals
print(summary(anova_residuals))
}
⑹ 세 집단의 ANOVA test
one_way_3_factor_anova <- function(v1, v2, v3){
dat <- matrix(0, nrow = ( length(v1) + length(v2) + length(v3) ), ncol = 2 )
for(i in 1 : length(v1) ){
dat[i, 1] <- v1[i]
dat[i, 2] <- 'v1'
}
for(i in 1 : length(v2) ){
dat[i + length(v1), 1] <- v2[i]
dat[i + length(v1), 2] <- 'v2'
}
for(i in 1 : length(v3) ){
dat[i + length(v1) + length(v2), 1] <- v3[i]
dat[i + length(v1) + length(v2), 2] <- 'v3'
}
dat <- as.data.frame(dat)
colnames(dat) <- c('val', 'factor')
anova_IS <- aov(val ~ factor, data = dat)
print(summary(anova_IS))
anova_residuals <- anova_IS$residuals
print(summary(anova_residuals))
}
⑺ 네 집단의 ANOVA test
one_way_4_factor_anova <- function(v1, v2, v3, v4){
dat <- matrix(0, nrow = ( length(v1) + length(v2) + length(v3) + length(v4) ), ncol = 2 )
for(i in 1 : length(v1) ){
dat[i, 1] <- v1[i]
dat[i, 2] <- 'v1'
}
for(i in 1 : length(v2) ){
dat[i + length(v1), 1] <- v2[i]
dat[i + length(v1), 2] <- 'v2'
}
for(i in 1 : length(v3) ){
dat[i + length(v1) + length(v2), 1] <- v3[i]
dat[i + length(v1) + length(v2), 2] <- 'v3'
}
for(i in 1 : length(v4) ){
dat[i + length(v1) + length(v2) + length(v3), 1] <- v4[i]
dat[i + length(v1) + length(v2) + length(v3), 2] <- 'v4'
}
dat <- as.data.frame(dat)
colnames(dat) <- c('val', 'factor')
anova_IS <- aov(val ~ factor, data = dat)
print(summary(anova_IS))
anova_residuals <- anova_IS$residuals
print(summary(anova_residuals))
}
⑻ 다섯 집단의 ANOVA test
one_way_5_factor_anova <- function(v1, v2, v3, v4, v5){
dat <- matrix(0, nrow = ( length(v1) + length(v2) + length(v3) + length(v4) + length(v5) ), ncol = 2 )
for(i in 1 : length(v1) ){
dat[i, 1] <- v1[i]
dat[i, 2] <- 'v1'
}
for(i in 1 : length(v2) ){
dat[i + length(v1), 1] <- v2[i]
dat[i + length(v1), 2] <- 'v2'
}
for(i in 1 : length(v3) ){
dat[i + length(v1) + length(v2), 1] <- v3[i]
dat[i + length(v1) + length(v2), 2] <- 'v3'
}
for(i in 1 : length(v4) ){
dat[i + length(v1) + length(v2) + length(v3), 1] <- v4[i]
dat[i + length(v1) + length(v2) + length(v3), 2] <- 'v4'
}
for(i in 1 : length(v5) ){
dat[i + length(v1) + length(v2) + length(v3) + length(v4), 1] <- v5[i]
dat[i + length(v1) + length(v2) + length(v3) + length(v4), 2] <- 'v5'
}
dat <- as.data.frame(dat)
colnames(dat) <- c('val', 'factor')
anova_IS <- aov(val ~ factor, data = dat)
print(summary(anova_IS))
anova_residuals <- anova_IS$residuals
print(summary(anova_residuals))
}
⑼ 여섯 집단의 ANOVA test
one_way_6_factor_anova <- function(v1, v2, v3, v4, v5, v6){
dat <- matrix(0, nrow = ( length(v1) + length(v2) + length(v3) + length(v4) + length(v5) + length(v6) ), ncol = 2 )
for(i in 1 : length(v1) ){
dat[i, 1] <- v1[i]
dat[i, 2] <- 'v1'
}
for(i in 1 : length(v2) ){
dat[i + length(v1), 1] <- v2[i]
dat[i + length(v1), 2] <- 'v2'
}
for(i in 1 : length(v3) ){
dat[i + length(v1) + length(v2), 1] <- v3[i]
dat[i + length(v1) + length(v2), 2] <- 'v3'
}
for(i in 1 : length(v4) ){
dat[i + length(v1) + length(v2) + length(v3), 1] <- v4[i]
dat[i + length(v1) + length(v2) + length(v3), 2] <- 'v4'
}
for(i in 1 : length(v5) ){
dat[i + length(v1) + length(v2) + length(v3) + length(v4), 1] <- v5[i]
dat[i + length(v1) + length(v2) + length(v3) + length(v4), 2] <- 'v5'
}
for(i in 1 : length(v6) ){
dat[i + length(v1) + length(v2) + length(v3) + length(v4) + length(v5), 1] <- v6[i]
dat[i + length(v1) + length(v2) + length(v3) + length(v4) + length(v5), 2] <- 'v6'
}
dat <- as.data.frame(dat)
colnames(dat) <- c('val', 'factor')
anova_IS <- aov(val ~ factor, data = dat)
print(summary(anova_IS))
anova_residuals <- anova_IS$residuals
print(summary(anova_residuals))
}
⑽ 두 벡터의 FC, p value 조사
comparison_of_two_vectors <- function(v1, v2, paired = FALSE){
p.val = t.test(v1, v2, paired = paired)
print(p.val)
log2FC = log( mean(v1 + 0.000000000001)/mean(v2 + 0.000000000001) ) / log(2)
print(log2FC)
}
⑾ Fisher's exact test를 이용하여 두 집합의 통계적 동일을 검정하는 방법 ver 1
total.gene <- 32285
ST <- 31
scRNAseq <- 14
cross <- 5
a <- cross
b <- scRNAseq - a
c <- ST - a
d <- total.gene - a - b - c
A <- a + b
B <- c + d
C <- a + c
D <- b + d
group<-c("A","A","B","B")
cancer<-c("1.Yes","2.No","1.Yes","2.No")
count<-c(a,b,c,d)
dat<-data.frame(group,cancer,count)
tab<-xtabs(count~group+cancer,data=dat)
tab
chisq.test(tab)$observed
chisq.test(tab)$expected
fisher.test(tab)
-log(fisher.test(tab)$p.value, 10)
if(cross > ST * scRNAseq / total.gene){
print("Enrichment")
} else if(cross < ST * scRNAseq / total.gene){
print("Depletion")
}
⑿ Fisher's exact test를 이용하여 두 집합의 통계적 동일을 검정하는 방법 ver 2
my.Fisher.exact.test <- function(total, A, B, cross){
a1 <- log10_factorial(A)
a2 <- log10_factorial(total - A)
a3 <- log10_factorial(B)
a4 <- log10_factorial(total - B)
b1 <- log10_factorial(cross)
b2 <- log10_factorial(A - cross)
b3 <- log10_factorial(B - cross)
b4 <- log10_factorial(total - cross - (A - cross) - (B - cross))
b5 <- log10_factorial(total)
out = a1 + a2 + a3 + a4 - b1 - b2 - b3 - b4 - b5
return(10^out)
}
⒀ MIA assay (enrichment)
my.MIA.assay.enrichment <- function(total, A, B, cross){
out <- 0
for(i in cross:min(A, B)){
out = out + my.Fisher.exact.test(total, A, B, i)
}
return(out)
}
⒁ MIA assay (depletion)
my.MIA.assay.depletion <- function(total, A, B, cross){
out <- 0
for(i in 0:cross-1){
out = out + my.Fisher.exact.test(total, A, B, i)
}
return(out)
}
7. 생물정보학 [목차]
⑴ 특정 단어(keyword)로 시작하는 유전자들 탐색
gene_starting_with <- function(keyword){
human = read.csv("https://blog.kakaocdn.net/dn/29YTj/btrS5iG9QOH/Di6RQKxHOPDii7EjkdHN30/human_genes_36601.tsv?attach=1&knm=tfile.tsv", sep = '\t', header = F)
mouse = read.csv("https://blog.kakaocdn.net/dn/wkjwJ/btrS1QSgrpD/VS8ELANCQyeZAA3vL8JQP0/mouse_genes_32285.tsv?attach=1&knm=tfile.tsv", sep = '\t', header = F)
ar = array()
flag = 0
if(keyword == toupper(keyword)){ # human genes
for(i in 1:dim(human)[1]){
if(grepl(keyword, human[i, 2], fixed = TRUE)){
flag = flag + 1
ar[flag] = human[i, 2]
}
}
}
else{ # mouse genes
for(i in 1:dim(mouse)[1]){
if(grepl(keyword, mouse[i, 2], fixed = TRUE)){
flag = flag + 1
ar[flag] = mouse[i, 2]
}
}
}
return(ar)
}
⑵ ensembl_gene_id와 gene_symbol 간의 변환 등
① convert from ensembl.gene to gene.symbol
library(EnsDb.Hsapiens.v79)
ensembl.genes <- c("ENSG00000150676", "ENSG00000099308", "ENSG00000142676", "ENSG00000180776", "ENSG00000108848", "ENSG00000277370", "ENSG00000103811", "ENSG00000101473")
geneIDs1 <- ensembldb::select(EnsDb.Hsapiens.v79, keys= ensembl.genes, keytype = "GENEID", columns = c("SYMBOL","GENEID"))
② convert from ensembl.gene to gene.symbol (표를 이용한 방식)
ensembl_to_gene <- function(ensembl_list){
ar = array(dim = length(ensembl_list))
human = read.csv("https://blog.kakaocdn.net/dn/29YTj/btrS5iG9QOH/Di6RQKxHOPDii7EjkdHN30/human_genes_36601.tsv?attach=1&knm=tfile.tsv", sep = '\t', header = F)
mouse = read.csv("https://blog.kakaocdn.net/dn/wkjwJ/btrS1QSgrpD/VS8ELANCQyeZAA3vL8JQP0/mouse_genes_32285.tsv?attach=1&knm=tfile.tsv", sep = '\t', header = F)
for(i in 1:length(ensembl_list)){
if(grepl('ENSG', ensembl_list[i], fixed = TRUE)){ # human gene
index = match(ensembl_list[i], human[, 1])
ar[i] = human[index, 2]
}
else if(grepl('ENSMUSG', ensembl_list[i], fixed = TRUE)){ # mouse gene
index = match(ensembl_list[i], mouse[, 1])
ar[i] = mouse[index, 2]
}
}
return(ar)
}
③ convert from gene.symbol to ensembl.gene
library(EnsDb.Hsapiens.v79)
geneSymbols <- c('DDX26B','CCDC83', 'MAST3', 'RPL11', 'ZDHHC20', 'LUC7L3', 'SNORD49A', 'CTSH', 'ACOT8')
geneIDs2 <- ensembldb::select(EnsDb.Hsapiens.v79, keys= geneSymbols, keytype = "SYMBOL", columns = c("SYMBOL","GENEID"))
④ convert from gene.symbol to ensembl.gene (표를 이용한 방식)
gene_to_ensembl <- function(gene_list){
ar = array(dim = length(gene_list))
human = read.csv("https://blog.kakaocdn.net/dn/29YTj/btrS5iG9QOH/Di6RQKxHOPDii7EjkdHN30/human_genes_36601.tsv?attach=1&knm=tfile.tsv", sep = '\t', header = F)
mouse = read.csv("https://blog.kakaocdn.net/dn/wkjwJ/btrS1QSgrpD/VS8ELANCQyeZAA3vL8JQP0/mouse_genes_32285.tsv?attach=1&knm=tfile.tsv", sep = '\t', header = F)
for(i in 1:length(gene_list)){
if(gene_list[i] == toupper(gene_list[i])){ # human gene
index = match(gene_list[i], human[, 2])
ar[i] = human[index, 1]
}
else{ # mouse gene
index = match(gene_list[i], mouse[, 2])
ar[i] = mouse[index, 1]
}
}
# return(ignore_na(ar))
## if possible, ignore_na should be used
return(ar)
}
⑤ human ensembl transcript to gene name (ref)
library(biomaRt)
mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
ensembl_transcript_to_gene <- function(transcript_ids){
# reference : https://support.bioconductor.org/p/106253/#106256
res <- getBM(attributes = c('ensembl_transcript_id_version',
'ensembl_gene_id',
'external_transcript_name',
'external_gene_name'),
filters = 'ensembl_transcript_id_version',
values = transcript_ids,
mart = mart)
if(dim(res)[1] == 0){
return("")
}
return(res[, 'external_gene_name'])
}
⑥ mouse gene to MGI symbol (표를 이용한 방식)
mouse_gene_to_MGI <- function(mouse_gene_list){
ar = array(dim = length(mouse_gene_list))
dat <- read.csv("https://blog.kakaocdn.net/dn/cVeqsA/btrS1JMnxyX/HtVhPmqtxdgt7LQlGkeql0/HOM_MouseHumanSequence.csv?attach=1&knm=tfile.csv")
for(i in 1:length(mouse_gene_list)){
index = match(mouse_gene_list[i], dat[,'Symbol'])
ar[i] = dat[index, 'Mouse.MGI.ID']
}
return(ar)
}
⑦ MGI symbol to mouse gene (표를 이용한 방식)
MGI_to_mouse_gene <- function(MGI_list){
ar = array(dim = length(MGI_list))
dat <- read.csv("https://blog.kakaocdn.net/dn/cVeqsA/btrS1JMnxyX/HtVhPmqtxdgt7LQlGkeql0/HOM_MouseHumanSequence.csv?attach=1&knm=tfile.csv")
for(i in 1:length(MGI_list)){
index = match(MGI_list[i], dat[,'Mouse.MGI.ID'])
ar[i] = dat[index, 'Symbol']
}
return(ar)
}
⑧ human gene to HGNC symbol (표를 이용한 방식)
human_gene_to_HGNC <- function(human_gene_list){
ar = array(dim = length(human_gene_list))
dat <- read.csv("https://blog.kakaocdn.net/dn/cVeqsA/btrS1JMnxyX/HtVhPmqtxdgt7LQlGkeql0/HOM_MouseHumanSequence.csv?attach=1&knm=tfile.csv")
for(i in 1:length(human_gene_list)){
index = match(human_gene_list[i], dat[,'Symbol'])
ar[i] = dat[index, 'HGNC.ID']
}
return(ar)
}
⑨ HGNC symbol to human gene (표를 이용한 방식)
HGNC_to_human_gene <- function(HGNC_list){
ar = array(dim = length(HGNC_list))
dat <- read.csv("https://blog.kakaocdn.net/dn/cVeqsA/btrS1JMnxyX/HtVhPmqtxdgt7LQlGkeql0/HOM_MouseHumanSequence.csv?attach=1&knm=tfile.csv")
for(i in 1:length(HGNC_list)){
index = match(HGNC_list[i], dat[,'HGNC.ID'])
ar[i] = dat[index, 'Symbol']
}
return(ar)
}
⑶ Affymetrix probe ID와 ensembl_gene_id, gene_name
① convert Affymetrix probe ID to ensembl_gene_id, gene_name
#Convert Affymetrix probe ID to ensembl_gene_id, gene_name
## https://www.biostars.org/p/328065/#328328
## https://www.biostars.org/p/332461/#332474
BiocManager::install("biomaRt", force=TRUE)
library(biomaRt)
dat<-c('1007_s_at', '1053_at', '117_at', '121_at', '1255_g_at', '1294_at')
require("biomaRt")
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
annotLookup <- getBM(
mart = mart,
attributes = c(
"affy_hg_u133_plus_2",
"ensembl_gene_id",
"gene_biotype",
"external_gene_name"),
filter = "affy_hg_u133_plus_2",
values = dat,
uniqueRows=TRUE)
⑷ 사람과 마우스의 상동(gene homology between human and mouse)
① 개요
○ 사람의 유전자는 36601개
○ 마우스의 유전자는 32285개
○ 사람과 마우스의 유전체는 거의 유사하다고는 하지만 다른 점도 많음
○ 따라서 사람 유전자를 마우스 유전자로, 혹은 그 역으로 변환하는 방법을 숙지할 필요가 있음
○ 단순히 마우스 유전자를 대문자로 했을 때 대응 되는 사람 유전자가 아닌 경우도 많음
② 방법 1. biomaRt
install.packages("BiocManager")
BiocManager::install("biomaRt")
library(biomaRt)
genes <- c("Xkr4", "Gm1992", "Gm37381")
human <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
mouse <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")
hh <- getLDS(attributes = c("mgi_symbol", "ensembl_gene_id", "entrezgene_id"), filters = "mgi_symbol", values = genes, mart = mouse, attributesL =
c("hgnc_symbol", "ensembl_gene_id", "entrezgene_id"), martL = human, uniqueRows = T)
○ attribute 1. MGI symbol와 HGNC symbol의 대응표를 통해 대응관계를 결정
○ attribute 2. ENSMUSG code와 ENSG code의 대응표를 통해 대응관계를 결정
○ attribute 3. NCBI gene ID와 NCBI gene ID의 대응표를 통해 대응관계를 결정
○ 각 attribute에 대해 왼쪽에 마우스에 대한 정보이고 오른쪽은 사람에 대한 정보
○ 현재 작동하지 않음 : 현재 다음과 같은 에러메세지가 발견되어 작동되지 않음
#1. Error in getLDS(attributes = c("mgi_symbol", "ensembl_gene_id", "entrezgene_id"), : Query ERROR: caught BioMart::Exception::Database: Could not connect to mysql database ensembl_mart_106: DBI connect ('database =ensembl_mart_106;host=127.0.0.1;port=5316','ensro',...) failed: Can't connect to MySQL server on '127.0.0.1' (111) at /nfs/public/ro/ensweb/live/mart/www_106/biomart-perl/lib/BioMart/Configuration/DBLocation.pm line 98.
#2. Error in textConnection(attrfilt) : invalid 'text' argument
#3. Ensembl site unresponsive, trying useast mirror. Ensembl site unresponsive, trying asia mirror
#4. Error: biomaRt has encountered an unexpected server error. Consider trying one of the Ensembl mirrors (for more details look at ?useEnsembl)
③ 방법 2. MGI 홈페이지에서 제공하는 사람-마우스 대응표를 이용하는 방법
human_to_mouse <- function(human_gene){
hom <- read.csv("https://blog.kakaocdn.net/dn/cVeqsA/btrS1JMnxyX/HtVhPmqtxdgt7LQlGkeql0/HOM_MouseHumanSequence.csv?attach=1&knm=tfile.csv")
mouse_gene = array()
flag = 0
for(i in 1 : length(human_gene)){
index = match(human_gene[i], hom[hom$Common.Organism.Name == 'human', 'Symbol'])
key = hom[hom$Common.Organism.Name == 'human', 'DB.Class.Key'][index]
flag = flag + 1
mouse_gene[flag] = hom[hom$DB.Class.Key == key
& hom$Common.Organism.Name == 'mouse, laboratory'
, 'Symbol'][1] # duplicate mouse genes can be found
}
return(mouse_gene)
}
mouse_to_human <- function(mouse_gene){
hom <- read.csv("https://blog.kakaocdn.net/dn/cVeqsA/btrS1JMnxyX/HtVhPmqtxdgt7LQlGkeql0/HOM_MouseHumanSequence.csv?attach=1&knm=tfile.csv")
human_gene = array()
flag = 0
for(i in 1 : length(mouse_gene)){
index = match(mouse_gene[i], hom[hom$Common.Organism.Name == 'mouse, laboratory', 'Symbol'])
key = hom[hom$Common.Organism.Name == 'mouse, laboratory', 'DB.Class.Key'][index]
flag = flag + 1
human_gene[flag] = hom[hom$DB.Class.Key == key
& hom$Common.Organism.Name == 'human'
, 'Symbol'][1] # duplicate human genes can be found
}
return(human_gene)
}
⑸ chromosome position to hgnc_symbol
ChromosomePosition_to_hgnc_symbol <- function(chromosome, start, end){
# reference : https://support.bioconductor.org/p/127035/
library(biomaRt)
positions <- data.frame(chromosome = chromosome,
start = start,
end = end)
ensembl = useEnsembl(biomart='ensembl',
dataset="hsapiens_gene_ensembl")
results <- getBM(attributes = c("hgnc_symbol", "chromosome_name", "start_position", "end_position"),
filters = c("chromosome_name", "start", "end"),
values = list(positions[,1], positions[,2], positions[,3]),
mart = ensembl)
print(results)
postions_combined <- apply(as.matrix(positions), 1, paste, collapse = ":")
results2 <- getBM(attributes = c("hgnc_symbol", "chromosome_name", "start_position", "end_position"),
filters = c("chromosomal_region"),
values = postions_combined,
mart = ensembl)
print(results2)
}
⑹ gene name to chromosome position (표를 이용한 방식)
gene_to_chromosome_position <- function(gene_list){
# gene_list : list of genes
human <- read.csv("https://blog.kakaocdn.net/dn/lTbKq/btrWjlmGho8/eWwWHbfLOlVGKAVeuDMKt1/human%20gene%20annotation.csv?attach=1&knm=tfile.csv")
mouse = read.csv("https://blog.kakaocdn.net/dn/clSwT7/btrWcrWmS41/mNLCUuBlQxfJFhG1U2JQNk/mouse%20gene%20annotation.csv?attach=1&knm=tfile.csv")
result = array()
for(i in 1:length(gene_list)){
if(gene_list[i] == toupper(gene_list[i])){ # human gene
idx = match(gene_list[i], human[, 1])
note = paste('Gene.ID: ', gene_list[i],
', chromosome: ', human[idx, 'chromosome'],
', start: ', human[idx, 'start'],
', end: ', human[idx, 'end'])
result[i] = note
}
else{ # mouse gene
idx = match(gene_list[i], mouse[, 1])
note = paste('Gene.ID: ', gene_list[i],
', chromosome: ', mouse[idx, 'chromosome'],
', start: ', mouse[idx, 'start'],
', end: ', mouse[idx, 'end'])
result[i] = note
}
}
return(result)
}
⑺ gene name to description
gene_to_description <- function(gene_list){
# gene_list : list of genes
human <- read.csv("https://blog.kakaocdn.net/dn/lTbKq/btrWjlmGho8/eWwWHbfLOlVGKAVeuDMKt1/human%20gene%20annotation.csv?attach=1&knm=tfile.csv")
mouse = read.csv("https://blog.kakaocdn.net/dn/clSwT7/btrWcrWmS41/mNLCUuBlQxfJFhG1U2JQNk/mouse%20gene%20annotation.csv?attach=1&knm=tfile.csv")
result = array()
for(i in 1:length(gene_list)){
if(gene_list[i] == toupper(gene_list[i])){ #human
idx = match(gene_list[i], human[, 1])
result[i] = human[idx, 'Description']
}
else{ #mouse
idx = match(gene_list[i], mouse[, 1])
result[i] = mouse[idx, 'Description']
}
}
return(result)
}
⑻ gene name to bioType
gene_to_bioType <- function(gene_list){
# gene_list : list of mouse genes
human <- read.csv("https://blog.kakaocdn.net/dn/lTbKq/btrWjlmGho8/eWwWHbfLOlVGKAVeuDMKt1/human%20gene%20annotation.csv?attach=1&knm=tfile.csv")
mouse = read.csv("https://blog.kakaocdn.net/dn/clSwT7/btrWcrWmS41/mNLCUuBlQxfJFhG1U2JQNk/mouse%20gene%20annotation.csv?attach=1&knm=tfile.csv")
result = array()
for(i in 1:length(gene_list)){
if(gene_list[i] == toupper(gene_list[i])){ # human
idx = match(gene_list[i], human[, 1])
result[i] = human[idx, 'bioType']
}
else { # mouse
idx = match(gene_list[i], mouse[, 1])
result[i] = mouse[idx, 'bioType']
}
}
return(result)
}
⑼ Seurat 객체 object 유전자 이름 변경 : 객체를 새로 만들어야 함 (레퍼런스)
# RenameGenesSeurat ------------------------------------------------------------------------------------
RenameGenesSeurat <- function(obj = ls.Seurat[[i]], newnames = HGNC.updated[[i]]$Suggested.Symbol) { # Replace gene names in different slots of a Seurat object. Run this before integration. Run this before integration. It only changes obj@assays$RNA@counts, @data and @scale.data.
print("Run this before integration. It only changes obj@assays$RNA@counts, @data and @scale.data.")
RNA <- obj@assays$RNA
if (nrow(RNA) == length(newnames)) {
if (length(RNA@counts)) RNA@counts@Dimnames[[1]] <- newnames
if (length(RNA@data)) RNA@data@Dimnames[[1]] <- newnames
if (length(RNA@scale.data)) RNA@scale.data@Dimnames[[1]] <- newnames
} else {"Unequal gene sets: nrow(RNA) != nrow(newnames)"}
obj@assays$RNA <- RNA
return(obj)
}
# RenameGenesSeurat(obj = SeuratObj, newnames = HGNC.updated.genes)
⑽ 클러스터 정보 변경
update_cluster_in_seurat_obj <- function(seurat_obj, barcode, cluster){
# dim(seurat_obj)[1] = length(barcode) = length(cluster)
mat <- matrix(0, nrow = length(barcode), ncol = 2)
mat[, 1] = barcode
mat[, 2] = cluster
mat = as.data.frame(mat)
rownames(mat) = barcode
seurat_obj@meta.data$orig.ident = mat[rownames(seurat_obj@meta.data), 2]
# you may need to modify the above code
seurat_obj@active.ident = as.factor(seurat_obj@meta.data$orig.ident)
return (seurat_obj)
}
⑾ Idents를 쓰면 원하는 metadata에 대해 FindAllMarkers를 쓸 수 있음 (레퍼런스 1, 레퍼런스 2)
Idents(pbmc) <- pbmc$celltype
markers = FindAllMarkers(object = pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
⑿ Seurat 객체의 orig.ident의 플롯 내 순서를 바꾸기
# Reference : https://github.com/satijalab/seurat/issues/2471
object$orig.ident <- factor(x = object$orig.ident, levels = c("A", "B", "C"))
⒀ Seurat 객체 pbmc3k를 h5ad로 저장하기 (레퍼런스)
library(Seurat)
library(SeuratData)
library(SeuratDisk)
SaveH5Seurat(pbmc3k, filename = "~/Downloads/pbmc3k.h5Seurat")
Convert("~/Downloads/pbmc3k.h5Seurat", dest = "h5ad")
① 생성된 .h5ad 파일은 파이썬 상에서 다음 명령어로 열 수 있음
import scanpy as sc
pbmc = sc.read_h5ad("pbmc3k.h5ad")
⒁ tissue_dir 디렉토리 내 matrix.mtx, barcodes.tsv, features.tsv 및 spatial 폴더가 있을 때, Visium 데이터를 R에서 읽는 코드
library(Seurat)
b_data <-ReadMtx('./matrix.mtx.gz',
'./barcodes.tsv.gz',
'./features.tsv.gz',
feature.column=1)
b_data = CreateSeuratObject(b_data, assay='Spatial')
b_image = Read10X_Image(paste('./spatial/',sep=''))
b_image_ = b_image[Cells(b_data)]
DefaultAssay(object = b_image_) <- 'Spatial'
b_data[["slice1"]]=b_image_
# you can change "slice1"
b_data <- SCTransform(b_data, assay = "Spatial", verbose = FALSE, variable.features.n = 1000)
# Some errors might occur when running SpatialFeaturePlot without SCTransform
SpatialFeaturePlot(b_data, rownames(b_data)[1])
# check whether it works properly
⒂ .h5ad로 저장된 Visium 데이터를 R에서 읽는 코드
library(Seurat)
library(anndata)
library(png)
library(jsonlite)
adata = read_h5ad("adata.h5ad")
hires = adata$uns$spatial$SAMPLE$images$hires
# you must change "SAMPLE"
png_file_path_hires = "~/Downloads/data/spatial/tissue_hires_image.png"
png(filename = png_file_path_hires, width = dim(hires)[1], height = dim(hires)[2])
par(mar = c(0, 0, 0, 0)) # Remove margins
plot.new()
rasterImage(hires, 0, 0, 1, 1)
dev.off()
lowres = adata$uns$spatial$SAMPLE$images$lowres
# you must change "SAMPLE"
png_file_path_lowres = "~/Downloads/data/spatial/tissue_lowres_image.png"
png(filename = png_file_path_lowres, width = dim(lowres)[1], height = dim(lowres)[2])
par(mar = c(0, 0, 0, 0)) # Remove margins
plot.new()
rasterImage(lowres, 0, 0, 1, 1)
dev.off()
scale_factors <- adata$uns$spatial$SAMPLE$scalefactors
# you must change "SAMPLE"
json_file_path <- "~/Downloads/data/spatial/scalefactors_json.json"
write_json(scale_factors, json_file_path)
df = adata$obs[,1:3]
df[,4] = adata$obsm$spatial[,2]
df[,5] = adata$obsm$spatial[,1]
write.csv(df, "~/Downloads/data/spatial/tissue_positions_list.csv")
b_data = CreateSeuratObject(t(as.matrix(adata$X)), assay='Spatial')
b_image = Read10X_Image(paste('~/Downloads/data/spatial/',sep=''))
b_image_ = b_image[Cells(b_data)]
DefaultAssay(object = b_image_) <- 'Spatial'
b_data[["slice1"]]=b_image_
# you can change "slice1"
b_data <- SCTransform(b_data, assay = "Spatial", verbose = FALSE, variable.features.n = 1000)
# Some errors might occur when running SpatialFeaturePlot without SCTransform
SpatialFeaturePlot(b_data, rownames(b_data)[1])
# An error occurs...
⒃ highly variable gene을 뽑아주는 코드
library(dplyr)
library(Seurat)
pbmc.data <- Read10X(data.dir = "./outs")
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc[["percent.mt"]] <- PercentageFeatureSet(object = pbmc, pattern = "^MT-")
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 15)
pbmc <- NormalizeData(object = pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
pbmc <- FindVariableFeatures(object = pbmc, selection.method = "vst", nfeatures = 2000)
gene_list = rownames(pbmc)[pbmc@assays$RNA@meta.features$vst.variable]
# reference : https://kkokkilkon.tistory.com/144
#install_github("devtools")
library(devtools)
#install_github("ggbiplot", "vqv")
library(ggbiplot)
PCA <- function(dt, dt_group, scale=T){
# rownames(dt) : our interest
# colnames(dt) : the dimensional space of each sample
pca_dt <- prcomp(dt,
center = T,
scale. = scale)
ggbiplot(pca_dt,
choices = c(1, 2),
obs.scale = 1,
var.scale = 1,
groups = dt_group,
circle = TRUE,
varname.size=0,
var.axes = F)
}
# Example
dt <- iris[, -5]
dt_group <- iris[, 5]
scale = TRUE
PCA(dt, dt_group, scale)
# 필요한 패키지 설치 및 로드
install.packages("survival")
install.packages("survminer")
library(survival)
library(survminer)
# 수정된 데이터 프레임 생성
# 모든 사망과 censoring 사건을 정확하게 반영
surv_data <- data.frame(
time = c(6, 12, 21, 27, 32, 39, 43, 43, 43, 89, 89, 89, 89, 89, 89, 261, 263, 270, 270, 311),
status = c(1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1) # 사망은 1, censoring은 0
)
# Surv 객체 생성
surv_obj <- Surv(time = surv_data$time, event = surv_data$status)
# Kaplan-Meier 생존 곡선 추정
fit <- survfit(surv_obj ~ 1, data = surv_data)
# 생존 곡선 그리기
ggsurvplot(
fit,
data = surv_data,
xlab = "Time",
ylab = "Survival probability",
title = "Kaplan-Meier Survival Curve",
surv.median.line = "hv", # 중앙값 생존 시간 선 추가
ggtheme = theme_minimal(), # 테마 설정
risk.table = TRUE, # 위험 테이블 추가
palette = "Dark2" # 색상 팔레트 설정
)
8. 이미지 생성 [목차]
⑴ 이미지를 png로 저장하는 예시
png(file = "my plot.png", width = 1500, height = 300)
DimPlot(pbmc, reduction = "umap", label = TRUE, repel = TRUE)
dev.off()
⑵ x와 y가 주어져 있을 때 scatter plot과 기울기의 신뢰구간을 그리는 함수
scatter_plot <- function(x, y, xlab = "x", ylab = "y", point_size = 2, lab_size = 4, png=TRUE){
library(ggplot2)
# the lenth(x) must be same with the length(y)
mat <- matrix(0, nrow = length(x), ncol = 2)
mat[, 1] = x
mat[, 2] = y
colnames(mat) = c(xlab, ylab)
mat <- as.data.frame(mat)
if(png){
png("./scatter_plot.png",width=2000,height=2000,res=500)
ggplot(mat, aes(x=x, y=y)) + geom_point(shape=19, size=point_size, color="blue") + theme(plot.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(),panel.background = element_blank(), axis.line = element_line(size =1)) + stat_smooth(method = lm, level=.95, color="grey") + labs(x=xlab, y=ylab, size=lab_size)
dev.off()
} else{
ggplot(mat, aes(x=x, y=y)) + geom_point(shape=19, size=point_size, color="blue") + theme(plot.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(),panel.background = element_blank(), axis.line = element_line(size =1)) + stat_smooth(method = lm, level=.95, color="grey") + labs(x=xlab, y=ylab, size=lab_size)
}
}
⑶ x, y, color가 주어져 있을 때 SpatialFeaturePlot을 그려주는 함수
my.plot <- function(x, y, col){
# assume that length(x) = length(y) = length(col)
plot(x, y, t="n")
colfunc <- colorRampPalette(c("#000000", "#EB4600", "#FFF800"))
coll = array(dim = length(col))
for(i in 1 : length(col)){
coll[i] <- colfunc(100) [as.integer( col[i] / max(col) * 99 + 1)]
}
text(x, y, labels = "●", col = coll, cex = 1)
}
⑷ spatial feature plot
# tissue_dir : the directory that contains a filtered_feature_bc_matrix.h5
tissue_dir <- './outs/'
# Tgenes : genes of interest
Tgenes <- c('Slc2a1', 'Slc2a3')
conv_spatial_feature_plot <- function(tissue_dir, Tgenes, quality.control = FALSE){
library(Seurat)
library(SeuratData)
library(ggplot2)
library(cowplot)
library(dplyr)
# reference : https://nbisweden.github.io/workshop-scRNAseq/labs/compiled/seurat/seurat_07_spatial.html
br.sp = Load10X_Spatial(tissue_dir, slice= 'slice1')
br.sp <- SCTransform(br.sp, assay = "Spatial", verbose = FALSE, variable.features.n = 1000)
if(quality.control){
br.sp <- PercentageFeatureSet(br.sp, "^mt-", col.name = "percent_mito")
br.sp <- PercentageFeatureSet(br.sp, "^Hb.*-", col.name = "percent_hb")
br.sp <- br.sp[, br.sp$nFeature_Spatial > 500 & br.sp$percent_mito < 25 & br.sp$percent_hb < 20]
}
SpatialFeaturePlot(br.sp, features = Tgenes)
}
conv_spatial_feature_plot(tissue_dir, Tgenes)
⑸ gene.list, log FC 값 벡터, adjusted p value 값 벡터가 주어졌을 때 enhanced volcano plot
my.EnhancedVolcano <- function(gene.name, logFC, adj.P.Val,
pCutoff = 0.05, FCcutoff = 0.3,
xlim = c(-0.5, 0.5), ylim = c(-0.5, 0.5)){
# install.packages("BiocManager")
# BiocManager::install("EnhancedVolcano")
library(EnhancedVolcano)
tested <- matrix(0, nrow = length(gene.name), ncol = 2)
tested <- as.data.frame(tested)
for(i in 1:length(gene.name)){
tested[i, 1] <- logFC[i]
tested[i, 2] <- adj.P.Val[i]
}
rownames(tested) <- gene.name
colnames(tested) <- c('logFC', 'adj.P.Val')
EnhancedVolcano(tested, lab = rownames(tested),
x='logFC', y='adj.P.Val', xlim = xlim, ylim = ylim,
pCutoff = pCutoff, FCcutoff = FCcutoff)
}
⑹ gene list로부터 GO (gene ontology)를 구하는 법
GO <- function(gene){
library(EnhancedVolcano)
library(clusterProfiler)
library(org.Hs.eg.db)
library(org.Mm.eg.db)
library(enrichplot)
# ont = "ALL", "BP", "CC", "MF"
# showCategory is not mandatory
gene <- gsub('.*-', '', gene) # -와 그 앞에 있는 것들을 제거 (예 : "GRCh38-")
gene <- gsub('\\ .*$', '', gene) # 'KLK2 ENSG00000167751' 같은 것을 해결
if (gene[1] == toupper(gene[1])){ ## Human gene
gene.df <- bitr(gene, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = org.Hs.eg.db)
gene.df <- as.vector(gene.df[[2]])
GO <- enrichGO(gene.df, OrgDb = 'org.Hs.eg.db',keyType = "ENTREZID", ont = "ALL", pvalueCutoff = 0.05, pAdjustMethod = "BH")
return(GO)
} else{ ## Mouse gene?
gene.df <- bitr(gene, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = org.Mm.eg.db)
gene.df <- as.vector(gene.df[[2]])
GO <- enrichGO(gene.df, OrgDb = 'org.Mm.eg.db',keyType = "ENTREZID", ont = "ALL", pvalueCutoff = 0.05, pAdjustMethod = "BH")
return(GO)
}
}
⑺ gene list로부터 GO (gene ontology) plot을 그리는 법 (▶ GO 플롯을 해석하는 방법)
GO.plot <- function(gene){
library(EnhancedVolcano)
library(clusterProfiler)
library(org.Hs.eg.db)
library(org.Mm.eg.db)
library(enrichplot)
# ont = "ALL", "BP", "CC", "MF"
# showCategory is not mandatory
gene <- gsub('.*-', '', gene) # -와 그 앞에 있는 것들을 제거 (예 : "GRCh38-")
gene <- gsub('\\ .*$', '', gene) # 'KLK2 ENSG00000167751' 같은 것을 해결
if (gene[1] == toupper(gene[1])){ ## Human gene
gene.df <- bitr(gene, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = org.Hs.eg.db)
gene.df <- as.vector(gene.df[[2]])
GO <- enrichGO(gene.df, OrgDb = 'org.Hs.eg.db',keyType = "ENTREZID", ont = "ALL", pvalueCutoff = 0.05, pAdjustMethod = "BH")
dotplot(GO,split="ONTOLOGY", showCategory = 5)+facet_grid(ONTOLOGY~., scale="free")
} else{ ## Mouse gene?
gene.df <- bitr(gene, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = org.Mm.eg.db)
gene.df <- as.vector(gene.df[[2]])
GO <- enrichGO(gene.df, OrgDb = 'org.Mm.eg.db',keyType = "ENTREZID", ont = "ALL", pvalueCutoff = 0.05, pAdjustMethod = "BH")
dotplot(GO,split="ONTOLOGY", showCategory = 5)+facet_grid(ONTOLOGY~., scale="free")
}
}
### Example
GO.plot(c("COL1A1", "COL1A2", "COL3A1", "COL6A3"))
⑻ gene list로부터 cnetplot을 그리는 법
my.cnetplot <- function(gene.list){
GO <- function(gene){
library(EnhancedVolcano)
library(clusterProfiler)
library(org.Hs.eg.db)
library(org.Mm.eg.db)
library(enrichplot)
# ont = "ALL", "BP", "CC", "MF"
# showCategory is not mandatory
gene <- gsub('.*-', '', gene) # -와 그 앞에 있는 것들을 제거 (예 : "GRCh38-")
gene <- gsub('\\ .*$', '', gene) # 'KLK2 ENSG00000167751' 같은 것을 해결
if (gene[1] == toupper(gene[1])){ ## Human gene
gene.df <- bitr(gene, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = org.Hs.eg.db)
gene.df <- as.vector(gene.df[[2]])
GO <- enrichGO(gene.df, OrgDb = 'org.Hs.eg.db',keyType = "ENTREZID", ont = "ALL", pvalueCutoff = 0.05, pAdjustMethod = "BH")
return(GO)
} else{ ## Mouse gene?
gene.df <- bitr(gene, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = org.Mm.eg.db)
gene.df <- as.vector(gene.df[[2]])
GO <- enrichGO(gene.df, OrgDb = 'org.Mm.eg.db',keyType = "ENTREZID", ont = "ALL", pvalueCutoff = 0.05, pAdjustMethod = "BH")
return(GO)
}
}
go = GO(gene.list)
ego_ <- 0
if (gene.list[1] == toupper(gene.list[1])){ ## Human gene
ego_ <- setReadable(go, org.Hs.eg.db, keyType='ENTREZID')
} else { ## Mouse gene?
ego_ <- setReadable(go, org.Mm.eg.db, keyType='ENTREZID')
}
cnetplot(ego_, categorySize="pvalue", foldChange= gene.list )
}
my.cnetplot(c("COL1A1", "COL1A2", "COL3A1", "COL6A3"))
⑼ 통계 분석까지 곁들인 바이올린 플롯
VlnPlot(object = br.sp, features = c('Col1a1'),
group.by = 'orig.ident', pt.size = 0.1) +
facet_grid(.~tnbc.merge@active.ident)+
fill_palette(palette='npg')+
stat_compare_means(method = "anova", label='p')+
theme(axis.text.x = element_text(angle = 90, hjust = 1),
strip.text.x = element_text(size = rel(0.7)))
입력: 2022.05.03 00:13
'▶ 자연과학 > ▷ RStudio' 카테고리의 다른 글
【RStudio】 R에서 Python 실행하기 (2) | 2024.05.03 |
---|---|
【RStudio】 R 주요 트러블슈팅 [21-40] (0) | 2023.06.09 |
【RStudio】 R 주요 트러블슈팅 [01-20] (0) | 2021.12.01 |
【RStudio】 10강. 메모리 관리 (0) | 2020.07.20 |
【RStudio】 9강. ANOVA 분석 (0) | 2019.11.17 |
최근댓글