【RStudio】 R 주요 트러블슈팅 [21-40]

R 주요 트러블슈팅 [21-40]

21. Error: could not find function "%>%"

⑴ (package) 해결방법

install.packages("magrittr") 
install.packages("dplyr")
library(magrittr)
library(dplyr)

⑵ 레퍼런스

22. Error in is.empty(.) : could not find function "is.empty"

⑴ (package) 해결방법

install.packages('rapportools')
library(rapportools)

23. Error in fill_palette(palette = "npg") : could not find function "fill_palette"

⑴ (package) 해결방법

install.packages('ggpubr')
library(ggpubr)

24. Error: package or namespace load failed for ‘Seurat’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/opt/conda/lib/R/library/igraph/libs/igraph.so': libglpk.so.40: cannot open shared object file: No such file or directory

⑴ (package) 원인 : Ubuntu 등의 Linux 서버에 RStudio 서버를 구축하고 Seurat 패키지를 다운받을 때 이 문제가 발생함

⑵ (package) 해결방법

## Linux Ubuntu
sudo apt update
sudo apt install libglpk40


## RStudio
remove.packages('rlang')
install.packages('rlang')
install.packages('lifecycle')

⑶ 세부 설명

① sudo apt update : "E: Unable to locate package" 문제를 해결하기 위함 (ref)

② sudo apt install libglpk40 : libglpk.so.40이 없으니까 이를 설치 (ref)

③ remove.packages('rlang'), install.packages('rlang'), install.packages('lifecycle') : "Error: package or namespace load failed for ‘Seurat’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): namespace ‘lifecycle’ 1.0.2 is already loaded, but >= 1.0.3 is required" 문제를 해결하기 위함 (ref)

25. The previous R session was abnormally terminated due to an unexpected crash. You may have lost workspace data as a result of this crash. RStudio may not have restored the previously active project as a precaution. You may switch back to it using the Projects menu.

⑴ (system) 원인 : 리소스 부족

26. Error in if (all(cur_arg < 0)) { : missing value where TRUE/FALSE needed

⑴ (grammar) 문제 상황 : SaveH5Seurat(object, filename = "myData.h5Seurat")를 한 이후, Convert("myData.h5Seurat", dest = "h5ad")를 하는 상황에서 해당 오류 메시지가 뜸

⑵ (grammar) 원인 : NM-001001130.3 같은 rowname을 갖는 Seurat object가 주어져 있을 때 object@assays$RNA@data 및 object@assays$RNA@scale.data에 접근하여 NM_001001130.3 같은 rowname을 갖는 매트릭스를 강제로 입력하려고 했을 때, 이 문제가 생김

⑶ (grammar) 해결방법 : NM_001001130.3을 NM-001001130.3으로 바꿀 수 있도록 gsub("_", "-", str)를 사용

27. Error: package or namespace load failed for 'hdf5r' in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/opt/conda/lib/R/library/hdf5r/libs/hdf5r.so': libhdf5_serial_hl.so.100: cannot open shared object file: No such file or directory

⑴ (package) 해결방법

sudo apt-get update
sudo apt-get install libhdf5-dev
sudo apt-get install libhdf5-serial-dev

⑵ 레퍼런스

h5py import error on libhdf5_serial.so.100

I have installed raspbian os on raspberry pi 3 model b. I have to perform a project which involves use of h5py. The os already came preinstalled with python 2.7 and 3.5 With the help of pip, I

stackoverflow.com

28. Error in data %>% gather(CellType, Proportion, -SampleID) : could not find function "%>%"

⑴ (package) 해결방법 1

# If you don't have tidyverse installed, install it first:
# install.packages("tidyverse")
 
# Then, load it
library(tidyverse)

⑵ (package) 해결방법 2 (ref)

library(tidyr)

29. Error in .subscript.2ary(x, i, j, drop = TRUE) : subscript out of bounds

⑴ (package) 문제상황 : br.sp_subset <- subset(br.sp, features = top_genes[valid_genes])

⑵ (package) 해결방법 1. Seurat 버전을 5에서 4로 downgrade

devtools::install_github("satijalab/seurat", ref = "v4.3.0")

⑶ (package) 해결방법 2. Seurat v5는 subset의 문법이 바뀜 (ref)

# Seurat v4
br.sp_subset <- subset(br.sp, features = top_genes)

# Seurat v5
br.sp[["RNAsub"]] <- subset(br.sp[["RNA"]], features = top_genes)
DefaultAssay(br.sp) <- "RNAsub"

30. Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE, : You're computing too large a percentage of total singular values, use a standard svd instead. Error in irlba::irlba(Matrix::t(preproc_res), nv = min(num_dim, min(dim(FM)) - : function 'as_cholmod_sparse' not provided by package 'Matrix'

⑴ (package) 문제상황 : cds1 <- preprocess_cds(cds1, method='LSI')

⑵ (package) 해결방법 : preprocess_cds를 새로 정의 (ref). 기타 필요한 함수는 ref1, ref2, ref3, ref4에서 정의

preprocess_cds <- function(cds,
                           method = c('PCA', "LSI"),
                           num_dim = 50,
                           norm_method = c("log", "size_only", "none"),
                           use_genes = NULL,
                           pseudo_count = NULL,
                           scaling = TRUE,
                           verbose = FALSE,
                           build_nn_index = FALSE,
                           nn_control = list()) {

  assertthat::assert_that(
    tryCatch(expr = ifelse(match.arg(method) == "",TRUE, TRUE),
             error = function(e) FALSE),
    msg = "method must be one of 'PCA' or 'LSI'")
  method <- match.arg(method)

  assertthat::assert_that(
    tryCatch(expr = ifelse(match.arg(norm_method) == "",TRUE, TRUE),
             error = function(e) FALSE),
    msg = "norm_method must be one of 'log', 'size_only' or 'none'")
  norm_method <- match.arg(norm_method)

  assertthat::assert_that(assertthat::is.count(num_dim))

  if(!is.null(use_genes)) {
    assertthat::assert_that(is.character(use_genes))
    assertthat::assert_that(all(use_genes %in% row.names(rowData(cds))),
                            msg = paste("use_genes must be NULL, or all must",
                            "be present in the row.names of rowData(cds)"))
  }
  assertthat::assert_that(!is.null(size_factors(cds)),
             msg = paste("You must call estimate_size_factors before calling",
                         "preprocess_cds."))
  assertthat::assert_that(sum(is.na(size_factors(cds))) == 0,
                          msg = paste("One or more cells has a size factor of",
                                      "NA."))

  if(build_nn_index) {
    nn_control <- set_nn_control(mode=1,
                                 nn_control=nn_control,
                                 nn_control_default=get_global_variable('nn_control_annoy_cosine'),
                                 nn_index=NULL,
                                 k=NULL,
                                 verbose=verbose)
  }

  #ensure results from RNG sensitive algorithms are the same on all calls
  set.seed(2016)
  FM <- normalize_expr_data(cds, norm_method, pseudo_count)

  if (nrow(FM) == 0) {
    stop("all rows have standard deviation zero")
  }

  if (!is.null(use_genes)) {
    FM <- FM[use_genes, ]
  }

  fm_rowsums = Matrix::rowSums(FM)
  FM <- FM[is.finite(fm_rowsums) & fm_rowsums != 0, ]

  #
  # Notes:
  #   o  the functions save_transform_models/load_transform_models
  #      expect that the reduce_dim_aux slot consists of a S4Vectors::SimpleList
  #      that stores information about methods with the elements
  #        reduce_dim_aux[[method]][['model']] for the transform elements
  #        reduce_dim_aux[[method]][[nn_method]] for the nn index
  #      and depends on the elements within model and nn_method.
  #
  if(method == 'PCA') {
    cds <- initialize_reduce_dim_metadata(cds, 'PCA')
    cds <- initialize_reduce_dim_model_identity(cds, 'PCA')

    if (verbose) message("Remove noise by PCA ...")

    # Initialize variables
    preproc_res <- NULL
    rotation_matrix <- NULL
    sdev_values <- NULL

    # Determine the dimension of the feature matrix
    dim_FM <- min(dim(FM)) - 1

    # Use irlba if the number of dimensions is less than 20% of the matrix dimension, else use svd
    if (num_dim <= 0.2 * dim_FM) {
        irlba_res <- irlba::irlba(Matrix::t(FM), n = min(num_dim, dim_FM), center = scaling, scale. = scaling)
        preproc_res <- irlba_res$x
        irlba_rotation <- irlba_res$rotation
				rotation_matrix <- irlba_res$rotation
        sdev_values <- irlba_res$sdev
    } else {
        svd_res <- svd(Matrix::t(FM))
				irlba_res = svd_res
        preproc_res <- svd_res$u[, 1:num_dim] %*% diag(svd_res$d[1:num_dim])
				irlba_rotation <- svd_res$v[, 1:num_dim]
        rotation_matrix <- svd_res$v[, 1:num_dim]
        sdev_values <- svd_res$d[1:num_dim]
    }

    row.names(preproc_res) <- colnames(cds)
    SingleCellExperiment::reducedDims(cds)[[method]] <- as.matrix(preproc_res)

    row.names(rotation_matrix) <- rownames(FM)


    # we need svd_v downstream so
    # calculate gene_loadings in cluster_cells.R
    cds@reduce_dim_aux[['PCA']][['model']][['num_dim']] <- num_dim
    cds@reduce_dim_aux[['PCA']][['model']][['norm_method']] <- norm_method
    cds@reduce_dim_aux[['PCA']][['model']][['use_genes']] <- use_genes
    cds@reduce_dim_aux[['PCA']][['model']][['pseudo_count']] <- pseudo_count
    cds@reduce_dim_aux[['PCA']][['model']][['svd_v']] <- irlba_rotation
    cds@reduce_dim_aux[['PCA']][['model']][['svd_sdev']] <- irlba_res$sdev
    cds@reduce_dim_aux[['PCA']][['model']][['svd_center']] <- irlba_res$center
    cds@reduce_dim_aux[['PCA']][['model']][['svd_scale']] <- irlba_res$svd_scale
    # Note that prop_var_expl is the fraction of variance explained by the retained
    # PCs, not the fraction of total variance.
    cds@reduce_dim_aux[['PCA']][['model']][['prop_var_expl']] <- irlba_res$sdev^2 / sum(irlba_res$sdev^2)

    matrix_id <- get_unique_id(SingleCellExperiment::reducedDims(cds)[['PCA']])
    counts_identity <- get_counts_identity(cds)

    cds <- set_reduce_dim_matrix_identity(cds, 'PCA',
                                          'matrix:PCA',
                                          matrix_id,
                                          counts_identity[['matrix_type']],
                                          counts_identity[['matrix_id']],
                                          'matrix:PCA',
                                          matrix_id)
    cds <- set_reduce_dim_model_identity(cds, 'PCA',
                                         'matrix:PCA',
                                         matrix_id,
                                         'none',
                                         'none')

    if( build_nn_index ) {
      nn_index <- make_nn_index(subject_matrix=SingleCellExperiment::reducedDims(cds)[[method]], nn_control=nn_control, verbose=verbose)
      cds <- set_cds_nn_index(cds=cds, reduction_method=method, nn_index=nn_index, verbose=verbose)
    }
    else
      cds <- clear_cds_nn_index(cds=cds, reduction_method=method, nn_method='all')

  } else if(method == "LSI") {
    cds <- initialize_reduce_dim_metadata(cds, 'LSI')
    cds <- initialize_reduce_dim_model_identity(cds, 'LSI')

#    preproc_res <- tfidf(FM)
    tfidf_res <- tfidf(FM)
    preproc_res <- tfidf_res[['tf_idf_counts']]

    num_col <- ncol(preproc_res)
    irlba_res <- irlba::irlba(Matrix::t(preproc_res),
                              nv = min(num_dim,min(dim(FM)) - 1))

    preproc_res <- irlba_res$u %*% diag(irlba_res$d)
    row.names(preproc_res) <- colnames(cds)
    SingleCellExperiment::reducedDims(cds)[[method]] <- as.matrix(preproc_res)

    irlba_rotation = irlba_res$v
    row.names(irlba_rotation) = rownames(FM)
    cds@reduce_dim_aux[['LSI']][['model']][['num_dim']] <- num_dim
    cds@reduce_dim_aux[['LSI']][['model']][['norm_method']] <- norm_method
    cds@reduce_dim_aux[['LSI']][['model']][['use_genes']] <- use_genes
    cds@reduce_dim_aux[['LSI']][['model']][['pseudo_count']] <- pseudo_count
    cds@reduce_dim_aux[['LSI']][['model']][['log_scale_tf']] <- tfidf_res[['log_scale_tf']]
    cds@reduce_dim_aux[['LSI']][['model']][['frequencies']] <- tfidf_res[['frequencies']]
    cds@reduce_dim_aux[['LSI']][['model']][['scale_factor']] <- tfidf_res[['scale_factor']]
    cds@reduce_dim_aux[['LSI']][['model']][['col_sums']] <- tfidf_res[['col_sums']]
    cds@reduce_dim_aux[['LSI']][['model']][['row_sums']] <- tfidf_res[['row_sums']]
    cds@reduce_dim_aux[['LSI']][['model']][['num_cols']] <- tfidf_res[['num_cols']]
    cds@reduce_dim_aux[['LSI']][['model']][['svd_v']] <- irlba_rotation
    cds@reduce_dim_aux[['LSI']][['model']][['svd_sdev']] <- irlba_res$d/sqrt(max(1, num_col - 1))

    # we need svd_v downstream so
    # calculate gene_loadings in cluster_cells.R

    matrix_id <- get_unique_id(SingleCellExperiment::reducedDims(cds)[['LSI']])
    counts_identity <- get_counts_identity(cds)

    cds <- set_reduce_dim_matrix_identity(cds, 'LSI',
                                          'matrix:LSI',
                                          matrix_id,
                                          counts_identity[['matrix_type']],
                                          counts_identity[['matrix_id']],
                                          'matrix:LSI',
                                          matrix_id)
    cds <- set_reduce_dim_model_identity(cds, 'LSI',
                                         'matrix:LSI',
                                         matrix_id,
                                         'none',
                                         'none')

    if( build_nn_index ) {
      nn_index <- make_nn_index(subject_matrix=SingleCellExperiment::reducedDims(cds)[[method]], nn_control=nn_control, verbose=verbose)
      cds <- set_cds_nn_index(cds=cds, reduction_method=method, nn_index=nn_index, verbose=verbose)
    }
    else
      cds <- clear_cds_nn_index(cds=cds, reduction_method=method, nn_method='all')
  }

  if(!is.null(cds@reduce_dim_aux[['Aligned']]) && !is.null(cds@reduce_dim_aux[['Aligned']][['model']][['beta']])) {
    cds@reduce_dim_aux[['Aligned']][['model']][['beta']] <- NULL
  }

  cds
}


# Helper function to normalize the expression data prior to dimensionality
# reduction
normalize_expr_data <- function(cds,
                                norm_method = c("log", "size_only", "none"),
                                pseudo_count = NULL) {
  norm_method <- match.arg(norm_method)

  FM <- SingleCellExperiment::counts(cds)

  # If we're going to be using log, and the user hasn't given us a
  # pseudocount set it to 1 by default.
  if (is.null(pseudo_count)){
    if(norm_method == "log")
      pseudo_count <- 1
    else
      pseudo_count <- 0
  }

  if (norm_method == "log") {
    # If we are using log, normalize by size factor before log-transforming

    FM <- Matrix::t(Matrix::t(FM)/size_factors(cds))

    if (pseudo_count != 1 || is_sparse_matrix(SingleCellExperiment::counts(cds)) == FALSE){
      FM <- FM + pseudo_count
      FM <- log2(FM)
    } else {
      FM@x = log2(FM@x + 1)
    }

  } else if (norm_method == "size_only") {
    FM <- Matrix::t(Matrix::t(FM)/size_factors(cds))
    FM <- FM + pseudo_count
  }
  return (FM)
}

# Andrew's tfidf
tfidf <- function(count_matrix, frequencies=TRUE, log_scale_tf=TRUE,
                  scale_factor=100000, block_size=2000e6) {
  # Use either raw counts or divide by total counts in each cell
  if (frequencies) {
    # "term frequency" method
    col_sums <- Matrix::colSums(count_matrix)
    tf <- Matrix::t(Matrix::t(count_matrix) / col_sums)
  } else {
    # "raw count" method
    col_sums <- NA
    tf <- count_matrix
  }

  # Either TF method can optionally be log scaled
  if (log_scale_tf) {
    if (frequencies) {
      tf@x = log1p(tf@x * scale_factor)
    } else {
      tf@x = log1p(tf@x * 1)
    }
  }

  # IDF w/ "inverse document frequency smooth" method
  num_cols <- ncol(count_matrix)
  row_sums <- Matrix::rowSums(count_matrix > 0)
  idf = log(1 + num_cols / row_sums)

  # Try to just to the multiplication and fall back on delayed array
  # TODO hopefully this actually falls back and not get jobs killed in SGE
  tf_idf_counts = tryCatch({
    tf_idf_counts = tf * idf
    tf_idf_counts
  }, error = function(e) {
    print(paste("TF*IDF multiplication too large for in-memory, falling back",
                "on DelayedArray."))
    options(DelayedArray.block.size=block_size)
    DelayedArray:::set_verbose_block_processing(TRUE)

    tf = DelayedArray::DelayedArray(tf)
    idf = as.matrix(idf)

    tf_idf_counts = tf * idf
    tf_idf_counts
  })

  rownames(tf_idf_counts) = rownames(count_matrix)
  colnames(tf_idf_counts) = colnames(count_matrix)
  tf_idf_counts = methods::as(tf_idf_counts, "sparseMatrix")
  return(list(tf_idf_counts=tf_idf_counts, frequencies=frequencies, log_scale_tf=log_scale_tf, scale_factor=scale_factor, col_sums=col_sums, row_sums=row_sums, num_cols=num_cols))
}

① 위 함수들을 다음과 같이 간단하게 불러올 수 있음

source("https://github.com/JB243/nate9389/blob/main/RStudio/preprocess_cds_and_as_cholmod_sparse.R?raw=true")

31. Warning: No layers found matching search pattern provided Error in FetchData.Assay5(object = object[[DefaultAssay(object = object)]], : layer "data" is not found in the object

⑴ (grammar) 원인 : Seurat ver. 5의 경우 별도의 normalization을 해야 SpatialFeaturePlot에 에러가 나지 않음

⑵ (grammar) 해결방법

# before
library(Seurat)
br.sp = Load10X_Spatial('~/Downloads/sample', slice= 'slice1')
SpatialFeaturePlot(br.sp, "Slc2a1") # error occurs

# after
library(Seurat)
br.sp = Load10X_Spatial('~/Downloads/sample', slice= 'slice1')
br.sp <- SCTransform(br.sp, assay = "Spatial", verbose = FALSE, variable.features.n = 1000)
SpatialFeaturePlot(br.sp, "Slc2a1")

32. Error in fill_alpha(data$fill %||% "black", data$alpha): could not find function “fill_alpha"

⑴ (package) 문제상황 : SpatialDimPlot(tnbc.merge, label = TRUE, label.size = 3)를 할 때 에러가 발생

⑵ (package) 해결방법 : ggplot2 3.4.4을 ggplot2 3.5.0으로 업데이트

⑶ 레퍼런스

Show error message : "fill_alpha" can't find

When run the code, below error message popped. Anyone can help ? Error Message: Error in fill_alpha(data$fill %||% "grey20", data$alpha) : could not find function "fill_alpha"...

stackoverflow.com

33. data layers are not joined. Please run JoinLayers

⑴ (grammar) 문제상황 : FindAllMarkers(tnbc.merge, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)

⑵ (grammar) 해결방법 : tnbc.merge = JoinLayers(tnbc.merge)을 먼저 한 다음에 위 코드를 실행

⑶ 레퍼런스

Integrative analysis in Seurat v5

Seurat

satijalab.org

입력: 2023.06.09 15:38

수정: 2023.12.27 12:29

저작자표시

'▶ 자연과학 > ▷ RStudio' 카테고리의 다른 글

【RStudio】 R에서 Python 실행하기 (2)	2024.05.03
【생물정보학】 R에서 유용한 주요 함수 모음 (1)	2022.11.27
【RStudio】 R 주요 트러블슈팅 [01-20] (0)	2021.12.01
【RStudio】 10강. 메모리 관리 (0)	2020.07.20
【RStudio】 9강. ANOVA 분석 (0)	2019.11.17

정빈이의 공부방

최근댓글

【RStudio】 R 주요 트러블슈팅 [21-40]

'▶ 자연과학 > ▷ RStudio' 카테고리의 다른 글

티스토리툴바

【RStudio】 R 주요 트러블슈팅 [21-40]

'▶ 자연과학 > ▷ RStudio' 카테고리의 다른 글

'▶ 자연과학/▷ RStudio' 관련 포스팅

티스토리툴바