본문 바로가기

Contact English

【생물정보학】 생물정보학 노트

 

생물정보학 노트

 

추천글 : 【생물정보학】 생물정보학 분석 목차 


 

1. 데이터 증가 속도

 

data phase astronomy Twitter YouTube genomics
acquisition 25 zetta-bytes/yr 0.5-15 billion tweets/yr 500-900 million hrs/yr 1 zetta-bases/yr
storage 1 EB/yr 1-17 PB/yr 1-2 EB/yr 2-40 EB/yr
analysis in situ data reduction topic and sentiment mining limited requirements heterogeneous data and analysis
  real-time processing metadata analysis   variant calling, ~2 trillion central processing unit (CPU) hours
  massive volumes      
distribution dedicated lines from antennae to server (600 TB/s) small units of distribution major component of modern user's bandwidth (10 MB/s) many small (10 MB/s) and fewer massive (10 TB/s) data movement

Table. 1. 데이터 증가 속도 (ref)

 

 

2. deep learning methods to prioritize noncoding variants  

GWAVA(genome-wide annotation of variants)

DeepSea : predicting effects of noncoding variants with deep learning-based sequence model 

DanQ : a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences 

DeepFun : predicting regulatory variants using a dense epigenomic mapped CNN model elucidated the molecular basis of trait-tissue associations

⑸ DeepC : predicting 3D genome folding megabase-scale transfer learning (Nature Methods 17:1118-1124 (2020))

⑹ Akita : predicting 3D genome folding from DNA sequence (Nature Methods 17:1111-1117(2020))

 

 

3. gnomAD

⑴ 개요

gnomAD data is available for download through Google Cloud Public Datasets, the Registry of Open Data on AWS, and Azure Open Datasets. We recommended using Hail and our Hail utilities for gnomAD to work with the data. In addition to the files listed below, Terra has a demo workspace for working gnomAD data.

⑵ Google Cloud Public Datasets

Files can be browsed and downloaded using gsuitl.

 

$ gsutil ls gs://gcp-public-data--gnomad/release/

 

gnomAD variants are also available as a BigQuery dataset

⑶ Registry of Open Data on AWS

Files can be browsed and downloaded using the AWS Command Line Interface.

 

$ aws s3 ls s3://gnomad-public-us-east-1/release/

⑷ Azure Open Datasets 

Files can be browsed and downloaded using AzCopy or Azure Storage Explorer.

 

$ azcopy ls https://datasetgnomad.blob.core.windows.net/dataset/  

 

입력: 2022.02.21 12:51

수정: 2024.10.24 22:06