생물정보학 노트
추천글 : 【생물정보학】 생물정보학 분석 목차
1. 데이터 증가 속도
data phase | astronomy | YouTube | genomics | |
acquisition | 25 zetta-bytes/yr | 0.5-15 billion tweets/yr | 500-900 million hrs/yr | 1 zetta-bases/yr |
storage | 1 EB/yr | 1-17 PB/yr | 1-2 EB/yr | 2-40 EB/yr |
analysis | in situ data reduction | topic and sentiment mining | limited requirements | heterogeneous data and analysis |
real-time processing | metadata analysis | variant calling, ~2 trillion central processing unit (CPU) hours | ||
massive volumes | ||||
distribution | dedicated lines from antennae to server (600 TB/s) | small units of distribution | major component of modern user's bandwidth (10 MB/s) | many small (10 MB/s) and fewer massive (10 TB/s) data movement |
Table. 1. 데이터 증가 속도 (ref)
2. deep learning methods to prioritize noncoding variants
⑴ GWAVA(genome-wide annotation of variants)
⑵ DeepSea : predicting effects of noncoding variants with deep learning-based sequence model
⑶ DanQ : a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences
⑷ DeepFun : predicting regulatory variants using a dense epigenomic mapped CNN model elucidated the molecular basis of trait-tissue associations
⑸ DeepC : predicting 3D genome folding megabase-scale transfer learning (Nature Methods 17:1118-1124 (2020))
⑹ Akita : predicting 3D genome folding from DNA sequence (Nature Methods 17:1111-1117(2020))
3. gnomAD
⑴ 개요
gnomAD data is available for download through Google Cloud Public Datasets, the Registry of Open Data on AWS, and Azure Open Datasets. We recommended using Hail and our Hail utilities for gnomAD to work with the data. In addition to the files listed below, Terra has a demo workspace for working gnomAD data.
⑵ Google Cloud Public Datasets
Files can be browsed and downloaded using gsuitl.
$ gsutil ls gs://gcp-public-data--gnomad/release/
gnomAD variants are also available as a BigQuery dataset
⑶ Registry of Open Data on AWS
Files can be browsed and downloaded using the AWS Command Line Interface.
$ aws s3 ls s3://gnomad-public-us-east-1/release/
⑷ Azure Open Datasets
Files can be browsed and downloaded using AzCopy or Azure Storage Explorer.
$ azcopy ls https://datasetgnomad.blob.core.windows.net/dataset/
입력: 2022.02.21 12:51
수정: 2024.10.24 22:06
'▶ 자연과학 > ▷ 생물정보학' 카테고리의 다른 글
【생물정보학】 Cell Type Classification Pipeline (0) | 2019.11.22 |
---|---|
【생물정보학】 TCGA DATA 얻는 법 (2) | 2019.08.26 |
【생물정보학】 리간드-수용체 상호작용 분석 (0) | 2016.06.27 |
【생물정보학】 세포주 (셀라인) 라이브러리 (0) | 2016.06.27 |
【생물정보학】 생물도감 (0) | 2016.06.24 |
최근댓글