본문 바로가기

Contact 日本語 English

【화학정보학】 유기화학 관련 파이썬 함수 모음

 

유기화학 관련 파이썬 함수 모음 (화학정보학, chemoinformatics; structural bioinformatics)

 

추천글 : 【유기화학】 유기화학 목차, 【Python】 파이썬 유용 함수 모음 


1. 명명법 [본문]

2. 드로잉 [본문]

3. 아미노산 [본문]

4. 분광학 [본문]


a. 생물정보학 


 

1. 명명법 [목차]

SMILES(simplified molecular-input line entry system) : 짧은 ASCII 문자열 표현

이중결합은 '='로, 삼중결합은 '#'로 나타냄

고리형 화합물의 경우, 1, 2, ···과 같은 숫자를 부여하여, 선형 분자의 끝과 끝이 연결돼 고리를 이루는 것을 나타냄

 : CN1C=NC2=C1C(=O)N(C(=O)N2C)C

③ C는 일반 탄소를 나타내는 반면, c는 방향족 탄소를 나타냄

○ C1CCCCC1 : cyclohexane

○ c1ccccc1 : benzene

⑵ 유기화합물을 SMILES로 바꾸는 코드 

 

rdkit.Chem.MolToSmiles(Chem.MolFromFASTA(sequence, flavor = 1))

"""
flavor: (optional)
0 Protein, L amino acids (default)
1 Protein, D amino acids
2 RNA, no cap
3 RNA, 5’ cap
4 RNA, 3’ cap
5 RNA, both caps
6 DNA, no cap
7 DNA, 5’ cap
8 DNA, 3’ cap
9 DNA, both caps
"""

 

⑶ SMILES, IUPAC 상호 변환 코드 : transformer 활용

 

# reference: https://github.com/Kohulan/Smiles-TO-iUpac-Translator

! pip install STOUT-pypi
! pip install git+https://github.com/Kohulan/Smiles-TO-iUpac-Translator.git

from STOUT import translate_forward, translate_reverse

# SMILES to IUPAC name translation

SMILES = "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"
IUPAC_name = translate_forward(SMILES)
print("IUPAC name of "+SMILES+" is: "+IUPAC_name)

# IUPAC name to SMILES translation

IUPAC_name = "1,3,7-trimethylpurine-2,6-dione"
SMILES = translate_reverse(IUPAC_name)
print("SMILES of "+IUPAC_name+" is: "+SMILES)

 

⑷ 임의의 화학식이 주어져 있을 때 IUPAC 명명법을 알아내는 방법 

단계 1. SMILES-to-drawing 함수를 여러 번 시도하여 주어진 화합물을 나타내는 SMILES 코드 찾아내기

단계 2. SMILES-to-IUPAC 함수를 실행시켜 최종 명명법 구하기

 

 

⑸ SMILES로부터 분자량을 구하는 코드 

 

from rdkit import Chem
from rdkit.Chem import Descriptors

def calculate_molecular_weight(smiles):
    molecule = Chem.MolFromSmiles(smiles)
    return Descriptors.ExactMolWt(molecule)

# Example usage
smiles_code = "C1=CC=C(C=C1)O"  # SMILES code for phenol
molecular_weight = calculate_molecular_weight(smiles_code)
print(f"Molecular Weight: {molecular_weight}")

 

⑹ SMILES로부터 방향족성을 판단하는 코드 

 

from rdkit import Chem

def is_aromatic(smiles):
    molecule = Chem.MolFromSmiles(smiles)
    if molecule is None:
        return False
    return any(atom.GetIsAromatic() for atom in molecule.GetAtoms())

# Example usage
cyclohexane = "C1CCCCC1"  # cyclohexane
benzene = "c1ccccc1"  # benzene
imidazole = "C1=CN=CN1" # 1H-imidazole

print(f"The {cyclohexane} is {'aromatic' if is_aromatic(cyclohexane) else 'not aromatic'}")
print(f"The {benzene} is {'aromatic' if is_aromatic(benzene) else 'not aromatic'}")
print(f"The {imidazole} is {'aromatic' if is_aromatic(imidazole) else 'not aromatic'}")

 

⑺ SMILES로부터 쌍극자 모멘트를 계산하는 코드 

 

# conda install -c psi4 psi4

import psi4
from rdkit import Chem
from rdkit.Chem import AllChem
import numpy as np

def calculate_dipole_moment(smiles):
    # Convert SMILES to molecule
    mol = Chem.MolFromSmiles(smiles)

    # Add Hydrogens
    mol = Chem.AddHs(mol)

    # Generate 3D coordinates
    AllChem.EmbedMolecule(mol, AllChem.ETKDG())

    # Extract coordinates
    conf = mol.GetConformer()
    xyz = ''
    for atom in mol.GetAtoms():
        pos = conf.GetAtomPosition(atom.GetIdx())
        xyz += f"{atom.GetSymbol()} {pos.x} {pos.y} {pos.z}\n"

    # Set up Psi4
    psi4.set_memory('500 MB')
    psi4.set_options({'basis': 'sto-3g'})

    # Calculate dipole moment using Psi4
    psi4_mol = psi4.geometry(xyz)
    psi4.energy('scf')
    dipole_moment = psi4.variable('SCF DIPOLE')

    return dipole_moment

### Example usage
smiles_code = "CCO"  # Example SMILES code for ethanol
dipole_moment = calculate_dipole_moment(smiles_code)
print(f"Dipole Moment: {dipole_moment} Debye")
# Dipole Moment: [ 0.04250251  0.20600936 -0.52850913] Debye

dipole_vector = np.array([0.04250251, 0.20600936, -0.52850913])
magnitude = np.linalg.norm(dipole_vector)
print(f"Magnitude of Dipole Moment: {magnitude} Debye")
# Magnitude of Dipole Moment: 0.5688305725409514 Debye

 

⑻ SMILES로부터 끓는점(bp), 녹는점(mp), 임계온도를 구하는 함수

 

# reference: https://thermo.readthedocs.io/thermo.chemical.html

from thermo.chemical import Chemical

N2 = Chemical('Nitrogen')
print(N2.Tm, N2.Tb, N2.Tc) # melting, boiling, and critical points [K]
## 63.15 77.3549950205 126.192

molecule = Chemical('CC(C)C')
print(molecule.Tm, molecule.Tb, molecule.Tc) # melting, boiling, and critical points [K]
## 124.2 261.401014643 407.81

molecule_ = Chemical('2-methylpropane')
print(molecule_.Tm, molecule_.Tb, molecule_.Tc) # melting, boiling, and critical points [K]
## 124.2 261.401014643 407.81

 

① search-based로 작동하며 모든 화합물이 대상이 되지 않음 

이를 개선하기 위해 다양한 머신러닝 모델이 도입되고 있음 

⑼ SMILES로부터 ADMET(흡수, 분포, 대사, 배출, 독성)를 예측하는 모델 : ADMET-AI 

 

 

2. 드로잉 [목차]

 유기화합물 분자식 그리기 (예 : peractic acid) 

 

from rdkit import Chem
from rdkit.Chem import Draw

# Define the SMILES strings for peractic acid
smiles = 'C=CC(=O)O'

# Convert the SMILES strings to RDKit molecule objects
molecule = Chem.MolFromSmiles(smiles)

# Draw the molecules without saving
Draw.MolToImage(molecule)

# Draw the molecules with saving
Draw.MolToFile(molecule, 'peractic_acid.png')

⑵ 유기화합물 electron density map 그리기 (ver. 1) (예 : peractic acid)

 

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import SimilarityMaps

# Generate a 3D structure
molecule = Chem.MolFromSmiles('CC(=O)OO')
molecule_3d = Chem.AddHs(molecule)
AllChem.EmbedMolecule(molecule_3d, AllChem.ETKDG())
AllChem.MMFFOptimizeMolecule(molecule_3d)

# Calculate Gasteiger charges
AllChem.ComputeGasteigerCharges(molecule_3d)

# Function to get atom charges
def GetAtomCharges(mol):
    charges = [float(mol.GetAtomWithIdx(i).GetProp('_GasteigerCharge')) for i in range(mol.GetNumAtoms())]
    return charges

# Draw electrostatic potential map
fig = SimilarityMaps.GetSimilarityMapFromWeights(molecule_3d, GetAtomCharges(molecule_3d), colorMap='jet', contourLines=10)

 

 

⑶ 유기화합물 electron density map 그리기 (ver. 2) (예 : peractic acid)

 

# Refernece: https://rdkit.readthedocs.io/en/latest/Cookbook.html


## STEP 1. make a random forest model

from rdkit import Chem, DataStructs
from rdkit.Chem import AllChem
from sklearn.ensemble import RandomForestClassifier
import numpy

# generate four molecules
m1 = Chem.MolFromSmiles('c1ccccc1')
m2 = Chem.MolFromSmiles('c1ccccc1CC')
m3 = Chem.MolFromSmiles('c1ccncc1')
m4 = Chem.MolFromSmiles('c1ccncc1CC')
mols = [m1, m2, m3, m4]

# generate fingeprints: Morgan fingerprint with radius 2
fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols]

# convert the RDKit explicit vectors into numpy arrays
np_fps = []
for fp in fps:
  arr = numpy.zeros((1,))
  DataStructs.ConvertToNumpyArray(fp, arr)
  np_fps.append(arr)

# get a random forest classifiert with 100 trees
rf = RandomForestClassifier(n_estimators=100, random_state=1123)

# train the random forest
# with the first two molecules being actives (class 1) and
# the last two being inactives (class 0)
ys_fit = [1, 1, 0, 0]
rf.fit(np_fps, ys_fit)

# use the random forest to predict a new molecule
m5 = Chem.MolFromSmiles('c1ccccc1O')
fp = numpy.zeros((1,))
DataStructs.ConvertToNumpyArray(AllChem.GetMorganFingerprintAsBitVect(m5, 2), fp)

print(rf.predict((fp,)))
print(rf.predict_proba((fp,)))



## STEP 2. run the random forest model for the input

from rdkit.Chem.Draw import SimilarityMaps

# helper function
def getProba(fp, predictionFunction):
  return predictionFunction((fp,))[0][1]

m5 = Chem.MolFromSmiles('CC(=O)OO')
fig, maxweight = SimilarityMaps.GetSimilarityMapForModel(m5, SimilarityMaps.GetMorganFingerprint, lambda x: getProba(x, rf.predict_proba))

⑷ 유기화합물 3차원 분자식 그리기

 

from rdkit import Chem
from rdkit.Chem import AllChem
import py3Dmol

def draw_3d_molecule(smiles):
    # Convert SMILES to RDKit molecule
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        print("Invalid SMILES code.")
        return
    
    # Generate 3D coordinates for the molecule
    mol = Chem.AddHs(mol)  # Add hydrogens
    AllChem.EmbedMolecule(mol, AllChem.ETKDG())  # Embed molecule in 3D space
    
    # Convert RDKit molecule to 3Dmol.js viewable format
    mb = Chem.MolToMolBlock(mol)
    
    # Visualization with Py3Dmol
    viewer = py3Dmol.view(width=400, height=300)
    viewer.addModel(mb, 'mol')
    viewer.setStyle({'stick': {}})
    viewer.zoomTo()
    
    return viewer.show()

# Example usage
smiles_code = "CCO"  # Ethanol
draw_3d_molecule(smiles_code)

 

 

검토 1. 메탄올은 입체 장애를 최소화하기 위해 두 메틸기가 staggered 배향인 반면, 에탄올은 분자 내 수소결합에 의해 그렇지 않음

 

 

검토 2. 다음과 같은 콘쥬게이트 고리 화합물에서도 구조가 잘 구현됨

 

 

 

Figure. 1. cyclooctatetraene에 적용한 예시

 

③ 이렇게 머신러닝 모델만으로 새로운 지식을 만들 수 있는 게 chemoinformatics의 매력

 

 

3. 아미노산 [목차]

염기서열을 아미노산 서열로 바꾸는 함수

 

from Bio.Seq import Seq

# Assuming 'sequence' is your string of nucleotides:
sequence = "ACTCATTCTCCCCAGACGCCAAGGATGGTGGTCATGGCGCCCCGAACCCTCTTCCTGCTGCTCTCGGGGGCCCTGACCCTGACCGAGACCTGGGCGG"  # truncated for brevity

# Create a sequence object
seq_obj = Seq(sequence)

# Translate the sequence
amino_acid_sequence = seq_obj.translate(to_stop=True)

# Print the amino acid sequence
print(amino_acid_sequence)

 

⑵ AlphaFold2를 이용하여 아미노산 서열로부터 아미노산 구조에 관한 PBD 파일을 생성하는 함수 (추후 업데이트)

생체고분자 라이브러리 

 

 

4. 분광학 [목차]

SMILES-to-IR (추후 업데이트)

⑵ SMILES-to-NMR (추후 업데이트)

⑶ NMR-to-structure (추후 업데이트)

 

입력: 2023.11.30 02:40