유기화학 관련 파이썬 함수 모음 (화학정보학, chemoinformatics; structural bioinformatics)
추천글 : 【유기화학】 유기화학 목차, 【Python】 파이썬 유용 함수 모음
1. 명명법 [본문]
2. 드로잉 [본문]
3. 아미노산 [본문]
4. 분광학 [본문]
a. 생물정보학
⑴ SMILES(simplified molecular-input line entry system) : 짧은 ASCII 문자열 표현
① 이중결합은 '='로, 삼중결합은 '#'로 나타냄
② 고리형 화합물의 경우, 1, 2, ···과 같은 숫자를 부여하여, 선형 분자의 끝과 끝이 연결돼 고리를 이루는 것을 나타냄
○ 예 : CN1C=NC2=C1C(=O)N(C(=O)N2C)C
③ C는 일반 탄소를 나타내는 반면, c는 방향족 탄소를 나타냄
○ C1CCCCC1 : cyclohexane
○ c1ccccc1 : benzene
④ 대괄호를 써서 보다 복잡한 경우를 나타낼 수 있음
○ [N+]와 같이 전하 표시도 할 수 있음
⑤ @을 써서 입체 배열을 나타낼 수 있음
○ 예 : C[C@H](O)[C@H](O)C
⑥ /, \를 써서 방향이 같은지 다른지를 나타내어 EZ 이성질체를 나타낼 수 있음
○ 예 : CCC/C(=C/C(=O)OCC)/C(=O)OCC
⑵ 유기화합물을 SMILES로 바꾸는 코드
rdkit.Chem.MolToSmiles(Chem.MolFromFASTA(sequence, flavor = 1))
"""
flavor: (optional)
0 Protein, L amino acids (default)
1 Protein, D amino acids
2 RNA, no cap
3 RNA, 5’ cap
4 RNA, 3’ cap
5 RNA, both caps
6 DNA, no cap
7 DNA, 5’ cap
8 DNA, 3’ cap
9 DNA, both caps
"""
⑶ SMILES, IUPAC 상호 변환 코드 : transformer 활용
# reference: https://github.com/Kohulan/Smiles-TO-iUpac-Translator
! pip install STOUT-pypi
! pip install git+https://github.com/Kohulan/Smiles-TO-iUpac-Translator.git
from STOUT import translate_forward, translate_reverse
# SMILES to IUPAC name translation
SMILES = "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"
IUPAC_name = translate_forward(SMILES)
print("IUPAC name of "+SMILES+" is: "+IUPAC_name)
# IUPAC name to SMILES translation
IUPAC_name = "1,3,7-trimethylpurine-2,6-dione"
SMILES = translate_reverse(IUPAC_name)
print("SMILES of "+IUPAC_name+" is: "+SMILES)
① IUPAC 입력이 완전히 정확하지 않아도 SMILES 코드 변환이 작동할 수 있음
⑷ 임의의 화학식이 주어져 있을 때 IUPAC 명명법을 알아내는 방법
① 단계 1. SMILES-to-drawing 함수를 여러 번 시도하여 주어진 화합물을 나타내는 SMILES 코드 찾아내기
② 단계 2. SMILES-to-IUPAC 함수를 실행시켜 최종 명명법 구하기
⑸ SMILES로부터 분자량을 구하는 코드
from rdkit import Chem
from rdkit.Chem import Descriptors
def calculate_molecular_weight(smiles):
molecule = Chem.MolFromSmiles(smiles)
return Descriptors.ExactMolWt(molecule)
# Example usage
smiles_code = "C1=CC=C(C=C1)O" # SMILES code for phenol
molecular_weight = calculate_molecular_weight(smiles_code)
print(f"Molecular Weight: {molecular_weight}")
⑹ SMILES로부터 방향족성을 판단하는 코드
from rdkit import Chem
def is_aromatic(smiles):
molecule = Chem.MolFromSmiles(smiles)
if molecule is None:
return False
return any(atom.GetIsAromatic() for atom in molecule.GetAtoms())
# Example usage
cyclohexane = "C1CCCCC1" # cyclohexane
benzene = "c1ccccc1" # benzene
imidazole = "C1=CN=CN1" # 1H-imidazole
print(f"The {cyclohexane} is {'aromatic' if is_aromatic(cyclohexane) else 'not aromatic'}")
print(f"The {benzene} is {'aromatic' if is_aromatic(benzene) else 'not aromatic'}")
print(f"The {imidazole} is {'aromatic' if is_aromatic(imidazole) else 'not aromatic'}")
⑺ SMILES로부터 쌍극자 모멘트를 계산하는 코드
# conda install -c psi4 psi4
import psi4
from rdkit import Chem
from rdkit.Chem import AllChem
import numpy as np
def calculate_dipole_moment(smiles):
# Convert SMILES to molecule
mol = Chem.MolFromSmiles(smiles)
# Add Hydrogens
mol = Chem.AddHs(mol)
# Generate 3D coordinates
AllChem.EmbedMolecule(mol, AllChem.ETKDG())
# Extract coordinates
conf = mol.GetConformer()
xyz = ''
for atom in mol.GetAtoms():
pos = conf.GetAtomPosition(atom.GetIdx())
xyz += f"{atom.GetSymbol()} {pos.x} {pos.y} {pos.z}\n"
# Set up Psi4
psi4.set_memory('500 MB')
psi4.set_options({'basis': 'sto-3g'})
# Calculate dipole moment using Psi4
psi4_mol = psi4.geometry(xyz)
psi4.energy('scf')
dipole_moment = psi4.variable('SCF DIPOLE')
return dipole_moment
### Example usage
smiles_code = "CCO" # Example SMILES code for ethanol
dipole_moment = calculate_dipole_moment(smiles_code)
print(f"Dipole Moment: {dipole_moment} Debye")
# Dipole Moment: [ 0.04250251 0.20600936 -0.52850913] Debye
dipole_vector = np.array([0.04250251, 0.20600936, -0.52850913])
magnitude = np.linalg.norm(dipole_vector)
print(f"Magnitude of Dipole Moment: {magnitude} Debye")
# Magnitude of Dipole Moment: 0.5688305725409514 Debye
⑻ SMILES로부터 끓는점(bp), 녹는점(mp), 임계온도를 구하는 함수
# reference: https://thermo.readthedocs.io/thermo.chemical.html
from thermo.chemical import Chemical
N2 = Chemical('Nitrogen')
print(N2.Tm, N2.Tb, N2.Tc) # melting, boiling, and critical points [K]
## 63.15 77.3549950205 126.192
molecule = Chemical('CC(C)C')
print(molecule.Tm, molecule.Tb, molecule.Tc) # melting, boiling, and critical points [K]
## 124.2 261.401014643 407.81
molecule_ = Chemical('2-methylpropane')
print(molecule_.Tm, molecule_.Tb, molecule_.Tc) # melting, boiling, and critical points [K]
## 124.2 261.401014643 407.81
① search-based로 작동하며 모든 화합물이 대상이 되지 않음 : 이를 개선하기 위해 다양한 머신러닝 모델이 도입되고 있음
② 응용 1. 끓는점, 녹는점 예제 만들기
import random
from rdkit import Chem
from rdkit.Chem import Draw
from thermo.chemical import Chemical
from STOUT import translate_forward
from IPython.display import display
from PIL import Image, ImageDraw, ImageFont
def generate_alkanes(n_carbons):
if n_carbons == 1:
return [Chem.MolFromSmiles('C')]
elif n_carbons == 2:
return [Chem.MolFromSmiles('CC')]
smaller_alkanes = generate_alkanes(n_carbons - 1)
new_alkanes = set()
for mol in smaller_alkanes:
for atom in mol.GetAtoms():
if atom.GetDegree() < 4: # 탄소는 최대 4개의 결합을 가질 수 있음
new_mol = Chem.RWMol(mol)
new_idx = new_mol.AddAtom(Chem.Atom(6))
new_mol.AddBond(atom.GetIdx(), new_idx, Chem.BondType.SINGLE)
Chem.SanitizeMol(new_mol)
smiles = Chem.MolToSmiles(new_mol, canonical=True)
new_alkanes.add(smiles)
return [Chem.MolFromSmiles(smiles) for smiles in new_alkanes]
def draw_and_display_properties(isomer1, isomer2):
# 이성질체의 SMILES로부터 thermo의 Chemical 객체 생성
smiles1 = Chem.MolToSmiles(isomer1)
smiles2 = Chem.MolToSmiles(isomer2)
chemical1 = Chemical(smiles1)
chemical2 = Chemical(smiles2)
# IUPAC 이름 변환
iupac_name1 = translate_forward(smiles1)
iupac_name2 = translate_forward(smiles2)
# 끓는점, 녹는점 계산 (°C로 변환)
melting_point1 = chemical1.Tm - 273.15 # K to °C
boiling_point1 = chemical1.Tb - 273.15 # K to °C
melting_point2 = chemical2.Tm - 273.15 # K to °C
boiling_point2 = chemical2.Tb - 273.15 # K to °C
# 구조 이성질체 그리기
img1 = Draw.MolToImage(isomer1, size=(300, 300))
img2 = Draw.MolToImage(isomer2, size=(300, 300))
# PIL로 이미지 결합 및 텍스트 추가
combined_img = Image.new('RGB', (600, 400), (255, 255, 255))
combined_img.paste(img1, (0, 0))
combined_img.paste(img2, (300, 0))
draw = ImageDraw.Draw(combined_img)
font = ImageFont.load_default(size=16)
# 첫 번째 이성질체 텍스트 추가
draw.text((10, 310), f"IUPAC: {iupac_name1}", fill=(0, 0, 0), font=font)
draw.text((10, 330), f"Melting Point: {melting_point1:.2f} °C", fill=(0, 0, 0), font=font)
draw.text((10, 350), f"Boiling Point: {boiling_point1:.2f} °C", fill=(0, 0, 0), font=font)
# 두 번째 이성질체 텍스트 추가
draw.text((310, 310), f"IUPAC: {iupac_name2}", fill=(0, 0, 0), font=font)
draw.text((310, 330), f"Melting Point: {melting_point2:.2f} °C", fill=(0, 0, 0), font=font)
draw.text((310, 350), f"Boiling Point: {boiling_point2:.2f} °C", fill=(0, 0, 0), font=font)
# 비교 결과 추가
#melting_comparison = f"{iupac_name1} has higher melting point" if melting_point1 > melting_point2 else f"{iupac_name2} has higher melting point"
#boiling_comparison = f"{iupac_name1} has higher boiling point" if boiling_point1 > boiling_point2 else f"{iupac_name2} has higher boiling point"
#draw.text((10, 370), melting_comparison, fill=(0, 0, 255), font=font)
#draw.text((310, 370), boiling_comparison, fill=(0, 0, 255), font=font)
# 이미지 출력
display(combined_img)
# n값 설정 (예: n=7, 탄소 7개)
n = 7
alkanes = generate_alkanes(n)
isomer1, isomer2 = random.sample(alkanes, 2)
draw_and_display_properties(isomer1, isomer2)
⑼ SMILES로부터 산도를 판단하는 코드 (추후 업데이트)
⑽ SMILES로부터 ADMET(흡수, 분포, 대사, 배출, 독성)를 예측하는 모델 : ADMET-AI
⑾ PubChem으로부터 IUPAC 이름들을 생성하는 함수
import requests
import time
def fetch_iupac_names_from_pubchem(start_cid, count):
iupac_names = []
base_url = "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{}/property/IUPACName/JSON"
current_cid = start_cid
while len(iupac_names) < count:
response = requests.get(base_url.format(current_cid))
if response.status_code == 200:
data = response.json()
if 'PropertyTable' in data and 'Properties' in data['PropertyTable']:
for prop in data['PropertyTable']['Properties']:
if 'IUPACName' in prop:
iupac_names.append(prop['IUPACName'])
if len(iupac_names) >= count:
break
else:
print(f"Failed to fetch data for CID {current_cid}")
current_cid += 1
time.sleep(0.1) # To prevent hitting the API rate limit
return iupac_names
# Fetch 100 IUPAC names starting from CID 1
iupac_names = fetch_iupac_names_from_pubchem(start_cid=1, count=100)
○ PubChem으로부터 IUPAC 명명법 크롤링 → IUPAC-to-SMILES → SMILES-to-image
○ 예제 생성 알고리즘에서 추가로 SMILES-to-IUPAC과 원래 IUPAC이 일치하는지 확인함으로써 다음과 같은 부수적인 효과가 발생
○ 효과 1. 부적절한 IUPAC 명명법을 제외
○ 효과 2. 컴퓨터가 이해하기 난해한 명명법을 제거함으로써 명명법 예제의 난이도를 조절
⑿ PubChem으로부터 입체배열을 갖는 SMILES 이름들을 생성하는 함수
import requests
import time
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.EnumerateStereoisomers import EnumerateStereoisomers, StereoEnumerationOptions
def fetch_smiles_from_pubchem(start_cid, count):
smiles_list = []
base_url = "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{}/property/CanonicalSMILES/JSON"
current_cid = start_cid
while len(smiles_list) < count:
response = requests.get(base_url.format(current_cid))
if response.status_code == 200:
data = response.json()
if 'PropertyTable' in data and 'Properties' in data['PropertyTable']:
for prop in data['PropertyTable']['Properties']:
if 'CanonicalSMILES' in prop:
smiles_list.append(prop['CanonicalSMILES'])
if len(smiles_list) >= count:
break
else:
print(f"Failed to fetch data for CID {current_cid}")
current_cid += 1
time.sleep(0.1) # To prevent hitting the API rate limit
return smiles_list
def generate_stereoisomers(smiles, max_isomers=5):
mol = Chem.MolFromSmiles(smiles)
if mol is None:
return []
opts = StereoEnumerationOptions(onlyUnassigned=True, maxIsomers=max_isomers)
isomers = list(EnumerateStereoisomers(mol, options=opts))
smiles_isomers = [Chem.MolToSmiles(isomer, isomericSmiles=True) for isomer in isomers]
return smiles_isomers
# Fetch 100 canonical SMILES codes starting from CID 1
smiles_codes = fetch_smiles_from_pubchem(start_cid=1, count=100)
# Generate stereoisomers for each fetched SMILES code
all_stereoisomers = []
for i, smiles in enumerate(smiles_codes, start=1):
stereoisomers = generate_stereoisomers(smiles)
all_stereoisomers.extend(stereoisomers)
print(f"Canonical SMILES {i}: {smiles}")
for j, isomer in enumerate(stereoisomers, start=1):
print(f" Stereoisomer {j}: {isomer}")
# Optionally, limit the number of generated stereoisomers for display
max_display = 100
print("\nGenerated Stereoisomers (limited to first {}):".format(max_display))
for i, isomer in enumerate(all_stereoisomers[:max_display], start=1):
print(f"{i}: {isomer}")
''' Visualization code example
from rdkit import Chem
from rdkit.Chem import Draw
smiles = 'O=C(O)C[C@@]1(Cl)C=CC(=O)O1'
molecule = Chem.MolFromSmiles(smiles)
Draw.MolToImage(molecule)
'''
○ PubChem으로부터 SMILES 명명법 크롤링 → canonical SMILES to stereochemical SMILES → SMILES-to-image
② 응용 2. 위 코드는 R 배향을 특히 선호하는 구조를 도출함
○ 시험에 출제된 이성질체 62개를 조사한 결과, R형이 답인 경우가 40개, S형이 답인 경우가 22개로 시험에서도 R형이 더 선호됨
○ R형과 S형의 이론적 비율은 동일할 것이므로, 오히려 인지 편향성이 있는 게 아닌가 추정됨
2. 드로잉 [목차]
⑴ 유기화합물 분자식 그리기 (예 : peracetic acid)
from rdkit import Chem
from rdkit.Chem import Draw
# Define the SMILES strings for peracetic acid
smiles = 'C=CC(=O)O'
# Convert the SMILES strings to RDKit molecule objects
molecule = Chem.MolFromSmiles(smiles)
# Draw the molecules without saving
Draw.MolToImage(molecule)
# Draw the molecules with saving
Draw.MolToFile(molecule, 'peracetic_acid.png')
⑵ 유기화합물 electron density map 그리기 (ver. 1) (예 : peracetic acid)
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import SimilarityMaps
# Generate a 3D structure
molecule = Chem.MolFromSmiles('CC(=O)OO')
molecule_3d = Chem.AddHs(molecule)
AllChem.EmbedMolecule(molecule_3d, AllChem.ETKDG())
AllChem.MMFFOptimizeMolecule(molecule_3d)
# Calculate Gasteiger charges
AllChem.ComputeGasteigerCharges(molecule_3d)
# Function to get atom charges
def GetAtomCharges(mol):
charges = [float(mol.GetAtomWithIdx(i).GetProp('_GasteigerCharge')) for i in range(mol.GetNumAtoms())]
return charges
# Draw electrostatic potential map
fig = SimilarityMaps.GetSimilarityMapFromWeights(molecule_3d, GetAtomCharges(molecule_3d), colorMap='jet', contourLines=10)
⑶ 유기화합물 electron density map 그리기 (ver. 2) (예 : peracetic acid)
# Refernece: https://rdkit.readthedocs.io/en/latest/Cookbook.html
## STEP 1. make a random forest model
from rdkit import Chem, DataStructs
from rdkit.Chem import AllChem
from sklearn.ensemble import RandomForestClassifier
import numpy
# generate four molecules
m1 = Chem.MolFromSmiles('c1ccccc1')
m2 = Chem.MolFromSmiles('c1ccccc1CC')
m3 = Chem.MolFromSmiles('c1ccncc1')
m4 = Chem.MolFromSmiles('c1ccncc1CC')
mols = [m1, m2, m3, m4]
# generate fingeprints: Morgan fingerprint with radius 2
fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols]
# convert the RDKit explicit vectors into numpy arrays
np_fps = []
for fp in fps:
arr = numpy.zeros((1,))
DataStructs.ConvertToNumpyArray(fp, arr)
np_fps.append(arr)
# get a random forest classifiert with 100 trees
rf = RandomForestClassifier(n_estimators=100, random_state=1123)
# train the random forest
# with the first two molecules being actives (class 1) and
# the last two being inactives (class 0)
ys_fit = [1, 1, 0, 0]
rf.fit(np_fps, ys_fit)
# use the random forest to predict a new molecule
m5 = Chem.MolFromSmiles('c1ccccc1O')
fp = numpy.zeros((1,))
DataStructs.ConvertToNumpyArray(AllChem.GetMorganFingerprintAsBitVect(m5, 2), fp)
print(rf.predict((fp,)))
print(rf.predict_proba((fp,)))
## STEP 2. run the random forest model for the input
from rdkit.Chem.Draw import SimilarityMaps
# helper function
def getProba(fp, predictionFunction):
return predictionFunction((fp,))[0][1]
m5 = Chem.MolFromSmiles('CC(=O)OO')
fig, maxweight = SimilarityMaps.GetSimilarityMapForModel(m5, SimilarityMaps.GetMorganFingerprint, lambda x: getProba(x, rf.predict_proba))
⑷ 유기화합물 3차원 분자식 그리기
from rdkit import Chem
from rdkit.Chem import AllChem
import py3Dmol
def draw_3d_molecule(smiles):
# Convert SMILES to RDKit molecule
mol = Chem.MolFromSmiles(smiles)
if mol is None:
print("Invalid SMILES code.")
return
# Generate 3D coordinates for the molecule
mol = Chem.AddHs(mol) # Add hydrogens
AllChem.EmbedMolecule(mol, AllChem.ETKDG()) # Embed molecule in 3D space
# Convert RDKit molecule to 3Dmol.js viewable format
mb = Chem.MolToMolBlock(mol)
# Visualization with Py3Dmol
viewer = py3Dmol.view(width=400, height=300)
viewer.addModel(mb, 'mol')
viewer.setStyle({'stick': {}})
viewer.zoomTo()
return viewer.show()
# Example usage
smiles_code = "CCO" # Ethanol
draw_3d_molecule(smiles_code)
① 검토 1. 메탄올은 입체 장애를 최소화하기 위해 두 메틸기가 staggered 배향인 반면, 에탄올은 분자 내 수소결합에 의해 그렇지 않음
② 검토 2. 다음과 같은 콘쥬게이트 고리 화합물에서도 구조가 잘 구현됨
Figure. 1. cyclooctatetraene에 적용한 예시
③ 이렇게 머신러닝 모델만으로 새로운 지식을 만들 수 있는 게 chemoinformatics의 매력
⑸ 유기화합물 2차원 분자식에 RS 명명법 표시하기
from rdkit import Chem
from rdkit.Chem import AllChem, Draw
from rdkit.Chem.rdMolDescriptors import CalcMolFormula
from PIL import Image, ImageDraw, ImageFont
import matplotlib.font_manager as fm
def get_stereochemistry(mol):
Chem.AssignStereochemistry(mol, cleanIt=True, force=True)
stereo_info = {}
for atom in mol.GetAtoms():
if atom.HasProp('_CIPCode'):
stereo_info[atom.GetIdx()] = atom.GetProp('_CIPCode')
return stereo_info
def draw_2d_molecule_with_stereochemistry(smiles, filename="molecule.png"):
# Convert SMILES to RDKit molecule
mol = Chem.MolFromSmiles(smiles)
if mol is None:
print("Invalid SMILES code.")
return
# Generate 2D coordinates for the molecule
AllChem.Compute2DCoords(mol)
# Get stereochemistry information
stereo_info = get_stereochemistry(mol)
print(stereo_info)
# Draw the molecule
img = Draw.MolToImage(mol, size=(300, 300), kekulize=True, wedgeBonds=True)
# Convert the RDKit image to a PIL image
pil_img = img.convert("RGBA")
# Create a drawing context
draw = ImageDraw.Draw(pil_img)
# Load fonts
font_size = 18 # You can change the font size here
try:
# Use DejaVuSans.ttf for regular text
font_path = fm.findfont(fm.FontProperties(family="DejaVu Sans"))
font = ImageFont.truetype(font_path, font_size)
# Use DejaVuSans-Oblique.ttf for italic text
italic_font_path = fm.findfont(fm.FontProperties(family="DejaVu Sans", style="italic"))
italic_font = ImageFont.truetype(italic_font_path, font_size)
except IOError:
font = ImageFont.load_default()
italic_font = font # Fallback if no italic font is found
# Generate 2D coordinates for the drawing
AllChem.Compute2DCoords(mol)
conf = mol.GetConformer()
# Get 2D coordinates for each atom
coords = conf.GetPositions()
# Add stereo annotations
for atom_idx, stereo in stereo_info.items():
atom = mol.GetAtomWithIdx(atom_idx)
pos = coords[atom_idx]
# Calculate pixel position from molecule coordinates
x = pos[0] * 35 + 135
y = -pos[1] * 35 + 150
# Draw the text annotation with non-italicized brackets and italicized R/S
draw.text((x, y), "(", fill=(0, 0, 0), font=font)
draw.text((x + 10, y), stereo, fill=(0, 0, 0), font=italic_font)
draw.text((x + 30, y), ")", fill=(0, 0, 0), font=font)
# Save the image to a file
pil_img.save(filename)
print(f"Image saved as {filename}")
# Example usage
smiles_code = "C[C@H](O)[C@H](O)C" # Example molecule with stereochemistry
draw_2d_molecule_with_stereochemistry(smiles_code, "molecule_with_stereochemistry.png")
⑹ 유기화합물 3차원 분자식에 RS 명명법 표시하기
from rdkit import Chem
from rdkit.Chem import AllChem, Draw
from rdkit.Chem.rdMolDescriptors import CalcMolFormula
import py3Dmol
def get_stereochemistry(mol):
Chem.AssignStereochemistry(mol, cleanIt=True, force=True)
stereo_info = {}
for atom in mol.GetAtoms():
if atom.HasProp('_CIPCode'):
stereo_info[atom.GetIdx()] = atom.GetProp('_CIPCode')
return stereo_info
def draw_3d_molecule_with_stereochemistry(smiles):
# Convert SMILES to RDKit molecule
mol = Chem.MolFromSmiles(smiles)
if mol is None:
print("Invalid SMILES code.")
return
# Generate 3D coordinates for the molecule
mol = Chem.AddHs(mol) # Add hydrogens
AllChem.EmbedMolecule(mol, AllChem.ETKDG()) # Embed molecule in 3D space
# Get stereochemistry information
stereo_info = get_stereochemistry(mol)
print(stereo_info)
# Convert RDKit molecule to 3Dmol.js viewable format
mb = Chem.MolToMolBlock(mol)
# Visualization with Py3Dmol
viewer = py3Dmol.view(width=400, height=300)
viewer.addModel(mb, 'mol')
viewer.setStyle({'stick': {}})
# Annotate the stereochemistry on the molecule
for atom_idx, stereo in stereo_info.items():
pos = mol.GetConformer().GetAtomPosition(atom_idx)
viewer.addLabel(stereo, {
'position': {'x': pos.x, 'y': pos.y, 'z': pos.z},
'backgroundColor': 'black',
'fontColor': 'white',
'fontSize': 14,
'showBackground': True
})
viewer.zoomTo()
return viewer.show()
# Example usage
smiles_code = "C[C@H](O)[C@H](O)C" # Example molecule with stereochemistry
draw_3d_molecule_with_stereochemistry(smiles_code)
⑺ 탄소수에 따른 알케인의 구조이성질체 모두 그리는 코드
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
from rdkit.Chem.rdchem import Mol
from IPython.display import display
def generate_alkanes(n_carbons):
if n_carbons == 1:
return [Chem.MolFromSmiles('C')]
elif n_carbons == 2:
return [Chem.MolFromSmiles('CC')]
smaller_alkanes = generate_alkanes(n_carbons - 1)
new_alkanes = set()
for mol in smaller_alkanes:
for atom in mol.GetAtoms():
if atom.GetDegree() < 4: # 탄소는 최대 4개의 결합을 가질 수 있음
new_mol = Chem.RWMol(mol)
new_idx = new_mol.AddAtom(Chem.Atom(6))
new_mol.AddBond(atom.GetIdx(), new_idx, Chem.BondType.SINGLE)
Chem.SanitizeMol(new_mol)
smiles = Chem.MolToSmiles(new_mol, canonical=True)
new_alkanes.add(smiles)
return [Chem.MolFromSmiles(smiles) for smiles in new_alkanes]
# C7H16의 모든 구조 이성질체 생성 및 시각화
n_carbons = 7
alkanes = generate_alkanes(n_carbons)
img = Draw.MolsToGridImage(alkanes, molsPerRow=5, subImgSize=(200, 200))
# 이미지를 직접 디스플레이 (Jupyter 노트북에서만 동작)
display(img)
화학식 | 구조이성질체의 수 |
C3H8 | 1 |
C4H10 | 2 |
C5H12 | 3 |
C6H14 | 5 |
C7H16 | 9 |
C8H18 | 18 |
C9H20 | 35 |
C10H22 | 75 |
C11H24 | 159 |
C12H26 | 355 |
C13H28 | 802 |
C14H30 | 1,858 |
C15H32 | 4,347 |
C16H34 | 10,359 |
C17H36 | 24,894 |
C18H39 | 60,523 |
C19H40 | 148,284 |
C20H42 | 366,319 |
C30H62 | 4,111,846,763 |
C40H82 | 62,481,801,147,341 |
Table. 1. 탄소수에 따른 알케인의 구조이성질체의 수
⑻ 트리 자료구조를 활용하여 모든 알케인 치환기를 그리는 코드 (n_carbons = 6부터 중복이 존재함)
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.rdchem import AtomValenceException
from IPython.display import display
class Node:
def __init__(self, id):
self.id = id
self.children = []
class Tree:
def __init__(self, root):
self.nodes = [root]
def add_node(self, parent_id, node_id, max_nodes):
if len(self.nodes) >= max_nodes or len(self.nodes[parent_id].children) >= 3:
return None
new_node = Node(node_id)
self.nodes.append(new_node)
self.nodes[parent_id].children.append(new_node)
return new_node
def generate_trees(current_tree, current_id, max_nodes):
if len(current_tree.nodes) == max_nodes:
if is_valid_tree(current_tree):
return [current_tree]
else:
return []
trees = []
for node_id in range(len(current_tree.nodes)):
new_tree = Tree(Node(0))
new_tree.nodes = [Node(i) for i in range(len(current_tree.nodes))]
for i in range(len(current_tree.nodes)):
new_tree.nodes[i].children = [new_tree.nodes[n.id] for n in current_tree.nodes[i].children]
new_node = new_tree.add_node(node_id, current_id, max_nodes)
if new_node and is_valid_tree(new_tree):
trees.extend(generate_trees(new_tree, current_id + 1, max_nodes))
return trees
def is_valid_tree(tree):
for node in tree.nodes:
children_counts = [len(child.children) for child in node.children]
if any(children_counts[i] > children_counts[i+1] for i in range(len(children_counts)-1)):
return False
return True
def visualize_organic_structures(trees):
mols = []
for tree in trees:
mol = Chem.RWMol()
atom_index = {}
for node in tree.nodes:
if node.id == 0:
atom = Chem.Atom(6)
atom.SetNumRadicalElectrons(1) # Set radical electron for the root
else:
atom = Chem.Atom(6)
atom_index[node.id] = mol.AddAtom(atom)
for node in tree.nodes:
for child in node.children:
mol.AddBond(atom_index[node.id], atom_index[child.id], Chem.BondType.SINGLE)
mols.append(mol)
return mols
# Input the total number of nodes
n_nodes = 4
# Start with a single root node
initial_tree = Tree(Node(0))
all_trees = generate_trees(initial_tree, 1, n_nodes)
all_mols = visualize_organic_structures(all_trees)
# 시각화
img = Draw.MolsToGridImage(all_mols, molsPerRow=5, subImgSize=(200, 200), useSVG=False)
display(img)
① n_carbons = 3 (총 2개)
② n_carbons = 4 (총 4개)
② n_carbons = 5 (총 8개)
③ n_carbons = 6 (총 17개)
⑴ 염기서열을 아미노산 서열로 바꾸는 함수
from Bio.Seq import Seq
# Assuming 'sequence' is your string of nucleotides:
sequence = "ACTCATTCTCCCCAGACGCCAAGGATGGTGGTCATGGCGCCCCGAACCCTCTTCCTGCTGCTCTCGGGGGCCCTGACCCTGACCGAGACCTGGGCGG" # truncated for brevity
# Create a sequence object
seq_obj = Seq(sequence)
# Translate the sequence
amino_acid_sequence = seq_obj.translate(to_stop=True)
# Print the amino acid sequence
print(amino_acid_sequence)
⑵ AlphaFold2를 이용하여 아미노산 서열로부터 아미노산 구조에 관한 PDB 파일을 생성하는 함수 (추후 업데이트)
⑴ 개요
① 화학식으로부터 MS, IR, NMR 스펙트럼을 예측하거나, MS, IR, NMR 데이터를 통해 화학식을 예측하는 연구가 활발히 진행되고 있음
② 이 연구는 딥러닝 기술의 발전과 함께 현재 크게 진전되고 있음
○ 예 : NMR-TS
입력: 2023.11.30 02:40
수정: 2024.06.11 12:26
'▶ 자연과학 > ▷ 화학정보학' 카테고리의 다른 글
【화학정보학】 프로테오믹스 분석 파이프라인 (1) | 2024.03.31 |
---|---|
【화학정보학】 물질 라이브러리 (0) | 2023.12.11 |
【화학정보학】 생물학 실험에 사용되는 형광물질 라이브러리 (7) | 2022.05.04 |
【화학정보학】 저분자 약물 데이터베이스 종류 (0) | 2022.04.27 |
【화학정보학】 New Cancer Drugs Approved by the FDA in 20 years (0) | 2022.01.08 |
최근댓글