▶ 자연과학/▷ Python
【Python】 자연어 처리 및 LLM 유용 함수 모음
초록E
2024. 2. 10. 13:35
자연어 처리 및 LLM 유용 함수 모음
추천글 : 【Python】 파이썬 유용 함수 모음, 【알고리즘】 21강. NLP와 LLM
1. 자연어 처리 유용 함수 [본문]
2. Llama2 응용 [본문]
a. ollama.ai : Llama3, Phi-3, Mistral, Gemma, etc
b. GroqChat : Mixtral, Llama3, Gemma
1. 자연어 처리 유용 함수 [목차]
⑴ 주어진 문장을 자동으로 영어로 번역하는 함수
! pip install --upgrade googletrans httpx httpcore deep_translator
def to_english (sentence):
from deep_translator import GoogleTranslator
translated = GoogleTranslator(source='auto', target='en').translate(sentence)
return translated
print( to_english("나는 소년입니다.") )
# I am a boy.
print( to_english("단핵구") )
# monocytes
⑵ 주어진 문장을 자동으로 한국어로 번역하는 함수
def to_korean (sentence):
from deep_translator import GoogleTranslator
translated = GoogleTranslator(source='auto', target='ko').translate(sentence)
return translated
print( to_korean("I am a boy.") )
# 저는 남자입니다.
⑶ 주어진 문장을 자동으로 일본어로 번역하는 함수
def to_japanese (sentence):
from deep_translator import GoogleTranslator
translated = GoogleTranslator(source='auto', target='ja').translate(sentence)
return translated
print( to_japanese("I am a boy.") )
# 私は男の子です。
⑷ 임의의 가변 길이 자연어 문장을 그 의미를 고려하여 384차원으로 만드는 함수 (cf. CELLama)
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
import numpy as np
from scipy.sparse import csr_matrix
import pandas as pd
from sklearn.neighbors import NearestNeighbors
import torch
from torch.utils.data import DataLoader, TensorDataset
from xgboost import XGBClassifier
def sentences_to_embedding(sentences):
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
db = embedding_function.embed_documents(sentences)
emb_res = np.asarray(db)
return emb_res
sentences = []
sentences.append("What is the meaning of: obsolete")
sentences.append("What is the meaning of: old-fashioned")
sentences.append("What is the meaning of: demagogue")
emb_res = sentences_to_embedding(sentences)
2. Llama2 응용 [목차]
⑴ Llama2를 이용한 영한 번역
import ollama
def english_to_korean(sentence):
content = 'Translate "' + sentence + '" to Korean. Output only the translated sentence.'
response = ollama.chat(model='llama2', messages=[
{
'role': 'user',
'content': content,
},
])
return response['message']['content']
sentence = "I am a boy."
english_to_korean(sentence)
⑵ 주어진 문장이 화학식을 포함하는지 판단하기
import ollama
def is_chemical_formula(sentence):
content = 'Please determine if "' + sentence + '" contains a chemical formula or not. If it is correct, answer "sure"; otherwise, "no".'
response = ollama.chat(model='llama2', messages=[
{
'role': 'user',
'content': content,
},
])
return response['message']['content']
sentence = "NH2OH is an amine."
result = is_chemical_formula(sentence)
print(result)
print('sure' in result.lower())
sentence = "I am a boy."
result = is_chemical_formula(sentence)
print(result)
print('sure' in result.lower())
### Output ###
'''
The term "NH2OH" does contain a chemical formula, so the answer is "yes" or "sure".
True
The statement "I am a boy" does not contain any chemical formulas, so the answer is "no".
False
'''
⑶ 주어진 명사가 고유명사(proper noun)인지 보통명사(common noun)인지를 판단하기
import ollama
def is_proper_noun(noun):
content = 'Please determine if "' + noun + '" is a proper noun or common noun. If it is a proper noun, answer "proper"; otherwise, "common".'
response = ollama.chat(model='llama2', messages=[
{
'role': 'user',
'content': content,
},
])
return response['message']['content']
sentence = "Pencil"
result = is_proper_noun(sentence)
print(result)
print('proper' in result.lower())
sentence = "Feynman"
result = is_proper_noun(sentence)
print(result)
print('proper' in result.lower())
### Output ###
'''
"Pencil" is a common noun. Therefore, the answer is "common".
False
"Feynman" is a proper noun. Therefore, the answer is "proper".
True
'''
입력: 2024.02.10 13:34