ã¯ããã«
ã¿ãªãããããã«ã¡ã¯ïŒðâš åªé ã«ã³ãŒããæžããéçºåå¿è ãšã³ãžãã¢ã®ç§ããåããŠAIã掻çšããã·ã¹ãã ãéçºããŸããïŒð 掟æãªè²ã倧奜ãã§ãç¹ã«çŽ«ã«ã¯ç®ããããŸããð ä»åã¯ã瀟å åŠç¿ã®äžç°ãšããŠAIã䜿ã£ãåãåãããã£ããããããäœã£ãã®ã§ããã®ææãšææ³ãã·ã§ã¢ããŸãïŒ
ãã®ãããžã§ã¯ãã§ã¯ã以äžã®æè¡ã䜿ããŸããïŒð
- OpenAI APIïŒAIã®ç¥èœã掻çšããŠè³ªåã«çããŠãããGPTã¢ãã«ã
- PostgreSQLïŒpgvector æ¡åŒµïŒïŒAIã®èšæ¶ãšããŠåããã¯ãã«æ€çŽ¢ããŒã¿ããŒã¹ã
- FlaskïŒã¢ããªã®è³ã¿ãïŒAPIããã¯ãšã³ãïŒã
- tiktokenïŒAIãšã¹ã ãŒãºã«äŒè©±ããããã®ããŒã¯ã³å¶åŸ¡ã©ã€ãã©ãªã
ãååŠè ã§ããããªã«æ¥œããã§éçºã§ãããã ïŒããšããããšãäŒããããšæããŸãðð
ð 䜿ã£ãAIæè¡ã®è©³çŽ°
1. OpenAI GPT ã¢ãã« âš
𧩠åã蟌ã¿çæïŒEmbeddingã£ãŠäœïŒïŒïŒ
AIãããã®æç« ã®æå³ã£ãŠãããªæãããªïŒããšç解ããããã«ãtext-embedding-ada-002 ã¢ãã«ã䜿ã£ãŠæç« ãæ°å€åïŒãã¯ãã«åïŒããŸããïŒ
Embeddingãšã¯ïŒ
ããã¹ããæ°å€ïŒãã¯ãã«ïŒã«å€æããé¡äŒŒããããã¹ãå士ãèŠã€ããæè¡ã§ãã
äŸãã°ã
- ãè¿åããªã·ãŒãæããŠãã ããã
- ãè¿éã®ã«ãŒã«ã«ã€ããŠç¥ãããã
ãã®2ã€ã®è³ªåã¯åèªã¯éããŸãããæå³ã䌌ãŠããŸããEmbeddingã䜿ãããšã§ããã®2ã€ãåããããªæ°å€ïŒãã¯ãã«ïŒã«å€æãã䌌ãå 容ãšããŠèªèã§ããŸãïŒ
ãã£ããæµãïŒ
- ãŠãŒã¶ãŒã®è³ªåãããŒã¿ãå ¥å
- ã¢ãã«ãæç« ããã¯ãã«ïŒæ°å€é åïŒã«å€æ
- AIããã¯ãã«ãæ¯èŒããŠã䌌ãŠããæç« ããæ¢ã
embedding = openai.Embedding.create(
model="text-embedding-ada-002",
input="è¿åããªã·ãŒãæããŠ"
)
vector = embedding["data"][0]["embedding"]
ãã® vector
ãããŒã¿ããŒã¹ã«ä¿åããé¡äŒŒæ€çŽ¢ã«äœ¿ããŸãã
ð AIãåçãçæããä»çµã¿
質åãåãåããšãGPT-4ã«åãåãããŠãé©åãªåçãçæããŸãïŒ
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "ããªãã¯åœ¹ã«ç«ã€ã¢ã·ã¹ã¿ã³ãã§ãïŒ"},
{"role": "user", "content": "ãŠãŒã¶ãŒã®è³ªåå
容"},
]
)
GPTããã倩æã§ããâŠïŒð€âš
2. ð ãã¯ãã«æ€çŽ¢ãšã³ãµã€ã³é¡äŒŒåºŠ
åŸæ¥ã®ãããŒã¯ãŒãæ€çŽ¢ãã§ã¯ãªããæç« ã®æå³ ãåºã«æ€çŽ¢ã§ãããã¯ãã«æ€çŽ¢ãæ¡çšïŒ
PostgreSQL ã® pgvector æ¡åŒµ ð
ããŒã¿ãèšæ¶ããããã« pgvector ã䜿ããŸããïŒ
æ€çŽ¢ã®æµããšã³ãµã€ã³é¡äŒŒåºŠ
ð¹ ãŠãŒã¶ãŒã®è³ªåïŒãè¿åããªã·ãŒãæããŠãã ããã
-
ãŠãŒã¶ãŒã®è³ªåããã¯ãã«ã«å€æ ð§
-
text-embedding-ada-002
ã䜿ã£ãŠã質åã®æå³ãæ°å€ïŒãã¯ãã«ïŒã«å€æããŸãã
ã
-
-
ããŒã¿ããŒã¹ã®ãã¯ãã«ãšæ¯èŒããŠãæã䌌ãŠããæ å ±ãæ¢ã ð
- é¡äŒŒåºŠã®èšç®ã«ã¯ ã³ãµã€ã³é¡äŒŒåºŠïŒcosine similarityïŒ ã䜿çšã
- ããã¯ã2ã€ã®ãã¯ãã«ã®è§åºŠãåºã«é¡äŒŒåºŠã枬ãæ¹æ³ã§ãå€ã 1ã«è¿ãã»ã©é¡äŒŒããŠãã ããšã瀺ããŸãã
ã
-
æãé¡äŒŒåºŠãé«ãFAQãååŸããåçãçæ âš
- è¿åã«é¢ããæ å ±ãæãé¡äŒŒããŠããããããè¿åã¯30æ¥ä»¥å ã§å¯èœã§ããããååŸã
ð¹ ã³ãµã€ã³é¡äŒŒåºŠãçšããæ€çŽ¢ã¯ãšãª
SELECT content, (1 - (embedding <=> %s)) AS similarity
FROM embeddings
ORDER BY similarity DESC
LIMIT 5;
ãã®æ¹æ³ã䜿ãããšã§ãåèªãå®å šã«äžèŽããªããŠã æå³ã®è¿ãåç ãèŠã€ããããã®ã倧ããªã¡ãªããã§ãïŒâš
3. ð AIã®ããã®ããŒã¯ã³å¶åŸ¡ïŒtiktoken ãšã¯ïŒïŒ
AIã¯äžåºŠã«é·ãæç« ãåŠçã§ããªãã®ã§ãé©åã«æç« ãåå²ãã tiktoken ã䜿ããŸããïŒ
ããŒã¯ã³ãšã¯ïŒ
GPTã¯ãããŒã¯ã³ããšããåäœã§æç« ãåŠçããŸãã
äŸãã°ã
- ãããã«ã¡ã¯ãä»æ¥ã¯ãã倩æ°ã§ãããã â 10ããŒã¯ã³
- ãHello, today is a nice day.ã â 7ããŒã¯ã³
GPTã«ã¯ããŒã¯ã³ã®å¶éãããã®ã§ãé·ãããæç« ã¯åå²ããŠåŠçããå¿ èŠããããŸãã
ð¡ tiktoken ã®æŽ»çšãã€ã³ã
- ããã¹ããçãåå²ãã512ããŒã¯ã³ä»¥å ã«èª¿æŽïŒ
- ç¡é§ãªæ¹è¡ã空çœãåé€ïŒ
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4")
text = "ããã¯é·ãæç« ã§ã...ïŒçç¥ïŒ"
tokens = enc.encode(text)
if len(tokens) > 512:
tokens = tokens[:512] # 512ããŒã¯ã³ä»¥å
ã«ã«ãã
ããã䜿ãã°ãAIãã¹ã ãŒãºã«åŠçã§ããŸãïŒ
ð® ã·ã¹ãã å šäœã®åã
- ãŠãŒã¶ãŒã質åãéã ðš
- 質åããã¯ãã«åããŠé¡äŒŒæ å ±ãæ€çŽ¢ ð
- AIãåçãçæããè¿çïŒ ð€
-
åçãä¿åããŠãæªæ¥ã®åãåããã«ã察å¿ïŒ ð
ã
éçºäžãAIãã©ãã©ãè³¢ããªã£ãŠããæããããŠããŸãã§è²ãŠãŠãããã§ããâŠïŒð
ãã®ã·ã¹ãã ã®è©³çŽ°ãªèšèšãå®è£ ã«ã€ããŠã¯ã次åã®æçš¿ã§è§£èª¬ããŸãïŒ
ð ä»åŸã®æ¹åãã€ã³ã
- ãã£ãã·ã¥å°å ¥ã§é«éåïŒâ¡ïž
- å€èšèªå¯Ÿå¿ãããŠã¿ããïŒð
- ãã£ãšèŠãç®ã玫ã§ãªã·ã£ã¬ã«ãããðïŒ
- ã¹ãã察å¿ã®UI
ð¡ ãŸãšã
åããŠã®AIéçºããšã£ãŠã楜ããã£ãã§ãïŒðð
ã»AIãæç« ãç解ããä»çµã¿ãåŠã¹ãïŒ
ã»ãã¯ãã«æ€çŽ¢ã®åããå®æïŒ
ã»FlaskãDBæäœãå®è·µã§ããïŒ
äœãããã³ãŒããæžã楜ããããå確èªã§ããŸããð
ãAIã£ãŠé¢çœãïŒããšæã£ãŠãããããå¬ããã§ãïŒ
ãŸã次ã®éçºã«ãææŠããŸãïŒâš
ð åèãªã³ã¯