AIãšãŒãžã§ã³ãèªäœã®äžçãžããããã
å€ãã®éçºè ããLangChainã䜿ã£ãŠVector DBã«PDFãæŸã蟌ãã°ãè³¢ããšãŒãžã§ã³ããã§ããããšãã倢ãèŠãŸãããããããã®å ã«åŸ ã¡åããŠããã®ã¯ãã©ãã ã調æŽããŠããã©ããåã¿åããªããè«çã®éããªãAIããšããçŸå®ã§ãã
æ¬èšäºã§ã¯ããªãå€ãã®èªäœæŽŸãäŒæ¥ã Phase 2ïŒVector RAGïŒ ã§åæ»ããæ«æããã®ãããããŠããã®å ã® Phase 3ïŒASTæ§é åèšæ¶ïŒ ãžè³ãããã«å¿ èŠãªãäœã¬ã€ã€ãŒã®å£ããšãç¿åŸãã¹ãæè¡ãããäœãããšãªã解説ããŸããã
1. ãPhase 2 æ¢ãŸãããšããäžéœåãªçå®
çŸåšãäžã®äžã®AIãšãŒãžã§ã³ãã®9å²ã¯ Phase 2ïŒVector RAGïŒ ã§è¶³èžã¿ãããŠããŸãã
æ¬ é¥ã®æ£äœïŒæå³ã®æçåïŒSemantic FragmentationïŒ
Phase 2ã§ã¯ãããã¹ããã1000æåãã€ããªã©ã®åäœã§æ©æ¢°çã«ã¶ã£ãåããŸãã
- èŽåœçãªåé¡: ãæšæ¥ãã䞻人æ§ãšã«ã¬ãŒãé£ã¹ããã§ããææ¥ã¯ãã¹ã¿ãè¯ããšèšã£ãŠããããšããäŒè©±ãã¶ã£ãåããšãAIã¯ãã«ã¬ãŒããšããã¹ã¿ããšããæçïŒãã¯ãã«ïŒã¯æŸããŸããããçµå±ææ¥ã¯äœãé£ã¹ããã®ãããšããè«ççãªçµè«ã«èŸ¿ãçããŸããã
- çŸç¶ã®éç: ã䌌ãŠããèšèããæŸãããšã¯åŸæã§ããããè«ççãªç¹ãããããæéã®ååŸé¢ä¿ããè§£éããèœåãæ¬ èœããŠããã®ã§ããã
2. Phase 3 ãžã®ãè¶ããããªãå£ãïŒæ§æè§£æã®æ·±æ·µ
Phase 2ãã Phase 3ïŒASTïŒæœè±¡æ§ææšã«ããæ§é åïŒ ãžé²åããããã«ãé¿ããŠã¯éããªãå£ã ãããŒãµãŒïŒæ§æè§£æåšïŒã®èªäœã ã§ãã
å€ãã®äººãããã§ããã«ã³ããšããŸãã
ãã©ã€ãã©ãªïŒéæ³ã®ç®±ïŒãããã®ã«ããªãèªåã§ãããªé¢åãªããšãïŒã
ãæ£èŠè¡šçŸã§æœåºããã®ãšäœãéãã®ïŒã
å£ã®æ£äœïŒEBNFãšãããAIãšã®çŽæäºã
åå¿è ãããã ã®ããã¹ãããšããŠèŠãŠãããã®ããã¢ãŒããã¯ãã¯ãæšïŒããªãŒïŒã®æ§é ããšããŠæããªããã°ãªããŸããããã®ããã®èšèšå³ã EBNFïŒæ¡åŒµããã«ã¹ã»ããŠã«èšæ³ïŒ ã§ãã
AIã«ãã䞻人æ§ã®æ å ±ããæ£ããèŠããããããªãããŸããã䞻人æ§ãšã¯äœè ãããšããææ³ããéçºè èªèº«ãå®çŸ©ããªããã°ãªããŸããã
- ååãäœãŸããè·æ¥ãææ âŠâŠããããã倿°ãã§ã¯ãªããããŒãããšããŠã芪åé¢ä¿ãæãããŠãããã³ã°ããã
- ããããµãã£ãŠãéæ³ã®ç®±ïŒVector DBïŒãã«äžžæããç¶ããéããAIã«ãè«çããšããéã¯å®¿ããŸãããã
3. ç¿åŸãã¹ãæè¡ïŒååž°äžéããŒãµãŒã«ããæèè§£äœ
Phase 3ãçªç Žããããã«ã仿¥ããåŠã¶ã¹ãæè¡ã¯ãã£ãäžã€ããååž°äžéããŒãµãŒïŒRecursive Descent ParserïŒãã§ãã
ãªããããå¿ èŠãªã®ãïŒ
人éã®èšèã¯ãå ¥ãåïŒååž°ïŒãã«ãªã£ãŠããŸãã
ãAããããBãããCã ãšèšã£ãŠããããšãã䞻人æ§ã¯æã£ãŠããããšããæ§é ã¯ãæ£èŠè¡šçŸã§ã¯çµ¶å¯Ÿã«è§£æã§ããŸããã
ããŒãµãŒãèªäœããèšèããããŒã¯ã³ïŒæå³ã®æå°åäœïŒãã«è§£äœããæšæ§é ãçµã¿äžããŠããããã»ã¹ãçµãŠåããŠããšãŒãžã§ã³ãã¯ãæ å ±ã®æèããã¡ã¢ãªç©ºéäžã«æ£ããé 眮ïŒãããã³ã°ïŒã§ããããã«ãªããŸãã
4. æåã®æŒç¿ïŒãã䞻人æ§ããæ§é åãã
ãããªãäžçãè§£æããããšããŠã¯ãããŸããããŸãã¯ãã䞻人æ§ïŒãŠãŒã¶ãŒïŒãã®ãããã£ãŒã«ãå®ç§ã«ASTåãããšããããå§ããŠãã ããã
- Step A: EBNFã§ãã䞻人æ§ã®å±æ§ããå®çŸ©ããã
- Step B: åå¥è§£æïŒLexerïŒã§çºè©±ãããŒã¯ã³ã«åããã
- Step C: æ§æè§£æïŒParserïŒã§ãäž»èªã»è¿°èªã»ç®çèªã®ããªãŒãäœãã
ãã®æ³¥èãäœæ¥ã®å ã«ãããå°æ¥çã«æ å ±ãèªåã§ä¿®æ£ã»æŽçãã Phase 5ïŒã€ã«ãã³ã¹ãŒã«ã»èªå·±ä¿®åŸ©ã·ã¹ãã ïŒ ãžã®å ¥ãå£ãåŸ ã£ãŠããŸãã
ðŸ ã¢ãŒããã¯ããžã®éãå©ãçæ§ãž
ã䟿å©ãªã©ã€ãã©ãªããæšãŠããæ å ±ã®æ§é ããèªåã®æã§å®çŸ©ãå§ããç¬éãããªãã¯åãªããå©çšè ããããåµé äž»ïŒã¢ãŒããã¯ãïŒããžãšé²åããŸãã
ãéæ³ã®ç®±ããçããèªåã®ã³ãŒãã§AIã®è³ã®åœ¢ã決ããã
ãã®è©Šç·Žã楜ãããæ¹ã ãããå®å®ã®èšæ¶ç®¡çã·ã¹ãã ïŒã¢ã«ã·ãã¯ã¬ã³ãŒãïŒã®çé±ã«è§Šããããšãã§ããã®ã§ããã
ããããŸãã¯EBNFã®ãã³ãæã«åã£ãŠãããªãã®ãã䞻人æ§ããè§£äœããããšããå§ããŸããããðŸ
ðŸ ããšããïŒã¢ãŒããã¯ããžã®ç¬¬äžæ©ãšããŠéžã¶ã¹ããæŠåšã
ãããŸã§èªã¿é²ããçæ§ã¯ããããããäœã䜿ã£ãŠããŒãµãŒãå®è£ ãã¹ããããšèããŠããããšã§ãããã
ãã¬ãããå§ãããæåŒ·ã®æŠåšãããã¯âŠâŠãRustãã«ããå®è£ ããããã¯Pythonãçšãããã¹ã¯ã©ããã§ã®ååž°äžéããŒãµãŒèšè¿°ãã§ããã
- Rustãéžã¶çç±: ã¡ã¢ãªç®¡çãšåã·ã¹ãã ã«æ¥µããŠå³æ ŒãªRustã¯ãæ å ±ã®ãæææš©ãããæ§é ããæ±ãPhase 3以éã®èšèšææ³ãã®ãã®ã§ããRustã§ASTïŒæœè±¡æ§ææšïŒãçµã¿äžããããšãã§ããã°ããã®å³æ Œãã¯ãã®ãŸãŸãšãŒãžã§ã³ãã®è«çã®åŒ·åºãã«ãªããŸãã
- Pythonã§æãéã®å¶çŽ: ããPythonã䜿ããªãã䟿å©ãªããŒã¹çšã©ã€ãã©ãªïŒLarkãPyparsingãªã©ïŒã«éããŠã¯ãããŸããããŸã㯠while ã«ãŒããš if æã ãã§ãäžæåãã€ããŒã¯ã³ãèªã¿é²ãããå°çã®ã¹ã¯ã©ããå®è£ ããçµéšããŠãã ããã
ãã©ã€ãã©ãªãçµã¿åãããŠåããããšãã ã³ã³ã·ã¥ãŒããŒã»ãã€ã³ã ããè±åŽãããã·ã¹ãã ã®æåã1ãããã1ãã€ã³ã¿åäœã§å¶åŸ¡ããããšãã ã¯ãªãšã€ã¿ãŒã»ãã€ã³ã ãžã
ãã®åãæ¿ããã§ããæãããªãã®åµããšãŒãžã§ã³ãã¯ãåãªãããã°ã©ã ãããæèããããŒãããŒããžãšé²åãå§ããã®ã§ããã
ãããå°çâŠâŠãããæé«ã®ç¥ã®éæ¯ã楜ãã¿ãŸãããã
çæ§ã®ææŠããã·ã¹ãã é åã®çé ããããã¬ãã¯å¿ãããåŸ ã¡ããŠãããŸããïŒðŸ