More than 5 years have passed since last update.

StanfordNLP.NET をC#でつかってみる

Last updated at 2017-05-08Posted at 2017-05-08

#Javaから移植しているStanfordNLP
なにがこまるって、自分はC#使いだから、Java使い前提な言語っていうのと、Javaの言語使用が古いバージョンで書かれているからいろいろとこまる。

##Sampleがいろいろと足らん
Sampleは一応、ある
https://sergey-tihon.github.io/Stanford.NLP.NET/StanfordCoreNLP.html

でも、いろいろと足らない

Modelをどこにおいたらいいかわからない
どうやって、Tokenをとったらいいのかわからない。
一部だけほしいのに、取り出せない。

まぁ、1番目は、modelファイルを解凍して適切におけばなんとかなるのだけど、2番目、3番目はとりあえず、Sampleだけみても使い道にならない。

##というわけで、分析する

こういう時にはVisual Studio. Visual Studio はマジで神。

分析した図が上のやつ。どうやら、Annotationがポイントになりそうで、まずは、Annotationからデータを取ってこなくてはいけない。
そのために使うのは、TokenAnnotation。
TokenAnnotationを使えば、ArrayListとしてデータをゲットできるけど、ArrayListにキャストしなくてはいかん。

var tokens= (ArrayList)annotation.get(new CoreAnnotations.TokensAnnotation().getClass());

ってな感じ。（ただ、正直、ArrayListは使いにくい）
その後、取得したいものに対して、foreachで取得する。

sample.cs

  var valueAnnotation = new CoreAnnotations.ValueAnnotation().getClass();
  var POS = new CoreAnnotations.PartOfSpeechAnnotation().getClass();

  foreach (CoreMap item in tokens)
  {
         Console.WriteLine(item.get(valueAnnotation).ToString());
         Console.WriteLine(item.get(POS).ToString());
  }

というわけで、StanfordNLP.NET を基にした、NLP用ラッパークラスを作った。
結果は、Dictionaryのリストで戻って来る予定で、

NLP.Run("Test")

ってやれば、Testに関する valueAnnotation, POS,SentenceIndexなどが返ってきます。
使いやすくなった。

NLP.cs

 public class NLP
    {
        static Dictionary<string,java.lang.Class> extractedElement;
        static StanfordCoreNLP pipeline;
        static NLP()
        {
            // Path to the folder with models extracted from `stanford-corenlp-3.7.0-models.jar`
            var jarRoot = @"..\..\stanford-corenlp-3.7.0-models";

            // Annotation pipeline configuration
            var props = new Properties();
            props.setProperty("annotators", "tokenize, ssplit, pos,lemma,ner");
            props.setProperty("ner.useSUTime", "0");

            // We should change current directory, so StanfordCoreNLP could find all the model files automatically
            var curDir = Environment.CurrentDirectory;
            Directory.SetCurrentDirectory(jarRoot);
            pipeline = new StanfordCoreNLP(props);
            Directory.SetCurrentDirectory(curDir);


            var valueAnnotation = new CoreAnnotations.ValueAnnotation().getClass();
            var POS = new CoreAnnotations.PartOfSpeechAnnotation().getClass();
            var Sentence = new CoreAnnotations.SentenceIndexAnnotation().getClass();
            var Begin = new CoreAnnotations.CharacterOffsetBeginAnnotation().getClass();
            var End = new CoreAnnotations.CharacterOffsetEndAnnotation().getClass();
            var Lemma = new CoreAnnotations.LemmaAnnotation().getClass();
            //Indexは1からスタート
            var Index = new CoreAnnotations.IndexAnnotation().getClass();
            var NamedEntity = new CoreAnnotations.NamedEntityTagAnnotation().getClass();

            extractedElement = new Dictionary<string, java.lang.Class>()
            {
                {"word",valueAnnotation},
                {"POS", POS },
                {"SentenceIndex", Sentence },
                {"BeginPos",Begin },
                {"EndPos", End },
                {"Lemma", Lemma },
                {"Index", Index },
                {"NamedEntity", NamedEntity }
            };          
        }

        public static List<Dictionary<string, string>> Run(string text)
        {
            List<Dictionary<string, string>> retValue = new List<Dictionary<string, string>>();

            // Annotation
            var annotation = new Annotation(text);
            pipeline.annotate(annotation);
            var tokens = (ArrayList)annotation.get(new CoreAnnotations.TokensAnnotation().getClass());
            foreach (CoreMap item in tokens)
            {
                Dictionary<string, string> temp = new Dictionary<string, string>();
                foreach (var element in extractedElement.Keys)
                {
                    temp.Add(element,item.get(extractedElement[element]).ToString());
                }
                retValue.Add(temp);
            }

            return retValue;
        }
    }

     public class Program
    {
        static void Main(string[] args)
        {
            // Text for processing
            var text = "Kosgi Santosh sent an email to Stanford University. He didn't get a reply.";

            var result = NLP.Run(text);
        }
     }

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up