Google just released (12 March 2016) its open-source project SyntaxNet which they say as "the world's most accurate parser". SyntaxNet has been developed using Google's Tensorflow Framework.
This is a tutorial on OSX to get started with SyntaxNet to tag part-of-speech(POS) in English sentences.
Here are the steps for installation:
-
Install bazel:
-
Install homebrew and then swig:
$ brew install swig
-
Install protocol buffers supported by tensorflow:
$ pip install -U protobuf==3.0.0b2
-
Install asciitree to draw parse trees i=on console:
$ pip install asciitree
Now check the build and test SyntaxNet using:
$ git clone --recursive https://github.com/tensorflow/models.git
$ cd models/syntaxnet/tensorflow
$ ./configure
$ cd ..
#For testing on Mac run following: (will take some time for tests to pass)
$ bazel test --linkopt=-headerpad_max_install_names \
syntaxnet/... util/utf8/...
Google has already provided a trained model for POS-tagging English sentences called Parsey McParseface. This is located under syntaxnet/models
. To test it inside the terminal we can use the script syntaxnet/demo.sh
which provides a basic interface to Parsey McParseface.
# try this in terminal to POS-tag a sentence
$ echo 'Did you see that man?' | syntaxnet/demo.sh
# the following should be the output
Input: Did you see that man ?
Parse:
see VB ROOT
+-- Did VBD aux
+-- you PRP nsubj
+-- man NN dobj
| +-- that DT det
+-- ? . punct