Skip to content

Commit e8464d3

Browse files
Ivan Bogatyycalberti
Ivan Bogatyy
authored andcommitted
Update the DRAGNN (tensorflow#1191)
* Release DRAGNN * Update CoNLL evaluation table & evaluator.py * Update documentation & tutorial for DRAGNN * Update the DRAGNN * Tutorial link
1 parent 51fcc99 commit e8464d3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+86578
-4360
lines changed

syntaxnet/Dockerfile

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Java baseimage, for Bazel.
2-
FROM java:8
2+
FROM openjdk:8
33

44
ENV SYNTAXNETDIR=/opt/tensorflow PATH=$PATH:/root/bin
55

@@ -50,6 +50,8 @@ RUN python -m pip install \
5050
&& python -m pip install pygraphviz \
5151
--install-option="--include-path=/usr/include/graphviz" \
5252
--install-option="--library-path=/usr/lib/graphviz/" \
53+
&& python -m jupyter_core.command nbextension enable \
54+
--py --sys-prefix widgetsnbextension \
5355
&& rm -rf /root/.cache/pip /tmp/pip*
5456

5557
# Installs the latest version of Bazel.
@@ -86,6 +88,5 @@ EXPOSE 8888
8688
# This does not need to be compiled, only copied.
8789
COPY examples $SYNTAXNETDIR/syntaxnet/examples
8890
# Todo: Move this earlier in the file (don't want to invalidate caches for now).
89-
RUN jupyter nbextension enable --py --sys-prefix widgetsnbextension
9091

9192
CMD /bin/bash -c "bazel-bin/dragnn/tools/oss_notebook_launcher notebook --debug --notebook-dir=/opt/tensorflow/syntaxnet/examples"

syntaxnet/README.md

+18-8
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,16 @@ This repository is largely divided into two sub-packages:
2020

2121
1. **DRAGNN:
2222
[code](https://github.com/tensorflow/models/tree/master/syntaxnet/dragnn),
23-
[documentation](g3doc/DRAGNN.md)** implements Dynamic Recurrent Acyclic
24-
Graphical Neural Networks (DRAGNN), a framework for building multi-task,
25-
fully dynamic constructed computation graphs. Practically, we use DRAGNN to
26-
extend our prior work from [Andor et al.
23+
[documentation](g3doc/DRAGNN.md),
24+
[paper](https://arxiv.org/pdf/1703.04474.pdf)** implements Dynamic Recurrent
25+
Acyclic Graphical Neural Networks (DRAGNN), a framework for building
26+
multi-task, fully dynamically constructed computation graphs. Practically, we
27+
use DRAGNN to extend our prior work from [Andor et al.
2728
(2016)](http://arxiv.org/abs/1603.06042) with end-to-end, deep recurrent
28-
models and to provide a much easier to use interface to SyntaxNet.
29+
models and to provide a much easier to use interface to SyntaxNet. *DRAGNN
30+
is designed first and foremost as a Python library, and therefore much
31+
easier to use than the original SyntaxNet implementation.*
32+
2933
1. **SyntaxNet:
3034
[code](https://github.com/tensorflow/models/tree/master/syntaxnet/syntaxnet),
3135
[documentation](g3doc/syntaxnet-tutorial.md)** is a transition-based
@@ -42,7 +46,7 @@ There are three ways to use SyntaxNet:
4246
SyntaxNet/DRAGNN baseline for the CoNLL2017 Shared Task, and running the
4347
ParseySaurus models.
4448
* You can use DRAGNN to train your NLP models for other tasks and dataset. See
45-
"Getting started with DRAGNN below."
49+
"Getting started with DRAGNN" below.
4650
* You can continue to use the Parsey McParseface family of pre-trained
4751
SyntaxNet models. See "Pre-trained NLP models" below.
4852

@@ -117,9 +121,13 @@ We have a few guides on this README, as well as more extensive
117121

118122
![DRAGNN](g3doc/unrolled-dragnn.png)
119123

120-
An easy and visual way to get started with DRAGNN is to run [our Jupyter
121-
Notebook](examples/dragnn/basic_parser_tutorial.ipynb). Our tutorial
124+
An easy and visual way to get started with DRAGNN is to run our Jupyter
125+
notebooks for [interactive
126+
debugging](examples/dragnn/interactive_text_analyzer.ipynb) and [training a new
127+
model](examples/dragnn/trainer_tutorial.ipynb). Our tutorial
122128
[here](g3doc/CLOUD.md) explains how to start it up from the Docker container.
129+
Once you have DRAGNN installed and running, try out the
130+
[ParseySaurus](g3doc/conll2017) models.
123131

124132
### Using the Pre-trained NLP models
125133

@@ -285,6 +293,7 @@ Original authors of the code in this package include (in alphabetical order):
285293
* Aliaksei Severyn
286294
* Andy Golding
287295
* Bernd Bohnet
296+
* Chayut Thanapirom
288297
* Chris Alberti
289298
* Daniel Andor
290299
* David Weiss
@@ -294,6 +303,7 @@ Original authors of the code in this package include (in alphabetical order):
294303
* Ji Ma
295304
* Keith Hall
296305
* Kuzman Ganchev
306+
* Lingpeng Kong
297307
* Livio Baldini Soares
298308
* Mark Omernick
299309
* Michael Collins

syntaxnet/beam_search_training.png

-346 KB
Binary file not shown.

syntaxnet/docker-devel/Dockerfile.min

+66
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# You need to build wheels before building this image. Please consult
2+
# docker-devel/README.txt.
3+
4+
# This is the base of the openjdk image.
5+
#
6+
# It might be more efficient to use a minimal distribution, like Alpine. But
7+
# the upside of this being popular is that people might already have it.
8+
FROM buildpack-deps:jessie-curl
9+
10+
ENV SYNTAXNETDIR=/opt/tensorflow PATH=$PATH:/root/bin
11+
12+
RUN apt-get update \
13+
&& apt-get install -y \
14+
file \
15+
git \
16+
graphviz \
17+
libcurl3 \
18+
libfreetype6 \
19+
libgraphviz-dev \
20+
liblapack3 \
21+
libopenblas-base \
22+
libpng12-0 \
23+
libxft2 \
24+
python-dev \
25+
python-mock \
26+
python-pip \
27+
python2.7 \
28+
zlib1g-dev \
29+
&& apt-get clean \
30+
&& (rm -f /var/cache/apt/archives/*.deb \
31+
/var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true)
32+
33+
# Install common Python dependencies. Similar to above, remove caches
34+
# afterwards to help keep Docker images smaller.
35+
RUN pip install --ignore-installed pip \
36+
&& python -m pip install numpy \
37+
&& rm -rf /root/.cache/pip /tmp/pip*
38+
RUN python -m pip install \
39+
asciitree \
40+
ipykernel \
41+
jupyter \
42+
matplotlib \
43+
pandas \
44+
protobuf \
45+
scipy \
46+
sklearn \
47+
&& python -m ipykernel.kernelspec \
48+
&& python -m pip install pygraphviz \
49+
--install-option="--include-path=/usr/include/graphviz" \
50+
--install-option="--library-path=/usr/lib/graphviz/" \
51+
&& rm -rf /root/.cache/pip /tmp/pip*
52+
53+
COPY syntaxnet_with_tensorflow-0.2-cp27-none-linux_x86_64.whl $SYNTAXNETDIR/
54+
RUN python -m pip install \
55+
$SYNTAXNETDIR/syntaxnet_with_tensorflow-0.2-cp27-none-linux_x86_64.whl \
56+
&& rm -rf /root/.cache/pip /tmp/pip*
57+
58+
# This makes the IP exposed actually "*"; we'll do host restrictions by passing
59+
# a hostname to the `docker run` command.
60+
COPY tensorflow/tensorflow/tools/docker/jupyter_notebook_config.py /root/.jupyter/
61+
EXPOSE 8888
62+
63+
# This does not need to be compiled, only copied.
64+
COPY examples $SYNTAXNETDIR/syntaxnet/examples
65+
# For some reason, this works if we run it in a bash shell :/ :/ :/
66+
CMD /bin/bash -c "python -m jupyter_core.command notebook --debug --notebook-dir=/opt/tensorflow/syntaxnet/examples"

syntaxnet/docker-devel/README.txt

+64
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
Docker is used for packaging the SyntaxNet. There are three primary things we
2+
build with Docker,
3+
4+
1. A development image, which contains all source built with Bazel.
5+
2. Python/pip wheels, built by running a command in the development container.
6+
3. A minified image, which only has the compiled version of TensorFlow and
7+
SyntaxNet, by installing the wheel built by the above step.
8+
9+
10+
Important info (please read)
11+
------------------------------
12+
13+
One thing to be wary of is that YOU CAN LOSE DATA IF YOU DEVELOP IN A DOCKER
14+
CONTAINER. Please be very careful to mount data you care about to Docker
15+
volumes, or use a volume mount so that it's mapped to your host filesystem.
16+
17+
Another note, especially relevant to training models, is that Docker sends the
18+
whole source tree to the Docker daemon every time you try to build an image.
19+
This can take some time if you have large temporary model files lying around.
20+
You can exclude your model files by editing .dockerignore, or just don't store
21+
them in the base directory.
22+
23+
24+
Step 1: Building the development image
25+
------------------------------
26+
27+
Simply run `docker build -t dragnn-oss .` in the base directory. Make sure you
28+
have all the source checked out correctly, including git submodules.
29+
30+
31+
Step 2: Building wheels
32+
------------------------------
33+
34+
Please run,
35+
36+
bash ./docker-devel/build_wheels.sh
37+
38+
This actually builds the image from Step 1 as well.
39+
40+
41+
Step 3: Building the development image
42+
------------------------------
43+
44+
First, ensure you have the file
45+
46+
syntaxnet_with_tensorflow-0.2-cp27-none-linux_x86_64.whl
47+
48+
in your working directory, from step 2. Then run,
49+
50+
docker build -t dragnn-oss:latest-minimal -f docker-devel/Dockerfile.min
51+
52+
If the filename changes (e.g. you are on a different architecture), just update
53+
Dockerfile.min.
54+
55+
56+
Developing in Docker
57+
------------------------------
58+
59+
We recommend developing in Docker by using the `./docker-devel/build_devel.sh`
60+
script; it will set up a few volume mounts, and port mappings automatically.
61+
You may want to add more port mappings on your own. If you want to drop into a
62+
shell instead of launching the notebook, simply run,
63+
64+
./docker-devel/build_devel.sh /bin/bash

syntaxnet/docker-devel/build_devel.sh

+1
Original file line numberDiff line numberDiff line change
@@ -23,5 +23,6 @@ syntaxnet_base="/opt/tensorflow/syntaxnet"
2323
docker run --rm -ti \
2424
-v "${root_path}"/syntaxnet:"${syntaxnet_base}"/syntaxnet \
2525
-v "${root_path}"/dragnn:"${syntaxnet_base}"/dragnn \
26+
-v "${root_path}"/examples:"${syntaxnet_base}"/examples \
2627
-p 127.0.0.1:8888:8888 \
2728
dragnn-oss "$@"

syntaxnet/dragnn/protos/spec.proto

+4-10
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,10 @@ package syntaxnet.dragnn;
1313
message MasterSpec {
1414
repeated ComponentSpec component = 1;
1515

16-
// DEPRECATED: Use the "batch_size" param of DragnnTensorFlowTrainer instead.
17-
optional int32 deprecated_batch_size = 2 [default = 1, deprecated = true];
18-
19-
// DEPRECATED: Use ComponentSpec.*_beam_size instead.
20-
optional int32 deprecated_beam_size = 3 [default = 1, deprecated = true];
21-
2216
// Whether to extract debug traces.
2317
optional bool debug_tracing = 4 [default = false];
18+
19+
reserved 2, 3, 5;
2420
}
2521

2622
// Complete specification for a single task.
@@ -221,10 +217,6 @@ message GridPoint {
221217
// problems for updates at the start of training.
222218
optional double gradient_clip_norm = 11 [default = 0.0];
223219

224-
// DEPRECATED: Use TrainTarget instead.
225-
repeated double component_weights = 5;
226-
repeated bool unroll_using_oracle = 6;
227-
228220
// A spec for using multiple optimization methods.
229221
message CompositeOptimizerSpec {
230222
// First optimizer.
@@ -254,6 +246,8 @@ message GridPoint {
254246
// should be restricted. If left empty, no filtering will take
255247
// place. Typically a single component.
256248
optional string self_norm_components_filter = 21;
249+
250+
reserved 5, 6;
257251
}
258252

259253
// Training target to be built into the graph.

syntaxnet/dragnn/python/BUILD

+1
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,7 @@ py_test(
154154
srcs = ["visualization_test.py"],
155155
deps = [
156156
":visualization",
157+
"//dragnn/protos:spec_py_pb2",
157158
"//dragnn/protos:trace_py_pb2",
158159
"@org_tensorflow//tensorflow:tensorflow_py",
159160
],

syntaxnet/dragnn/python/visualization.py

+29-6
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,15 @@ def parse_trace_json(trace):
5454
return as_json
5555

5656

57+
def _optional_master_spec_json(master_spec):
58+
"""Helper function to return 'null' or a master spec JSON string."""
59+
if master_spec is None:
60+
return 'null'
61+
else:
62+
return json_format.MessageToJson(
63+
master_spec, preserving_proto_field_name=True)
64+
65+
5766
def _container_div(height='700px', contents=''):
5867
elt_id = str(uuid.uuid4())
5968
html = """
@@ -64,7 +73,11 @@ def _container_div(height='700px', contents=''):
6473
return elt_id, html
6574

6675

67-
def trace_html(trace, convert_to_unicode=True, height='700px', script=None):
76+
def trace_html(trace,
77+
convert_to_unicode=True,
78+
height='700px',
79+
script=None,
80+
master_spec=None):
6881
"""Generates HTML that will render a master trace.
6982
7083
This will result in a self-contained "div" element.
@@ -76,6 +89,8 @@ def trace_html(trace, convert_to_unicode=True, height='700px', script=None):
7689
often pass the output of this function to IPython.display.HTML.
7790
height: CSS string representing the height of the element, default '700px'.
7891
script: Visualization script contents, if the defaults are unacceptable.
92+
master_spec: Master spec proto (parsed), which can improve the layout. May
93+
be required in future versions.
7994
8095
Returns:
8196
unicode or str with HTML contents.
@@ -89,10 +104,14 @@ def trace_html(trace, convert_to_unicode=True, height='700px', script=None):
89104
{div_html}
90105
<script type='text/javascript'>
91106
{script}
92-
visualizeToDiv({json}, "{elt_id}");
107+
visualizeToDiv({json}, "{elt_id}", {master_spec_json});
93108
</script>
94109
""".format(
95-
script=script, json=json_trace, elt_id=elt_id, div_html=div_html)
110+
script=script,
111+
json=json_trace,
112+
master_spec_json=_optional_master_spec_json(master_spec),
113+
elt_id=elt_id,
114+
div_html=div_html)
96115
return unicode(as_str, 'utf-8') if convert_to_unicode else as_str
97116

98117

@@ -174,11 +193,13 @@ def initial_html(self, height='700px', script=None, init_message=None):
174193
script=script, div_html=div_html)
175194
return unicode(html, 'utf-8') # IPython expects unicode.
176195

177-
def show_trace(self, trace):
196+
def show_trace(self, trace, master_spec=None):
178197
"""Returns a JS script HTML fragment, which will populate the container.
179198
180199
Args:
181200
trace: binary-encoded MasterTrace string.
201+
master_spec: Master spec proto (parsed), which can improve the layout. May
202+
be required in future versions.
182203
183204
Returns:
184205
unicode with HTML contents.
@@ -187,8 +208,10 @@ def show_trace(self, trace):
187208
<meta charset="utf-8"/>
188209
<script type='text/javascript'>
189210
document.getElementById("{elt_id}").innerHTML = ""; // Clear previous.
190-
visualizeToDiv({json}, "{elt_id}");
211+
visualizeToDiv({json}, "{elt_id}", {master_spec_json});
191212
</script>
192213
""".format(
193-
json=parse_trace_json(trace), elt_id=self.elt_id)
214+
json=parse_trace_json(trace),
215+
master_spec_json=_optional_master_spec_json(master_spec),
216+
elt_id=self.elt_id)
194217
return unicode(html, 'utf-8') # IPython expects unicode.

syntaxnet/dragnn/python/visualization_test.py

+16-1
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
1+
# -*- coding: utf-8 -*-
12
"""Tests for dragnn.python.visualization."""
23

34
from __future__ import absolute_import
45
from __future__ import division
56
from __future__ import print_function
67

78
from tensorflow.python.platform import googletest
9+
from dragnn.protos import spec_pb2
810
from dragnn.protos import trace_pb2
911
from dragnn.python import visualization
1012

@@ -15,10 +17,16 @@ def _get_trace_proto_string():
1517
step_trace=[
1618
trace_pb2.ComponentStepTrace(fixed_feature_trace=[]),
1719
],
18-
name='test_component',)
20+
# Google Translate says this is "component" in Chinese. (To test UTF-8).
21+
name='零件',)
1922
return trace.SerializeToString()
2023

2124

25+
def _get_master_spec():
26+
return spec_pb2.MasterSpec(
27+
component=[spec_pb2.ComponentSpec(name='jalapeño')])
28+
29+
2230
class VisualizationTest(googletest.TestCase):
2331

2432
def testCanFindScript(self):
@@ -37,6 +45,13 @@ def testInteractiveVisualization(self):
3745
widget.initial_html()
3846
widget.show_trace(_get_trace_proto_string())
3947

48+
def testMasterSpecJson(self):
49+
visualization.trace_html(
50+
_get_trace_proto_string(), master_spec=_get_master_spec())
51+
widget = visualization.InteractiveVisualization()
52+
widget.initial_html()
53+
widget.show_trace(_get_trace_proto_string(), master_spec=_get_master_spec())
54+
4055

4156
if __name__ == '__main__':
4257
googletest.main()

0 commit comments

Comments
 (0)