More than 5 years have passed since last update.

Native build of Tensorflow v2.0.0-rc0 for RaspberryPi3 (armv7l)

Last updated at 2019-10-20Posted at 2019-08-31

Tensorflow-bin　

Bazel_bin　

１．Introduction

Tensorflow v2.0.0-rc0版でのアンオフィシャルなコンパイル格闘ログです。今回はビルド完了まで 76時間15分 掛かりました。過去のalpha版でのコンパイル記事は RaspberryPi3用のTensorflow v2.0.0-alpha (Tensorflow Lite v1.0) のインストーラ(Wheel)を速攻でネイティブビルド錬成した、 beta版でのコンパイル記事は Native build of Tensorflow v2.0.0-beta for Raspberry Pi3 (armv7l) です。 beta版の頃からrc0にかけて、ビルドオプションに指定可能な組み合わせが変更されています。コンパイル済みバイナリの利用上の制約事項を記載していますので、以下に続く Introduction の章をしっかりと読んでいただくことを推奨いたします。ただ、英文は読まなくてもコピペで同じことができます。誰でもできる単調な作業ですが、丸３日間コンパイルし続ける異常な胆力があり、高耐久microSDカードを所有しているか、外付けHDDを所有している方のみチャレンジしてください。なお、 beta1 から rc0 に掛けて、かなりの量の修正が加えられているようです。

Tensorflow Release 2.0.0-rc0

https://github.com/tensorflow/tensorflow/releases/tag/v2.0.0-rc0

１−１．Major Features and Improvements

TensorFlow 2.0 focuses on simplicity and ease of use, featuring updates like:

Easy model building with Keras and eager execution.
Robust model deployment in production on any platform.
Powerful experimentation for research.
API simplification by reducing duplication and removing deprecated endpoints.

For details on best practices with 2.0, see the Effective 2.0 guide

For information on upgrading your existing TensorFlow 1.x models, please refer to our Upgrade and Migration guides. We have also released a collection of tutorials and getting started guides.

１−２．Highlights

TF 2.0 delivers Keras as the central high level API used to build and train models. Keras provides several model-building APIs such as Sequential, Functional, and Subclassing along with eager execution, for immediate iteration and intuitive debugging, and tf.data, for building scalable input pipelines. Checkout guide for additional details.
Distribution Strategy: TF 2.0 users will be able to use the tf.distribute.Strategy API to distribute training with minimal code changes, yielding great out-of-the-box performance. It supports distributed training with Keras model.fit, as well as with custom training loops. Multi-GPU support is available, along with experimental support for multi worker and Cloud TPUs. Check out the guide for more details.
Functions, not Sessions. The traditional declarative programming model of building a graph and executing it via a tf.Session is discouraged, and replaced with by writing regular Python functions. Using the tf.function decorator, such functions can be turned into graphs which can be executed remotely, serialized, and optimized for performance.
Unification of tf.train.Optimizers and tf.keras.Optimizers. Use tf.keras.Optimizers for TF2.0. compute_gradients is removed as public API, and use GradientTape to compute gradients.
AutoGraph translates Python control flow into TensorFlow expressions, allowing users to write regular Python inside tf.function-decorated functions. AutoGraph is also applied in functions used with tf.data, tf.distribute and tf.keras APIs.
Unification of exchange formats to SavedModel. All TensorFlow ecosystem projects (TensorFlow Lite, TensorFlow JS, TensorFlow Serving, TensorFlow Hub) accept SavedModels. Model state should be saved to and restored from SavedModels.
API Changes: Many API symbols have been renamed or removed, and argument names have changed. Many of these changes are motivated by consistency and clarity. The 1.x API remains available in the compat.v1 module. A list of all symbol changes can be found here.
API clean-up, included removing tf.app, tf.flags, and tf.logging in favor of absl-py.
No more global variables with helper methods like tf.global_variables_initializer and tf.get_global_step.

１−３．Breaking Changes

Many backwards incompatible API changes have been made to clean up the APIs and make them more consistent.
tf.contrib has been deprecated, and functionality has been either migrated to the core TensorFlow API, to an ecosystem project such as tensorflow/addons or tensorflow/io, or removed entirely.
Premade estimators in the tf.estimator.DNN/Linear/DNNLinearCombined family have been updated to use tf.keras.optimizers instead of the tf.compat.v1.train.Optimizers. If you do not pass in an optimizer= arg or if you use a string, the premade estimator will use the Keras optimizer. This is checkpoint breaking, as the optimizers have separate variables. A checkpoint converter tool for converting optimizers is included with the release, but if you want to avoid any change, switch to the v1 version of the estimator: tf.compat.v1.estimator.DNN/Linear/DNNLinearCombined*.
The equality operation on Tensors & Variables now compares on value instead of id(). As a result, both Tensors & Variables are no longer hashable types.
Layers now default to float32, and automatically cast their inputs to the layer's dtype. If you had a model that used float64, it will probably silently use float32 in TensorFlow 2, and a warning will be issued that starts with "Layer is casting an input tensor from dtype float64 to the layer's dtype of float32". To fix, either set the default dtype to float64 with tf.keras.backend.set_floatx('float64'), or pass dtype='float64' to each of the Layer constructors. See tf.keras.layers.Layer for more information.

Refer to our public project status tracker and issues tagged with 2.0 on GitHub for insight into recent issues and development progress.

If you experience any snags when using TF 2.0, please let us know at the TF 2.0 Testing User Group. We have a support mailing list as well as weekly testing meetings, and would love to hear your migration feedback and questions.

１−４．Bug Fixes and Other Changes

tf.data:
- Add support for TensorArrays to tf.data Dataset.
- Integrate Ragged Tensors with tf.data.
- All core and experimental tf.data transformations that input user-defined functions can span multiple devices now.
- Extending the TF 2.0 support for shuffle(..., reshuffle_each_iteration=True) and cache() to work across different Python iterators for the same dataset.
- Removing the experimental_numa_aware option from tf.data.Options.
- Add num_parallel_reads and passing in a Dataset containing filenames into TextLineDataset and FixedLengthRecordDataset.
- Add support for defaulting the value of cycle_length argument of tf.data.Dataset.interleave to the number of schedulable CPU cores.
- Promoting tf.data.experimental.enumerate_dataset to core as tf.data.Dataset.enumerate.
- Promoting tf.data.experimental.unbatch to core as tf.data.Dataset.unbatch.
- Adds option for introducing slack in the pipeline to reduce CPU contention, via tf.data.Options().experimental_slack = True
- Added experimental support for parallel batching to batch() and padded_batch(). This functionality can be enabled through tf.data.Options()
- Support cancellation of long-running reduce.
- Now we use dataset node name as prefix instead of the op name, to identify the component correctly in metrics, for pipelines with repeated components.
tf.distribute:
- Enable tf.distribute.experimental.MultiWorkerMirroredStrategy working in eager mode.
- Disable run_eagerly and distribution strategy if there are symbolic tensors added to the model using add_metric or add_loss.
- Bug fix: loss and gradients should now more reliably be correctly scaled w.r.t. the global batch size when using a tf.distribute.Strategy.
- Set default loss reduction as AUTO for improving reliability of loss scaling with distribution strategy and custom training loops. AUTO indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to SUM_OVER_BATCH_SIZE. When used in distribution strategy scope, outside of built-in training loops such as tf.keras compile and fit, we expect reduction value to be 'None' or 'SUM'. Using other values will raise an error.
- Support for multi-host ncclAllReduce in Distribution Strategy.
tf.estimator:
- Replace tf.contrib.estimator.add_metrics with tf.estimator.add_metrics
- Use tf.compat.v1.estimator.inputs instead of tf.estimator.inputs
- Replace contrib references with tf.estimator.experimental.* for apis in early_s in Estimator
- Canned Estimators will now use keras optimizers by default. An error will be raised if tf.train.Optimizers are used, and you will have to switch to tf.keras.optimizers or tf.compat.v1 canned Estimators.
- A checkpoint converter for canned Estimators has been provided to transition canned Estimators that are warm started from tf.train.Optimizers to tf.keras.optimizers.
- Default aggregation for canned Estimators is now SUM_OVER_BATCH_SIZE. To maintain previous default behavior, please pass SUM as the loss aggregation method.
- Canned Estimators don’t support input_layer_partitioner arg in the API. If you have this arg, you will have to switch to tf.compat.v1 canned Estimators.
- Estimator.export_savedmodel has been renamed export_saved_model
- When saving to SavedModel, Estimators will strip default op attributes. This is almost always the correct behavior, as it is more forwards compatible, but if you require that default attributes are saved with the model, please use tf.compat.v1.Estimator
- Feature Columns have been upgraded to be more Eager-friendly and to work with Keras. As a result, tf.feature_column.input_layer has been deprecated in favor of tf.keras.layers.DenseFeatures. v1 feature columns have direct analogues in v2 except for shared_embedding_columns, which are not cross-compatible with v1 and v2. Use tf.feature_column.shared_embeddings instead.
- Losses are scaled in canned estimator v2 and not in the optimizers anymore. If you are using Estimator + distribution strategy + optimikzer v1 then the behavior does not change. This implies that if you are using custom estimator with optimizer v2, you have to scale losses. We have new utilities to help scale losses tf.nn.compute_average_loss, tf.nn.scale_regularization_loss.
tf.keras:
- Premade models (including Linear and WideDeep) have been introduced for the purpose of replacing Premade estimators.
- Model saving changes
- model.save and tf.saved_model.save may now save to the TensorFlow SavedModel format. The model can be restored using tf.keras.models.load_model. HDF5 files are still supported, and may be used by specifying save_format=\"h5\" when saving.
- tf.keras.model.save_model and model.save now defaults to saving a TensorFlow SavedModel. HDF5 files are still supported.
- Deprecated tf.keras.experimental.export_saved_model and tf.keras.experimental.function. Please use tf.keras.models.save_model(..., save_format='tf') and tf.keras.models.load_model` instead.
- Raw TensorFlow functions can now be used in conjunction with the Keras Functional API during model creation. This obviates the need for users to create Lambda layers in most cases when using the Functional API. Like Lambda layers, TensorFlow functions that result in Variable creation or assign ops are not supported.
- Add support for passing list of lists to the metrics argument in Keras `compile.
- Add tf.keras.layers.AbstractRNNCell as the preferred implementation for RNN cells in TF v2. User can use it to implement RNN cells with custom behavior.
- Keras training and validation curves are shown on the same plot when using the TensorBoard callback.
- Switched Keras fit/evaluate/predict execution to use only a single unified path by default unless eager execution has been explicitly disabled, regardless of input type. This unified path places an eager-friendly training step inside of a tf.function. With this 1. All input types are converted to Dataset. 2. The path assumes there is always a distribution strategy. when distribution strategy is not specified the path uses a no-op distribution strategy. 3. The training step is wrapped in tf.function unless run_eagerly=True is set in compile. The single path execution code does not yet support all use cases. We fallback to the existing v1 execution paths if your model contains the following: 1. sample_weight_mode in compile 2. weighted_metrics in compile 3. v1 optimizer 4. target tensors in compile. If you are experiencing any issues because of this change, please inform us (file an issue) about your use case and you can unblock yourself by setting experimental_run_tf_function=False in compile meanwhile. We have seen couple of use cases where the model usage pattern is not as expected and would not work with this change. 1. output tensors of one layer is used in the constructor of another. 2. symbolic tensors outside the scope of the model are used in custom loss functions. The flag can be disabled for these cases and ideally the usage pattern will need to be fixed.
- OMP_NUM_THREADS is no longer used by the default Keras config. To configure the number of threads, use tf.config.threading APIs.
- Mark Keras set_session as compat.v1 only.
- tf.keras.estimator.model_to_estimator now supports exporting to tf.train.Checkpoint format, which allows the saved checkpoints to be compatible with model.load_weights.
- keras.backend.resize_images (and consequently, keras.layers.Upsampling2D) behavior has changed, a bug in the resizing implementation was fixed.
- Add an implementation=3 mode for tf.keras.layers.LocallyConnected2D and tf.keras.layers.LocallyConnected1D layers using tf.SparseTensor to store weights, allowing a dramatic speedup for large sparse models.
- Raise error if batch_size argument is used when input is dataset/generator/keras sequence.
- Update TF 2.0 keras.backend.name_scope to use TF 2.0 name_scope.
- Add v2 module aliases for losses, metrics, initializers and optimizers: tf.losses = tf.keras.losses & tf.metrics = tf.keras.metrics & tf.initializers = tf.keras.initializers & tf.optimizers = tf.keras.optimizers.
- Updates binary cross entropy logic in Keras when input is probabilities. Instead of converting probabilities to logits, we are using the cross entropy formula for probabilities.
- Added public APIs for cumsum and cumprod keras backend functions.
- Add support for temporal sample weight mode in subclassed models.
- Raise ValueError if an integer is passed to the training APIs.
- Added fault-tolerance support for training Keras model via model.fit() with MultiWorkerMirroredStrategy, tutorial available.
- Callbacks are supported in MultiWorkerMirroredStrategy.
- Custom Callback tutorial is now available.
- To train with tf.distribute, Keras api is recommended over estimator.
- steps_per_epoch and steps arguments are supported with numpy arrays.
- New error message when unexpected keys are used in sample_weight/class_weight dictionaries
- Losses are scaled in Keras compile/fit and not in the optimizers anymore. If you are using custom training loop, we have new utilities to help scale losses tf.nn.compute_average_loss, tf.nn.scale_regularization_loss.
- Layer apply and add_variable APIs are deprecated.
- Added support for channels first data format in cross entropy losses with logits and support for tensors with unknown ranks.
- Error messages will be raised if add_update, add_metric, add_loss, activity regularizers are used inside of a control flow branch.
- New loss reduction types: 1. AUTO: Indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to SUM_OVER_BATCH_SIZE. When used with tf.distribute.Strategy, outside of built-in training loops such as tf.keras compile and fit, we expect reduction value to be SUM or NONE. Using AUTO in that case will raise an error. 2. NONE: Weighted losses with one dimension reduced (axis=-1, or axis specified by loss function). When this reduction type used with built-in Keras training loops like fit/evaluate, the unreduced vector loss is passed to the optimizer but the reported loss will be a scalar value. 3. SUM: Scalar sum of weighted losses. 4. SUM_OVER_BATCH_SIZE: Scalar SUM divided by number of elements in losses. This reduction type is not supported when used with tf.distribute.Strategy outside of built-in training loops like tf.keras compile/fit.
tf.lite:
- Added support for TFLiteConverter Python API in 2.0. Contains functions from_saved_model, from_keras_file, and from_concrete_functions.
- Removed lite.OpHint, lite.experimental, and lite.constant from 2.0 API.
- Added support for tflite_convert command line tool in 2.0.
- Post-training quantization tool supports quantizing weights shared by multiple operations. The models made with versions of this tool will use INT8 types for weights and will only be executable interpreters from this version onwards.
- Post-training quantization tool supports fp16 weights and GPU delegate acceleration for fp16.
tf.contrib:
- Expose tf.contrib.proto.* ops in tf.io (they will exist in TF2)
- Remove tf.contrib.timeseries dependency on TF distributions.
- Replace contrib references with tf.estimator.experimental.* for apis in early_stopping.py
Other:
- Bug fix for tf.tile gradient.
- TF code now resides in tensorflow_core and tensorflow is just a virtual pip package. No code changes are needed for projects using TensorFlow, the change is transparent
- Added gradient for SparseToDense op.
- Expose a flag that allows the number of threads to vary across Python benchmarks.
- ResourceVariable's gather op supports batch dimensions.
- image.resize in 2.0 now supports gradients for the new resize kernels.
- removed tf.string_split from v2 API
- Variadic reduce is supported on CPU Variadic reduce is supported on CPU
- Added GPU implementation of tf.linalg.tridiagonal_solve.
- Delete unused lookup table code
- Remove unused StringViewVariantWrapper.
- Delete unused Fingerprint64Map op registration
- Add broadcasting support to tf.matmul.
- Add ellipsis (...) support for tf.einsum().
- ResourceVariable support for gather_nd.
- Add expand_composites argument to all nest.* methods.
- Standardize the LayerNormalization API by replacing the args norm_axis and params_axis with axis.
- Add a new "result_type" parameter to tf.strings.split
- add_update can now be passed a zero-arg callable in order to support turning off the update when setting trainable=False on a Layer of a Model compiled with run_eagerly=True.
- Added tf.random.binomial.
- Extend tf.function with basic support for CompositeTensors arguments (such as SparseTensor and RaggedTensor).
- Add name argument to tf.string_split and tf.strings_split.
- Added strings.byte_split.
- CUDNN_INSTALL_PATH, TENSORRT_INSTALL_PATH, NCCL_INSTALL_PATH, NCCL_HDR_PATH are deprecated. Use TF_CUDA_PATHS instead which supports a comma-separated list of base paths that are searched to find CUDA libraries and headers.
- Add RaggedTensor.placeholder().
- Add pfor converter for Squeeze.
- Renamed tf.image functions to remove duplicate "image" where it is redundant.
- Add C++ Gradient for BatchMatMulV2.
- parallel_for.pfor: add converters for Softmax, LogSoftmax, IsNaN, All, Any, and MatrixSetDiag.
- parallel_for: add converters for LowerTriangularSolve and Cholesky.
- Add ragged tensor support to tf.squeeze.
- Allow LinearOperator.solve to take a LinearOperator.
- Allow all dtypes for LinearOperatorCirculant.
- Introduce MaxParallelism method
- parallel_for: add converter for BroadcastTo.
- Add LinearOperatorHouseholder.
- Added key and skip methods to random.experimental.Generator.
- Adds Philox support to new stateful RNG's XLA path.
- Update RaggedTensors to support int32 row_splits.
- Add TensorSpec support for CompositeTensors.
- Added partial_pivoting input parameter to tf.linalg.tridiagonal_solve.
- Extend tf.strings.split to support inputs with any rank
- Removing the experimental_numa_aware option from tf.data.Options.
- Improve the performance of datasets using from_tensors().
- Add tf.linalg.tridiagonal_mul op.
- Add LinearOperatorToeplitz.
- Added gradient to tf.linalg.tridiagonal_solve.
- Upgraded LIBXSMM to version 1.11.
- parallel_for: add converters for LogMatrixDeterminant and MatrixBandPart.
- Uniform processing of quantized embeddings by Gather and EmbeddingLookup Ops
- Correct a misstatement in the documentation of the sparse softmax cross entropy logit parameter.
- parallel_for: Add converters for OneHot, LowerBound, UpperBound.
- Added GPU implementation of tf.linalg.tridiagonal_matmul.
- Add gradient to tf.linalg.tridiagonal_matmul.
- Add tf.ragged.boolean_mask.
- tf.switch_case added, which selects a branch_fn based on a branch_index.
- The C++ kernel of gather op supports batch dimensions.
- Promoting unbatch from experimental to core API.
- Fixed default value and documentation for trainable arg of tf.Variable.
- Adds tf.enable_control_flow_v2() and tf.disable_control_flow_v2().
- EagerTensor now supports buffer interface for tensors.
- This change bumps the version number of the FullyConnected Op to 5.
- tensorflow : crash when pointer become nullptr.
- parallel_for: Add converter for MatrixDiag.
- Add 'narrow_range' attribute to QuantizeAndDequantizeV2 and V3.
- Added new op: tf.strings.unsorted_segment_join.
- Tensorflow code now produces 2 different pip packages: tensorflow_core containing all the code (in the future it will contain only the private implementation) and tensorflow which is a virtual pip package doing forwarding to tensorflow_core (and in the future will contain only the public API of tensorflow)
- Adding support for datasets as inputs to from_tensors and from_tensor_slices and batching and unbatching of nested datasets.
- Add HW acceleration support for topK_v2
- Add new TypeSpec classes
- CloudBigtable version updated to v0.10.0 BEGIN_PUBLIC CloudBigtable version updated to v0.10.0
- Deprecated the use of constraint= and .constraint with ResourceVariable.
- Expose Head as public API.
- Update docstring for gather to properly describe the non-empty batch_dims case.
- Added tf.sparse.from_dense utility function.
- Add GATHER support to NN API delegate
- Improved ragged tensor support in TensorFlowTestCase.
- Makes the a-normal form transformation in Pyct configurable as to which nodes are converted to variables and which are not.
- ResizeInputTensor now works for all delegates
- Start of open development of TF, TFLite, XLA MLIR dialects.
- Add EXPAND_DIMS support to NN API delegate TEST: expand_dims_test
- tf.cond emits a StatelessIf op if the branch functions are stateless and do not touch any resources.
- Add support of local soft device placement for eager op.
- Pass partial_pivoting to the _TridiagonalSolveGrad.
- Add HW acceleration support for LogSoftMax
- Added a function nested_value_rowids for ragged tensors.
- fixed a bug in histogram_op.cc.
- Add guard to avoid acceleration of L2 Normalization with input rank != 4
- Added evaluation script for COCO minival
- Add delegate support for QUANTIZE
- add tf.math.cumulative_logsumexp operation.
- Add tf.ragged.stack.
- Add delegate support for QUANTIZED_16BIT_LSTM.
- tf.while_loop emits a StatelessWhile op if the cond and body functions are stateless and do not touch any resources.
- Fix memory allocation problem when calling AddNewInputConstantTensor.
- Delegate application failure leaves interpreter in valid state
- tf.cond, tf.while and if and while in AutoGraph now accept a nonscalar predicate if has a single element. This does not affec non-V2 control flow.
- Enables v2 control flow as part of tf.enable_v2_behavior() and TF2_BEHAVIOR=1.
- Fix potential security vulnerability where decoding variant tensors from proto could result in heap out of bounds memory access.
- Extracts NNAPIDelegateKernel from nnapi_delegate.cc
- Only create a GCS directory object if the object does not already exist.
- Introduce dynamic constructor argument in Layer and Model, which should be set to True when using imperative control flow in the call method.
- ResourceVariable and Variable no longer accepts constraint in the constructor, nor expose it as a @property.
- Add UnifiedGRU as the new GRU implementation for tf2.0. Change the default recurrent activation function for GRU from 'hard_sigmoid' to 'sigmoid', and 'reset_after' to True in 2.0. Historically recurrent activation is 'hard_sigmoid' since it is fast than 'sigmoid'. With new unified backend between CPU and GPU mode, since the CuDNN kernel is using sigmoid, we change the default for CPU mode to sigmoid as well. With that, the default GRU will be compatible with both CPU and GPU kernel. This will enable user with GPU to use CuDNN kernel by default and get a 10x performance boost in training. Note that this is checkpoint breaking change. If user want to use their 1.x pre-trained checkpoint, please construct the layer with GRU(recurrent_activation='hard_sigmoid', reset_after=False) to fallback to 1.x behavior.
- Begin adding Go wrapper for C Eager API
- XLA HLO graphs can be inspected with interactive_graphviz tool now.
- Add dataset ops to the graph (or create kernels in Eager execution) during the python Dataset object creation instead doing it during Iterator creation time.
- Add batch_dims argument to tf.gather.
- Removing of dtype in the constructor of initializers and partition_info in call.
- Add tf.math.nextafter op.
- Turn on MKL-DNN contraction kernels by default. MKL-DNN dynamically dispatches the best kernel implementation based on CPU vector architecture. To disable them, build with --define=tensorflow_mkldnn_contraction_kernel=0.
- tf.linspace(start, stop, num) now always uses "stop" as last value (for num > 1)
- Added top-k to precision and recall to keras metrics.
- Add a ragged size op and register it to the op dispatcher
- Transitive dependencies on :pooling_ops were removed. Some users may need to add explicit dependencies on :pooling_ops if they reference the operators from that library.
- Add CompositeTensor base class.
- Malformed gif images could result in an access out of bounds in the color palette of the frame. This has been fixed now
- Add templates and interfaces for creating lookup tables
- Tensor::UnsafeCopyFromInternal deprecated in favor Tensor::BitcastFrom
- In map_vectorization optimization, reduce the degree of parallelism in the vectorized map node.
- Add variant wrapper for absl::string_view
- Wraps losses passed to the compile API (strings and v1 losses) which are not instances of v2 Loss class in LossWrapper class. => All losses will now use SUM_OVER_BATCH_SIZE reduction as default.
- Add OpKernels for some stateless maps
- Add v2 APIs for AUCCurve and AUCSummationMethod enums. #tf-metrics-convergence
- Allow non-Tensors through v2 losses.
- Add v2 sparse categorical crossentropy metric. GITHUB_PR_OR_BUG=b/123431691
- DType is no longer convertible to an int. Use dtype.as_datatype_enum instead of int(dtype) to get the same result.
- Support both binary and -1/1 label input in v2 hinge and squared hinge losses.
- Added LinearOperator.adjoint and LinearOperator.H (alias).
- Expose CriticalSection in core as tf.CriticalSection.
- Enhanced graphviz output.
- The behavior of tf.gather is now correct when axis=None and batch_dims<0.
- Add tf.linalg.tridiagonal_solve op.
- Add opkernel templates for common table operations.
- Fix issue: Callbacks do not log values in eager mode when a deferred build model is used.
- SignatureDef util functions have been deprecated.
- Update Fingerprint64Map to use aliases
- Add legacy string flat hash map op kernels
- Fix: model.add_loss(symbolic_tensor) should work in ambient eager.
- Adding clear_losses API to be able to clear losses at the end of forward pass in a custom training loop in eager. GITHUB_PR_OR_BUG=b/123431691
- Add support for add_metric in the graph function mode. GITHUB_PR_OR_BUG=tf_only
- Updating cosine similarity loss - removed the negate sign from cosine similarity. GITHUB_PR_OR_BUG=b/123431691
- TF 2.0 - Update metric name to always reflect what the user has given in compile. Affects following cases 1. When name is given as 'accuracy'/'crossentropy' 2. When an aliased function name is used eg. 'mse' 3. Removing the weighted prefix from weighted metric names.
- Workaround for compiler bug(?)
- Changed default for gradient accumulation for TPU embeddings to true.
- Adds summary trace API for collecting graph and profile information.
- image.resize now considers proper pixel centers and has new kernels (incl. anti-aliasing).

１−５．Supplementary matter

I created a Wheel package for RaspberryPi3 of Tensorflow v2.0.0-rc0 published on Aug 23, 2019. I have not confirmed the operation of every OP, but I hope you find it useful. Btw, the fully compiled Wheel file for RaspberryPi3 has been saved in the above Github repository (Tensorflow-bin).

The performance of Tensorflow Lite, created according to my procedure, is about 2.5 times faster than the official binary when multithreading is enabled. The point to note, however, is that not all models get equal 2.5 times performance. It is necessary to consider that the other layers are not accelerated since only the convolutional layer is subject to multi-thread parallel processing. See the following link for a discussion of Tensorflow Lite acceleration by engineers around the world and I: Tensorflow Lite, python API does not work #21574. And, the binary I generated excludes matrix_square_root_op from the compilation target in order to avoid the memory shortage of RaspberryPi3 at compile time. The matrix_square_root_op wastes more than 3GB in total: 2GB of swap area and 1GB of physical memory.

The cross compilation steps listed on the official site often do not succeed. Even if you succeed, you may notice that the generated binary has various problems. The reason I stick to insane native builds is that I don't want to generate broken binaries.

２．Environment

RaspberryPi3 model B+ (armhf/armv7l/Raspbian Buster)
Tensorflow v2.0.0-rc0
Bazel 0.26.1

３．Procedure

３−１．Preparation before compilation

Prepare to build Tensorflow v2.0.0-rc0.

First, install openjdk-8-jdk according to the procedure of the following URL.
[Stable] Install openjdk-8-jdk safely in Raspbian Buster (Debian 10) environment

Next, follow the steps below to build Tensorflow on RaspberryPi3.

Preparation_before_compilation

$ sudo nano /etc/dphys-swapfile
CONF_SWAPFILE=2048
CONF_MAXSWAP=2048

$ sudo systemctl stop dphys-swapfile
$ sudo systemctl start dphys-swapfile

$ wget https://github.com/PINTO0309/Tensorflow-bin/raw/master/zram.sh
$ chmod 755 zram.sh
$ sudo mv zram.sh /etc/init.d/
$ sudo update-rc.d zram.sh defaults
$ sudo reboot

$ sudo apt-get install -y libhdf5-dev libc-ares-dev libeigen3-dev
$ sudo pip3 install keras_applications==1.0.8 --no-deps
$ sudo pip3 install keras_preprocessing==1.1.0 --no-deps
$ sudo pip3 install h5py==2.9.0
$ sudo apt-get install -y openmpi-bin libopenmpi-dev
$ sudo -H pip3 install -U --user six numpy wheel mock
$ sudo apt update;sudo apt upgrade

$ cd ~
$ git clone https://github.com/PINTO0309/Bazel_bin.git
$ cd Bazel_bin
$ ./0.26.1/Raspbian_Debian_Buster_armhf/openjdk-8-jdk/install.sh

$ cd ~
$ git clone -b v2.0.0-rc0 https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ git checkout -b v2.0.0-rc0

３−２．Edit various files

Add MultiThread function to Tensorflow Lite's Python API. This means customizing Tensorflow Lite's implementation on your own. [tflite] export SetNumThreads to TFLite Python API #25748

tensorflow/lite/python/interpreter.py

# Add the following two lines to the last line
  def set_num_threads(self, i):
    return self._interpreter.SetNumThreads(i)

tensorflow/lite/python/interpreter_wrapper/interpreter_wrapper.cc

// Corrected the vicinity of the last line as follows
PyObject* InterpreterWrapper::ResetVariableTensors() {
  TFLITE_PY_ENSURE_VALID_INTERPRETER();
  TFLITE_PY_CHECK(interpreter_->ResetVariableTensors());
  Py_RETURN_NONE;
}

PyObject* InterpreterWrapper::SetNumThreads(int i) {
  interpreter_->SetNumThreads(i);
  Py_RETURN_NONE;
}

}  // namespace interpreter_wrapper
}  // namespace tflite

tensorflow/lite/python/interpreter_wrapper/interpreter_wrapper.h

  // should be the interpreter object providing the memory.
  PyObject* tensor(PyObject* base_object, int i);

  PyObject* SetNumThreads(int i);

 private:
  // Helper function to construct an `InterpreterWrapper` object.
  // It only returns InterpreterWrapper if it can construct an `Interpreter`.

Disable compilation of matrix_square_root_op.

tensorflow/tensorflow/core/kernels/BUILD

cc_library(
    name = "linalg",
    deps = [
        ":cholesky_grad",
        ":cholesky_op",
        ":determinant_op",
        ":lu_op",
        ":matrix_exponential_op",
        ":matrix_inverse_op",
        ":matrix_logarithm_op",
        ":matrix_solve_ls_op",
        ":matrix_solve_op",
        ":matrix_triangular_solve_op",
        ":qr_op",
        ":self_adjoint_eig_op",
        ":self_adjoint_eig_v2_op",
        ":svd_op",
        ":tridiagonal_solve_op",
    ],
)

tensorflow/tensorflow/core/kernels/BUILD_(Delete_the_following)

tf_kernel_library(
    name = "matrix_square_root_op",
    prefix = "matrix_square_root_op",
    deps = LINALG_DEPS,
)

Disable NNAPI.

tensorflow/lite/tools/make/Makefile

BUILD_WITH_NNAPI=false

３−３．Configure of Tensorflow v2.0.0-rc0

Set build parameters of Tensorflow v2.0.0-rc0.

Procedure_of_configure

$ cd ~/tensorflow
$ sudo ./configure 
Extracting Bazel installation...
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.26.1- (@non-git) installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3


Found possible Python library paths:
  /usr/local/lib
  /usr/lib/python3/dist-packages
  /home/pi/inference_engine_vpu_arm/python/python3.7
  /usr/local/lib/python3.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib]
/usr/local/lib/python3.7/dist-packages
Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: n
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: n
Clang will not be downloaded.

Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
	--config=gdr         	# Build with GDR support.
	--config=verbs       	# Build with libverbs support.
	--config=ngraph      	# Build with Intel nGraph support.
	--config=numa        	# Build with NUMA support.
	--config=dynamic_kernels	# (Experimental) Build kernels into separate shared objects.
	--config=v2             # Build Tensorflow 2.x instead of 1.x
Preconfigured Bazel build configs to DISABLE default on features:
	--config=noaws       	# Disable AWS S3 filesystem support.
	--config=nogcp       	# Disable GCP support.
	--config=nohdfs      	# Disable HDFS support.
	--config=noignite    	# Disable Apache Ignite support.
	--config=nokafka     	# Disable Apache Kafka support.
	--config=nonccl      	# Disable NVIDIA NCCL support.
Configuration finished

３−４．Build Tensorflow v2.0.0-rc0

Build_command_by_Bazel_0.26.1

$ sudo bazel --host_jvm_args=-Xmx512m build \
--config=opt \
--config=noaws \
--config=nohdfs \
--config=noignite \
--config=nokafka \
--config=nonccl \
--config=v2 \
--local_resources=1024.0,0.5,0.5 \
--copt=-mfpu=neon-vfpv4 \
--copt=-ftree-vectorize \
--copt=-funsafe-math-optimizations \
--copt=-ftree-loop-vectorize \
--copt=-fomit-frame-pointer \
--copt=-DRASPBERRY_PI \
--host_copt=-DRASPBERRY_PI \
//tensorflow/tools/pip_package:build_pip_package

３−５．Create Wheel file

Command_to_create_Wheel_file

$ su --preserve-environment
# ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# exit
$ sudo cp /tmp/tensorflow_pkg/tensorflow-2.0.0rc0-cp37-cp37m-linux_arm7l.whl ~

３−６．Installation of Tensorflow v2.0.0-rc0

Installation_command

$ cd ~
$ sudo pip3 uninstall tensorflow
$ sudo -H pip3 install tensorflow-2.0.0rc0-cp37-cp37m-linux_armv7l.whl

４．Operation check

Command_to_check_the_installed_Tensorflow_version

$ python3 -c 'import tensorflow as tf; print(tf.__version__)'

５．Reference articles

Compiling Bazel from source - Build Bazel from scratch (bootstrapping) - 2. Bootstrap Bazel on Ubuntu Linux, macOS, and other Unix-like systems

Converter Python API guide

GitHubのAPIを使って、Releaseコンテンツを取得する - ジムには乗りたい - su-kun1899さん

６．Appendix

Release Note の情報取得用curlコマンドサンプル

v2.0.0-rc0のリリースノート情報取得コマンド

--- v2.0.0-rc0 -----------
$ curl -v https://api.github.com/repos/tensorflow/tensorflow/releases/19501173

--- All versions -----------
$ curl -v https://api.github.com/repos/tensorflow/tensorflow/releases

--- Latest version -----------
$ curl -v https://api.github.com/repos/tensorflow/tensorflow/releases/latest

ビルドの高速化記事
その1. Ultra-fast build of Tensorflow with Bazel Remote Caching [Google Cloud Storage version] - Qiita - PINTO
その2. Ultra-fast build of Tensorflow with Bazel Remote Caching [Docker version] - Qiita - PINTO

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up