Try to group individual infer jobs by using automatic batching.Ĭonsider caching to minimize model load time. To learn about optimization options, see Optimization Guide. Prefer FP16 inference precision over FP32, as Model Conversion API can generate both variants, and the FP32 is the default. Since OpenVINO relies on the OpenCL kernels for the GPU implementation, many general OpenCL tips apply: Using a bounded dynamic shape will help to reduce such overhead. The GPU plugin needs to reallocate memory if the current shape is larger than the maximum of the previous shapes, which causes additional overhead. Use bounded dynamic shapes whenever possible It is, therefore, recommended to reshape dynamic models to static ones, if the scenario allows. If the same shape is used for a dynamic and a static model, performance is worse in the dynamic one. Static models can benefit from more aggressive optimizations, such as, constant propagation, fusing, and reorder optimization. Recommendations for performance improvement ¶ In-memory cache: preserves kernels compiled at runtime and weights reordered for the specific kernels. The GPU plugin compiles an optimal kernel for the given shape and preserves it in the in-memory cache for future use. The required kernel is compiled at runtime for each shape.Īsynchronous kernel compilation: even when a shape-agnostic kernel is available, If a shape-agnostic kernel is not available, Shape agnostic kernels: new kernels that can run arbitrary shapes. Runtime shape inference: infers output shapes of each primitive for a new input shape at runtime. To support dynamic shape execution, the following basic infrastructures are implemented: The general description of what dynamic shapes are and how they are used can be found in Not all operations and optimization passes support dynamic shapes.Īs a result, a given model may crash or experience significant performance drops.ĭue to the dominant runtime overhead on the host device, dynamic shapes may perform worse than static shapes on a discrete GPU. It mainly supports NLP models (Natural Language Processing). Currently, dynamic shape support for GPU is a preview feature and has the following limitations:
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |