models.neural_network.quantization_utils

Utilities to compress Neural Network Models

Module Contents

Classes

TopKMetrics(self,topk)
NoiseMetrics(self)
OutputMetric(self,name,type) Utility class to calculate and hold metrics between
ModelMetrics(self,spec) A utility class to hold evaluation metrics

Functions

_convert_1bit_array_to_byte_array(arr)
_convert_array_to_nbit_quantized_bytes(arr,nbits)
_decompose_bytes_to_bit_arr(arr)
_get_linear_lookup_table(nbits,wp)
_get_kmeans_lookup_table(nbits,w,init=”k-means++”,tol=0.01,n_init=1,rand_seed=0) Generate K-Means lookup table given a weight parameter field
_quantize_wp(wp,nbits,qm,**kwargs)
_quantize_wp_field(wp,nbits,qm,outChannels=1,**kwargs)
_dequantize_wp(wp,outChannels,**kwargs)
_dequantize_spec(spec)
_quantize_nn_spec(spec,nbits,qm,**kwargs)
quantize_spec_weights(spec,nbits,quantization_mode,**kwargs)
_load_and_resize_image(image_path,size)
_characterize_qmodel_perf_with_data_dir(fpmodel,qspec,data_dir)
_characterize_quantized_model_perf(fpmodel,qspec,sample_data)
compare_models(full_precision_model,quantized_model,sample_data) Utility function to compare the performance of a full precision vs
quantize_weights(full_precision_model,nbits,quantization_mode=”linear”,sample_data=None,**kwargs) Utility function to convert a full precision (float) MLModel to a
_convert_1bit_array_to_byte_array(arr)
_convert_array_to_nbit_quantized_bytes(arr, nbits)
_decompose_bytes_to_bit_arr(arr)
_get_linear_lookup_table(nbits, wp)
_get_kmeans_lookup_table(nbits, w, init="k-means++", tol=0.01, n_init=1, rand_seed=0)

Generate K-Means lookup table given a weight parameter field

Parameters:
  • nbits – Number of bits for quantization
  • w – List of weights
lut - Lookup table, numpy array of shape (1 << nbits, ); wq - Numpy array of type numpy.uint8
_quantize_wp(wp, nbits, qm, **kwargs)
_quantize_wp_field(wp, nbits, qm, outChannels=1, **kwargs)
_dequantize_wp(wp, outChannels, **kwargs)
_dequantize_spec(spec)
_quantize_nn_spec(spec, nbits, qm, **kwargs)
quantize_spec_weights(spec, nbits, quantization_mode, **kwargs)
_load_and_resize_image(image_path, size)
class TopKMetrics(topk)
__init__(topk)
add_metric(output1, output2)
display_metrics()
class NoiseMetrics
__init__()
_compute_snr(arr2)
add_metric(output1, output2)
display_metrics()
class OutputMetric(name, type)

Utility class to calculate and hold metrics between two model outputs

__init__(name, type)
add_metric(output1, output2)
display_metrics()
class ModelMetrics(spec)

A utility class to hold evaluation metrics

__init__(spec)
add_metrics(model1_output, model2_output)
display_metrics()
_characterize_qmodel_perf_with_data_dir(fpmodel, qspec, data_dir)
_characterize_quantized_model_perf(fpmodel, qspec, sample_data)
compare_models(full_precision_model, quantized_model, sample_data)

Utility function to compare the performance of a full precision vs quantized model

Parameters:
  • full_precision_model – MLModel The full precision model with float32 weights
  • quantized_model – MLModel Quantized version of the model with quantized weights
  • sample_data – str | [dict] Data used to characterize performance of the quantized model in comparison to the full precision model. Either a list of sample input dictionaries or an absolute path to a directory containing images. Path to a directory containing images is only valid for models with one image input. For all other models a list of sample inputs must be provided.
Returns:

None. Performance metrics are printed out

quantize_weights(full_precision_model, nbits, quantization_mode="linear", sample_data=None, **kwargs)

Utility function to convert a full precision (float) MLModel to a nbit quantized MLModel (float16).

Parameters:
  • full_precision_model – MLModel Model which will be converted to half precision. Currently conversion for only neural network models is supported. If a pipeline model is passed in then all embedded neural network models embedded within will be converted.
  • nbits – Int Number of bits per quantized weight. Only 8-bit and lower quantization is supported
  • quantization_mode

    str One of:

    ”linear”:
    Simple linear quantization with scale and bias
    ”linear_lut”:
    Simple linear quantization represented as a lookup table
    ”kmeans_lut”:
    LUT based quantization, where LUT is generated by K-Means clustering
    ”custom_lut”:
    LUT quantization where LUT and quantized weight params are calculated using a custom function. If this mode is selected then a custom function must be passed in kwargs with key lut_function. The function must have input params (nbits, wp) where nbits is the number of quantization bits and wp is the list of weights for a given layer. The function should return two parameters (lut, qw) where lut is an array of length (2^nbits)containing LUT values and qw is the list of quantized weight parameters. See _get_linear_lookup_table for a sample implementation.
  • sample_data – str | [dict] Data used to characterize performance of the quantized model in comparison to the full precision model. Either a list of sample input dictionaries or an absolute path to a directory containing images. Path to a directory containing images is only valid for models with one image input. For all other models a list of sample inputs must be provided.
  • **kwargs

    See below

Keyword Arguments:
 
  • lut_function (callable function) – A callable function provided when quantization mode is set to _QUANTIZATION_MODE_CUSTOM_LOOKUP_TABLE. See quantization_mode for more details
model: MLModel
The quantized MLModel instance if running on macOS 10.14 or later, otherwise the quantized model specification is returned