Nano PyTorch API#

bigdl.nano.pytorch.Trainer#

class bigdl.nano.pytorch.Trainer(*args: Any, **kwargs: Any)[source]#

Trainer for BigDL-Nano pytorch.

This Trainer extends PyTorch Lightning Trainer by adding various options to accelerate pytorch training.

A pytorch lightning trainer that uses bigdl-nano optimization.

Parameters
  • num_processes – number of processes in distributed training. default: 1.

  • use_ipex – whether we use ipex as accelerator for trainer. default: False.

  • distributed_backend – use which backend in distributed mode, defaults to “subprocess”, now avaiable backends are ‘spawn’, ‘subprocess’ and ‘ray’

  • cpu_for_each_process – A list of length num_processes, each containing a list of indices of cpus each process will be using. default: None, and the cpu will be automatically and evenly distributed among processes.

  • channels_last – whether convert input to channels last memory formats, defaults to False.

  • precision – Double precision (64), full precision (32), half precision (16) or bfloat16 precision (bf16), defaults to 32. Enable ipex bfloat16 weight prepack when use_ipex=True and precision=’bf16’

static compile(model: torch.nn.modules.module.Module, loss: Optional[torch.nn.modules.loss._Loss] = None, optimizer: Optional[torch.optim.optimizer.Optimizer] = None, scheduler: Optional[torch.optim.lr_scheduler._LRScheduler] = None, metrics: Optional[List[torchmetrics.metric.Metric]] = None)[source]#

Construct a pytorch-lightning model.

If model is already a pytorch-lightning model, return model. If model is pytorch model, construct a new pytorch-lightning module with model, loss and optimizer.

Parameters
  • model – A model instance.

  • loss – Loss to construct pytorch-lightning model. Should be None if model is instance of pl.LightningModule.

  • optimizer – Optimizer to construct pytorch-lightning model Should be None. if model is instance of pl.LightningModule.

  • metrics – A list of torchmetrics to validate/test performance.

Returns

A LightningModule object.

search(model, resume: bool = False, target_metric=None, n_parallels=1, acceleration=False, input_sample=None, **kwargs)[source]#

Run HPO search. It will be called in Trainer.search().

Parameters
  • model – The model to be searched. It should be an auto model.

  • resume – whether to resume the previous or start a new one, defaults to False.

  • target_metric – the object metric to optimize, defaults to None.

  • n_parallels – the number of parallel processes for running trials.

  • acceleration – Whether to automatically consider the model after inference acceleration in the search process. It will only take effect if target_metric contains “latency”. Default value is False.

  • input_sample – A set of inputs for trace, defaults to None if you have trace before or model is a LightningModule with any dataloader attached.

Returns

the model with study meta info attached.

search_summary()[source]#

Retrive a summary of trials.

Returns

A summary of all the trials. Currently the entire study is returned to allow more flexibility for further analysis and visualization.

static trace(model: torch.nn.modules.module.Module, input_sample=None, accelerator: str = None, use_ipex: bool = False, thread_num: int = None, onnxruntime_session_options=None, logging: bool = True, **export_kwargs)[source]#

Trace a pytorch model and convert it into an accelerated module for inference.

For example, this function returns a PytorchOpenVINOModel when accelerator==’openvino’.

Parameters
  • model – An torch.nn.Module model, including pl.LightningModule.

  • input_sample – A set of inputs for trace, defaults to None if you have trace before or model is a LightningModule with any dataloader attached.

  • accelerator – The accelerator to use, defaults to None meaning staying in Pytorch backend. ‘openvino’, ‘onnxruntime’ and ‘jit’ are supported for now.

  • use_ipex – whether we use ipex as accelerator for inferencing. default: False.

  • thread_num – (optional) a int represents how many threads(cores) is needed for inference, only valid for accelerator=’onnxruntime’ or accelerator=’openvino’.

  • onnxruntime_session_options – The session option for onnxruntime, only valid when accelerator=’onnxruntime’, otherwise will be ignored.

  • logging – whether to log detailed information of model conversion, only valid when accelerator=’openvino’, otherwise will be ignored. default: True.

  • **kwargs

    other extra advanced settings include 1. those be passed to torch.onnx.export function, only valid when accelerator=’onnxruntime’/’openvino’, otherwise will be ignored. 2. if channels_last is set and use_ipex=True, we will transform the data to be channels last according to the setting. Defaultly, channels_last will be set to True if use_ipex=True.

Returns

Model with different acceleration.

Warning

bigdl.nano.pytorch.Trainer.trace will be deprecated in future release.

Please use bigdl.nano.pytorch.InferenceOptimizer.trace instead.

static quantize(model: torch.nn.modules.module.Module, precision: str = 'int8', accelerator: str = None, use_ipex: bool = False, calib_dataloader: torch.utils.data.dataloader.DataLoader = None, metric: torchmetrics.metric.Metric = None, accuracy_criterion: dict = None, approach: str = 'static', method: str = None, conf: str = None, tuning_strategy: str = None, timeout: int = None, max_trials: int = None, input_sample=None, thread_num: int = None, onnxruntime_session_options=None, logging: bool = True, **export_kwargs)[source]#

Calibrate a Pytorch-Lightning model for post-training quantization.

Parameters
  • model – A model to be quantized. Model type should be an instance of nn.Module.

  • precision – Global precision of quantized model, supported type: ‘int8’, ‘bf16’, ‘fp16’, defaults to ‘int8’.

  • accelerator – Use accelerator ‘None’, ‘onnxruntime’, ‘openvino’, defaults to None. None means staying in pytorch.

  • calib_dataloader – A torch.utils.data.dataloader.DataLoader object for calibration. Required for static quantization. It’s also used as validation dataloader.

  • metric – A torchmetrics.metric.Metric object for evaluation.

  • accuracy_criterion – Tolerable accuracy drop, defaults to None meaning no accuracy control. accuracy_criterion = {‘relative’: 0.1, ‘higher_is_better’: True} allows relative accuracy loss: 1%. accuracy_criterion = {‘absolute’: 0.99, ‘higher_is_better’:False} means accuracy must be smaller than 0.99.

  • approach – ‘static’ or ‘dynamic’. ‘static’: post_training_static_quant, ‘dynamic’: post_training_dynamic_quant. Default: ‘static’. OpenVINO supports static mode only.

  • method – Method to do quantization. When accelerator=None, supported methods: ‘fx’, ‘eager’, ‘ipex’, defaults to ‘fx’. If you don’t use ipex, suggest using ‘fx’ which executes automatic optimizations like fusion. For more information, please refer to https://pytorch.org/docs/stable/quantization.html#eager-mode-quantization. When accelerator=’onnxruntime’, supported methods: ‘qlinear’, ‘integer’, defaults to ‘qlinear’. Suggest ‘qlinear’ for lower accuracy drop if using static quantization. More details in https://onnxruntime.ai/docs/performance/quantization.html. This argument doesn’t take effect for OpenVINO, don’t change it for OpenVINO.

  • conf – A path to conf yaml file for quantization. Default: None, using default config.

  • tuning_strategy – ‘bayesian’, ‘basic’, ‘mse’, ‘sigopt’. Default: ‘bayesian’.

  • timeout – Tuning timeout (seconds). Default: None, which means early stop. Combine with max_trials field to decide when to exit.

  • max_trials – Max tune times. Default: None, which means no tuning. Combine with timeout field to decide when to exit. “timeout=0, max_trials=1” means it will try quantization only once and return satisfying best model.

  • input_sample – An input example to convert pytorch model into ONNX/OpenVINO.

  • thread_num – (optional) a int represents how many threads(cores) is needed for inference, only valid for accelerator=’onnxruntime’ or accelerator=’openvino’.

  • onnxruntime_session_options – The session option for onnxruntime, only valid when accelerator=’onnxruntime’, otherwise will be ignored.

  • logging – whether to log detailed information of model conversion, only valid when accelerator=’openvino’, otherwise will be ignored. default: True.

  • **export_kwargs

    will be passed to torch.onnx.export function.

Returns

A accelerated Pytorch-Lightning Model if quantization is sucessful.

Warning

bigdl.nano.pytorch.Trainer.quantize will be deprecated in future release.

Please use bigdl.nano.pytorch.InferenceOptimizer.quantize instead.

static save(model: pytorch_lightning.LightningModule, path)[source]#

Save the model to local file.

Parameters
  • model – Any model of torch.nn.Module, including all models accelareted by Trainer.trace/Trainer.quantize.

  • path – Path to saved model. Path should be a directory.

static load(path, model: Optional[pytorch_lightning.LightningModule] = None)[source]#

Load a model from local.

Parameters
  • path – Path to model to be loaded. Path should be a directory.

  • model – Required FP32 model to load pytorch model, it is needed if you accelerated the model with accelerator=None by Trainer.trace/Trainer.quantize. model should be set to None if you choose accelerator=”onnxruntime”/”openvino”/”jit”.

Returns

Model with different acceleration(None/OpenVINO/ONNX Runtime/JIT) or precision(FP32/FP16/BF16/INT8).

save_checkpoint(filepath, weights_only: bool = False, storage_options: Optional[Any] = None) None[source]#

Save checkpoint after one train epoch.

bigdl.nano.pytorch.InferenceOptimizer#

class bigdl.nano.pytorch.InferenceOptimizer[source]#

InferenceOptimizer for Pytorch/TF Model.

It can be used to accelerate your model’s inference speed with very few code changes.

optimize(model: torch.nn.modules.module.Module, training_data: Union[torch.utils.data.dataloader.DataLoader, torch.Tensor, Tuple[torch.Tensor]], validation_data: Optional[Union[torch.utils.data.dataloader.DataLoader, torch.Tensor, Tuple[torch.Tensor]]] = None, input_sample: Optional[Union[torch.Tensor, Dict, Tuple[torch.Tensor]]] = None, metric: Optional[Callable] = None, direction: str = 'max', thread_num: Optional[int] = None, logging: bool = False, latency_sample_num: int = 100, includes: Optional[List[str]] = None, excludes: Optional[List[str]] = None) None[source]#

This function will give all available inference acceleration methods a try and record the latency, accuracy and model instance inside the Optimizer for future usage. All model instance is setting to eval mode.

The available methods are “original”, “fp32_ipex”, “bf16”, “bf16_ipex”,”int8”, “jit_fp32”, “jit_fp32_ipex”, “jit_fp32_ipex_channels_last”, “openvino_fp32”, “openvino_int8”, “onnxruntime_fp32”, “onnxruntime_int8_qlinear” and “onnxruntime_int8_integer”.

Parameters
  • model – A torch.nn.Module to be optimized

  • training_data

    training_data support following formats:

    1. a torch.utils.data.dataloader.DataLoader object for training dataset.
    Users should be careful with this parameter since this dataloader
    might be exposed to the model, which causing data leak. The
    batch_size of this dataloader is important as well, users may
    want to set it to the same batch size you may want to use the model
    in real deploy environment. E.g. batch size should be set to 1
    if you would like to use the accelerated model in an online service.

    2. a single torch.Tensor which used for training, this case is used to
    accept single sample input x.

    3. a tuple of torch.Tensor which used for training, this case is used to
    accept single sample input (x, y) or (x1, x2) et al.

  • validation_data

    (optional) validation_data is only needed when users care

    about the possible accuracy drop. It support following formats:

    1. a torch.utils.data.dataloader.DataLoader object for accuracy evaluation.

    2. a single torch.Tensor which used for training, this case is used to
    accept single sample input x.

    3. a tuple of torch.Tensor which used for training, this case is used to
    accept single sample input (x, y) or (x1, x2) et al.

  • input_sample – (optional) A set of inputs for trace, defaults to None. In most cases, you don’t need specify this parameter, it will be obtained from training_data.

  • metric

    (optional) A callable object which is used for calculating accuracy. It supports two kinds of callable object:

    1. A torchmetrics.Metric object or similar callable object which takes
    prediction and target then returns an accuracy value in this calling
    method metric(pred, target). This requires data in validation_data
    is composed of (input_data, target).
    2. A callable object that takes model and validation_data (if
    validation_data is not None) as input, and returns an accuracy value in
    this calling method metric(model, data_loader) (or metric(model) if
    validation_data is None).

  • direction – (optional) A string that indicates the higher/lower better for the metric, “min” for the lower the better and “max” for the higher the better. Default value is “max”.

  • thread_num – (optional) a int represents how many threads(cores) is needed for inference.

  • logging – whether to log detailed information of model conversion. Default: False.

  • latency_sample_num – (optional) a int represents the number of repetitions to calculate the average latency. The default value is 100.

  • includes – (optional) a list of acceleration methods that will be included in the search. Default to None meaning including all available methods. “original” method will be automatically add to includes.

  • excludes – (optional) a list of acceleration methods that will be excluded from the search. “original” will be ignored in the excludes.

static quantize(model: torch.nn.modules.module.Module, precision: str = 'int8', accelerator: Optional[str] = None, use_ipex: bool = False, calib_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, metric: Optional[torchmetrics.metric.Metric] = None, accuracy_criterion: Optional[dict] = None, approach: str = 'static', method: Optional[str] = None, conf: Optional[str] = None, tuning_strategy: Optional[str] = None, timeout: Optional[int] = None, max_trials: Optional[int] = None, input_sample=None, thread_num: Optional[int] = None, onnxruntime_session_options=None, openvino_config=None, simplification: bool = True, sample_size: int = 100, logging: bool = True, **export_kwargs)[source]#

Calibrate a torch.nn.Module for post-training quantization.

Parameters
  • model – A model to be quantized. Model type should be an instance of torch.nn.Module.

  • precision – Global precision of quantized model, supported type: ‘int8’, ‘bf16’, ‘fp16’, defaults to ‘int8’.

  • accelerator – Use accelerator ‘None’, ‘onnxruntime’, ‘openvino’, defaults to None. None means staying in pytorch.

  • calib_dataloader – A torch.utils.data.dataloader.DataLoader object for calibration. Required for static quantization. It’s also used as validation dataloader.

  • metric – A torchmetrics.metric.Metric object for evaluation.

  • accuracy_criterion – Tolerable accuracy drop, defaults to None meaning no accuracy control. accuracy_criterion = {‘relative’: 0.1, ‘higher_is_better’: True} allows relative accuracy loss: 1%. accuracy_criterion = {‘absolute’: 0.99, ‘higher_is_better’:False} means accuracy must be smaller than 0.99.

  • approach – ‘static’ or ‘dynamic’. ‘static’: post_training_static_quant, ‘dynamic’: post_training_dynamic_quant. Default: ‘static’. OpenVINO supports static mode only.

  • method – Method to do quantization. When accelerator=None, supported methods: ‘fx’, ‘eager’, ‘ipex’, defaults to ‘fx’. If you don’t use ipex, suggest using ‘fx’ which executes automatic optimizations like fusion. For more information, please refer to https://pytorch.org/docs/stable/quantization.html#eager-mode-quantization. When accelerator=’onnxruntime’, supported methods: ‘qlinear’, ‘integer’, defaults to ‘qlinear’. Suggest ‘qlinear’ for lower accuracy drop if using static quantization. More details in https://onnxruntime.ai/docs/performance/quantization.html. This argument doesn’t take effect for OpenVINO, don’t change it for OpenVINO.

  • conf – A path to conf yaml file for quantization. Default: None, using default config.

  • tuning_strategy – ‘bayesian’, ‘basic’, ‘mse’, ‘sigopt’. Default: ‘bayesian’.

  • timeout – Tuning timeout (seconds). Default: None, which means early stop. Combine with max_trials field to decide when to exit.

  • max_trials – Max tune times. Default: None, which means no tuning. Combine with timeout field to decide when to exit. “timeout=0, max_trials=1” means it will try quantization only once and return satisfying best model.

  • input_sample – An input example to convert pytorch model into ONNX/OpenVINO.

  • thread_num – (optional) a int represents how many threads(cores) is needed for inference, only valid for accelerator=’onnxruntime’ or accelerator=’openvino’.

  • onnxruntime_session_options – The session option for onnxruntime, only valid when accelerator=’onnxruntime’, otherwise will be ignored.

  • openvino_config – The config to be inputted in core.compile_model. Only valid when accelerator=’openvino’, otherwise will be ignored.

  • simplification – whether we use onnxsim to simplify the ONNX model, only valid when accelerator=’onnxruntime’, otherwise will be ignored. If this option is set to True, new dependency ‘onnxsim’ need to be installed.

  • sample_size – (optional) a int represents how many samples will be used for Post-training Optimization Tools (POT) from OpenVINO toolkit, only valid for accelerator=’openvino’. Default to 100. The larger the value, the more accurate the conversion, the lower the performance degradation, but the longer the time.

  • logging – whether to log detailed information of model conversion, only valid when accelerator=’openvino’, otherwise will be ignored. Default: True.

  • **export_kwargs

    will be passed to torch.onnx.export function.

Returns

A accelerated torch.nn.Module if quantization is sucessful.

static trace(model: torch.nn.modules.module.Module, input_sample=None, accelerator: Optional[str] = None, use_ipex: bool = False, thread_num: Optional[int] = None, onnxruntime_session_options=None, openvino_config=None, simplification: bool = True, logging: bool = True, **export_kwargs)[source]#

Trace a torch.nn.Module and convert it into an accelerated module for inference.

For example, this function returns a PytorchOpenVINOModel when accelerator==’openvino’.

Parameters
  • model – An torch.nn.Module model, including pl.LightningModule.

  • input_sample – A set of inputs for trace, defaults to None if you have trace before or model is a LightningModule with any dataloader attached.

  • accelerator – The accelerator to use, defaults to None meaning staying in Pytorch backend. ‘openvino’, ‘onnxruntime’ and ‘jit’ are supported for now.

  • use_ipex – whether we use ipex as accelerator for inferencing. default: False.

  • thread_num – (optional) a int represents how many threads(cores) is needed for inference, only valid for accelerator=’onnxruntime’ or accelerator=’openvino’.

  • onnxruntime_session_options – The session option for onnxruntime, only valid when accelerator=’onnxruntime’, otherwise will be ignored.

  • openvino_config – The config to be inputted in core.compile_model. Only valid when accelerator=’openvino’, otherwise will be ignored.

  • simplification – whether we use onnxsim to simplify the ONNX model, only valid when accelerator=’onnxruntime’, otherwise will be ignored. If this option is set to True, new dependency ‘onnxsim’ need to be installed.

  • logging – whether to log detailed information of model conversion, only valid when accelerator=’openvino’, otherwise will be ignored. Default: True.

  • **kwargs

    other extra advanced settings include 1. those be passed to torch.onnx.export function, only valid when accelerator=’onnxruntime’/’openvino’, otherwise will be ignored. 2. if channels_last is set and use_ipex=True, we will transform the data to be channels last according to the setting. Defaultly, channels_last will be set to True if use_ipex=True.

Returns

Model with different acceleration.

static save(model: torch.nn.modules.module.Module, path)[source]#

Save the model to local file.

Parameters
  • model – Any model of torch.nn.Module, including all models accelareted by Trainer.trace/Trainer.quantize.

  • path – Path to saved model. Path should be a directory.

static load(path, model: Optional[torch.nn.modules.module.Module] = None)[source]#

Load a model from local.

Parameters
  • path – Path to model to be loaded. Path should be a directory.

  • model – Required FP32 model to load pytorch model, it is needed if you accelerated the model with accelerator=None by Trainer.trace/Trainer.quantize. model should be set to None if you choose accelerator=”onnxruntime”/”openvino”/”jit”.

Returns

Model with different acceleration(None/OpenVINO/ONNX Runtime/JIT) or precision(FP32/FP16/BF16/INT8).

get_best_model(accelerator: Optional[str] = None, precision: Optional[str] = None, use_ipex: Optional[bool] = None, accuracy_criterion: Optional[float] = None)#

According to results of optimize, obtain the model with minimum latency under specific restrictions or without restrictions.

Parameters
  • accelerator – (optional) Use accelerator ‘None’, ‘onnxruntime’, ‘openvino’, ‘jit’, defaults to None. If not None, then will only find the model with this specific accelerator.

  • precision – (optional) Supported type: ‘int8’, ‘bf16’, and ‘fp32’. Defaults to None which represents no precision limit. If not None, then will only find the model with this specific precision.

  • use_ipex – (optional) if not None, then will only find the model with this specific ipex setting. This is only effective for pytorch model.

  • accuracy_criterion – (optional) a float represents tolerable accuracy drop percentage, defaults to None meaning no accuracy control.

Returns

best model, corresponding acceleration option

get_model(method_name: str)#

According to results of optimize, obtain the model with method_name.

The available methods are “original”, “fp32_ipex”, “bf16”, “bf16_ipex”,”int8”, “jit_fp32”, “jit_fp32_ipex”, “jit_fp32_ipex_channels_last”, “openvino_fp32”, “openvino_int8”, “onnxruntime_fp32”, “onnxruntime_int8_qlinear” and “onnxruntime_int8_integer”.

Parameters

method_name – (optional) Obtain specific model according to method_name.

Returns

Model with different acceleration.

summary()#

Print format string representation for optimization result.

bigdl.nano.pytorch.TorchNano#

class bigdl.nano.pytorch.TorchNano(*args: Any, **kwargs: Any)[source]#

TorchNano for BigDL-Nano pytorch.

It can be used to accelerate custom pytorch training loops with very few code changes.

Create a TorchNano with nano acceleration.

Parameters
  • num_processes – number of processes in distributed training, defaults to 1

  • use_ipex – whether use ipex acceleration, defaults to False

  • distributed_backend – use which backend in distributed mode, defaults to “subprocess”, now avaiable backends are ‘spawn’, ‘subprocess’ and ‘ray’

  • precision – Double precision (64), full precision (32), half precision (16) or bfloat16 precision (bf16), defaults to 32. Enable ipex bfloat16 weight prepack when use_ipex=True and precision=’bf16’

  • cpu_for_each_process – specify the cpu cores which will be used by each process, if None, cpu cores will be distributed evenly by all processes, only take effect when num_processes > 1

  • channels_last – whether convert input to channels last memory formats, defaults to False.

setup(model: torch.nn.modules.module.Module, optimizer: Union[torch.optim.optimizer.Optimizer, List[torch.optim.optimizer.Optimizer]], *dataloaders: torch.utils.data.dataloader.DataLoader, move_to_device: bool = True)[source]#

Setup model, optimizers and dataloaders for accelerated training.

Parameters
  • model – A model to setup

  • optimizer – The optimizer(s) to setup

  • *dataloaders

    The dataloader(s) to setup

  • move_to_device – If set True (default), moves the model to the correct device. Set this to False and alternatively use to_device() manually.

Returns

The tuple of the wrapped model, optimizer, loss_func and dataloaders, in the same order they were passed in.

abstract train(*args: Any, **kwargs: Any) Any[source]#

All the code inside this train method gets accelerated by TorchNano.

You can pass arbitrary arguments to this function when overriding it.

Patch API#

bigdl.nano.pytorch.dispatcher.patch_torch(cuda_to_cpu: bool = True)[source]#

patch_torch is used to patch optimized torch classes to replace original ones.

Optimized classes include:

1. pytorch_lightning.Trainer -> bigdl.nano.pytorch.Trainer
2. torchvision.transforms -> bigdl.nano.pytorch.vision.transforms
3. torchvision.datasets -> bigdl.nano.pytorch.vision.datasets
Parameters

cuda_to_cpu – bool, make codes write for CUDA available for CPU if set to True. This feature is still experimental and only valid in python layer codes. Default to True.

bigdl.nano.pytorch.dispatcher.unpatch_torch()[source]#

unpatch_torch is used to unpatch optimized torch classes to original ones.