CRAN Task View: Model Deployment with R
|Yuan Tang, James Joseph Balamuta
|terrytangyuan at gmail.com
|Suggestions and improvements for this task view are very welcome and can be made through issues or pull requests on GitHub or via e-mail to the maintainer address. For further details see the Contributing guide.
|Yuan Tang, James Joseph Balamuta (2022). CRAN Task View: Model Deployment with R. Version 2022-08-24. URL https://CRAN.R-project.org/view=ModelDeployment.
|The packages from this task view can be installed automatically using the ctv package. For example,
ctv::install.views("ModelDeployment", coreOnly = TRUE) installs all the core packages or
ctv::update.views("ModelDeployment") installs all packages that are not yet installed and up-to-date. See the CRAN Task View Initiative for more details.
This CRAN task view contains a list of packages, grouped by topic, that provides functionalities to streamline the process of deploying models to various environments, such as mobile devices, edge devices, cloud, and GPUs, for scoring or inferencing on new data. It complements the related task views on HighPerformanceComputing and MachineLearning.
Model deployment is often challenging due to various reasons. Some example challenges are:
- It involves deploying models on heterogenous environments, e.g. edge devices, mobile devices, GPUs, etc.
- It is hard to compress the model to very small size that could fit on devices with limited storage while keeping the same precision and minimizing the overhead to load the model for inference.
- Deployed models sometimes need to process new data records within limited memory on small devices.
- Many deployment environments have bad network connectivity so sometimes cloud solutions may not meet the requirements.
- There’s interest in stronger user data privacy paradigms where user data does not need to leave the mobile device.
- There’s growing demand to perform on-device model-based data filtering before collecting the data.
Many of the areas discussed in this task view are undergoing rapid changes in industries and academia. Please send any suggestions to the maintainer via e-mail or submit an issue or pull request in the GitHub repository linked above. All suggestions and corrections by others are gratefully acknowledged.
Deployment through different types of artifacts
This section includes packages that provides functionalities to export the trained model to an artifact that could fit in small devices such as mobile devices (e.g. Android, iOS) and edge devices (Rasberri Pi). These packages are built based on different model format.
- Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications. The following packages are based on PMML:
- The pmml package provides the main interface to PMML.
- The pmmlTransformations package allows for data to be transformed before using it to construct models. Builds structures to allow functions in the PMML package to output transformation details in addition to the model in the resulting PMML file.
- The arules package provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). The associations can be written to disk in PMML.
- The arulesSequences package is an add-on for arules to handle and mine frequent sequences.
- The arulesCBA package provides a function to build an association rule-based classifier for data frames, and to classify incoming data frames using such a classifier.
- Plain Old Java Object (POJO) or a Model Object, Optimized (MOJO) are intended to be easily embeddable in any Java environment. The only compilation and runtime dependency for a generated model is a h2o-genmodel.jar file produced as the build output of these packages. The h2o package provides easy-to-use interface to build a wide range of machine learning models, such as GLM, DRF, and XGBoost models based on xgboost package, which can then be exported as MOJO and POJO format. The MOJO and POJO artifacts can then be loaded by its REST interface as well as different language bindings, e.g. Java, Scala, R, and Python.
- TensorFlow’s SavedModel as well as its optimized version TensorFlow Lite, which uses many techniques for achieving low latency such as optimizing the kernels for mobile apps, pre-fused activations, and quantized kernels that allow smaller and faster (fixed-point math) models. It enables on-device machine learning inference with low latency and small binary size. The packages listed below can produce models in this format. Note that these packages are R wrappers of their corresponding Python API based on the reticulate package. Though Python binary is required for creating the models, it’s not required during inference time for deployment.
- The tensorflow package provides full access to TensorFlow API for numerical computation using data flow graphs.
- The tfestimators package provides high-level API to machine learning models as well as highly customized neural network architectures.
- The keras package high-level API to construct different types of neural networks.
- The onnx package provides the interface to Open Neural Network Exchange (ONNX) which is a standard format for models built using different frameworks ( e.g. TensorFlow, MXNet, PyTorch, CNTK, etc). It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types. Models trained in one framework can be easily transferred to another framework for inference. This open source format enables the interoperability between different frameworks and streamlining the path from research to production will increase the speed of innovation in the AI community. Note that this package is based on the reticulate package to interface with the original Python API so Python binary is required for deployment.
- The xgboost and lightgbm packages can be used to create gradient-boosted decision tree (GBDT) models and serialize them to text and binary formats which can be used to create predictions with other technologies outside of R, including but not limited to Apache Spark, Dask, and treelite.
Deployment through cloud/server
Many deployment environments are based on cloud/server. The following packages provides functionalities to deploy models in those types of environments:
- The yhatr package allows to deploy, maintain, and invoke models via the Yhat REST API.
- The cloudml package provides functionality to easily deploy models to Google Cloud ML Engine.
- The tfdeploy package provides functions to run a local test server that supports the same REST API as CloudML and RStudio Connect.
- The vetiver package provides tooling to version, share, deploy, and monitor a trained model. Functions handle both recording and checking the model’s input data prototype, and predicting from a remote API endpoint. This package is extensible, with generics to support many kinds of models.
- The domino package provides R interface to Domino CLI, a service that makes it easy to run your code on scalable hardware, with integrated version control and collaboration features designed for analytical workflows.
- The tidypredict package provides functionalities to run predictions inside database. It’s based on dplyr and dbplyr that could translate data manipulations written in R to database queries that can be used later to execute the data transformations and aggregations inside various types of databases.
- The ibmdbR package allows many basic and complex R operations to be pushed down into the database, which removes the main memory boundary of R and allows to make full use of parallel processing in the underlying database.
- The sparklyr package provides bindings to Apache Spark’s distributed machine learning library and allows to deploy the trained models to clusters. Additionally, the rsparkling package uses sparklyr for Spark job deployment while using h2o package for regular model building.
- The non-CRAN mrsdeploy package provides functions for establishing a remote session in a console application and for publishing and managing a web service that is backed by the R code block or script you provided.
- The opencpu package provides a server that exposes a simple but powerful HTTP API for RPC and data interchange with R. This provides a reliable and scalable foundation for statistical services or building R web applications.
- Several general purpose server/client frameworks for R exist that could help deploy models in server based environments:
- The Rserve and RSclient packages both provide server and client functionality for TCP/IP or local socket interfaces to enable access to R from many languages and systems.
- The httpuv package provides a low-level socket and protocol support for handling HTTP and WebSocket requests directly within R.
- Several packages offer functionality for turning R code into a web API:
- The FastRWeb package provides some basic infrastructure for this.
- The plumber package allows you to create a web API by merely decorating your existing R source code with special comments.
- The RestRserve package is a R web API framework for building high-performance microservices and app backends based on Rserve.
|arules, arulesCBA, arulesSequences, cloudml, dbplyr, domino, dplyr, FastRWeb, h2o, httpuv, ibmdbR, keras, lightgbm, onnx, opencpu, plumber, pmml, pmmlTransformations, RestRserve, reticulate, RSclient, Rserve, rsparkling, sparklyr, tensorflow, tfdeploy, tfestimators, tidypredict, vetiver, xgboost, yhatr.