Kaptain for Model Inferencing
Deploy Kaptain for model inferencing
Kubeflow provides various tools and operators that simplify machine learning workflows. All these components require additional cluster resources to install and operate properly. In some cases, models can be trained and tuned on one cluster (or training cluster), and then deployed on other clusters (or deployment clusters). For example, you can deploy to a cluster that runs your business-specific applications, or a cluster that stores data locally.
Alternatively, you can deploy a minimal installation of Kaptain on IoT/Egde environments, where resources are limited.
Thanks to a highly-flexible modular architecture, Kaptain components can be disabled based on the target use-case or environment, which allows Kaptain to be deployed in the model inferencing mode by disabling the Kubeflow core components. To minimize the amount of dependencies, KServe can be configured to run in the RawDeployment
mode to enable InferenceService
deployments with Kubernetes resources instead of using Knative for deploying models.
In this tutorial, you will learn how to install a lightweight version of Kaptain for model inference, as well as deploying the model and making a prediction using either internal or external ingress services.
Prerequisites
Before installing Kaptain, make sure you have the following applications installed on the target cluster:
Istio
cert-manager
You can choose between two model inferencing methods: Model Inferencing With Local Cluster Gateway if you only need your model to be accessible within the cluster, and Model Inferencing via the External Ingress if you need your model to be accessible within the cluster and from outside the cluster.
Model inferencing with a local cluster gateway
Follow this tutorial if you only need your model to be accessible within the cluster via a local cluster gateway.
Deploy Kaptain with a customized configuration by enabling KServe only:
CODEcore: enabled: false ingress: enabled: false kserve: controller: deploymentMode: RawDeployment
Create a namespace and deploy the example:
CODEkubectl create ns kserve-test kubectl apply -n kserve-test -f - <<EOF apiVersion: "serving.kserve.io/v1beta1" kind: "InferenceService" metadata: name: "sklearn-iris" spec: predictor: model: modelFormat: name: sklearn storageUri: "gs://kfserving-examples/models/sklearn/1.0/model" EOF
Run the inference from another pod:
CODEkubectl run curl -n kserve-test --image=curlimages/curl -i --tty -- sh # the following commands are run in the “curl” pod cat <<EOF > "/tmp/iris-input.json" { "instances": [ [6.8, 2.8, 4.8, 1.4], [6.0, 3.4, 4.5, 1.6] ] } EOF curl -v http://sklearn-iris-predictor-default/v1/models/sklearn-iris:predict -d @./tmp/iris-input.json
The output should look similar to this:
CODE* Trying 10.109.166.118:80... * Connected to sklearn-iris-predictor-default (10.109.166.118) port 80 (#0) > POST /v1/models/sklearn-iris:predict HTTP/1.1 > Host: sklearn-iris-predictor-default > User-Agent: curl/7.85.0-DEV > Accept: */* > Content-Length: 76 > Content-Type: application/x-www-form-urlencoded > * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < Server: TornadoServer/6.2 < Content-Type: application/json; charset=UTF-8 < Date: Thu, 20 Oct 2022 22:12:06 GMT < Content-Length: 23 < * Connection #0 to host sklearn-iris-predictor-default left intact {"predictions": [1, 1]}
Model inferencing via the external ingress
Follow this tutorial if you need your model to be accessible within the cluster with a local cluster gateway, and from outside the cluster via the external load balancer.
Deploy Kaptain with a customized configuration by enabling KServe only:
CODEcore: enabled: false ingress: enabled: false kserve: controller: deploymentMode: RawDeployment gateway: ingressClassName: istio
Create an
IngressClass
resource. The name should match theingressClassName
set in the previous step:CODEkubectl apply -f - <<EOF apiVersion: networking.k8s.io/v1 kind: IngressClass metadata: name: istio spec: controller: istio.io/ingress-controller EOF
Deploy the example:
CODEkubectl create ns kserve-test kubectl apply -n kserve-test -f - <<EOF apiVersion: "serving.kserve.io/v1beta1" kind: "InferenceService" metadata: name: "sklearn-iris" spec: predictor: model: modelFormat: name: sklearn storageUri: "gs://kfserving-examples/models/sklearn/1.0/model" EOF
From your local machine, discover the ingress host and port:
CODEexport INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath="{.status.loadBalancer.ingress[*]['ip', 'hostname']}") export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
Run the inference by setting the
Host
header in the request:CODEcat <<EOF > "./iris-input.json" { "instances": [ [6.8, 2.8, 4.8, 1.4], [6.0, 3.4, 4.5, 1.6] ] } EOF SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3) curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/sklearn-iris:predict -d @./iris-input.json
The output should look similar to this:
CODE* Trying 54.148.92.116:80... * Connected to af70d19e9*************************-1815165949.us-west-2.elb.amazonaws.com (54.148.92.116) port 80 (#0) > POST /v1/models/sklearn-iris:predict HTTP/1.1 > Host: sklearn-iris-kserve-test.example.com > User-Agent: curl/7.79.1 > Accept: */* > Content-Length: 76 > Content-Type: application/x-www-form-urlencoded > * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < server: istio-envoy < content-type: application/json; charset=UTF-8 < date: Thu, 20 Oct 2022 22:09:04 GMT < content-length: 23 < x-envoy-upstream-service-time: 2 < * Connection #0 to host af70d19e9*************************-1815165949.us-west-2.elb.amazonaws.com left intact {"predictions": [1, 1]}%