Kaptain for Model Inferencing

Deploy Kaptain for model inferencing

Kubeflow provides various tools and operators that simplify machine learning workflows. All these components require additional cluster resources to install and operate properly. In some cases, models can be trained and tuned on one cluster (or training cluster), and then deployed on other clusters (or deployment clusters). For example, you can deploy to a cluster that runs your business-specific applications, or a cluster that stores data locally.

Alternatively, you can deploy a minimal installation of Kaptain on IoT/Egde environments, where resources are limited.

Thanks to a highly-flexible modular architecture, Kaptain components can be disabled based on the target use-case or environment, which allows Kaptain to be deployed in the model inferencing mode by disabling the Kubeflow core components. To minimize the amount of dependencies, KServe can be configured to run in the RawDeployment mode to enable InferenceService deployments with Kubernetes resources instead of using Knative for deploying models.

In this tutorial, you will learn how to install a lightweight version of Kaptain for model inference, as well as deploying the model and making a prediction using either internal or external ingress services.

Prerequisites

Before installing Kaptain, make sure you have the following applications installed on the target cluster:

Istio
cert-manager

You can choose between two model inferencing methods: Model Inferencing With Local Cluster Gateway if you only need your model to be accessible within the cluster, and Model Inferencing via the External Ingress if you need your model to be accessible within the cluster and from outside the cluster.

Model inferencing with a local cluster gateway

Follow this tutorial if you only need your model to be accessible within the cluster via a local cluster gateway.

Deploy Kaptain with a customized configuration by enabling KServe only:

CODE

core:
  enabled: false
ingress:
  enabled: false
kserve:
  controller:
    deploymentMode: RawDeployment

Create a namespace and deploy the example:

CODE

kubectl create ns kserve-test 
kubectl apply -n kserve-test -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF

Run the inference from another pod:

CODE

kubectl run curl -n kserve-test --image=curlimages/curl -i --tty -- sh
# the following commands are run in the “curl” pod
cat <<EOF > "/tmp/iris-input.json"
{
  "instances": [
    [6.8,  2.8,  4.8,  1.4], 
    [6.0,  3.4,  4.5,  1.6]
  ]
}
EOF
curl -v http://sklearn-iris-predictor-default/v1/models/sklearn-iris:predict -d @./tmp/iris-input.json

The output should look similar to this:

CODE

*   Trying 10.109.166.118:80...
* Connected to sklearn-iris-predictor-default (10.109.166.118) port 80 (#0)
> POST /v1/models/sklearn-iris:predict HTTP/1.1
> Host: sklearn-iris-predictor-default
> User-Agent: curl/7.85.0-DEV
> Accept: */*
> Content-Length: 76
> Content-Type: application/x-www-form-urlencoded
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: TornadoServer/6.2
< Content-Type: application/json; charset=UTF-8
< Date: Thu, 20 Oct 2022 22:12:06 GMT
< Content-Length: 23
<
* Connection #0 to host sklearn-iris-predictor-default left intact
{"predictions": [1, 1]}

Model inferencing via the external ingress

Follow this tutorial if you need your model to be accessible within the cluster with a local cluster gateway, and from outside the cluster via the external load balancer.

Deploy Kaptain with a customized configuration by enabling KServe only:

CODE

core:
  enabled: false
ingress:
  enabled: false
kserve:
  controller:
    deploymentMode: RawDeployment
    gateway:
      ingressClassName: istio

Create an IngressClass resource. The name should match the ingressClassName set in the previous step:

CODE

kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: istio
spec:
  controller: istio.io/ingress-controller
EOF

Deploy the example:

CODE

kubectl create ns kserve-test 
kubectl apply -n kserve-test -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF

From your local machine, discover the ingress host and port:

CODE

export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath="{.status.loadBalancer.ingress[*]['ip', 'hostname']}")
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')

Run the inference by setting the Host header in the request:

CODE

cat <<EOF > "./iris-input.json"
{
  "instances": [
    [6.8,  2.8,  4.8,  1.4], 
    [6.0,  3.4,  4.5,  1.6]
  ]
}
EOF
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/sklearn-iris:predict -d @./iris-input.json

The output should look similar to this:

CODE

*   Trying 54.148.92.116:80...
* Connected to af70d19e9*************************-1815165949.us-west-2.elb.amazonaws.com (54.148.92.116) port 80 (#0)
> POST /v1/models/sklearn-iris:predict HTTP/1.1
> Host: sklearn-iris-kserve-test.example.com
> User-Agent: curl/7.79.1
> Accept: */*
> Content-Length: 76
> Content-Type: application/x-www-form-urlencoded
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< server: istio-envoy
< content-type: application/json; charset=UTF-8
< date: Thu, 20 Oct 2022 22:09:04 GMT
< content-length: 23
< x-envoy-upstream-service-time: 2
<
* Connection #0 to host af70d19e9*************************-1815165949.us-west-2.elb.amazonaws.com left intact
{"predictions": [1, 1]}%