Skip to main content
Skip table of contents

Kaptain for Model Inferencing

Deploy Kaptain for model inferencing

Kubeflow provides various tools and operators that simplify machine learning workflows. All these components require additional cluster resources to install and operate properly. In some cases, models can be trained and tuned on one cluster (or training cluster), and then deployed on other clusters (or deployment clusters). For example, you can deploy to a cluster that runs your business-specific applications, or a cluster that stores data locally. 

Alternatively, you can deploy a minimal installation of Kaptain on IoT/Egde environments, where resources are limited.

Thanks to a highly-flexible modular architecture, Kaptain components can be disabled based on the target use-case or environment, which allows Kaptain to be deployed in the model inferencing mode by disabling the Kubeflow core components. To minimize the amount of dependencies, KServe can be configured to run in the RawDeployment mode to enable InferenceService deployments with Kubernetes resources instead of using Knative for deploying models.

In this tutorial, you will learn how to install a lightweight version of Kaptain for model inference, as well as deploying the model and making a prediction using either internal or external ingress services.

Prerequisites

Before installing Kaptain, make sure you have the following applications installed on the target cluster:

  • Istio

  • cert-manager

You can choose between two model inferencing methods: Model Inferencing With Local Cluster Gateway if you only need your model to be accessible within the cluster, and Model Inferencing via the External Ingress if you need your model to be accessible within the cluster and from outside the cluster.

Model inferencing with a local cluster gateway

Follow this tutorial if you only need your model to be accessible within the cluster via a local cluster gateway. 

  1. Deploy Kaptain with a customized configuration by enabling KServe only:

    CODE
    core:
      enabled: false
    ingress:
      enabled: false
    kserve:
      controller:
        deploymentMode: RawDeployment
  2. Create a namespace and deploy the example:

    CODE
    kubectl create ns kserve-test 
    kubectl apply -n kserve-test -f - <<EOF
    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "sklearn-iris"
    spec:
      predictor:
        model:
          modelFormat:
            name: sklearn
          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
    EOF
  3. Run the inference from another pod:

    CODE
    kubectl run curl -n kserve-test --image=curlimages/curl -i --tty -- sh
    # the following commands are run in the “curl” pod
    cat <<EOF > "/tmp/iris-input.json"
    {
      "instances": [
        [6.8,  2.8,  4.8,  1.4], 
        [6.0,  3.4,  4.5,  1.6]
      ]
    }
    EOF
    curl -v http://sklearn-iris-predictor-default/v1/models/sklearn-iris:predict -d @./tmp/iris-input.json

    The output should look similar to this:

    CODE
    *   Trying 10.109.166.118:80...
    * Connected to sklearn-iris-predictor-default (10.109.166.118) port 80 (#0)
    > POST /v1/models/sklearn-iris:predict HTTP/1.1
    > Host: sklearn-iris-predictor-default
    > User-Agent: curl/7.85.0-DEV
    > Accept: */*
    > Content-Length: 76
    > Content-Type: application/x-www-form-urlencoded
    >
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Server: TornadoServer/6.2
    < Content-Type: application/json; charset=UTF-8
    < Date: Thu, 20 Oct 2022 22:12:06 GMT
    < Content-Length: 23
    <
    * Connection #0 to host sklearn-iris-predictor-default left intact
    {"predictions": [1, 1]}

Model inferencing via the external ingress

Follow this tutorial if you need your model to be accessible within the cluster with a local cluster gateway, and from outside the cluster via the external load balancer.

  1. Deploy Kaptain with a customized configuration by enabling KServe only:

    CODE
    core:
      enabled: false
    ingress:
      enabled: false
    kserve:
      controller:
        deploymentMode: RawDeployment
        gateway:
          ingressClassName: istio
  2. Create an IngressClass resource. The name should match the ingressClassName set in the previous step:

    CODE
    kubectl apply -f - <<EOF
    apiVersion: networking.k8s.io/v1
    kind: IngressClass
    metadata:
      name: istio
    spec:
      controller: istio.io/ingress-controller
    EOF
  3. Deploy the example:

    CODE
    kubectl create ns kserve-test 
    kubectl apply -n kserve-test -f - <<EOF
    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "sklearn-iris"
    spec:
      predictor:
        model:
          modelFormat:
            name: sklearn
          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
    EOF
  4. From your local machine, discover the ingress host and port:

    CODE
    export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath="{.status.loadBalancer.ingress[*]['ip', 'hostname']}")
    export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
  5. Run the inference by setting the Host header in the request:

    CODE
    cat <<EOF > "./iris-input.json"
    {
      "instances": [
        [6.8,  2.8,  4.8,  1.4], 
        [6.0,  3.4,  4.5,  1.6]
      ]
    }
    EOF
    SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
    curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/sklearn-iris:predict -d @./iris-input.json

    The output should look similar to this:

    CODE
    *   Trying 54.148.92.116:80...
    * Connected to af70d19e9*************************-1815165949.us-west-2.elb.amazonaws.com (54.148.92.116) port 80 (#0)
    > POST /v1/models/sklearn-iris:predict HTTP/1.1
    > Host: sklearn-iris-kserve-test.example.com
    > User-Agent: curl/7.79.1
    > Accept: */*
    > Content-Length: 76
    > Content-Type: application/x-www-form-urlencoded
    >
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < server: istio-envoy
    < content-type: application/json; charset=UTF-8
    < date: Thu, 20 Oct 2022 22:09:04 GMT
    < content-length: 23
    < x-envoy-upstream-service-time: 2
    <
    * Connection #0 to host af70d19e9*************************-1815165949.us-west-2.elb.amazonaws.com left intact
    {"predictions": [1, 1]}%
JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.