Skip to main content
Skip table of contents

API documentation (Kaptain SDK 1.3.x)

kaptain.config

Config Objects

CODE
class Config()
__init__
CODE
 | __init__(docker_config_provider: ConfigurationProvider, storage_config_provider: ConfigurationProvider, docker_registry_url: Optional[str] = None, docker_registry_certificate_provider: Optional[ConfigurationProvider] = None, base_dir: str = os.getcwd(), base_model_storage_uri: str = "s3://kaptain/models")

Encapsulates platform-specific configuration such as access credentials or AWS endpoints. Config is provided as an argument to the Model and is used to instantiate concrete implementations of lower-level components based on its properties so that users work with a configuration-based API when it comes to fine-tuning the workloads.

Arguments:

  • docker_config_provider: the configuration provider for Docker registry.

  • storage_config_provider: the configuration provider for blob storage access. Currently, only S3 and MinIO are supported.

  • docker_registry_url: private custom Docker registry URL to use with provided TLS certificates.

  • docker_registry_certificate_provider: the configuration provider for Docker registry certificate.

  • base_dir: base directory to use for referencing relative file paths of model files. Defaults to current working directory.

  • base_model_storage_uri: name of a bucket in the remote storage (MinIO or S3) to store the model. Defaults to ‘s3://kaptain/models’

kaptain.envs

_M Objects

CODE
class _M(types.ModuleType)

the environment variables that can change anytime by the user

VERBOSE
CODE
 | @property
 | VERBOSE() -> bool

this environment variable (KAPTAIN_SDK_VERBOSE) will enable showing pod logs unless overridden to not

VERBOSE
CODE
 | @VERBOSE.setter
 | VERBOSE(value: bool) -> None

this environment variable (KAPTAIN_SDK_VERBOSE) will enable showing pod logs unless overridden to not

DEBUG
CODE
 | @property
 | DEBUG() -> bool

this environment variable (KAPTAIN_SDK_DEBUG) will show stacktrace for uncaught exceptions

LOG_TIMEFORMAT
CODE
 | @property
 | LOG_TIMEFORMAT() -> str

this environment variable (KAPTAIN_SDK_LOG_TIMEFORMAT) will set the time format to show in logs

DOCKER_BUILDER_CPU_LIMIT
CODE
 | @DOCKER_BUILDER_CPU_LIMIT.setter
 | DOCKER_BUILDER_CPU_LIMIT(cpu_limit: str) -> None

set cpu limit resource for image builder job

DOCKER_BUILDER_MEM_LIMIT
CODE
 | @DOCKER_BUILDER_MEM_LIMIT.setter
 | DOCKER_BUILDER_MEM_LIMIT(mem_limit: str) -> None

set memory limit resource for image builder job

DOCKER_BUILDER_CPU_REQUEST
CODE
 | @DOCKER_BUILDER_CPU_REQUEST.setter
 | DOCKER_BUILDER_CPU_REQUEST(cpu_request: str) -> None

set cpu request resource for image builder job

DOCKER_BUILDER_MEM_REQUEST
CODE
 | @DOCKER_BUILDER_MEM_REQUEST.setter
 | DOCKER_BUILDER_MEM_REQUEST(mem_request: str) -> None

set memory request resource for image builder job

KAPTAIN_SDK_DELETE_EXPERIMENT
CODE
 | @KAPTAIN_SDK_DELETE_EXPERIMENT.setter
 | KAPTAIN_SDK_DELETE_EXPERIMENT(value: bool) -> None

Delete the experiment resource upon the completion of the tuning step. Note: once the experiment is deleted, it won’t be available for viewing in the Katib UI

KAPTAIN_SDK_TTL_SECONDS_AFTER_FINISHED
CODE
 | @KAPTAIN_SDK_TTL_SECONDS_AFTER_FINISHED.setter
 | KAPTAIN_SDK_TTL_SECONDS_AFTER_FINISHED(ttl_seconds: int) -> None

Number of seconds after which a completed training job gets automatically deleted.

KAPTAIN_SDK_FORCE_CLEANUP
CODE
 | @KAPTAIN_SDK_FORCE_CLEANUP.setter
 | KAPTAIN_SDK_FORCE_CLEANUP(value: bool) -> None

If set to True, delete completed training jobs automatically ignoring the TTL.

kaptain.exceptions

InvalidModelProperty Objects

CODE
class InvalidModelProperty(Exception)

Raised when a model property is None or blank.

UndefinedModelProperty Objects

CODE
class UndefinedModelProperty(Exception)

Raised when a model property is not defined.

UnsupportedAlgorithmException Objects

CODE
class UnsupportedAlgorithmException(Exception)

Raised when a hyperparameter tuning algorithm is not supported.

UnsupportedModelDeploymentException Objects

CODE
class UnsupportedModelDeploymentException(Exception)

Raised when a model deployment is not supported.

UnsupportedMetricsTypeException Objects

CODE
class UnsupportedMetricsTypeException(Exception)

Raised when a metric type is not supported.

ModelDeploymentException Objects

CODE
class ModelDeploymentException(Exception)

Raised in case of a model deployment failure.

ModelValidationException Objects

CODE
class ModelValidationException(Exception)

Raised in case the model configuration properties are missing or model is in a state that is unsuitable for the operation invoked on the model.

ModelAlreadyExistsException Objects

CODE
class ModelAlreadyExistsException(Exception)

Raised in case the model configuration file already exists in the location.

ImageBuildException Objects

CODE
class ImageBuildException(Exception)

Raised in case of a image build failure.

WorkloadDeploymentError Objects

CODE
class WorkloadDeploymentError(Exception)

Raised in case of a workload deployment failure, e.g. failed scheduling

kaptain.utils

diagnose
CODE
diagnose() -> None

List all managed resources for the current namespace: TfJobs, PyTorchJobs, Experiments and Inference Services, Secrets, Pods, and Service Accounts

list_jobs
CODE
list_jobs() -> None

List all training jobs in current namespace.

delete_job
CODE
delete_job(name: str, kind: Optional[str] = None) -> None

Deletes a training job based on provided name and kind.

Arguments:

  • name: job name

  • kind: job kind (optional), e.g. “tfjob”, “pytorchjob” of “job”.

list_experiments
CODE
list_experiments() -> None

Lists Katib experiments.

delete_experiment
CODE
delete_experiment(name: str) -> None

Deletes Katib experiment.

Arguments:

  • name: Name of the experiment

list_inference_services
CODE
list_inference_services() -> None

Lists deployed inference services.

delete_inference_service
CODE
delete_inference_service(name: str) -> None

Deletes inference service.

Arguments:

  • name: Name of the inference service

delete_jobs
CODE
delete_jobs(force: bool = False) -> None

Deletes all training jobs created by the SDK.

Arguments:

  • force: If True, delete all (even running) jobs created by the SDK, otherwise, delete only completed jobs.

delete_experiments
CODE
delete_experiments(force: bool = False) -> None

Deletes all experiments created by the SDK.

Arguments:

  • force: If True, delete all (even running) experiments created by the SDK, otherwise, delete only completed experiments.

delete_inference_services
CODE
delete_inference_services(force: bool = False) -> None

Delete all inference services created by the SDK.

Arguments:

inference services with ‘NotReady’ status.

  • force: If True, delete all (even already deployed) inference services created by the SDK, otherwise, delete

Returns:

clean
CODE
clean(force: bool = False) -> None

Deletes stale resources (such as Secrets and ServiceAccounts which are not used by any workloads).

WARNING: Use with caution! To prevent data loss, please first run this method without any arguments set and check whether the resources proposed by the method can be safely deleted.

Arguments:

  • force: If False, method only prints unused resource names without actually removing them.

clean_all
CODE
clean_all(force: bool = False) -> None

Deletes all completed workloads and stale Kubernetes resources created by the SDK.

Arguments:

delete only completed workloads and prints stale resources (secrets and service accounts).

  • force: If True, delete all (even running) workloads and resources created by the SDK, otherwise

Returns:

list_all_resources
CODE
list_all_resources() -> None

Lists all deployed resources: TfJobs, PyTorchJobs, Experiments and Inference Services.

delete_resource
CODE
delete_resource(kind: str, name: str) -> None

Deletes inference service.

Arguments:

  • name: Name of the resource

  • kind: Kind of the resource - one of “tfjob”, “pytorchjob”, “experiment” or “inferenceservice”.

kaptain.model

kaptain.model.models

Model Objects

CODE
class Model()
__init__
CODE
 | __init__(id: str, name: str, description: str, version: str, framework: ModelFramework, framework_version: str, main_file: str, image_name: str, base_image: str, extra_files: Optional[List[str]] = None, requirements: Optional[str] = None, labels: Optional[List[str]] = None, config: Optional[Config] = None, serving_config: Optional[Dict[str, str]] = None)

A representation of a machine learning model.

When the model is created for the first time, its internal revision is set to a random UUID and its internal state is “untrained”. Once the model is trained or tuned, its state will be updated accordingly, hyperparameter values set, its revision refreshed, and it can be saved or deployed. Each action (train, tune, deploy) alters the revision and is stored in the model tracking database.

Arguments:

Details on the format can be found here: https://pip.pypa.io/en/stable/cli/pip_install/ requirements-file-format.

  • id: Unique identifier of model, e.g. “dev/mnist”. It is recommended to include the stage of the model (e.g. dev/prod) in the name to make it easier to filter models under active development and in production.

  • name: Short name of the model, e.g. “MNIST”. This name is visible in the model tracking database.

  • description: Description of the model, e.g. “Digit recognition for MNIST data set”. This description is visible in the model tracking database.

  • version: Model version, e.g. “4.5”

  • main_file: Main (Python) file that contains the executable model code, e.g. “trainer.py”.

  • image_name: Name of the repository to push the resulting image, e.g. ‘kaptain/mnist’ Can also contain image tag, e.g. “kaptain/mnist:0.0.1-tensorflow-2.2.0”.

  • extra_files: Auxiliary files, e.g. [“utils.py”, “data_loader.py”].

  • requirements: Path to the file with additional python packages to install into the image in pip compatible format (e.g. “requirements.txt”).

  • framework: Machine learning library or framework used for the model, e.g. “tensorflow”.

  • framework_version: Machine learning library or framework version used by model, e.g. “2.3.2”

  • base_image: Base container image, e.g. “tensorflow-2.3.2”

  • labels: Custom labels for deployment-related metadata, e.g. “dev/mnist-tensorflow”

  • config: Configuration object used for configuring access to Docker registries and blob storage.

  • serving_config: Configuration specific to model servers

hyperparameters
CODE
 | @property
 | hyperparameters() -> Optional[Dict[str, Any]]

Hyperparameters of the model as defined through an action:

  • Train: uses the static values provided to the training procedure.

  • Tune: extracts the recommended values after running multiple experiments.

metrics
CODE
 | @property
 | metrics() -> Optional[Dict[str, Dict[str, float]]]

Metrics of the model when tuning is run

build
CODE
 | build(verbose: Optional[bool] = None) -> bool

Builds a Docker image with the model training code and dependencies and publishes it to the registry specified in the configuration. Label with checksum of the model’s content will be included in the image. Image rebuilding is triggered only if an image with the same name and checksum is not already present in the registry.

Arguments:

  • verbose: Enable verbose output (can also be set via environment variable KAPTAIN_SDK_VERBOSE).

Returns:

True if successful, otherwise False

train
CODE
 | train(hyperparameters: Dict[str, Any], args: Optional[Dict[str, Any]] = None, gpus: Optional[int] = None, cpu: Optional[str] = None, memory: Optional[str] = None, resources: Optional[Resources] = None, workers: int = 2, verbose: Optional[bool] = None, ttl_seconds_after_finished: Optional[int] = None, force_cleanup: Optional[bool] = None, timeout: Optional[int] = constants.DEFAULT_TIMEOUT_SECONDS) -> bool

Train a model in a distributed manner.

Simple / advanced resource API

Resources may be specified via the ‘simple’ resource parameters::

CODE
model.train(workers=1, cpu=1, memory="2G", gpus=0)

… the model training process will have both the request and limit set for all resource parameters.

More fine-grained resource specification is possible via the ‘resources’ parameter::

CODE
model.train(workers=workers, resources=Resources(cpu_request=1, memory_limit="2G", gpu_limit=gpus))

It is illegal to specify both the ‘resources’ parameter or any ‘simple’ resource parameters (gpus, memory, cpu).

Arguments:

Can be set via ‘KAPTAIN_SDK_TTL_SECONDS_AFTER_FINISHED’ environment variable.

  • args: Arguments to be passed to the training function.

  • hyperparameters: Dictionary of hyperparameter values.

  • workers: Number of parallel workers to use (default: 2).

  • gpus: Number of GPUs to use (default: 0).

  • memory: Amount of memory for each worker (optional),

  • cpu: Number of CPUs to use for each worker (optional).

  • resources: Advanced API for resource specification. Do not use in tandem with the parameters gpus, memory and cpu (optional).

  • verbose: Enable verbose output (can also be set via environment variable KAPTAIN_SDK_VERBOSE).

  • ttl_seconds_after_finished: Number of seconds after which a completed training job gets automatically deleted.

  • force_cleanup: If set to True, delete completed training jobs automatically ignoring the TTL (can also be set via ‘KAPTAIN_SDK_FORCE_CLEANUP’ environment variable).

  • timeout: Number of seconds to wait for the training job to complete before timing-out

Returns:

True if successful, otherwise False

tune
CODE
 | tune(hyperparameters: Dict[str, Domain], objectives: List[str], objective_goal: Optional[float] = None, objective_type: str = "maximize", workers: int = 2, gpus: Optional[int] = None, cpu: Optional[str] = None, memory: Optional[str] = None, resources: Optional[Resources] = None, trials: int = 16, parallel_trials: int = 2, failed_trials: int = 4, algorithm: Optional[str] = Algorithm.RANDOM.value, algorithm_setting: Optional[dict] = None, args: Optional[Dict[str, Any]] = None, verbose: Optional[bool] = None, delete_experiment: Optional[bool] = None, ttl_seconds_after_finished: Optional[int] = None, timeout: Optional[int] = constants.DEFAULT_TIMEOUT_SECONDS) -> bool

Tunes a model with parallel trials and possibly distributed trials.

Simple / advanced resource API

Resources may be specified via the ‘simple’ resource parameters::

CODE
model.tune(hyperparameters=params, objectives=objectives, cpu=1, memory="2G", gpus=0)

… the deployed tuning process will have both the request and limit set for all resource parameters.

More fine-grained resource specification is possible via the ‘resources’ parameter::

CODE
model.tune(
  hyperparameters=params,
  objectives=objectives,
  resources=Resources(cpu_request=1, memory_limit="2G", gpu_limit=gpus))

It is illegal to specify both the ‘resources’ parameter or any ‘simple’ resource parameters (gpus, memory, cpu).

Arguments:

  • args: Arguments to be passed to the the experiment trial specification.

  • hyperparameters: Dictionary of hyperparameters and their specified domains.

  • objectives: List of metrics to track in order of importance. The first one listed is used in conjunction with the objective goal and type.

  • objective_goal: Main objective’s goal, which when reached causes the tuning to stop. The main objective is the first element in objectives. If None, the tuning will continue until the maximum number of trials has been reached.

  • objective_type: Whether to “maximize” or “minimize” the main objective’s value (default: maximize).

  • workers: Number of parallel workers to use for each trial (default: 2).

  • gpus: Number of GPUs to use (default: 0).

  • memory: Amount of memory for each worker (optional),

  • cpu: Number of CPUs to use for each worker (optional).

  • resources: Advanced API for resource specification. Do not use in tandem with the parameters gpus, memory and cpu (optional).

  • trials: Maximum number of trials (default: 16).

  • parallel_trials: Maximum number of trials to run in parallel (default: 2).

  • failed_trials: Maximum number of failed trials before hyperparameter tuning stops (default: 4).

  • algorithm: Algorithm to use for hyperparameter search (default: random).

  • algorithm_setting: Algorithm settings. Please see https://www.kubeflow.org/docs/components/katib/experiment/ for details.

  • verbose: Enable verbose output (can also be set via environment variable KAPTAIN_SDK_VERBOSE).

  • delete_experiment: Delete the experiment resource upon the completion of the tuning step. Can be set via ‘KAPTAIN_SDK_DELETE_EXPERIMENT’ environment variable. Note: once the experiment is deleted, it won’t be available for viewing in the Katib UI.

  • ttl_seconds_after_finished: Number of seconds after which a completed training job gets automatically deleted.

  • timeout: Number of seconds to wait for the experiment to complete before timing-out.

Returns:

True if successful, otherwise False

deploy
CODE
 | deploy(model_uri: Optional[str] = None, autoscale: int = 2, gpus: Optional[int] = None, cpu: Optional[str] = None, memory: Optional[str] = None, resources: Optional[Resources] = None, replace: bool = False, transformer: Optional[Transformer] = None, **kwargs: str, ,) -> bool

Deploys a model.

Simple / advanced resource API

Resources may be specified via the ‘simple’ resource parameters:

CODE
model.deploy(model_uri=uri, cpu=1, memory=“2G”, gpus=0)

… the deployed model process will have both the request and limit set for all resource parameters.

More fine-grained resource specification is possible via the ‘resources’ parameter:

CODE
model.deploy(model_uri=uri, resources=Resources(cpu_request=1, memory_limit=“2G”, gpu_limit=gpus))

It is illegal to specify both the ‘resources’ parameter or any ‘simple’ resource parameters (gpus, memory, cpu).

Arguments:

  • model_uri: URI of the saved model to be loaded. If None, the default location managed by Kaptain is chosen based on the most recent state of the model.

  • autoscale: Target concurrency (default: 2).

  • gpus: Number of GPUs to use (default: 0).

  • memory: Amount of memory for each worker (optional),

  • cpu: Number of CPUs to use for each worker (optional).

  • resources: Advanced API for resource specification. Do not use in tandem with the parameters gpus, memory and cpu (optional).

  • replace: Safety flag to avoid accidental redeployment of the model. If True, the previously deployed model will be replaced. If False, an error will be logged in case the model had been previously deployed.

  • transformer: Transformer is an component which does pre/post processing alongside with model inference. It usually takes raw input and transforms them to the input tensors model server expects.

  • kwargs: Keyword arguments for the deployment.

Returns:

True if successful, otherwise False

deploy_canary
CODE
 | deploy_canary(canary_traffic_percentage: int, model_uri: Optional[str] = None, **kwargs: str, ,) -> None

Deploys a model in a canary with a pre-determined percentage of traffic. A canary deployment allows a model to be run in parallel with a baseline or previous model revision. This allows traffic to be split, so the latest revision can be checked for possible issues with model (e.g. compared to the baseline) or system (e.g. latency) performance. To deploy a model to the canary, a previously deployed model revision must exist.


To deploy canary with 30 percent traffic:

model.deploy_canary(canary_traffic_percentage=30)

To change the canary traffic percentage to 50 (half the traffic):

model.deploy_canary(canary_traffic_percentage=50)

To deploy canary with 30 percent traffic and specified saved model location:

model.deploy_canary(canary_traffic_percentage=30, model_uri=uri)

To change the canary traffic percentage to 50 (half the traffic) for a model deployed from a specified saved location:

model.deploy_canary(canary_traffic_percentage=50, model_uri=uri)

Arguments:

  • canary_traffic_percentage: the percentage of traffic to route to the canary model.

  • model_uri: URI of the saved model to be loaded. If None, the default location managed by Kaptain is chosen based on the most recent state of the model.

rollback_canary
CODE
 | rollback_canary() -> None

Undeploy the model from canary and switch 100% of traffic to the previously deployed baseline model.

:raises: ModelDeploymentException if canary deployment doesn’t exist.

promote_canary
CODE
 | promote_canary() -> None

Promote the model from canary to server 100% of traffic.

:raises: ModelDeploymentException if canary deployment doesn’t exist.

undeploy
CODE
 | undeploy() -> None

Removes existing deployment and canary deployment of a model.

:raises: ModelDeploymentException in case the model was not previously deployed

log_data
CODE
 | log_data(name: str, uri: str, description: Optional[str] = None, features: Optional[List[str]] = None, version: Optional[str] = None) -> None

Logs an input data set to a model execution.

Arguments:

  • name: Name of the data set.

  • uri: URI of the data set.

  • description: Optional description.

  • features: List of features used.

  • version: Optional version of the data set.

log_metrics
CODE
 | log_metrics(metrics: dict, metrics_type: str, uri: Optional[str] = None) -> None

Logs model evaluation metrics to a model execution.

Arguments:

  • metrics: A dictionary of metrics names and their values, e.g. {“accuracy”, 0.95, “auc”: 0.975}.

  • metrics_type: Evaluation type of the metric: training, testing, validation, or production (for deployed models).

  • uri: Optional URI to the metrics (e.g. log directory).

meta
CODE
 | meta() -> ModelMeta

Creates an immutable snapshot of model properties.

Returns:

ModelMeta data class with a copy of all current model field values

save_as_json
CODE
 | save_as_json(path: str = "", overwrite: bool = False) -> None

Provides a way to save the model as JSON. Uses the storage_config_provider ConfigurationProvider from the model’s config to save to.

Arguments:

default location. will raise an exception if the model already exists.

  • path: Optional path to save the model to. If not provided, the model will be saved to the

  • overwrite: If True, will overwrite the model json file if it already exists. If False,

Returns:

None

load_from_json
CODE
 | @classmethod
 | load_from_json(cls, model_uri: str, config: Config) -> "Model"

Load the instance of the Model class from model_uri location.

Arguments:

  • model_uri: model location in the object storage, e.g. ‘s3://kaptain/models/mnist’.

  • config: model configuration, such as S3 credentials.

Returns:

new instance of the Model class.

kaptain.model.frameworks

kaptain.model.states

kaptain.hyperparameter

kaptain.hyperparameter.algorithms

Algorithm Objects

CODE
class Algorithm(Enum)
of
CODE
 | @staticmethod
 | of(algorithm: Optional[str]) -> Optional["Algorithm"]

Converts a hyperparameter tuning algorithm (string) to an Algorithm enum.

Arguments:

  • algorithm: Model framework or library.

Returns:

Algorithm enum if the algorithm is supported.

kaptain.hyperparameter.domains

Double Objects

CODE
class Double(Domain)
__init__
CODE
 | __init__(min: float, max: float)

Defines a floating-point (double) hyperparameter with domain [min, max]

Arguments:

  • min: Minimum value

  • max: Maximum value

Integer Objects

CODE
class Integer(Domain)
__init__
CODE
 | __init__(min: int, max: int)

Defines an integer (int) hyperparameter with domain [min, max]

Arguments:

  • min: Minimum value

  • max: Maximum value

Discrete Objects

CODE
class Discrete(Domain)

Defines an discrete hyperparameter with a list of possible values of floats

Arguments:

  • values: List of allowed floating-point values

Categorical Objects

CODE
class Categorical(Domain)

Defines an integer hyperparameter with a list of possible values of strings

Arguments:

  • values: List of allowed string values

kaptain.platform.config

kaptain.platform.config.provider

ConfigurationProvider Objects

CODE
class ConfigurationProvider(ABC)

The ConfigurationProvider interface defines high-level functions for translating user-provided credentials for a Docker registry or cloud buckets into Kubernetes Secrets required for distributed building, training, tuning, and serving components.

FileBasedConfigurationProvider Objects

CODE
class FileBasedConfigurationProvider(ConfigurationProvider)

The FileBasedConfigurationProvider defines a factory method for creating instances of ConfigurationProvider from provided configuration file specific for the concrete implementation.

EnvironmentVariableConfigurationProvider Objects

CODE
class EnvironmentVariableConfigurationProvider(ConfigurationProvider)

The EnvironmentVariableConfigurationProvider defines a factory method for creating instances of ConfigurationProvider from environment variables specific for the concrete implementation.

kaptain.platform.config.certificates

DockerRegistryCertificateProvider Objects

CODE
class DockerRegistryCertificateProvider(FileBasedConfigurationProvider)
__init__
CODE
 | __init__(certificate_body: str, certificate_path: Optional[str] = None)

Docker Registry Certificate Provider is a container for private Docker registries running with custom/self-signed TLS certificates which are required for pushing Docker images containing model training code.

Docker Registry Certificate Provider by default loads the configuration from $HOME/.tls/certificate.crt. It is also possible to specify a custom registry certificate.crt location using DockerRegistryCertificateProvider.from_file(path=/path/to/certificate.crt).

Docker Registry certificate.crt file can be created ad-hoc while using a notebook or mounted to the notebook from a Secret. To support mounting of a shared Docker certificate.crt as a volume, the system administrator must create the PodDefault resource with a certificate file to make it available for the user.

Arguments:

  • certificate_body: The configuration string in json format

  • certificate_path: Path to the certificate file (optional)

kaptain.platform.config.docker

DockerConfigurationProvider Objects

CODE
class DockerConfigurationProvider(FileBasedConfigurationProvider)
__init__
CODE
 | __init__(config_json: str)

Docker Configuration Provider is a container for user Docker configuration which are required for pulling and pushing images used in training and tuning jobs.

Docker Configuration Provider supports standard Docker config.json file of the following format:

CODE
    {
        "auths": {
                "https://index.docker.io/v1/": {
                        "auth": "<username and password in base64>"
                }
        }
    }

Copy

The auth field is a base64-encoded string of the form “:” where and are the actual username and password used to login to Docker registry. To generate value for auth field, use the following command: echo -n "<username>:<password>" | base64.

Docker Configuration Provider by default loads the configuration from $HOME/.docker/config.json. It is also possible to specify a custom config.json location using DockerConfigurationProvider.from_file(path=/path/to/config.json).

Docker config.json file can be created ad-hoc while using a notebook or mounted to the notebook from a Secret. To support mounting of a shared Docker config.json as a volume, the system administrator must create the PodDefault resource with a pre-populated file to make it available for the user.

Arguments:

  • config_json: The configuration string in json format

kaptain.platform.config.defaults

kaptain.platform.config.s3

S3ConfigurationProvider Objects

CODE
class S3ConfigurationProvider(FileBasedConfigurationProvider,  EnvironmentVariableConfigurationProvider)

__init__
CODE
 | __init__(aws_access_key_id: str, aws_secret_access_key: str, aws_session_token: Optional[str] = None, region_name: str = _DEFAULT_REGION, s3_endpoint: Optional[str] = None, s3_signature_version: Optional[str] = None, s3_force_path_style: bool = False)

S3-specific configuration provider which supports reading configuration from AWS configuration file and from environment variables. The provider can be used as a configuration object, or for convenience resolution of the configuration both on the development side and in containers when configuration is passed in form of environment variables from Kubernetes Secrets.

Constructor arguments represent a subset of [boto3 configuration properties] (https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html ) sufficient for kaptain.

Arguments:

  • aws_access_key_id: The access key to authenticate with S3.

  • aws_secret_access_key: The secret key to authenticate with S3.

  • aws_session_token: The session token to authenticate with S3.

  • region_name: The name of AWS region.

  • s3_endpoint: The complete URL of S3 endpoint. This parameter is required when working with non-standard, S3-compatible storage solutions such as MinIO. It should be set to a resolvable address of the running server.

  • s3_signature_version: The signature version when signing requests

  • s3_force_path_style: When enabled, the clients will use path style instead of URL style for accessing buckets

get_secret_body
CODE
 | get_secret_body() -> Dict[str, str]

Transforms the configuration properties into a dict of environment variables. The resulting dict will be used for creating Kubernetes Secret to securely share access credentials between containers.

Returns:

dict of environment variables with associated values

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.