Spark in a Project

Deploying Spark in a project

Getting started

To get started with creating and managing Spark workloads in a project, you first need to deploy the Spark Operator in the workspace where the project exists.

After deploying the Spark Operator, apply the Spark Operator specific custom resources. The Spark Operator works with the following kinds of custom resources:

SparkApplication
ScheduledSparkApplication

See Spark Operator API documentation for more details.

Example Deployment

If you need to manage these custom resources and RBAC resources across all clusters in a project, it is recommended you use Project Deployments which enables you to leverage GitOps to deploy the resources. Otherwise, you will need to create the resources manually in each cluster.

This example deployment walks you through deploying a Spark application in a project namespace. The result of this procedure is a running Spark application ready for use in your project’s namespace.

Create your Project if you don’t already have one.
Set the PROJECT_NAMESPACE environment variable to the name of your project’s namespace:
CODE
```
export PROJECT_NAMESPACE=<project namespace>
```

Ensure the necessary RBAC resources referenced in your custom resources exist, otherwise the custom resources can fail. See the Spark Operator documentation for details.

This is an example of commands for you to create the RBAC resources needed in your project namespace:

CODE

kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark-service-account
  namespace: ${PROJECT_NAMESPACE}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: ${PROJECT_NAMESPACE}
  name: spark-role
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["*"]
- apiGroups: [""]
  resources: ["services"]
  verbs: ["*"]
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["*"]
- apiGroups: [""]
  resources: ["persistentvolumeclaims"]
  verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: spark-role-binding
  namespace: ${PROJECT_NAMESPACE}
subjects:
- kind: ServiceAccount
  name: spark-service-account
  namespace: ${PROJECT_NAMESPACE}
roleRef:
  kind: Role
  name: spark-role
  apiGroup: rbac.authorization.k8s.io
EOF

Set the SPARK_SERVICE_ACCOUNT environment variable to one of the following:

${PROJECT_NAMESPACE}, if you skipped the previous step to create RBAC resources.

CODE

# This service account is automatically created when you create a project and has access to everything in the project namespace. 
export SPARK_SERVICE_ACCOUNT=${PROJECT_NAMESPACE}

Or set it to spark-service-account

CODE

export SPARK_SERVICE_ACCOUNT=spark-service-account

Apply the SparkApplication custom resource in your project namespace

CODE

kubectl apply -f - <<EOF
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: pyspark-pi
  namespace: ${PROJECT_NAMESPACE}
spec:
  type: Python
  pythonVersion: "3"
  mode: cluster
  image: "gcr.io/spark-operator/spark-py:v3.1.1"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.1.1"
  restartPolicy:
    type: OnFailure
    onFailureRetries: 3
    onFailureRetryInterval: 10
    onSubmissionFailureRetries: 5
    onSubmissionFailureRetryInterval: 20
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.1.1
    serviceAccount: ${SPARK_SERVICE_ACCOUNT}
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.1.1
EOF

Delete Spark custom resources

Follow these steps to delete the Spark custom resources:

View SparkApplications in all namespaces:
CODE
```
kubectl get sparkapp -A
```

Delete a specific SparkApplication:

CODE

kubectl -n ${PROJECT_NAMESPACE} delete sparkapp <name of sparkapplication>

Getting started

Example Deployment

Delete Spark custom resources

Resources