Spark in a Project
Deploying Spark in a project
Getting started
To get started with creating and managing Spark workloads in a project, you first need to deploy the Spark Operator in the workspace where the project exists.
After deploying the Spark Operator, apply the Spark Operator specific custom resources. The Spark Operator works with the following kinds of custom resources:
SparkApplication
ScheduledSparkApplication
See Spark Operator API documentation for more details.
Example Deployment
If you need to manage these custom resources and RBAC resources across all clusters in a project, it is recommended you use Project Deployments which enables you to leverage GitOps to deploy the resources. Otherwise, you will need to create the resources manually in each cluster.
This example deployment walks you through deploying a Spark application in a project namespace. The result of this procedure is a running Spark application ready for use in your project’s namespace.
Create your Project if you don’t already have one.
Set the
PROJECT_NAMESPACE
environment variable to the name of your project’s namespace:CODEexport PROJECT_NAMESPACE=<project namespace>
Ensure the necessary RBAC resources referenced in your custom resources exist, otherwise the custom resources can fail. See the Spark Operator documentation for details.
This is an example of commands for you to create the RBAC resources needed in your project namespace:
CODEkubectl apply -f - <<EOF apiVersion: v1 kind: ServiceAccount metadata: name: spark-service-account namespace: ${PROJECT_NAMESPACE} --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: ${PROJECT_NAMESPACE} name: spark-role rules: - apiGroups: [""] resources: ["pods"] verbs: ["*"] - apiGroups: [""] resources: ["services"] verbs: ["*"] - apiGroups: [""] resources: ["configmaps"] verbs: ["*"] - apiGroups: [""] resources: ["persistentvolumeclaims"] verbs: ["*"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: spark-role-binding namespace: ${PROJECT_NAMESPACE} subjects: - kind: ServiceAccount name: spark-service-account namespace: ${PROJECT_NAMESPACE} roleRef: kind: Role name: spark-role apiGroup: rbac.authorization.k8s.io EOF
Set the
SPARK_SERVICE_ACCOUNT
environment variable to one of the following:${PROJECT_NAMESPACE}
, if you skipped the previous step to create RBAC resources.CODE# This service account is automatically created when you create a project and has access to everything in the project namespace. export SPARK_SERVICE_ACCOUNT=${PROJECT_NAMESPACE}
Or set it to
spark-service-account
CODEexport SPARK_SERVICE_ACCOUNT=spark-service-account
Apply the
SparkApplication
custom resource in your project namespaceCODEkubectl apply -f - <<EOF apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: pyspark-pi namespace: ${PROJECT_NAMESPACE} spec: type: Python pythonVersion: "3" mode: cluster image: "gcr.io/spark-operator/spark-py:v3.1.1" imagePullPolicy: Always mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py sparkVersion: "3.1.1" restartPolicy: type: OnFailure onFailureRetries: 3 onFailureRetryInterval: 10 onSubmissionFailureRetries: 5 onSubmissionFailureRetryInterval: 20 driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 3.1.1 serviceAccount: ${SPARK_SERVICE_ACCOUNT} executor: cores: 1 instances: 1 memory: "512m" labels: version: 3.1.1 EOF
Delete Spark custom resources
Follow these steps to delete the Spark custom resources:
View
SparkApplications
in all namespaces:CODEkubectl get sparkapp -A
Delete a specific
SparkApplication
:CODEkubectl -n ${PROJECT_NAMESPACE} delete sparkapp <name of sparkapplication>