I am working in a Dataproc cluster setup to test all the features. I have created a cluster template and I've been using the creation command almost every day, but this week it stopped working. The error printed is:
ERROR: (gcloud.dataproc.clusters.create) Operation [projects/cluster_name/regions/us-central1/operations/number] failed: Failed to initialize node cluster_name-m: Component ranger failed to activate post-hdfs See output in: gs://cluster_bucket/google-cloud-dataproc-metainfo/number/cluster_name-m/dataproc-post-hdfs-startup-script_output.
When I see the output that the message print, the error found is:
<13>Mar  3 12:22:59 google-dataproc-startup[15744]: <13>Mar  3 12:22:59 post-hdfs-activate-component-ranger[15786]: ERROR: Error CREATEing SolrCore 'ranger_audits': Unable to create core [ranger_audits] Caused by: Java heap space
<13>Mar  3 12:22:59 google-dataproc-startup[15744]: <13>Mar  3 12:22:59 post-hdfs-activate-component-ranger[15786]:
<13>Mar  3 12:22:59 google-dataproc-startup[15744]: <13>Mar  3 12:22:59 post-hdfs-activate-component-ranger[15786]: + exit_code=1
<13>Mar  3 12:22:59 google-dataproc-startup[15744]: <13>Mar  3 12:22:59 post-hdfs-activate-component-ranger[15786]: + [[ 1 -ne 0 ]]
<13>Mar  3 12:22:59 google-dataproc-startup[15744]: <13>Mar  3 12:22:59 post-hdfs-activate-component-ranger[15786]: + echo 1
<13>Mar  3 12:22:59 google-dataproc-startup[15744]: <13>Mar  3 12:22:59 post-hdfs-activate-component-ranger[15786]: + log_and_fail ranger 'Component ranger failed to activate post-hdfs'
The creation command that I ran is:
gcloud dataproc clusters create cluster_name \
  --bucket cluster_bucket \
  --region us-central1 \
  --subnet subnet_dataproc \
  --zone us-central1-c \
  --master-machine-type n1-standard-8 \
  --master-boot-disk-size 500 \
  --num-workers 2 \
  --worker-machine-type n1-standard-8 \
  --worker-boot-disk-size 1000 \
  --image-version 2.0.29-debian10 \
  --optional-components ZEPPELIN,RANGER,SOLR \
  --autoscaling-policy=autoscale_rule \
  --properties="dataproc:ranger.kms.key.uri=projects/gcp-project/locations/global/keyRings/kbs-dataproc-keyring/cryptoKeys/kbs-dataproc-key,dataproc:ranger.admin.password.uri=gs://cluster_bucket/kerberos-root-principal-password.encrypted,hive:hive.metastore.warehouse.dir=gs://cluster_bucket/user/hive/warehouse,dataproc:solr.gcs.path=gs://cluster_bucket/solr2,dataproc:ranger.cloud-sql.instance.connection.name=gcp-project:us-central1:ranger-metadata,dataproc:ranger.cloud-sql.root.password.uri=gs://cluster_bucket/ranger-root-mysql-password.encrypted" \
  --kerberos-root-principal-password-uri=gs://cluster_bucket/kerberos-root-principal-password.encrypted \
  --kerberos-kms-key=projects/gcp-project/locations/global/keyRings/kbs-dataproc-keyring/cryptoKeys/kbs-dataproc-key \
  --project gcp-project \
  --enable-component-gateway \
  --initialization-actions gs://goog-dataproc-initialization-actions-us-central1/cloud-sql-proxy/cloud-sql-proxy.sh,gs://cluster_bucket/hue.sh,gs://goog-dataproc-initialization-actions-us-central1/livy/livy.sh,gs://goog-dataproc-initialization-actions-us-central1/sqoop/sqoop.sh \
  --metadata livy-timeout-session='4h' \
  --metadata "hive-metastore-instance=gcp-project:us-central1:hive-metastore" \
  --metadata "kms-key-uri=projects/gcp-project/locations/global/keyRings/kbs-dataproc-keyring/cryptoKeys/kbs-dataproc-key" \
  --metadata "db-admin-password-uri=gs://cluster_bucket/hive-root-mysql-password.encrypted" \
  --metadata "db-hive-password-uri=gs://cluster_bucket/hive-mysql-password.encrypted" \
  --scopes=default,sql-admin
I know that it's something related to Ranger/Solr setup, but I don't know how to increase this heap size without creating an alternative initialization script or creating a custom machine image. If you have any idea of how to solve this or need more information about my setup please let me know.