Sorry for horrible question title but here is my scenario
- I have a pyspark databricks notebook in which I am loading other notebooks.
- One of this notebooks is setting some redshift configuration for reading data from redshift(Some temp S3 buckets). I cannot change any of this configuration.
- Under this configuration, both of this returns
True. This is useful in step number 5
sc._jsc.hadoopConfiguration().get("fs.s3n.awsAccessKeyId") == None sc._jsc.hadoopConfiguration().get("fs.s3n.awsSecretAccessKey") == None
- I've a apache spark model which I need to store to my S3 bucket which is different bucket than configured for redshift
- I am pickling other objects and storing into AWS using boto3 and It is working properly but I don't think we can pickle apache models like other objects. So I've to use model's save method with S3 url and for that I am setting aws credentials like this and this works (if no one in same cluster is not messing with AWS configurations).
sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", AWS_ACCESS_KEY_ID) sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", AWS_SECRET_ACCESS_KEY)
- After I save this model, I also need to read other data from redshift and here it is failing with following error. What I think is that redshift's configuration of S3 is changed with above code.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1844.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1844.0 (TID 63816, 10.0.63.188, executor 3): com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 3219CD268DEE5F53; S3 Extended Request ID: rZ5/zi2B+AsGuKT0iW1ATUyh9xw7YAt9RULoE33WxTaHWUWqHzi1+0sRMumxnnNgTvNED30Nj4o=), S3 Extended Request ID: rZ5/zi2B+AsGuKT0iW1ATUyh9xw7YAt9RULoE33WxTaHWUWqHzi1+0sRMumxnnNgTvNED30Nj4o=
Now my question is why I am not able to read data again. How can I reset redshift's S3 configuration the way it was before setting explicitly after saving model into S3.
What I also don't understand is, initially aws values were None and when I try to reset with None on my own it returns an error saying
The value of property fs.s3n.awsAccessKeyId must not be null
Right now I am thinking workaround in which I will save model locally on databricks and then will make zip of it and upload it to S3 but still this is just a patch. I would like to do it in proper manner.
Sorry for using quote box for code because it was not working for multiline code for some reason
Thank you in Advance!!!