Sorry for horrible question title but here is my scenario

  1. I have a pyspark databricks notebook in which I am loading other notebooks.
  2. One of this notebooks is setting some redshift configuration for reading data from redshift(Some temp S3 buckets). I cannot change any of this configuration.
  3. Under this configuration, both of this returns True. This is useful in step number 5

sc._jsc.hadoopConfiguration().get("fs.s3n.awsAccessKeyId") == None sc._jsc.hadoopConfiguration().get("fs.s3n.awsSecretAccessKey") == None

  1. I've a apache spark model which I need to store to my S3 bucket which is different bucket than configured for redshift
  2. I am pickling other objects and storing into AWS using boto3 and It is working properly but I don't think we can pickle apache models like other objects. So I've to use model's save method with S3 url and for that I am setting aws credentials like this and this works (if no one in same cluster is not messing with AWS configurations).

sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", AWS_ACCESS_KEY_ID) sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", AWS_SECRET_ACCESS_KEY)

  1. After I save this model, I also need to read other data from redshift and here it is failing with following error. What I think is that redshift's configuration of S3 is changed with above code.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1844.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1844.0 (TID 63816, 10.0.63.188, executor 3): com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 3219CD268DEE5F53; S3 Extended Request ID: rZ5/zi2B+AsGuKT0iW1ATUyh9xw7YAt9RULoE33WxTaHWUWqHzi1+0sRMumxnnNgTvNED30Nj4o=), S3 Extended Request ID: rZ5/zi2B+AsGuKT0iW1ATUyh9xw7YAt9RULoE33WxTaHWUWqHzi1+0sRMumxnnNgTvNED30Nj4o=

Now my question is why I am not able to read data again. How can I reset redshift's S3 configuration the way it was before setting explicitly after saving model into S3.

What I also don't understand is, initially aws values were None and when I try to reset with None on my own it returns an error saying

The value of property fs.s3n.awsAccessKeyId must not be null

Right now I am thinking workaround in which I will save model locally on databricks and then will make zip of it and upload it to S3 but still this is just a patch. I would like to do it in proper manner.

Sorry for using quote box for code because it was not working for multiline code for some reason

Thank you in Advance!!!

re-import the notebook that sets up the redshift connectivity. Or find where it is set and copy that code.

If you don't have privileges to modify the notebooks you are importing then I'd guess you don't have privileges to set roles on the cluster. If you use roles then you don't need aws keys.

Your Answer

 

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Not the answer you're looking for? Browse other questions tagged or ask your own question.