Hi, thank you for your reply.
s3 loader was actually the first component in the pipeline that i’ve upgraded from version 0.18 to 2.0.0rc2 - without giving too much thought about compatibility i’ve pushed the change to our
production environments - without problems.
recently I’ve upgraded the
staging and although it’s deployed and running the downstream RDB loader (which is only in
staging) is failing with this error:
INFO Client: Deleted staging directory hdfs://ip-10-5-215-178.ec2.internal:8020/user/hadoop/.sparkStaging/application_1631198412915_0003 Exception in thread "main" org.apache.spark.SparkException: Application application_1631198412915_0003 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 21/09/09 15:56:57 INFO ShutdownHookManager: Shutdown hook called
I’m not too familiar with debugging EMRs so not sure you can make out the error from the above log.
Found this error in the EMR output:
Data loading error [Amazon](500310) Invalid operation: Cannot COPY into nonexistent table nl_basjes_yauaa_context_1; ERROR: Data loading error [Amazon](500310) Invalid operation: Cannot COPY into nonexistent table nl_basjes_yauaa_context_1; Following steps completed: [Discover]
I guess this error explains what’s missing in my Redshift schema, but not sure how to create the new schema? also, how can i make sure that all my other enrichments are supported by my Redshift Cluster.
In regards to the upgrade steps (going to 0.18.2 then to v1) can’t I simply deploy the newest version “alongside” the old (current) process and then I’ll just change the s3-loader output bucket (to the new RDB input bucket)?