-
Notifications
You must be signed in to change notification settings - Fork 68
Description
I'm using the Spark Salesforce connector 2.12 v1.1.4 to read data from my AWS Glue Job.
In order for it to work I followed the official guide on AWS where it says that I have to add the following dependencies to my job for the connector to work:
- force-partner-api-40.0.0.jar
- force-wsc-40.0.0.jar
- salesforce-wave-api-1.0.9.jar
- spark-salesforce_2.11-1.1.1.jar
I decided to use the latest version of each dependencies, so this is my configuration now:
- force-partner-api-60.0.0.jar
- force-wsc-60.0.0.jar
- salesforce-wave-api-1.0.10.jar
- spark-salesforce_2.12-1.1.4.jar
I'm also using bulk queries, but I'm getting the following error: com.univocity.parsers.common.TextParsingException: Length of parsed input (4097) exceeds the maximum number of characters defined in your parser settings (4096).
I went through the documentation and saw that there is a parameter called maxCharsPerColumn which should fix this problem, so I changed my read instruction to do the following:
df = spark.read.format("com.springml.spark.salesforce").option("soql",soql).option("username", "xxxxxx").option("password", "yyyyyy").option("login","zzzzzz").option("version", "52.0").option("bulk","true").option("maxCharsPerColumn", "8192").option("sfObject", data_source).load()The result doesn't change at all.
Also from the logs I can see the CsvParserSettings passed from the connector, and I can see that the max chars per column is still 4096:
Parser Configuration: CsvParserSettings:
Auto configuration enabled=true
Auto-closing enabled=true
Autodetect column delimiter=false
Autodetect quotes=false
Column reordering enabled=true
Delimiters for detection=null
Empty value=null
Escape unquoted values=false
Header extraction enabled=null
Headers=null
Ignore leading whitespaces=true
Ignore leading whitespaces in quotes=false
Ignore trailing whitespaces=true
Ignore trailing whitespaces in quotes=false
Input buffer size=1048576
Input reading on separate thread=true
Keep escape sequences=false
Keep quotes=false
Length of content displayed on error=-1
Line separator detection enabled=true
Maximum number of characters per column=4096
Maximum number of columns=512
Normalize escaped line separators=true
Null value=null
Number of records to read=all
Processor=none
Restricting data in exceptions=false
RowProcessor error handler=null
Selected fields=none
Skip bits as whitespace=true
Skip empty lines=true
Unescaped quote handling=nullFormat configuration:
CsvFormat:
Comment character=#
Field delimiter=,
Line separator (normalized)=
Line separator sequence=\n
Quote character="
Quote escape character="
Quote escape escape character=null
Is there anything wrong with my code?