Skip to main content

This content has been archived and is no longer being updated. Links may not function; however, this content may be relevant to outdated versions of the product.

Support Article

NPE executing Data Flow batch run with HDFS data set

SA-30438

Summary



Data Flow batch execution run fails with errors. The Data Flow has HDFS dataset as source.


Error Messages



Error on node [some node id] 
com.pega.dsm.dnode.api.dataflow.StageException: Exception in stage: <some stage>
at com.pega.dsm.dnode.api.dataflow.DataFlowStage$StageOutputSubscriber.onError(DataFlowStage.java:394) 
at com.pega.dsm.dnode.api.dataflow.DataFlowStage$StageInputSubscriber.onError(DataFlowStage.java:286) 
at com.pega.dsm.dnode.impl.stream.DataObservableImpl$SafeDataSubscriber.onError(DataObservableImpl.java:287) 
at com.pega.dsm.dnode.impl.stream.DataObservableImpl$SafeDataSubscriber.subscribe(DataObservableImpl.java:325) 
at com.pega.dsm.dnode.impl.stream.DataObservableImpl.subscribe(DataObservableImpl.java:52) 
at com.pega.dsm.dnode.api.dataflow.DataFlow$2.run(DataFlow.java:269) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at com.pega.dsm.dnode.impl.prpc.PrpcThreadFactory$PrpcThread.run(PrpcThreadFactory.java:81) 
Caused by: java.lang.NullPointerException 
at com.pega.decision.util.csv.CSVTokenizer.nextToken(CSVTokenizer.java:280) 
at com.pega.bigdata.dataset.hdfs.parsers.CSVToClipboardPageParser.parse(CSVToClipboardPageParser.java:70) 
at com.pega.bigdata.dataset.hdfs.parsers.CSVToClipboardPageParser.parse(CSVToClipboardPageParser.java:28) 
at com.pega.bigdata.dataset.hdfs.HDFSFileClient$PartitionRowsIterator.next(HDFSFileClient.java:970) 
at com.pega.bigdata.dataset.hdfs.HDFSBrowseAllOperation.propagateResultsToSubscriber(HDFSBrowseAllOperation.java:58) 
at com.pega.bigdata.dataset.hdfs.HDFSBrowseByPartitionOperation$1.emit(HDFSBrowseByPartitionOperation.java:63) 
at com.pega.dsm.dnode.impl.stream.DataObservableImpl$SafeDataSubscriber.subscribe(DataObservableImpl.java:320) 
... 5 more 


Steps to Reproduce

  1. Create a data flow with HDFS data set as source.
  2. Use CSV option for the HDFS file.
  3. Designer Studio - Decisioning- Decisions - DataFlows - Actions - Run.


Root Cause



A defect in Pegasystems’ code or rules, multithreading and concurrency issues with CSVTokenizer api.

Resolution



Apply HFix-30260.

 

Published November 24, 2016 - Updated October 8, 2020

Was this useful?

0% found this useful

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Community has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice
Contact us