Support Article

NPE executing Data Flow batch run with HDFS data set

SA-30438

Summary



Data Flow batch execution run fails with errors. The Data Flow has HDFS dataset as source.


Error Messages



Error on node [some node id] 
com.pega.dsm.dnode.api.dataflow.StageException: Exception in stage: <some stage>
at com.pega.dsm.dnode.api.dataflow.DataFlowStage$StageOutputSubscriber.onError(DataFlowStage.java:394) 
at com.pega.dsm.dnode.api.dataflow.DataFlowStage$StageInputSubscriber.onError(DataFlowStage.java:286) 
at com.pega.dsm.dnode.impl.stream.DataObservableImpl$SafeDataSubscriber.onError(DataObservableImpl.java:287) 
at com.pega.dsm.dnode.impl.stream.DataObservableImpl$SafeDataSubscriber.subscribe(DataObservableImpl.java:325) 
at com.pega.dsm.dnode.impl.stream.DataObservableImpl.subscribe(DataObservableImpl.java:52) 
at com.pega.dsm.dnode.api.dataflow.DataFlow$2.run(DataFlow.java:269) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at com.pega.dsm.dnode.impl.prpc.PrpcThreadFactory$PrpcThread.run(PrpcThreadFactory.java:81) 
Caused by: java.lang.NullPointerException 
at com.pega.decision.util.csv.CSVTokenizer.nextToken(CSVTokenizer.java:280) 
at com.pega.bigdata.dataset.hdfs.parsers.CSVToClipboardPageParser.parse(CSVToClipboardPageParser.java:70) 
at com.pega.bigdata.dataset.hdfs.parsers.CSVToClipboardPageParser.parse(CSVToClipboardPageParser.java:28) 
at com.pega.bigdata.dataset.hdfs.HDFSFileClient$PartitionRowsIterator.next(HDFSFileClient.java:970) 
at com.pega.bigdata.dataset.hdfs.HDFSBrowseAllOperation.propagateResultsToSubscriber(HDFSBrowseAllOperation.java:58) 
at com.pega.bigdata.dataset.hdfs.HDFSBrowseByPartitionOperation$1.emit(HDFSBrowseByPartitionOperation.java:63) 
at com.pega.dsm.dnode.impl.stream.DataObservableImpl$SafeDataSubscriber.subscribe(DataObservableImpl.java:320) 
... 5 more 


Steps to Reproduce

  1. Create a data flow with HDFS data set as source.
  2. Use CSV option for the HDFS file.
  3. Designer Studio - Decisioning- Decisions - DataFlows - Actions - Run.


Root Cause



A defect in Pegasystems’ code or rules, multithreading and concurrency issues with CSVTokenizer api.

Resolution



Apply HFix-30260.

 

Published November 16, 2016 - Updated November 23, 2016

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.