Support Article

NPE executing Data Flow batch run with HDFS data set



Data Flow batch execution run fails with errors. The Data Flow has HDFS dataset as source.

Error Messages

Error on node [some node id] 
com.pega.dsm.dnode.api.dataflow.StageException: Exception in stage: <some stage>
at com.pega.dsm.dnode.api.dataflow.DataFlowStage$StageOutputSubscriber.onError( 
at com.pega.dsm.dnode.api.dataflow.DataFlowStage$StageInputSubscriber.onError( 
at com.pega.dsm.dnode.api.dataflow.DataFlow$ 
at java.util.concurrent.ThreadPoolExecutor.runWorker( 
at java.util.concurrent.ThreadPoolExecutor$ 
at com.pega.dsm.dnode.impl.prpc.PrpcThreadFactory$ 
Caused by: java.lang.NullPointerException 
at com.pega.decision.util.csv.CSVTokenizer.nextToken( 
at com.pega.bigdata.dataset.hdfs.parsers.CSVToClipboardPageParser.parse( 
at com.pega.bigdata.dataset.hdfs.parsers.CSVToClipboardPageParser.parse( 
at com.pega.bigdata.dataset.hdfs.HDFSFileClient$ 
at com.pega.bigdata.dataset.hdfs.HDFSBrowseAllOperation.propagateResultsToSubscriber( 
at com.pega.bigdata.dataset.hdfs.HDFSBrowseByPartitionOperation$1.emit( 
... 5 more 

Steps to Reproduce

  1. Create a data flow with HDFS data set as source.
  2. Use CSV option for the HDFS file.
  3. Designer Studio - Decisioning- Decisions - DataFlows - Actions - Run.

Root Cause

A defect in Pegasystems’ code or rules, multithreading and concurrency issues with CSVTokenizer api.


Apply HFix-30260.


Published November 16, 2016 - Updated November 23, 2016

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.