Support Article
NPE executing Data Flow batch run with HDFS data set
SA-30438
Summary
Data Flow batch execution run fails with errors. The Data Flow has HDFS dataset as source.
Error Messages
Error on node [some node id]
com.pega.dsm.dnode.api.dataflow.StageException: Exception in stage: <some stage>
at com.pega.dsm.dnode.api.dataflow.DataFlowStage$StageOutputSubscriber.onError(DataFlowStage.java:394)
at com.pega.dsm.dnode.api.dataflow.DataFlowStage$StageInputSubscriber.onError(DataFlowStage.java:286)
at com.pega.dsm.dnode.impl.stream.DataObservableImpl$SafeDataSubscriber.onError(DataObservableImpl.java:287)
at com.pega.dsm.dnode.impl.stream.DataObservableImpl$SafeDataSubscriber.subscribe(DataObservableImpl.java:325)
at com.pega.dsm.dnode.impl.stream.DataObservableImpl.subscribe(DataObservableImpl.java:52)
at com.pega.dsm.dnode.api.dataflow.DataFlow$2.run(DataFlow.java:269)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at com.pega.dsm.dnode.impl.prpc.PrpcThreadFactory$PrpcThread.run(PrpcThreadFactory.java:81)
Caused by: java.lang.NullPointerException
at com.pega.decision.util.csv.CSVTokenizer.nextToken(CSVTokenizer.java:280)
at com.pega.bigdata.dataset.hdfs.parsers.CSVToClipboardPageParser.parse(CSVToClipboardPageParser.java:70)
at com.pega.bigdata.dataset.hdfs.parsers.CSVToClipboardPageParser.parse(CSVToClipboardPageParser.java:28)
at com.pega.bigdata.dataset.hdfs.HDFSFileClient$PartitionRowsIterator.next(HDFSFileClient.java:970)
at com.pega.bigdata.dataset.hdfs.HDFSBrowseAllOperation.propagateResultsToSubscriber(HDFSBrowseAllOperation.java:58)
at com.pega.bigdata.dataset.hdfs.HDFSBrowseByPartitionOperation$1.emit(HDFSBrowseByPartitionOperation.java:63)
at com.pega.dsm.dnode.impl.stream.DataObservableImpl$SafeDataSubscriber.subscribe(DataObservableImpl.java:320)
... 5 more
Steps to Reproduce
- Create a data flow with HDFS data set as source.
- Use CSV option for the HDFS file.
- Designer Studio - Decisioning- Decisions - DataFlows - Actions - Run.
Root Cause
A defect in Pegasystems’ code or rules, multithreading and concurrency issues with CSVTokenizer api.
Resolution
Apply HFix-30260.
Published November 24, 2016 - Updated October 8, 2020
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.