Support Article
Characters decoded incorrectly when email is sent as HTML
Summary
On sending an email as HTML, the content does not contain special characters for a few email IDs. This occurs despite setting UTF-8 charset for decoding.
Error Messages
Not Applicable
Steps to Reproduce
- Send an email to Gmail from Outlook.
- Wait for a minute to receive a notification on the case created.
- Search for the case created in the Pega instance.
ROOT CAUSE
The setting in Outlook for Windows has the default encoding for outgoing messages as 'ISO_8859_1'. As a result, the HTML source has the ‘ISO-8859-1’ charset.
Setting the correct content type is important for email accessibility.
UTF-8 is a standard encoding which ensures that all the characters are decoded correctly, especially non-Latin characters.
While, ISO-8859-1 (this is the default on Outlook) only includes Latin-based languages.
Pega application uses UTF-8 for decoding. Therefore, the content must also be encoded with UTF-8.
Resolution
Perform the following local-change:
Modify the code in 'pyExtractHtmlFromAttachment' activity's Step 3 instead of changing the Outlook's default encoding settings (from 'ISO_8859_1' to 'UTF-8'):
String inputStream = htmlBase64;
byte[] bytes = org.apache.commons.codec.binary.Base64.decodeBase64(inputStream);
java.io.InputStream is = new java.io.ByteArrayInputStream(bytes);
java.nio.charset.Charset charset = java.nio.charset.StandardCharsets.UTF_8;
StringBuilder stringBuilder = new StringBuilder();
String line = null;
java.io.InputStreamReader reader = new java.io.InputStreamReader(is, charset);
try (java.io.BufferedReader bufferedReader = new java.io.BufferedReader(reader)) {
while ((line = bufferedReader.readLine()) != null) {
stringBuilder.append(line);
}
}catch(Exception ex){
oLog.error("Error while reading file", ex);
}
FileString = stringBuilder.toString();
oLog.debug("Html string: " + FileString);
//-------------------------------------------
org.jsoup.nodes.Document doc = org.jsoup.Jsoup.parse(FileString);
//change all the anchor tags to have parameter as _blank
org.jsoup.select.Elements elements = doc.select("meta");
if(elements != null && !elements.isEmpty()){
String charsetAttr = elements.get(0).attr("content");
if(!charsetAttr.toUpperCase().contains("UTF-8")){
stringBuilder = new StringBuilder();
charset = java.nio.charset.StandardCharsets.ISO_8859_1;
is = new java.io.ByteArrayInputStream(bytes);
reader = new java.io.InputStreamReader(is, charset);
try (java.io.BufferedReader bufferedReader = new java.io.BufferedReader(reader)) {
while ((line = bufferedReader.readLine()) != null) {
stringBuilder.append(line);
}
}catch(Exception ex){
oLog.error("Error while reading file", ex);
}
FileString = stringBuilder.toString();
}
}
Published August 15, 2019 - Updated December 2, 2021
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.