Project

General

Profile

Actions

Task #12227

closed

Encoding issue on Dataminer proto 5

Added by Gianpaolo Coro almost 7 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
_InfraScience Systems Engineer
Category:
High-Throughput-Computing
Target version:
Start date:
Jul 24, 2018
Due date:
% Done:

100%

Estimated time:
Infrastructure:
Production

Description

I have a difficult issue on one of the prototype Dataminers for which I need help:

There are several algorithms of text analysis that read input files in UTF-8 encoding and write json files in UTF-8. They run in R.

Only on dataminer5-proto, the UTF-8 file write crashes when there is a stressed character in the text (e.g. "oggi รจ una bella giornata"). The reported error is generic:

Warning message:
In writeLines(json, fileConn) : invalid char string in output conversion

writeLines is a native R function and is invoked correctly in the code:

fileConn<-file(outjsonfile,encoding = "UTF-8")
writeLines(json, fileConn)
close(fileConn)

Input files are plain text files read as UTF-8 files using bytes:

inputFile <- file(inputfile, encoding="UTF-8")
filetext<-readChar(inputFile, file.info(inputfile)$size, useBytes = T)

The only package used by the algorithms is "jsonlite".
I have checked the machine and R locales but they seem OK. Perhaps there is some other difference in the locales I cannot see.
From sample tests, this issue occurs only on dataminer5-proto.


Files

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 8.91 MB)