The odyssey of attempting to serialize json arrays to CSV file format in Java

Increasingly we find ourselves working with json files. Json is a highly versatile format for storing data. It is considered more efficient than using XML and it is highly universal. However, json isn’t an ideal format for when you want to put data into a spreadsheet. Although some spreadsheet programs have *some* support for reading json files, it is not sufficient and it, quite frankly, sucks.

A better format for storing data that is meant to be looked at in a spreadsheet would be .CSV, since it is a very simple format and is extremely ubiquitous for that context. If you find yourself in a position where you want to convert things that are in the Json format to .CSV, consider the following.

I have a JSONArray (as defined by org.json). It is actually a json array of arrays. I want to convert it to .CSV. How do I do that?

[
  [
    "GEO_ID",
    "NAME",
    "CRRALL",
    "DRRALL",
    "CRRINT",
    "DRRINT",
    "RESP_DATE",
    "state",
    "county",
    "county subdivision"
  ],
  [
    "0600000US3611711473",
    "Butler town, Wayne County, New York",
    "51.4",
    "0.1",
    "17.7",
    "0",
    "2020-08-21",
    "36",
    "117",
    "11473"
  ],
  [
    "0600000US3611728013",
    "Galen town, Wayne County, New York",
    "59.9",
    "0.3",
    "31.8",
    "0.1",
    "2020-08-21",
    "36",
    "117",
    "28013"
  ],
  [
    "0600000US3611778102",
    "Walworth town, Wayne County, New York",
    "68.8",
    "0.2",
    "62.4",
    "0.1",
    "2020-08-21",
    "36",
    "117",
    "78102"
  ]
]

At first glance this did not seem too challenging. However, when I set about doing this, I used org.json (JSON-java) instead of using other json libraries for Java like gson or json-simple. My solution also involved the use of the Apache Commons IO library. My solution ended up being somewhat problematic because for whatever reason org.json only saw the outter array and not the inner arrays. Thus, instead of parsing the json as an array of arrays, it saw it instead as a single array with these mysterious elements within.

My workaround was to simply to process each of these unparsed inner arrays as a “row”, remove some square brackets, and then to convert it all to .CSV. This was the function that I ended-up creating:

private static void convertAndSaveAsCSV(JSONArray arrayToBeConverted, String filePath, boolean getRidOfSquareBrackets) throws IOException {
	var file = new File(filePath);
		
	System.out.println();
	ArrayList<String> rows = new ArrayList<String>();
		
	for (int i = 0 ; i < arrayToBeConverted.length(); i++ ) {
			
		String row = arrayToBeConverted.getJSONArray(i).toString(); 
			
		if (getRidOfSquareBrackets) {
			row = row.replace("[", "");
			row = row.replace("]","");
		}
			
		rows.add(row);
	}
		
	System.out.println();

	FileUtils.writeLines(file, rows);	
		
}

Simply converting the JSONArray to a string would have resulted in this output:

["GEO_ID","NAME","CRRALL","DRRALL","CRRINT","DRRINT","RESP_DATE","state","county","county subdivision"]
["0600000US3611711473","Butler town, Wayne County, New York","51.4","0.1","17.7","0","2020-08-21","36","117","11473"]
["0600000US3611728013","Galen town, Wayne County, New York","59.9","0.3","31.8","0.1","2020-08-21","36","117","28013"]
["0600000US3611778102","Walworth town, Wayne County, New York","68.8","0.2","62.4","0.1","2020-08-21","36","117","78102"]

For some reason org.json only reads the outer array and does not see the inner arrays. This is terrible for CSV because all you will get is a single row of just terribleness. This is why I ended up having the conditional for whether we want to get rid of square brackets for each “row.” This ends up solving the problem and produces a correct output.

I speculate that the reason why org.json could not read the inner elements as nested arrays was because there weren’t keys involved. If that is the reason, then the fault is on json. Either that, or org.json isn’t very good. Maybe the format or the library isn’t to blame but the data itself, since the data does not have keys. It is annoying, in either case, but at least I was able to figure out this workaround.

Leave a comment