Scala
google
JSON
Google-Scholar

Parse Recursively and Flatten Json in Scala

These days I have been working on retrieving json format data via REST API, parsing the json data, connecting to SQL and writing the json data into SQL table. A few days ago, one of my team members said he needs all data in the json format REST API ouput to be into SQL table. It means, I need to learn how to recursively parse json data and flatten it.

Let's say I have the following json data.

{
  "metadata": {
    "id": "1234",
    "type": "file",
    "length": 395
  }
}

Then, what I needed to do is change this json into like:

{
  "metadata.id": "1234",
  "metadata.type": "file",
  "metadata.length": 395
}

Why I like google ( and everybody does) is when I search for something at google, there is always someone who had already been in the same or similar situation and solved the issue in someway. This time, it is the same. I found the post https://stackoverflow.com/questions/24273433/play-scala-how-to-flatten-a-json-object.

So now what I need to do is to investigate if his/her situation is really same or similar to mine. If so, I need to find out how to apply the solution to my situation. This time, I guess I was lucky. Reading thourgh the post briefly, and it seemed to work just by copy and paste. So I pasted the following code and watched how it would go.

def flatten(js: JsValue, prefix: String = ""): JsObject = js.as[JsObject].fields.foldLeft(Json.obj()) {
    case (acc, (k, v: JsObject)) => {
        if(prefix.isEmpty) acc.deepMerge(flatten(v, k))
        else acc.deepMerge(flatten(v, s"$prefix.$k"))
    }
    case (acc, (k, v)) => {
        if(prefix.isEmpty) acc + (k -> v)
        else acc + (s"$prefix.$k" -> v)
    }
}

As a result, it didn't work unfortunately. Is it because of some flaws in the code? Do I have to modify some parts of the code? But in fact, the cause was ARRAY [] type in the json input.
The following is the json input I used for this test.

{
        "ip": "192.168.242.128",
        "user": "admin",
        "password": "Xtrem10",
        "clusters": [
        {
            "cluster": {
                "name": "xbrick113",
                "logical-space-in-use": "true"
                }
        },
***** omitted *****        
        ]
}

So now how do I fix the code to apply it to my situation? But luckily, just reading down the post, I found that someone had already solved this issue as well. For the case json input has ARRAY [] type data. You need to implement the following code.

def flatten(js: JsValue, prefix: String = ""): JsObject = js.as[JsObject].fields.foldLeft(Json.obj()) {
  case (acc, (k, v: JsObject)) => {
    val nk = if(prefix.isEmpty) k else s"$prefix.$k"
    acc.deepMerge(flatten(v, nk))
  }
  case (acc, (k, v: JsArray)) => {
    val nk = if(prefix.isEmpty) k else s"$prefix.$k"
    val arr = flattenArray(v, nk).foldLeft(Json.obj())(_++_)
    acc.deepMerge(arr)
  }
  case (acc, (k, v)) => {
    val nk = if(prefix.isEmpty) k else s"$prefix.$k"
    acc + (nk -> v)
  }
}

def flattenArray(a: JsArray, k: String = ""): Seq[JsObject] = {
  flattenSeq(a.value.zipWithIndex.map {
    case (o: JsObject, i: Int) =>
      flatten(o, s"$k[$i]")
    case (o: JsArray, i: Int) =>
      flattenArray(o, s"$k[$i]")
    case a =>
      Json.obj(s"$k[${a._2}]" -> a._1)
  })
}

def flattenSeq(s: Seq[Any], b: Seq[JsObject] = Seq()): Seq[JsObject] = {
  s.foldLeft[Seq[JsObject]](b){
    case (acc, v: JsObject) =>
      acc:+v
    case (acc, v: Seq[Any]) =>
      flattenSeq(v, acc)
  }
}

Thanks to the flatten function, my program returned what I expeced like below.
(Sorry for the format. I am just too lazy to format the result.)

{"ip":"192.168.242.128","user":"admin","password":"Xtrem10","clusters[0].cluster.volumes.iops":"true","clusters[0].cluster.wr-iops":"true","clusters[0].cluster.volumes.avg-latency":"true","clusters[1].cluster.name":"xbrick113","clusters[0].cluster.snapshots.logical-space-in-use":"true","clusters[1].cluster.avg-latency":"true","clusters[1].cluster.volumes.rd-latency":"true","clusters[1].cluster.volumes.wr-latency":"true","clusters[1].cluster.logical-space-in-use":"true","clusters[0].cluster.volumes.wr-latency":"true","clusters[1].cluster.rd-iops":"true","clusters[1].cluster.snapshots.logical-space-in-use":"true","clusters[0].cluster.logical-space-in-use":"true","clusters[0].cluster.ud-ssd-space-in-use":"true","clusters[1].cluster.data-reduction-ratio":"true","clusters[0].cluster.name":"xbrick113","clusters[1].cluster.volumes.avg-latency":"true","clusters[0].cluster.avg-latency":"true","clusters[0].cluster.volumes.rd-latency":"true","clusters[0].cluster.wr-latency":"true","clusters[1].cluster.wr-latency":"true","clusters[1].cluster.wr-iops":"true","clusters[1].cluster.volumes.iops":"true","clusters[1].cluster.ud-ssd-space-in-use":"true","clusters[0].cluster.rd-iops":"true","clusters[0].cluster.xenvs.cpu-usage":"true","clusters[1].cluster.iops":"true","clusters[0].cluster.rd-latency":"true","clusters[1].cluster.xenvs.cpu-usage":"true","clusters[0].cluster.iops":"true","clusters[0].cluster.data-reduction-ratio":"true","clusters[1].cluster.rd-latency":"true"}

This is all about how to parse recursively and flatten json. Most of what I did is just googling, copy&paset and a very small troubleshooting. Even though it has nothing to do with this post, this experience reminded me of Google Scholar. I like the quote on its top page. I feel like I am always helped by giants.

image