Need to make a quick json fixes – JSONPath for rescue

From time to time I have a need to do some fixes in my json data. In a world of flat files I do this with grep/sed/awk tool chain. How to handle it for JSON? Searching for a solution I came across the JSONPath. It quite mature tool (from 2007) but I haven’t hear about it so I decided to share my experience with others. First of all you can try it without pain online: http://jsonpath.curiousconcept.com/. Full syntax is described at http://goessner.net/articles/JsonPath/ But also you can download python binding and run it from command line: $ sudo apt-get install python-jsonpath-rw$ sudo apt-get install python-setuptools$ sudo easy_install -U jsonpath After that you can use inside python or with simple cli wrapper: #!/usr/bin/pythonimport sys, json, jsonpathpath = sys.argv[1]result = jsonpath.jsonpath(json.load(sys.stdin), path)print json.dumps(result, indent=2) … you can use it in your shell e.g. for json: {"store": {"book": [ {"category": "reference","author": "Nigel Rees","title": "Sayings of the Century","price": 8.95 }, {"category": "fiction","author": "Evelyn Waugh","title": "Sword of Honour","price": 12.99 }, {"category": "fiction","author": "Herman Melville","title": "Moby Dick","isbn": "0-553-21311-3","price": 8.99 }, {"category": "fiction","author": "J. R. R. Tolkien","title": "The Lord of the Rings","isbn": "0-395-19395-8","price": 22.99 } ],"bicycle": {"color": "red","price": 19.95 } }} You can print only book nodes with price lower than 10 by: $ jsonpath '$..book[?(@.price Result: [ {"category": "reference","price": 8.95,"title": "Sayings of the Century","author": "Nigel Rees" }, {"category": "fiction","price": 8.99,"title": "Moby Dick","isbn": "0-553-21311-3","author": "Herman Melville" }] Have a nice JSON hacking!From time to time I have a need to do some fixes in my json data. In a world of flat files I do this with grep/sed/awk tool chain. How to handle it for JSON? Searching for a solution I came across the JSONPath. It quite mature tool (from 2007) but I haven’t hear about it so I decided to share my experience with others.

From time to time I have a need to do some fixes in my json data. In a world of flat files I do this with grep/sed/awk tool chain. How to handle it for JSON? Searching for a solution I came across the JSONPath. It quite mature tool (from 2007) but I haven’t hear about it so I decided to share my experience with others. First of all you can try it without pain online: http://jsonpath.curiousconcept.com/. Full syntax is described at http://goessner.net/articles/JsonPath/ But also you can download python binding and run it from command line:

$ sudo apt-get install python-jsonpath-rw
$ sudo apt-get install python-setuptools
$ sudo easy_install -U jsonpath

After that you can use inside python or with simple cli wrapper:

#!/usr/bin/python
import sys, json, jsonpath

path = sys.argv[1]

result = jsonpath.jsonpath(json.load(sys.stdin), path)
print json.dumps(result, indent=2)

… you can use it in your shell e.g. for json:

{
  "store": {
    "book": [
      {
        "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      {
        "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      },
      {
        "category": "fiction",
        "author": "Herman Melville",
        "title": "Moby Dick",
        "isbn": "0-553-21311-3",
        "price": 8.99
      },
      {
        "category": "fiction",
        "author": "J. R. R. Tolkien",
        "title": "The Lord of the Rings",
        "isbn": "0-395-19395-8",
        "price": 22.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
  }
}

You can print only book nodes with price lower than 10 by:

$ jsonpath '$..book[?(@.price < 10)]' < books.json

Result:

[
  {
    "category": "reference",
    "price": 8.95,
    "title": "Sayings of the Century",
    "author": "Nigel Rees"
  },
  {
    "category": "fiction",
    "price": 8.99,
    "title": "Moby Dick",
    "isbn": "0-553-21311-3",
    "author": "Herman Melville"
  }
]

Have a nice JSON hacking!

You May Also Like

Distributed scans with HBase

HBase is by design a columnar store, that is optimized for random reads. You just ask for a row using rowId as an identifier and you get your data instantaneously. Performing a scan on part or whole table is a completely different thing. First of all, it is sequential. Meaning it is rather slow, because it doesn't use all the RegionServers at the same time. It is implemented that way to realize the contract of Scan command - which has to return results sorted by key. So, how to do this efficiently?HBase is by design a columnar store, that is optimized for random reads. You just ask for a row using rowId as an identifier and you get your data instantaneously. Performing a scan on part or whole table is a completely different thing. First of all, it is sequential. Meaning it is rather slow, because it doesn't use all the RegionServers at the same time. It is implemented that way to realize the contract of Scan command - which has to return results sorted by key. So, how to do this efficiently?