Tuesday, July 19, 2016

data: JSON vs XML vs CSV for Azure Cloud

JSON is smaller than XML, but it is not small for data.
Old TSV/CSV is often much more efficient than JSON for data.
There is no need to repeat metadata when format is structured.
Binary protocols are faster than text based.
Google is internally using binary Protocol Buffers for all communication
So why we are still using less efficient data formats?

Windows Azure Tables: Introducing JSON - Microsoft Azure Storage Team Blog - Site Home - MSDN Blogs (2013)
"...JSON is an alternate OData payload format to (XML based) AtomPub, which significantly reduces the size of the payload and results in lower latency. To reduce the payload size even further we are providing a way to turn off the payload echo during inserts. Both of these new features are now the default behavior in the newly released Windows Azure Storage Client 3.0 Library.

<entry m:etag="W/"datetime'2013-12-03T06%3A37%3A20.9709094Z'"">
    <category term="someaccount.Customers" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
    <link rel="edit" title="Customers" href="Customers(PartitionKey='Jonathan',RowKey='Foster')" />
    <title />
      <name />
    <content type="application/xml">
        <d:Timestamp m:type="Edm.DateTime">2013-12-03T06:37:20.9709094Z</d:Timestamp>
        <d:Address>1234 SomeStreet St, Bellevue, WA 75001</d:Address>
        <d:CustomerSince m:type="Edm.DateTime">2005-01-05T00:00:00Z</d:CustomerSince>
        <d:Rating m:type="Edm.Int32">3</d:Rating>

      "Address":"1234 SomeStreet St, Bellevue, WA 75001",

"Jonathan","Foster","2013-12-03T06:45:00.7254269Z","1234 SomeStreet St, Bellevue, WA 75001"

Greatly Increase the Performance of Azure Storage CloudBlobClient - Timothy Khouri's Blog