Saturday, May 11, 2019

data: UBER vs JSON => MessagePack

Putting the Squeeze on Trip Data

pareto frontParetoFront
Key conclusions:
  • Simply compressing JSON with zlib would yield a reasonable tradeoff in size and speed. The result would be just a little bigger, but execution was much faster than using BZ2 on JSON.
  • Going with IDL-based protocols, Thrift and Protocol Buffers compressed with zlib or Snappy would give us the best gain in size and/or speed.

...settled on MessagePack with zlib (instead of plain JSON)

A 1 TB disk will now last almost a year (347 days),
compared to a month (30 days) without compression
.

MessagePack: It's like JSON. but fast and small.

"MessagePack is an efficient binary serialization format.
It lets you exchange data among multiple languages like JSON.
But it's faster and smaller.
Small integers are encoded into a single byte,
and typical short strings require only one extra byte in addition to the strings themselves."

Designing Schemaless, Uber Engineering's Scalable Datastore Using MySQL | Uber Engineering Blog




No comments: