DraganSr: Geo JSON compression: 7z vs Zip

Thursday, October 26, 2023

Geo JSON compression: 7z vs Zip

When storing Goo data in JSON format usually there is a lot of redundancy.
Many Geo items, i.e. those from photo EXIF have same or similar values.
So they "should" be very compress-able.
When saved in separate, small JSON files, standard ZIP is very ineffective.
Example

2,482 Geo JSON Files
Size: 4.15 MB (4,355,916 bytes)
Size on disk: 9.69 MB (10,166,272 bytes)

(Windows) zip: 2.17 MB file

So while total storage on disk is about 5 times smaller, effective size reduction is about 2x

When using 7-zip program, the result file is AMAZINGLY small: 7z 100 KB file
That is 20 times better than Zip, and 100 times smaller than original!

Now, having many small files is not optimal to begin with, so when they are "merged" to a single
JSON file (object using file name as key), the size of such single JSON file is 3.37 MB

(Windows) Zip of this single JSON file is now very small, 113 KB

While 7-zip .7z of single JSON file is phenomenal 80KB

7zip wins!

7zip-dev/7zip: https://7zip.dev is the fork of the 7-Zip archiver originally made by Igor Pavlov. All development takes place on GitLab. This repository is the read-only mirror of https://gitlab.com/7zip.dev/7zip @GitHub

Download 7zip

The main difference is, besides higher compression by 7zip,
that 7zip is apparently using same shared "dictionary" for segments of files,
while zip is likely compressing each separately.

Optimally the individual files should be possible to add to archive and extract one by one.
That way the archive could effectively be used as a simple "database" for compressed files.

7 zip - 7-Zip command line add only specific file to archive? - Super User

i.e. to add a JSON file to an archive, would do this:

7z a -t7z archive.7z newfile.json

windows - Extract a certain file from an archive with 7-Zip from the command line - Super User

7z e archive.7z -o [outputdir] file.json

7zip - Running 7-Zip from within a Powershell script - Stack Overflow

put "&" special character before 7z command

& "C:\Program Files\7-Zip\7z.exe" a -t7z geo.7z geo\20230818_153424.jpg.json

using from a node.js program:

7zip-bin - npm

7-Zip precompiled binaries.

7zip - npm
a lite-version of 7zip, ≈2.4MB.

var _7z = require('7zip')['7z']
var task = spawn(_7z, ['x', 'somefile.7z', '-y'])

node-7z - npm

Lempel–Ziv–Markov chain algorithm - Wikipedia

The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compression. It has been under development since either 1996 or 1998 by Igor Pavlov [1] and was first used in the 7z format of the 7-Zip archiver. This algorithm uses a dictionary compression scheme somewhat similar to the LZ77 algorithm published by Abraham Lempel and Jacob Ziv in 1977 and features a high compression ratio (generally higher than bzip2)[2][3] and a variable compression-dictionary size (up to 4 GB),[4] while still maintaining decompression speed similar to other commonly used compression algorithms

Thursday, October 26, 2023

Geo JSON compression: 7z vs Zip

No comments: