When storing Goo data in JSON format usually there is a lot of redundancy.
Many Geo items, i.e. those from photo EXIF have same or similar values.
So they "should" be very compress-able.
When saved in separate, small JSON files, standard ZIP is very ineffective.
Example
2,482 Geo JSON Files
Size: 4.15 MB (4,355,916 bytes)
Size on disk: 9.69 MB (10,166,272 bytes)
(Windows) zip: 2.17 MB file
So while total storage on disk is about 5 times smaller, effective size reduction is about 2x
When using 7-zip program, the result file is AMAZINGLY small: 7z 100 KB file
That is 20 times better than Zip, and 100 times smaller than original!
Now, having many small files is not optimal to begin with, so when they are "merged" to a single
JSON file (object using file name as key), the size of such single JSON file is 3.37 MB
(Windows)
Zip of this single JSON file is now very small,
113 KBWhile 7-zip .7z of single JSON file is phenomenal
80KB
7zip wins!
The main difference is, besides higher compression by 7zip,
that 7zip is apparently using same shared "dictionary" for segments of files,
while zip is likely compressing each separately.
Optimally the individual files should be possible to add to archive and extract one by one.
That way the archive could effectively be used as a simple "database" for compressed files.
i.e. to add a JSON file to an archive, would do this:
7z a -t7z archive.7z newfile.json
7z e archive.7z -o [outputdir] file.json
put "&" special character before 7z command
& "C:\Program Files\7-Zip\7z.exe" a -t7z geo.7z geo\20230818_153424.jpg.json
using from a node.js program:
7-Zip precompiled binaries.
7zip - npma lite-version of
7zip, ≈2.4MB.
var _7z = require('7zip')['7z']
var task = spawn(_7z, ['x', 'somefile.7z', '-y'])
The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compression. It has been under development since either 1996 or 1998 by Igor Pavlov[1] and was first used in the 7z format of the 7-Zip archiver. This algorithm uses a dictionary compression scheme somewhat similar to the LZ77 algorithm published by Abraham Lempel and Jacob Ziv in 1977 and features a high compression ratio (generally higher than bzip2)[2][3] and a variable compression-dictionary size (up to 4 GB),[4] while still maintaining decompression speed similar to other commonly used compression algorithms