JSON Data Type in Relational Databases

Storing & retrieving JSON data in a RDBMS

Bryan Jones : Build Engineer
Matt Thomas : Principal System Engineer - UI

Relational Database History

Database Royalty for 40 Years

  • Since 1970's

  • Relational Algebra and Set Theory

  • Storage... Memory... Data Dictionary... Query Language

Benefits

Addressed limits of: (1) Flat Files (2) Hierarchical Data and (3) Network Data

  • Duplicate Data

  • Inefficient Searching

  • Difficulty Maintaining Program Code For Data Access

Core Concepts

  • Avoid Data Anomolies

  • Normalization - Eliminate Redundant Data

  • Atomicity ... Consistency ... Isolation ... Durability

  • (E.g. Bank Account Transfer)

But Trouble in the Kingdom....

Early 2000's - Big Data Web Companies

  • Support For 10,000+ Concurrent Users

  • Low Latency Response Times

  • High Availability

  • Horizontal Scalability

NoSQL

A New Breed of Database

Data Category Description
Key Value Simple KV
Columnar Good for Analytical Processing
Graph Relationships b/w Entities
Document Embedding Joins

Documents (aka. JSON) - Our Focus

  • More Complex Data Than Key-Value

  • Sets of Key-Value Pairs

  • Stored in JSON (or XML)

What is JSON

JSON (JavaScript Object Notation) is a lightweight data-interchange format.
It is easy for humans to read and write. It is easy for machines to parse and generate.

JSON is built on two structures with string values only:
  • A collection of key/value pairs. In various languages, define by an object, dictionary, hash, etc.
  • An ordered list of values. In most languages, this is realized as an array or list.

Why?!?

Why would you use JSON in a relational DB!?!

There are many circumstances where you wish to store document style data, such as UI preferences, incoming data that is not processed locally on the backend.

Why stand up document store increasing complexity of your existing infrastructure when most modern relational databases have the ability to store JSON data.

Data Modeling - RDBMS vs. JSON

Core Difference in Design Philosophies

RDBMS + NORMALIZATION - Minimize Duplicate Data and Data Anomalies

JSON + DENORMALIZATION - Minimize the Need for Joins
(And Support Flexible Schemas)

RDBMS + JSON USE CASES

Semi-Tangible Examples

USE CASE: Shopping Cart

USE CASE: Storing Front-end Preferences

Single Page Applications often have a data stores that need to be persisted but not necessary understood by the backend API service as they are only used by the front-end.

USE CASE: API Response Consumption

Consuming 3rd party API response may not require consumption of the data immediately.

The response(s) can be stored, later extracting section(s) of the response to be consumed.

USE CASE: Trello Cards

USE CASE: Slack Reactions

USE CASE: Slack Admins .... (bad idea)

JSON Downsides

  • Extra Storage

  • Lack of Statistics

  • Heavy Updates and Inserts

  • Watch out for Lazy DB Design...

RDBMS JSON implementations

"...all men JSON implementations are created equal..."

NOT!

PostgreSQL

Version Feature(s)
9.2 JSON Validation
9.3 Extract Operators
9.4 JSONB
9.5 JSONB Modifiers
9.6 jsonb_insert()
10 Full Text Search for JSON/B

PostgreSQL

JSON JSONB (Binary)
Preserves whitespace Removes whitespace b/w keys
Preserves key order Arbitrary key order
Duplicate keys No duplicates
Heavy writes (No indexes) Heavy reads (Indexes allowed)
E.g. Log files; API responses Recommended for general use

MySQL

JSON (Binary)
Removes whitespace b/w keys
Arbitrary key order
No duplicates
Indexes allowed via virtual columns
1 Gb per field

MariaDB

LONGTEXT aka JSON
Preserves whitespace
Preserves key order
Duplicate keys
Indexes allowed via virtual columns
4 Gb per field
CHECK Constraint required to validate document on column definition

SQLite

TEXT
Preserves whitespace
Preserves key order
Duplicate keys
No enforced field limit
json() function call required to validate document on statement

Adding a JSON compatible column

PostgreSQL

ALTER TABLE users ADD contacts JSONB;

MySQL

ALTER TABLE users ADD contacts JSON;

MariaDB

ALTER TABLE users ADD contacts JSON CHECK(JSON_VALID(user));

SQLite

ALTER TABLE users ADD contacts TEXT;

Insert

PostgreSQL/MySQL/MariaDB

INSERT INTO users (user_name, contacts) VALUES ('kenny', '{ "email": "kenny@aol.com", "twitter": "@drken"}');
INSERT INTO users (user_name, contacts) VALUES ('sam', '{ "email": "sam@yahoo.com", "twitter": "@sam88", "facebook": "sam88"}');

SQLite

INSERT INTO users (user_name, contacts) VALUES ('kenny', json('{ "email": "kenny@aol.com", "twitter": "@drken"}'));

PostgreSQL

UPDATE users SET contacts = JSONB_INSERT(contacts, '{phone}', '"314-222-2222"') WHERE user_name = 'kenny';

Select

(Including returning a subsection of the document)

PostgreSQL

SELECT user_name, contacts -> 'twitter' AS twitter FROM users WHERE user_name = 'kenny';
SELECT user_name, contacts ->> 'twitter' AS twitter FROM users WHERE user_name = 'kenny';
SELECT user_name, contacts ->> 'twitter' AS twitter FROM users WHERE contacts ->> 'twitter' IS NOT NULL;

MySQL

SELECT user_name, contacts->'$.twitter' AS twitter FROM users WHERE user_name = 'kenny';
SELECT user_name, contacts->>'$.twitter' AS twitter FROM users WHERE user_name = 'kenny';
SELECT user_name, contacts->>'$.twitter' AS twitter FROM users WHERE contacts->>'$.twitter' IS NOT NULL;

MariaDB/SQLite

SELECT user_name, JSON_VALUE(contacts, '$.twitter') AS twitter FROM users WHERE user_name = 'kenny';

Update

(Including updating a subsection of the document)

PostgreSQL

UPDATE users SET contacts = JSONB_SET(contacts, '{twitter}', '"@fred"') WHERE user_name = 'kenny';

MySQL/MariaDB/SQLite

UPDATE users SET contacts = JSON_SET(contacts, '$.twitter', '@fred') WHERE user_name = 'kenny';

Delete

(Including deleting a subsection of the document)

PostgreSQL

UPDATE users SET contacts = contacts - 'twitter' WHERE user_name = 'kenny';

MySQL/MariaDB/SQLite

UPDATE users SET contacts = JSON_REMOVE(contacts, '$.twitter') WHERE user_name = 'kenny';

Virtual Index on a JSON attribute

PostgreSQL

CREATE INDEX idx_users_contacts ON users USING GIN (contacts);
CREATE INDEX idx_users_contacts_email ON users USING GIN ((contacts -> 'email'));

MySQL

/* Create the virtual column: */
ALTER TABLE users ADD contacts_email VARCHAR(255) AS (contacts->>'$.email');
ALTER TABLE users ADD contacts_email VARCHAR(255) AS (contacts->>'$.email') STORED;
/* Create your traditional index: */
ALTER TABLE users ADD INDEX (contacts_email);
ALTER TABLE users ADD INDEX (contacts_email_stored);

MariaDB

/* Create the virtual column: */
ALTER TABLE users ADD COLUMN contacts_email VARCHAR(255) AS (JSON_VALUE(contacts, '$.email')) VIRTUAL;
/* Create your traditional index: */
ALTER TABLE users ADD INDEX (contacts_email);
ALTER TABLE users ADD INDEX (contacts_email_stored);
Your Favorite Terminal
$> To The Terminal
$> sql -u user -p  
CAUTION
Live
Demo

Additional Resources

JSON
PostgreSQL
MySQL
MariaDB
SQLite

Next Steps

  • Lots more JSON Functions
  • Indexing
  • Optimizer & Query Planning

Questions

Feedback...

https://where.matsinet.codes