Skip to main content

PostgreSQL JSONB: Flexible Schema, Indexing, and When to Use It

·9 mins
Table of Contents
JSONB is not a replacement for normalized schemas. It is a tool for genuinely semi-structured data where the shape varies per row and you need to query inside it. Use it precisely, and it saves you from premature schema commits. Use it everywhere, and you have reinvented a document store on top of a relational database.

PostgreSQL’s JSONB type is one of the most powerful and most misused features in the database. This post covers the practical details: storage differences, operators, GIN indexes, mutation functions, and the patterns where JSONB earns its place.

json vs jsonb: Always Use jsonb
#

PostgreSQL has two JSON types: json and jsonb. The difference matters in production.

jsonjsonb
StorageStores the raw text, preserves whitespace and key orderStores a decomposed binary representation
Read performanceMust re-parse on every readBinary format, no re-parse needed
Write performanceSlightly faster (no decomposition)Slightly slower (decomposes at write time)
IndexingNo index supportGIN and btree indexes supported
Key deduplicationKeeps duplicatesLast-write wins on duplicate keys
Warning

Never use json in a new schema. The only reason json exists is for cases where you need to preserve exact key order or duplicate keys (rare auditing use cases). For everything else, jsonb is strictly better because it can be indexed and queried efficiently.

Querying: ->, ->>, #>, #>>
#

The four navigation operators:

  • -> returns a JSON sub-object or array element (type: jsonb)
  • ->> returns the value as text (type: text)
  • #> navigates a path array, returns jsonb
  • #>> navigates a path array, returns text
operators.sql
CREATE TABLE events (
    id      bigserial PRIMARY KEY,
    payload jsonb NOT NULL,
    created_at timestamptz NOT NULL DEFAULT now()
);

INSERT INTO events (payload) VALUES
  ('{"type": "order.created", "user": {"id": 42, "email": "alice@example.com"}, "items": [{"sku": "A1", "qty": 2}]}'),
  ('{"type": "order.shipped", "user": {"id": 42, "email": "alice@example.com"}, "tracking": "1Z999"}');

-- Get the event type as jsonb (quoted string in output)
SELECT payload -> 'type' FROM events;

-- Get the event type as plain text
SELECT payload ->> 'type' FROM events;

-- Navigate nested path: user.email
SELECT payload #>> '{user, email}' FROM events;

-- Filter by nested value
SELECT * FROM events
WHERE payload ->> 'type' = 'order.created'
  AND (payload #>> '{user, id}')::int = 42;

Containment Operators: @> and <@
#

The containment operators are the most powerful JSONB querying tools, and they work with GIN indexes.

A @> B means “does A contain B as a subset of keys/values?” A <@ B means “is A contained within B?”

containment.sql
-- Find all events where the user id is 42
SELECT * FROM events
WHERE payload @> '{"user": {"id": 42}}';

-- Find events of specific types
SELECT * FROM events
WHERE payload @> '{"type": "order.created"}';

-- Find events that have a tracking field at all
SELECT * FROM events
WHERE payload ? 'tracking';

-- Find events where type is one of a set
SELECT * FROM events
WHERE payload ->> 'type' = ANY(ARRAY['order.created', 'order.shipped']);

GIN Indexes: What They Accelerate
#

A GIN (Generalized Inverted Index) on a JSONB column indexes all keys and values within the document, enabling fast containment and existence queries.

gin-index.sql
-- Default GIN with jsonb_ops: supports @>, ?, ?|, ?&
CREATE INDEX idx_events_payload_gin ON events USING gin (payload);

-- GIN with jsonb_path_ops: smaller index, only supports @>
-- Better when you only need containment queries
CREATE INDEX idx_events_payload_path_ops ON events USING gin (payload jsonb_path_ops);

-- B-tree index on a specific extracted value (for equality + range on one field)
CREATE INDEX idx_events_type ON events ((payload ->> 'type'));
CREATE INDEX idx_events_user_id ON events (((payload #>> '{user, id}')::int));
Tip

Use jsonb_path_ops when all your queries use @> containment. It produces a smaller, faster index because it only indexes values, not keys. Use the default jsonb_ops when you also need ? (key existence), ?|, or ?& operators.

The B-tree index on an extracted expression is the right choice when you have a specific field that is queried frequently with equality or range conditions. It is significantly faster than GIN for single-field lookups.

Updating JSONB: jsonb_set, ||, and Key Removal
#

Updating a JSONB field requires constructing a new value. PostgreSQL provides several functions for this.

jsonb-mutations.sql
-- jsonb_set(target, path, new_value, create_missing)
-- Update user.email in place
UPDATE events
SET payload = jsonb_set(payload, '{user, email}', '"bob@example.com"', false)
WHERE id = 1;

-- jsonb_insert(target, path, new_value, insert_after)
-- Insert a new element into the items array at position 0
UPDATE events
SET payload = jsonb_insert(payload, '{items, 0}', '{"sku": "B2", "qty": 1}', false)
WHERE id = 1;

-- Merge / shallow merge with the || operator
UPDATE events
SET payload = payload || '{"status": "processed", "processed_at": "2025-01-01"}'
WHERE id = 1;

-- Remove a key with the - operator
UPDATE events
SET payload = payload - 'tracking'
WHERE id = 2;

-- Remove a nested path with the #- operator
UPDATE events
SET payload = payload #- '{user, email}'
WHERE id = 1;
Note

The || operator performs a shallow merge, not a deep merge. If both the left and right side have the same nested object key, the right side replaces the left entirely, rather than merging recursively. Use jsonb_set for targeted nested updates.

Unnesting: jsonb_array_elements and jsonb_each
#

unnesting.sql
-- Expand a JSONB array into rows
SELECT
  e.id,
  item ->> 'sku'      AS sku,
  (item ->> 'qty')::int AS qty
FROM events e,
  jsonb_array_elements(e.payload -> 'items') AS item
WHERE e.payload @> '{"type": "order.created"}';

-- Expand all key-value pairs of a JSONB object into rows
SELECT key, value
FROM events,
  jsonb_each(payload)
WHERE id = 1;

Real Use Case: Event Sourcing with JSONB
#

Event sourcing is one of the best fits for JSONB. The event type determines the payload shape, so a single events table with a JSONB payload column handles heterogeneous event schemas naturally.

event-sourcing.sql
CREATE TABLE domain_events (
    id          bigserial PRIMARY KEY,
    aggregate   text        NOT NULL,
    agg_id      bigint      NOT NULL,
    event_type  text        NOT NULL,          -- extracted for indexing
    version     int         NOT NULL,
    payload     jsonb       NOT NULL,
    created_at  timestamptz NOT NULL DEFAULT now()
);

-- Partial index: find all shipment events cheaply
CREATE INDEX idx_domain_events_shipment
  ON domain_events (agg_id, version)
  WHERE event_type = 'order.shipped';

-- GIN for attribute queries inside payload
CREATE INDEX idx_domain_events_payload
  ON domain_events USING gin (payload jsonb_path_ops);

-- Replay all events for a specific order
SELECT event_type, version, payload
FROM domain_events
WHERE aggregate = 'order' AND agg_id = 42
ORDER BY version;

-- Find all orders shipped to a specific address
SELECT agg_id, payload
FROM domain_events
WHERE event_type = 'order.shipped'
  AND payload @> '{"address": {"city": "Milan"}}';

ORM Integration
#

models.py
from sqlalchemy import Column, BigInteger, Text, DateTime
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import DeclarativeBase
import datetime

class Base(DeclarativeBase):
    pass

class DomainEvent(Base):
    __tablename__ = "domain_events"

    id         = Column(BigInteger, primary_key=True)
    aggregate  = Column(Text, nullable=False)
    agg_id     = Column(BigInteger, nullable=False)
    event_type = Column(Text, nullable=False)
    version    = Column(BigInteger, nullable=False)
    payload    = Column(JSONB, nullable=False)
    created_at = Column(DateTime(timezone=True), default=datetime.datetime.utcnow)

# Query using containment (uses GIN index)
from sqlalchemy import cast
from sqlalchemy.dialects.postgresql import JSONB as JSONB_type

session.query(DomainEvent).filter(
    DomainEvent.event_type == "order.shipped",
    DomainEvent.payload.contains({"address": {"city": "Milan"}})
).all()
models.py
from django.db import models
from django.contrib.postgres.fields import JSONField  # Django < 3.1
# from django.db.models import JSONField              # Django >= 3.1 (auto-uses jsonb on Postgres)

class DomainEvent(models.Model):
    aggregate  = models.TextField()
    agg_id     = models.BigIntegerField()
    event_type = models.TextField(db_index=True)
    version    = models.IntegerField()
    payload    = models.JSONField()  # maps to jsonb in PostgreSQL
    created_at = models.DateTimeField(auto_now_add=True)

    class Meta:
        indexes = [
            models.Index(fields=["aggregate", "agg_id", "version"]),
        ]

# Query using key lookup (translated to ->> operator)
DomainEvent.objects.filter(
    event_type="order.shipped",
    payload__address__city="Milan"
)
models.go
package models

import (
    "database/sql/driver"
    "encoding/json"
    "fmt"
    "time"
)

// JSONB is a custom type wrapping map[string]interface{}
// to satisfy GORM's Scanner/Valuer interface for jsonb columns.
type JSONB map[string]interface{}

func (j JSONB) Value() (driver.Value, error) {
    b, err := json.Marshal(j)
    return string(b), err
}

func (j *JSONB) Scan(value interface{}) error {
    bytes, ok := value.([]byte)
    if !ok {
        return fmt.Errorf("failed to unmarshal JSONB: %v", value)
    }
    return json.Unmarshal(bytes, j)
}

type DomainEvent struct {
    ID        uint      `gorm:"primarykey"`
    Aggregate string    `gorm:"not null;index:idx_agg"`
    AggID     int64     `gorm:"not null;index:idx_agg"`
    EventType string    `gorm:"not null;index"`
    Version   int       `gorm:"not null"`
    Payload   JSONB     `gorm:"type:jsonb;not null"`
    CreatedAt time.Time
}

// Raw query for GIN containment
var results []DomainEvent
db.Raw(
    `SELECT * FROM domain_events WHERE event_type = ? AND payload @> ?::jsonb`,
    "order.shipped", `{"address": {"city": "Milan"}}`,
).Scan(&results)

When NOT to Use JSONB
#

JSONB is the right choice when the structure genuinely varies per row. It is the wrong choice when:

  • The field is relational data (user, product, order relationships belong in normalized tables)
  • The field is queried frequently by a specific attribute (make it a column, add a B-tree index)
  • You need referential integrity (JSONB values are not foreign keys)
  • The nested structure is deep and would be better expressed as a join
Important

If you find yourself consistently extracting the same path (payload ->> 'status') in WHERE clauses and ORDER BY, that field should be a column. JSONB flexibility costs you query planner statistics, referential integrity, and type safety.

Common Mistakes
#

Using json instead of jsonb
The json type stores raw text and cannot be indexed. Every query that filters or extracts from a json column re-parses the entire text. In any realistic workload this is a performance regression compared to jsonb. There is no valid reason to use json for new tables.
No GIN index on a queried JSONB column
Without a GIN index, containment queries (@>) and key existence queries (?) perform a full sequential scan. On tables with millions of rows, this is the difference between milliseconds and minutes. Add a GIN index on any JSONB column you filter on.
Updating JSONB with string concatenation
Constructing JSON in application code and overwriting the entire column loses any concurrent updates and is error-prone. Use jsonb_set for targeted updates and the || merge operator for shallow merges. Never concatenate strings to build JSON for SQL queries.
Storing deeply nested structures that should be columns
A three-level nested JSONB object with five frequently-queried keys is a maintenance problem. The query planner cannot collect statistics on JSONB contents, so estimates are poor and query plans can be suboptimal. Extract frequently-queried fields as columns and use the JSONB column for the genuinely variable remainder.

If you want to go deeper on any of this, I offer 1:1 coaching sessions for engineers working on AI integration, cloud architecture, and platform engineering. Book a session (50 EUR / 60 min) or reach out at manuel.fedele+website@gmail.com.