R&D scenario step-by-step explain about Bloat

Step 1: Setting Up the Environment

Objective: To set up a basic schema for a product review system.

products Table: This table will store different product names. The id is an auto-incrementing primary key, ensuring each product has a unique identifier.
reviews Table: This table stores reviews for the products. Each review is associated with a product through product_id (a foreign key). The created_at field automatically stores the timestamp of when a review was created.

Step 2: Populating Data

Objective: To insert sample data that mimics real-world usage.

Inserting Products: Three products (Laptop, Smartphone, Headphones) are inserted into the products table.
Bulk Insert of Reviews: A loop inserts 100,000 reviews for the first product (Laptop). This bulk insert simulates a large dataset, making it easier to observe bloat.

Step 3: Simulating Bloat

Objective: To generate bloat by performing numerous updates and deletions.

Update Operations:
- Updating the first 50,000 reviews simulates the scenario where data is frequently changed.
- PostgreSQL’s MVCC system creates a new version of each row that's updated, marking the old version as "dead" but not immediately removing it. This results in bloat.
Delete Operations:
- Deleting reviews between IDs 50,001 and 75,000 represents the regular removal of data.
- Deleted rows are marked as dead and aren't immediately purged, contributing further to the bloat.

Step 4: Observing Bloat

Objective: To measure and analyze bloat in the reviews table.

pgstattuple Extension:
- This is a PostgreSQL extension that provides various functions to obtain tuple-level statistics. It's an invaluable tool for observing bloat.
Analyzing the Reviews Table:
- Using the pgstattuple function, you can obtain metrics like the total table size and the total size of dead tuples. The difference between these values gives you an idea of how much space is being wasted due to bloat.

Step 5: Reclaiming Space

Objective: To demonstrate how PostgreSQL handles bloat and reclaims space.

Regular VACUUM:
- This operation cleans up the dead tuples (from updates and deletes) and marks the space as available for future use. However, it doesn't return the space to the operating system, so the physical size of the table on disk might remain the same.
- After running VACUUM, the dead_tuple_len (size of dead tuples) from pgstattuple should decrease.
VACUUM FULL:

This is a more aggressive form of vacuuming. It reclaims space and compacts the table, returning the freed space to the operating system. This operation is more resource-intensive and locks the table during its operation.
After running VACUUM FULL, both dead_tuple_len and table_len (total table size) should decrease significantly.

db2=# DO $$ db2$# BEGIN db2$# FOR i IN 1..100000 LOOP db2$# INSERT INTO reviews (product_id, review) db2$# VALUES (1, 'This is a review for product ' || i); db2$# END LOOP; db2$# END; db2$# $$; DO db2=# \dt List of relations Schema | Name | Type | Owner --------+----------+-------+---------- public | products | table | postgres public | reviews | table | postgres (2 rows) db2=# \dt+ List of relations Schema | Name | Type | Owner | Persistence | Size | Description --------+----------+-------+----------+-------------+---------+------------- public | products | table | postgres | permanent | 16 kB | public | reviews | table | postgres | permanent | 8280 kB | (2 rows) db2=# select count(*) from reviews; count -------- 100000 (1 row) db2=# select * from reviews where id=1; id | product_id | review | created_at ----+------------+--------------------------------+--------------------------------- 1 | 1 | This is a review for product 1 | 2023-08-17 22:02:39.80592+05:30 (1 row) db2=# select * from reviews where id=2; id | product_id | review | created_at ----+------------+--------------------------------+---------------------------------

Search This Blog

postgresql new