R&D scenario step-by-step explain about Bloat
Step 1: Setting Up the Environment
Objective: To set up a basic schema for a product review system.
- products Table: This table will store different product names. The
idis an auto-incrementing primary key, ensuring each product has a unique identifier. - reviews Table: This table stores reviews for the products. Each review is associated with a product through
product_id(a foreign key). Thecreated_atfield automatically stores the timestamp of when a review was created.
Step 2: Populating Data
Objective: To insert sample data that mimics real-world usage.
- Inserting Products: Three products (Laptop, Smartphone, Headphones) are inserted into the
productstable. - Bulk Insert of Reviews: A loop inserts 100,000 reviews for the first product (Laptop). This bulk insert simulates a large dataset, making it easier to observe bloat.
Step 3: Simulating Bloat
Objective: To generate bloat by performing numerous updates and deletions.
Update Operations:
- Updating the first 50,000 reviews simulates the scenario where data is frequently changed.
- PostgreSQL’s MVCC system creates a new version of each row that's updated, marking the old version as "dead" but not immediately removing it. This results in bloat.
Delete Operations:
- Deleting reviews between IDs 50,001 and 75,000 represents the regular removal of data.
- Deleted rows are marked as dead and aren't immediately purged, contributing further to the bloat.
Step 4: Observing Bloat
Objective: To measure and analyze bloat in the reviews table.
- pgstattuple Extension:
- This is a PostgreSQL extension that provides various functions to obtain tuple-level statistics. It's an invaluable tool for observing bloat.
- Analyzing the Reviews Table:
- Using the
pgstattuplefunction, you can obtain metrics like the total table size and the total size of dead tuples. The difference between these values gives you an idea of how much space is being wasted due to bloat.
- Using the
Step 5: Reclaiming Space
Objective: To demonstrate how PostgreSQL handles bloat and reclaims space.
Regular VACUUM:
- This operation cleans up the dead tuples (from updates and deletes) and marks the space as available for future use. However, it doesn't return the space to the operating system, so the physical size of the table on disk might remain the same.
- After running VACUUM, the
dead_tuple_len(size of dead tuples) frompgstattupleshould decrease.
VACUUM FULL:
- This is a more aggressive form of vacuuming. It reclaims space and compacts the table, returning the freed space to the operating system. This operation is more resource-intensive and locks the table during its operation.
- After running VACUUM FULL, both
dead_tuple_lenandtable_len(total table size) should decrease significantly.



Comments
Post a Comment