# 20 SQL Interview Questions – Senior Level Part 1 (Questions 1 through 5)
Table of Contents
1. What are window functions in SQL and how are they used?
Window functions operate over a “window” of rows related to the current row, typically defined with an OVER clause. They allow you to perform calculations such as running totals, rankings, and moving averages without collapsing rows the way aggregate queries do.
Basic syntax
SELECT column1, column2, window_function(column3) OVER (PARTITION BY partition_column ORDER BY order_column)FROM table_name;
column1,column2: ordinary columns returned in the result set.window_function(column3): the function applied over the window.PARTITION BY: splits the dataset into independent partitions (optional).ORDER BY: defines ordering inside each partition (optional).
Example:
-- Running sum of salary per departmentSELECT employee_id, department_id, salary, SUM(salary) OVER ( PARTITION BY department_id ORDER BY employee_id ) AS cumulative_salaryFROM employees;Popular window functions include ROW_NUMBER(), RANK(), DENSE_RANK(), SUM(), AVG(), MIN(), and MAX(). They are invaluable whenever you need to compare each row against its peers without losing detail.
2. Explain database sharding
Database sharding splits a large dataset into smaller, independent chunks called shards. Each shard holds a subset of the data—often determined by a key range or hashing strategy—and can be hosted separately. Sharding improves scalability and performance by distributing load across multiple nodes.
Key traits
-
Data partitioning
- Tables are divided into shards, each storing a specific slice of the data (for example, by customer region or primary-key range).
-
Workload distribution
- Each shard handles reads and writes for its own data range, and some deployments dedicate shards to read or write workloads.
-
Scalability
- Instead of scaling a single monolithic database, you add more shards to handle growth.
-
Performance
- Queries touch less data per shard, so they complete faster when routed intelligently.
-
Distributed management
- Operating multiple shards requires coordination, metadata tracking, and careful failover planning.
Challenges and considerations
- Data synchronization: ensure consistency across shards when cross-shard operations occur.
- Shard key selection: choose a partitioning strategy that avoids hotspots (one shard receiving disproportionate traffic).
- Metadata management: keep reliable metadata about shard locations, ranges, and status.
Sharding is complex and best reserved for systems that truly need horizontal scaling.
3. How do you design an effective database?
Designing a solid database schema demands a deep understanding of application requirements and data usage patterns. Core steps include:
1. Gather application requirements
- Understand entities, workflows, and how data is accessed to capture the right relationships.
2. Normalize data
- Apply normalization rules to reduce redundancy and keep data consistent.
3. Define primary and foreign keys
- Assign primary keys for uniqueness and foreign keys to model relationships.
4. Choose appropriate data types
- Pick data types that match the value range and precision you need.
5. Classify data
- Organize data into tables based on behavior and usage to keep schemas coherent.
6. Design for performance
- Anticipate frequent queries and add supporting indexes or structures ahead of time.
7. Consider partitioning and sharding (when needed)
- Use partitioning or sharding to keep very large tables performant and scalable.
8. Plan data synchronization
- Define how data stays consistent across tables, services, or replicas.
9. Add constraints
- Enforce integrity with constraints such as foreign keys, unique indexes, and checks.
10. Test and optimize
- Benchmark and profile the schema under realistic workloads to uncover bottlenecks.
11. Maintain and iterate
- Establish maintenance routines to evolve the schema as requirements change.
Effective design is iterative—keep measuring, refining, and validating against real-world use.
4. Differentiate OLAP and OLTP databases
OLAP (Online Analytical Processing) and OLTP (Online Transaction Processing) systems target different workloads.
OLAP
- Purpose: supports complex analysis, multidimensional reporting, and aggregated insights.
- Query style: heavy, computation-intensive queries over large datasets.
- Data model: multidimensional structures with historical data and precomputed metrics.
- Users: managers, analysts, and decision-makers who need big-picture visibility.
OLTP
- Purpose: handles day-to-day business transactions (insert, update, delete).
- Query style: short, simple, highly concurrent operations.
- Data model: row-oriented schemas optimized for strict consistency and fast commits; typically stores only current transactional data.
- Users: operational staff and applications running core business processes.
Summary
- OLAP focuses on analytics; OLTP focuses on transactional workloads.
- OLAP queries are complex and scan lots of data; OLTP queries are simple and frequent.
- OLAP uses multidimensional storage; OLTP uses normalized row-oriented tables.
- OLAP serves leadership insights; OLTP powers daily operations.
5. How do you optimize a query execution plan?
Improving a query plan centers on helping the optimizer make better choices and reducing the data it must process. Techniques include:
- Use indexes wisely: index columns involved in
WHERE,JOIN, andORDER BYclauses to avoid full-table scans. - Keep statistics current: up-to-date table and index statistics let the optimizer estimate cardinality accurately.
- Avoid
SELECT *: return only the columns you need to cut down on I/O and network transfer. - Choose joins carefully: pick the right join type (INNER, LEFT, etc.) to limit intermediate row counts.
- Order filters strategically: put the most selective predicates first when possible to prune rows early.
- Limit result size: use
LIMIT(or database-specific equivalents) when you only need a subset. - Partition large tables: partitioning reduces the data scanned for queries that target specific ranges.
- Skip functions in
WHERE: wrapping indexed columns in functions can disable index usage; rewrite predicates when you can. - Refresh database statistics: periodically regenerate statistics so the optimizer reflects real data distribution.
- Leverage tuning tools: use your database’s explain/analyze, profiling, or advisor tools to spot bottlenecks.
- Inspect execution plans: review generated plans to find unnecessary scans, sorts, or joins and adjust queries or schema accordingly.
- Consider denormalization: for read-heavy workloads, denormalizing select tables can reduce joins and improve speed.
Always test and measure—optimizations depend on schema design, workload patterns, and the database engine in use.