Delta Lakehouse: NYC Property Sales
- Designed and implemented a scalable Delta Lakehouse architecture, consolidating fragmented NYC real estate datasets into a curated, analytics-ready single source to support market trend analysis and investment decision-making.
- Developed PySpark pipelines in Databricks for data ingestion, cleansing, and transformation, improving data quality, enforcing schema consistency, and enabling reliable downstream BI consumption.
- Modeled a high-performance Snowflake schema in the Gold layer (AWS S3 + Unity Catalog), optimizing fact/dimension structures to reduce query latency and support self-service analytics.
- Created curated SQL views on top of the Gold layer, providing a stable and analytics-ready layer that can be directly consumed by BI tools (e.g., Power BI), enabling self-service reporting and reducing dependency on engineering teams.