Data Engineer · NYC

Building reliable data systems for high-stakes decision making.

I'm Ahsan, a Data Engineer at Accenture focused on web-scale scraping, resilient ETL, and production-grade data quality engineering.

Previously: NYC Department of Buildings, Fujitsu, WebMD, and ASML.

Focused on finance-ready data platforms with strong reliability, lineage, and trust controls.

Portrait of Ahsan Fayyaz

Impact

What I deliver

95%

Deduplication accuracy in AI-assisted extraction

80%

Reduction in manual data processing

50%

Faster processing in modernized data quality workflows

40%

Increase in enterprise data quality accuracy

How I support data teams

  • Build reliable alternative data ingestion pipelines with validation and retry controls.
  • Improve trust in critical datasets through quality checks and observability patterns.
  • Deliver cleaner production datasets faster for research and decision workflows.

Experience

Professional experience

Accenture

Jul 2022 - Mar 2026 · New York, NY

Senior Data Engineer

Jun 2025 - Mar 2026

  • Built a scalable web-scraping platform using Python, Playwright, ScraperAPI, and BeautifulSoup to ingest structured and unstructured alt-data across multiple sources.
  • Developed AI-assisted extraction pipelines (Gemini + fuzzy matching), enabling 95% deduplication accuracy and reducing manual processing by 80%.
  • Designed cloud-hosted ETL workflows on GCP Compute Engine with cron scheduling, REST integrations, retries, monitoring, logging, and automated historical storage.
  • Implemented validation and schema enforcement to maintain accuracy, freshness, and consistency across thousands of daily records.
  • Architected centralized NocoDB master and active tables with lifecycle automation, entitlement controls, and real-time synchronization.

Data Engineer

Jul 2022 - Jun 2025

  • Led design and deployment of a telecom data quality framework using AWS Deequ, Databricks, Spark, Scala, Python, and SQL.
  • Automated 30+ data quality checks, reducing validation time by 40% and improving processing speed by 50%.
  • Developed scalable ETL systems across AWS and Databricks for reliable high-volume production pipelines.

NYC Department of Buildings

Feb 2022 - Jun 2022 · New York, NY

Machine Learning Engineer

  • Designed and developed supervised ML models for the Analytics and Data Science Unit to predict high-risk buildings from historical construction injury data.
  • Engineered multi-year public datasets and built robust train/validate/test workflows with upsampling and hyperparameter tuning.
  • Evaluated Gradient Boosting, Logistic Regression, Neural Networks, K-Nearest Neighbors, and SVM to identify the most reliable model for inspection prioritization.
  • Implemented model analysis and visualization pipelines in Jupyter using NumPy, pandas, matplotlib, scikit-learn, and GeoPandas.
  • Improved risk-based inspection planning and contributed to a 35% reduction in construction incidents.

Fujitsu Network Communications

Jun 2021 - Aug 2021 · Richardson, TX

Software Engineering Intern

  • Designed and developed a cloud-based web application for the Software Business Unit (SWBU) to generate XML, JSON, and XLSX outputs for Virtuora Planning and Design workflows.
  • Built a user-friendly interface to create and download multi-format files in a single streamlined workflow.
  • Eliminated recurring format errors and reduced average manual build time from 25 hours to 30 minutes.
  • Project was adopted by a major telecom client and generated approximately $100K in revenue impact.
  • Tech stack: HTML, CSS, JavaScript, Java 8, Oracle DB, jQuery, jQuery UI, AJAX, and Bootstrap.

WebMD

Feb 2020 - May 2021 · New York, NY

Software Engineering Intern

  • Developed and contributed to multiple full-stack applications for WebMD's Consumer Runtime platform.
  • Designed and implemented a Page Performance Testing Dashboard to streamline test creation, tracking, and stakeholder approvals before production deployment.
  • Automated route creation and route validation workflows to remove manual QA steps and improve developer throughput.
  • Implemented regex-based URL validation and PostgreSQL-backed search functionality to improve routing reliability.
  • Designed user interfaces using MVC design principles for maintainable internal tooling.

ASML

Jun 2019 - Aug 2019 · Wilton, CT

Software Engineering Intern

  • Delivered two automation tools for the Reticle Stage SDEV team, including a Linux/Python app for compatibility agreement generation.
  • Reduced compatibility agreement generation from 8 hours to 10 seconds, eliminating manual errors and enabling more frequent execution.
  • Built a Python utility for rapid software build infrastructure navigation, removing the need to manually access scope files.
  • Collaborated on a joint automation initiative to streamline GTA (Google Test Assistant) input collection workflows.
  • Redesigned algorithm logic to extract testable functions from C/C++ source code, automating a repetitive unit-testing preparation step.

Skills

Technical stack

Tools and platforms I use most in production data engineering work.

Programming and Data Processing

Python SQL Scala Java Shell Git

Cloud and Platforms

AWS S3 Glue Redshift Databricks Snowflake Delta Lake GCP Compute Engine

Orchestration and Governance

Airflow CI/CD Cron-based Scheduling Metadata Management Lineage

Data Modeling and Quality

Star/Snowflake Schemas Deequ Validation Rules Schema Enforcement

Certification

AWS Certified Cloud Practitioner (CLF-C01)

Amazon Web Services

Let's connect

Open to opportunities in data engineering with focus on web data, robust pipelines, and high-integrity analytics foundations.