03/19/2025 updated


100 % available
DevOps Engineer | 9 + Yrs | AWS | Azure |
Houston, USA
USA
Bs in CS at University of Texas - AustinJavaScript (Programming Language)Application Programming Interfaces (APIs)Agile MethodologyArtificial IntelligenceAirflowAlgorithmsAmazon Web ServicesAmazon Elastic Compute CloudAmazon S3JIRAAutomationMicrosoft AzureBatch ProcessingBig DataBigQueryCustomer Relationship ManagementBusiness PlanningCustomer ExperienceClient Access LicensingCloud ComputingInformation Technology ConsultingContinuous IntegrationCustomer InsightInformation EngineeringData GovernanceData InfrastructureData IntegrationData IntegrityExtract Transform Load (ETL)Data MiningData ModelingData QualityData RetrievalData SystemData VisualizationData WarehousingDevOpsDialectical Behavior TherapyE-CommercePerformance ManagementEnergy TechnologyFinanceException HandlingForecastingGithubGovernanceRenewable EnergyApache HadoopHadoop Distributed File SystemMapReduceHealth Insurance Portability and Accountability Act (HIPAA) ComplianceIdentity ManagementInfrastructure ManagementInnovation ManagementPython (Programming Language)Lightweight Directory Access Protocols (LDAP)PostgreSQLProject ManagementMicrosoft SQL ServersMongoDBOperational DatabasesPhotovoltaics (PV)Real EstatesVenta al por MenorRisk AnalysisScala (Programming Language)Software EngineeringPL-SQLSQL DatabasesStakeholder ManagementStock ControlStreamlineData StreamingSustainabilityTableau (Software)Transact-SQLWeb ApplicationsWorkflowsData/Record LoggingOil and GasData ProcessingDemonstration SkillsData Storage TechnologiesConsolidation (Financial)Data ScienceSnowflakeApache SparkElectronic Medical RecordsGitlabGitPandasKubernetesCassandraStar SchemaReal Time DataApache KafkaApache NifiBitbucketData ManagementData DeliveryFintechSafety PrinciplesTerraformData InconsistenciesLooker AnalyticsData PipelineDockerAmazon RedshiftDatabricksProgramming Languages
Senior Data Engineer with over 9 years of hands-on experience building and managing scalable data solutions across AWS, GCP, and Azure. Skilled in developing robust ETL pipelines, optimizing data warehousing, and delivering advanced analytics. My expertise spans across fintech, e-commerce, and energy tech, leveraging tools like Snowflake, Spark, and Python to drive insights and improve operations. I thrive in agile, cross-functional teams, aligning data strategies with business goals to support smarter decision-making and deliver measurable impact.
SKILLS
• Programming Languages: Python, SQL, JavaScript, Scala, PL/SQL, T-SQL
• Cloud Platforms: AWS (EC2, S3, EMR, Redshift), GCP (BigQuery, Cloud SQL, Dataflow)
• Big Data Technologies: Spark, Hadoop (HDFS, MapReduce)
• Data Warehousing: Snowflake, Redshift, Postgres, MSSQL, MongoDB, Cassandra, Clickhouse
• Data Processing & Integration: Apache Airflow, Airbyte, Apache NiFi, Keboola, dbt, Apache Kafka, Pub/Sub
• Data Visualization: Looker, Holistics, Tableau
• Development & DevOps: Docker, Kubernetes, Terraform, Git, GitHub Actions, GitHub, GitLab, Bitbucket
• Security & Compliance: IAM Security, HIPAA Compliance, LDAP
• Software Development Practices: Agile, TDD, BDD, CI/CD
WORK EXPERIENCE
Wizeline, San Francisco, CA
Senior Data Engineer 05/2022 – 09/2024
Industry: Technology Consulting & Software Development
Overview: [https://www.wizeline.com/] Wizeline is a global technology consulting company that helps businesses solve complex technology problems through digital solutions. With a focus on data engineering, AI, and software development, Wizeline delivers innovative solutions for clients worldwide, including the US. Based in San Francisco, the company serves various industries like retail, media, and financial services.
- Data Infrastructure Optimization: Developed and optimized data pipelines and workflows using Apache Spark, Scala, and Databricks, leading to a 50% improvement in processing time for high-volume datasets across various domains.
- Data Warehousing Solutions: Architected and maintained scalable data warehousing solutions on AWS, ensuring efficient data storage and accessibility. Optimized ETL processes with Airflow and Python to improve the speed and accuracy of data retrieval for analytics and reporting.
- Data Integration for Customer Insights: Led the development of real-time data pipelines using Kafka and Spark to enhance customer insights, significantly improving decision-making capabilities for key stakeholders by providing up-to-date analytics.
- Automation and Workflow Management: Automated critical workflows using Apache Airflow, ensuring timely data delivery and validation. Built robust frameworks for monitoring and testing data pipelines, which reduced manual errors and enhanced data reliability.
- Cross-Functional Collaboration: Worked closely with teams across product, data science, and business units to understand data requirements and implement solutions that support business growth. Mentored junior engineers, providing training on Scala, Airflow, and best practices in data engineering.
Technologies Used: Apache Spark, Databricks, Scala, Python (Pandas), AWS (S3, EC2, Redshift), Airflow, Kafka, Git, Kubernetes, Docker.
S&P Global, New York, NY
Data Engineering Consultant 07/2019 - 04/2022
Industry: Financial Services & Sustainability Analytics
Overview: [https://www.spglobal.com/] S&P Global Sustainable 1 provides essential intelligence on sustainability for companies, governments, and investors. As a Data Engineering Consultant, I played a key role in optimizing S&P Global’s data extraction and transformation processes, particularly focusing on risk data pipelines. My work contributed to the enhancement of sustainability risk scoring and improved the timeliness of critical data for clients.
- Data Pipeline Automation: Led the redesign and optimization of the data extraction pipeline using Scala and Apache Spark, automating the ingestion of data from a paginated API. This reduced data inconsistencies by 90%, significantly improving data quality and reliability for risk scoring models.
- Risk Data Modeling: Developed and maintained complex data models to support the Physical Risk pipeline, using Databricks and Scala. These models processed millions of records across diverse asset categories such as Mines, Oil & Gas Pipelines, and Real Estate, enhancing the precision of sustainability risk assessments.
- Real-Time Data Ingestion: Transformed annual batch processing into continuous real-time data ingestion, leveraging Apache NiFi and Kafka. This enabled faster updates to risk scores on the XpressFeed platform, improving client access to real-time data.
- Data Warehousing and ETL: Designed and implemented ETL processes using Spark and Databricks, ensuring efficient backfilling of historical data in MSSQL production data warehouse. These processes improved the accuracy and timeliness of risk updates for hundreds of clients.
- Cross-Functional Collaboration: Worked closely with data scientists and analysts to align data pipelines with business objectives, ensuring data integrity and consistency across multiple data streams.
- Workflow Automation: Implemented automated workflows using Apache Airflow, ensuring reliable, timely data delivery to downstream applications. Enhanced testing frameworks to validate data accuracy and improve data governance practices.
Technologies Used: Databricks, Apache Spark, Scala, Apache NiFi, Kafka, Apache Airflow, XpressFeed, Python, AWS S3, PostgreSQL, GitHub.
SolarCity, San Mateo, CA
Lead Data Engineer 11/2015-07/2019
Industry: Renewable Energy
Overview: [https://www.solarcity.com/] SolarCity is a leading provider of renewable energy solutions, specializing in solar energy systems for residential and commercial clients. As a Lead Data Engineer, I played a key role in optimizing data management processes, focusing on renewable energy analytics and customer insights. My work contributed to enhanced project management and data-driven decision-making across the organization.
- Data Warehouse Optimization: Optimized the data warehouse schema using a star schema design, establishing distinct staging, modeling, and presentation layers to support renewable energy analytics and customer experience functions. Utilized Python, SQL, and Airflow to automate data pipeline and transformation processes.
- Reverse ETL Implementation: Led the migration of over 2 TB of data from AWS Redshift into the company's CRM system, enabling a seamless transition to a self-managed platform. This initiative resulted in a 75% reduction in third-party CRM costs.
- Airflow Migration: Successfully migrated Airflow from Digital Ocean to AWS EC2, enhancing its capabilities with robust error handling and actionable logging. This critical step was instrumental in centralizing our infrastructure on AWS for improved management and efficiency.
- Solar Forecast Optimization: Developed and implemented Python-based algorithms for optimizing solar energy production forecasts using historical performance data and weather patterns. This automated process improved forecast accuracy by 20%.
- Project Management Tool Development: Coded a web-based tool to streamline project management for solar installations, incorporating a Jira-based approval workflow for enhanced governance and tracking.
- Performance Metrics Consolidation: Designed a unified data pipeline to consolidate performance metrics across multiple SolarCity markets. This solution enabled data-driven decisions on project selection and optimization by providing insights into efficiency and productivity.
Technologies Used: AWS Redshift, Python, SQL, Airflow, Jira, Data Warehousing, ETL Processes, Renewable Energy Analytics.
Languages
EnglishNative speaker