The Data Engineer has a unique role to build world class data platforms and deploy scalable business intelligence tools for Populi. The ideal candidate relishes working with large volumes of data, enjoys the challenge of highly complex technical contexts, and, above all else, is passionate about data and analytics. He/she is an expert with data modeling, ETL design, and business intelligence tools, and passionately partners with the business to identify strategic opportunities where improvements in data infrastructure creates outsized business impact. He/she is a self-starter, comfortable with ambiguity, able to think big (while paying careful attention to detail), and enjoys working in a fast-paced team. The ideal candidate needs to possess exceptional technical expertise in large scale data lake and BI systems with hands-on knowledge with Apache Spark, SQL, NoSQL, Distributed/MPP data storage, and AWS services.
- Design, implement, and support a platform providing ad hoc access to large datasets
- Interface with other technology teams to extract, transform, and load data from a wide variety of data sources
- Implement data processing pipelines using best practices in data modeling, ETL/ELT processes, Apache Spark, and OLAP technologies
- Model data and metadata for ad hoc and pre-built reporting
- Interface with business customers, gathering requirements and delivering complete reporting solutions
- Build robust and scalable data integration (ETL) pipelines using SQL, Python, Scala, and Apache Spark.
- Build and deliver high quality datasets to support business analyst and customer reporting needs.
- Continually improve ongoing reporting and analysis processes, automating or simplifying self-service support for customers
- Participate in strategic & tactical planning discussions, including annual budget processes
- Bachelor’s degree in a quantitative/technical field such as Computer Science, Statistics, Engineering
- 3+ years of relevant experience in data engineering, business intelligence, and business analytics
- 3+ years of hands-on experience in writing and optimizing complex SQL queries
- 3+ years of experience in programming with Python and/or Scala
- Experience in data modeling, ETL development, and interacting with and building Data lakes and warehouses
- Extensive experience with Apache Spark and Hive
- Experience with AWS services including S3, Redshift, EMR, Step Functions, and Data Pipeline
- Experience with Google Cloud Platform, including GCS, Dataproc, and BigQuery
- Experience in working and delivering end-to-end projects independently
- Experience building/operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets
- Experience with AGILE software development practices
- Experience with Linux and shell scripting
- Experience with Git, GitHub, JIRA, and Confluence
- Master’s degree or higher in a quantitative/technical field such as Computer Science, Statistics, Engineering
- 5+ years of experience as a Data Engineer or Software Engineer working with one or more companies with large, complex data sources
- 3+ years of experience with healthcare data and use cases, particularly claims data
- Experience with streaming data and streaming analytics
- Experience with distributed machine learning and related technologies such as MLflow, Spark ML Pipelines, and XGBoost
Populi is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you have a disability or special need that requires accommodation, please let us know.