The Role of Data Engineers – AWS



This content originally appeared on DEV Community and was authored by soul-o mutwiri

Image description

  1. Building and managing Data Infrastructure and platforms:
  2. databases
  3. data warehouses on cloud – s3, aws Glue, Amazon Redshift etc.

  4. Ingest data from various sources:

  5. Use tools like AWS glue Jobs or aws Lambda functions to ingest data

    from databases, applications, files, streaming devices into a centralized data platforms.

  6. Prepare ingested data for analytics

  7. use AWS glue, Apache spark, Amazon EMR to prepare data for cleaning, transforming and enriching it.

  8. Catalog and document Curated datasets
    -use AWS Glue crawlers to determine format and schema, group data into tables. write metadata to aws Glue data Catalog. Use metadata tagging in Data catalog for data governance, compliance and discoverability.

  9. Automate regular data workflows and pipelines
    simplify and accelerate data processing using services like AWS Glue Workflows, AWS lambda or AWS step functions.

The data engineer builds the system that delivers usable data to the data analyst, who querys and analyzes the data to gain business insights/reports/visualizations.

Before a data engineer begins these questions must be answered:

  • Which data should be analyzed? What is its value to the business or organization?
  • Who owns the data? Where is it located?
  • Is the data usable in its current state? What transformations are required?
  • Who needs to see the data?
  • After the data is curated and ready for consumption, how should it be presented?


This content originally appeared on DEV Community and was authored by soul-o mutwiri