Implementing Event Tracking Infra with CDK

Engineers at Wealthfront emphasize iterative development and maintainability of code to build robust applications. The underpinnings of any application is its infrastructure. To build robust and scalable apps, it is a requirement to apply the same engineering principles of iteration and maintainability to create quality, underlying infrastructure. In the case of our event-tracking system, which ingests millions of user events to provide product insights, we accomplished this by using AWS CDK (Cloud Development Kit), an infrastructure as code framework, to redesign and re-provision each functional component of the entire infrastructure.

Previously, our event tracking infrastructure was manually configured within the AWS console. This posed multiple risks to the stability of this infrastructure, mainly:

Changes could be made to the infrastructure manually with minimal oversight and approval.
There was no system to version or roll back changes.
Manual toil was required to provision infrastructure across multiple environments (testing, sandbox, production).
There was no clear grouping of resources involved in the event tracking system.
There was a lack of testing for resource definitions and relationships.

In this blog post, we’ll discuss how we have mitigated these risks using CDK and detail, in depth, how we have restructured the functional components of event tracking infrastructure.

Component Abstraction

Since we want to track, group, and provision all functional components of the event tracking system together, we declare all components within a single DataCollection stack, which represents a unit of deployment in AWS CDK. This stack, along with others for separate projects, is instantiated in a top-level App file.

DataCollectionStack dataCollectionStack = new DataCollectionStack(app, "DataCollectionStack", StackProps.builder()
        .env(Environment.builder()
            .account(System.getenv("CDK_ACCOUNT_ID"))
            .region("us-west-2")
            .build())
        .build());
    Tags.of(dataCollectionStack).add("wf:owner", "data");
Code language: Java (java)

Within the DataCollectionStack class, we use variables, functions, and classes to model different aspects of our system. Event tracking consists of a few high level components:

An API Gateway that captures events from devices.
A Kinesis Pipeline that receives events from the API Gateway and writes to S3.
A Lambda-based processing service that polls Kinesis and writes the events to a sink
A Cloudwatch telemetry layer to monitor the system as events enter and exit the system.

For each component, we create a reusable CDK construct, a high level abstraction consisting of multiple related AWS resources. These constructs are then referenced in our root stack, with relationships between constructs delineated using Props and Builder patterns.

High level overview of event tracking infrastructure implemented as cdk constructs.

With this paradigm, for each event channel (android, ios, and web), we need only one compact method to create nearly all required resources for end to end event tracking:


private void createMobileCollectionResource(RestApi api, String platformResourceName) {
  Resource mobileResource = api.getRoot().addResource(platformResourceName);
  KinesisPipeline pipeline = createKinesisPipeline(platformResourceName);
  kinesisPipelines.add(pipeline);
  AnalyticsEventProcessor processor = AnalyticsEventProcessor.Builder.create(this,
          String.format("Analytics%sEventProcessor", capitalize(platformResourceName)))
      .platformName(platformResourceName)
      .stream(pipeline.getStream())
      .lambdaExecutionRole(awsLambdaAnalyticsExecution)
      .routingLambda(routingLambda)
      .reprocessingLambda(reprocessingLambda)
      .build();
  lambdaDlqTopics.add(processor.getDlqTopic());
  Resource batch = mobileResource.addResource("batch");
  attachStreamResourceMethods(batch, platformResourceName, "PutRecords", false);
  Resource sourceConfig = mobileResource.addResource("sourceConfig");
  sourceConfig.addMethod("GET",
      createSourceConfigMockIntegration(false),
      createMethodOptions(false)
  );
}
Code language: Java (java)

Walking through the code snippet above, for each event channel, we create an endpoint on an API Gateway resource. Further endpoints are later defined on the resource to handle different types of requests and forward these requests to Kinesis.

After defining the endpoint, we create a KinesisPipeline, a construct that packages all Kinesis related resources for the channel. This includes creation of a Kinesis Data Stream and a Kinesis Firehose Delivery Stream as well as additional configuration between the two resources to direct records to S3.

Overview of the amazon resources involved in the KinesisPipeline, namely Kinesis Data Streams, Kinesis Data Firehose, S3, and Athena.

public KinesisPipeline(final Construct scope, final String id, final KinesisPipelineProps props) {
    super(scope, id);
 
    firehoseDeliveryRole = props.getFirehoseDeliveryRole();
    lambdaExecutionRole = props.getLambdaExecutionRole();
    streamKey = props.getStreamKey();
    catalogId = props.getCatalogId();
    glueDatabaseName = props.getGlueDatabaseName();
    glueTableName = props.getGlueTableName();
    streamId = props.getStreamId();
    streamName = props.getStreamName();
    transformFunction = props.getTransformFunction();
    deliveryStreamPrefix = props.getDeliveryStreamPrefix();
    deliveryStreamErrorPrefix = props.getDeliveryStreamErrorPrefix();
    deliveryStreamParquetPrefix = props.getDeliveryStreamParquetPrefix();
    deliveryStreamParquetErrorPrefix = props.getDeliveryStreamParquetErrorPrefix();
    platformResourceName = props.getPlatformResourceName();
    autoPartitionLambda = props.getAutoPartitionLambda();
 
    createStream();
    createDeliveryStream();
    createDeliveryStreamParquetGlue();
    createAutoPartitionLambda();
  }
Code language: Java (java)

After creating a KinesisPipeline, we create an AnalyticsEventProcessor to handle the heavy lifting of parsing, redacting, and routing event data to the target sink. We pass a reference of the Kinesis Data Stream to the AnalyticsEventProcessor so it can poll the Data Stream for new events. Within the processor, we create a primary event-routing function as well as supporting infrastructure to handle and reprocess routing failures. This includes a source queue which handles initial failed events, a reprocessing function that receives failures from the source queue, a dead letter queue (DLQ) to handle events that could not be reprocessed even after multiple attempts, and a DLQ topic to send alerts when the DLQ contains events.

Illustrates the AnalyticsEventProcessor cdk construct, which consists of components involved in sending events to Rudderstack, the analytics sink

public AnalyticsEventProcessor(final Construct scope, final String id, final AnalyticsEventProcessorProps props) {
  super(scope, id);
 
  platformName = props.getPlatformName();
  stream = props.getStream();
  lambdaExecutionRole = props.getLambdaExecutionRole();
  routingLambda = props.getRoutingLambda();
  reprocessingLambda = props.getReprocessingLambda();
 
  if (Objects.equals(System.getenv("CDK_ENVIRONMENT"), "production")) {
    environmentVars.put("TEST_WRITE_KEY_ALIAS", String.format("%s_dev", platformName));
    environmentVars.put("WRITE_KEY_ALIAS", String.format("%s_prod", platformName));
  } else {
    environmentVars.put("TEST_WRITE_KEY_ALIAS", String.format("%s_testing", platformName));
    environmentVars.put("WRITE_KEY_ALIAS", String.format("%s_testing", platformName));
  }
 
  dlq = createDlq();
  sourceQueue = createSourceQueue();
  dlqTopic = createDlqTopic();
  routingFunction = createRudderstackRoutingLambda();
  reprocessingFunction = createRudderstackReprocessingLambda();
}
Code language: Java (java)

Once the API Gateway, KinesisPipeline, and AnalyticsEventProcessor have been provisioned, we deploy an AnalyticsEventMonitor that takes these constructs and builds monitoring alarms on each component, ensuring that on-call engineers are paged if anything goes wrong.

Showcases the AnalyticsEventMonitor construct, which consists of Cloudwatch Alarms connected to a monitoring lambda responsible for triggering Pagerduty

public AnalyticsEventMonitor(final Construct scope, final String id, final AnalyticsEventMonitorProps props) {
  super(scope, id);
 
  lambdaExecutionRole = props.getLambdaExecutionRole();
  dlqTopics = props.getDlqTopics();
  infraLambda = props.getInfraLambda();
  getProperties();
 
  monitoringAlarms.addAll(props.getKinesisPipelines().stream()
      .map(pipeline ->
          KinesisAlarmSystem.Builder.create(this, String.format("%sAlarmSystem", pipeline.getNode().getId()))
              .kinesisPipeline(pipeline)
              .build()
              .getMonitoringAlarms())
      .flatMap(Collection::stream)
      .collect(Collectors.toList()));
 
  monitoringAlarms.addAll(
      ApiGatewayAlarmSystem.Builder.create(this, String.format("%sAlarmSystem", props.getApi().getNode().getId()))
          .api(props.getApi())
          .build()
          .getMonitoringAlarms()
  );
 
  monitoringTopic = createCloudwatchAlarmTopic();
  monitoringLambda = createAnalyticsInfraMonitoringLambda();
  createMonitoringSubscriptions();
  createAlarmEventBridge();
}
Code language: Java (java)

The usage of these constructs to model the major components of our application allow us to cleanly define and group relevant infrastructure. Since our code is written in Java, we are able to compose larger and more complex constructs from other constructs, allowing for greater abstraction and readability within the DataCollectionStack. Additionally, separate projects that require similar components (a Kinesis Service, a Failure Prone Processing Service, Alarm Systems, etc) can reuse the constructs made for event tracking to speed up development time in the future.

Unit Testing

While we were able to encapsulate components of event-tracking within constructs, we still needed to test whether our constructs were provisioning AWS resources correctly. CDK provides the ability to synthesize a Cloudformation template for our DataCollectionStack and run tests on this template to verify counts and configuration of resources. For the higher level stack and each of the constructs involved in event tracking, we created a test file to make sure that native AWS resources are provisioned in the right number and with the right properties.

@Test
public void testResourceCountCorrect() {
  Template template = getStackTemplate();
  template.resourceCountIs("AWS::Kinesis::Stream", 1);
  template.resourceCountIs("AWS::KinesisFirehose::DeliveryStream", 2);
}
 
@Test
public void testStreamsConfigured() {
  Template template = getStackTemplate();
  template.hasResourceProperties("AWS::Kinesis::Stream", Map.of(
      "Name", "analytics-android-kinesis-data-stream",
      "RetentionPeriodHours", 168,
      "StreamEncryption", Map.of(
          "EncryptionType", "KMS",
          "KeyId", Collections.singletonMap("Fn::GetAtt", new String[]{
              "testkey2D7A4880",
              "Arn"
          })),
      "StreamModeDetails", Collections.singletonMap("StreamMode", "ON_DEMAND")
  ));
}
Code language: Java (java)

Deployment

Since we implemented the event-tracking system in code, we could apply good engineering processes to iteratively develop and ship changes. We created a repository to version all applications provisioned through CDK. To ensure robustness, we used Jenkins to run compilation and testing using maven and then deploy resources using the CDK command line interface. Additionally, since provisioning the event tracking infrastructure in multiple accounts became much easier with CDK, we deployed new changes to a staging environment and ran end-to-end tests before finally deploying to production. To avoid dangerous, manual changes in the AWS console and drift between provisioned resources and CDK code, we set up write permissions so that only Cloudformation could provision resources and only Jenkins could deploy to Cloudformation.

By migrating our event-tracking system to CDK, we were able to establish a level of robustness, structure, and transparency in our infrastructure which previously wasn’t achievable through manual configuration. We have many projects ahead that involve a complex mesh of AWS resources and are excited to continue to use this technology to improve delivery and maintainability.

Disclosures

The information contained in this communication is provided for general informational purposes only, and should not be construed as investment or tax advice. Nothing in this communication should be construed as a solicitation or offer, or recommendation, to buy or sell any security. Any links provided to other server sites are offered as a matter of convenience and are not intended to imply that Wealthfront Advisers or its affiliates endorses, sponsors, promotes and/or is affiliated with the owners of or participants in those sites, or endorses any information contained on those sites, unless expressly stated otherwise.

All investing involves risk, including the possible loss of money you invest, and past performance does not guarantee future performance. Please see our Full Disclosure for important details.

Wealthfront offers a free software-based financial advice engine that delivers automated financial planning tools to help users achieve better outcomes. Investment management and advisory services are provided by Wealthfront Advisers LLC, an SEC registered investment adviser, and brokerage related products are provided by Wealthfront Brokerage LLC, a member of FINRA/SIPC.

Wealthfront, Wealthfront Advisers and Wealthfront Brokerage are wholly owned subsidiaries of Wealthfront Corporation.

Engineering Blog – Wealthfront

Component Abstraction

Unit Testing

Deployment

Interested in what you see here?