Braze, a lifecycle engagement platform, collects data from customers’ applications from web, email, iOS devices, Android devices, and Smart TVs. Braze ingests that information and more to build unique user profiles that help marketing brands message their customers in more relevant, engaging, and timely ways across different platforms, using historical, in-the-moment, and predictive data.
Salvatore Poliandro is currently the Director of DevOps and Security at Braze and has been at Braze for the last three years. Sal leads multiple teams, including Site Reliability Engineers, IT and IT Operations. His group is focused on IT infrastructure through which they process 5 million jobs per minute. They make sure that infrastructure and applications can scale as required with the current architecture. Poliandro’s team owns monitoring, Continuous Integration and Continuous Delivery (CI/CD), alerting, cost and everything above the application layer.
Braze's Challenge
Braze as a company began with a dedicated environment for infrastructure that included 30 employees, fewer than 12 servers and did fewer than 100 million API calls a day. Poliandro’s first challenge was scaling their infrastructure at a rapid pace while monitoring and governing their environment with complete transparency. The second concern was breaking their monolith application into microservices to be more efficient in resource usage and also drive more accountability.
Braze decided to move to Amazon Web Services (AWS) in order to gain agility and scale. Now, they have more than 240 employees and they process more than 300 million API calls every hour. Braze began with a primary monolith application that was developed 7 years ago in Ruby. Currently, as a monolith app, it runs on thousands of Amazon EC2 instances across multiple regions, backed on hundreds of Redis nodes, and uses thousands of Mongo database nodes.
Finding a Solution
In order to ensure complete visibility while transforming and scaling rapidly, Poliandro and his team decided to adopt a new approach. Braze didn’t just want to choose vendors, the company was looking to forge lasting business partnerships. After going through rounds of evaluations Braze selected CloudHealth Technologies and Datadog as the key partners in the company’s transformation journey, based on the strength of their technology. Poliandro highlighted that the internal “build vs. buy” debate was initially weighed during the evaluation of CloudHealth Technologies and Datadog. However, considering the high Total Cost of Ownership (TCO) and the additional time for ongoing maintenance the team preferred to buy proven solutions.
Braze selected CloudHealth Technologies based on the platform’s proven leadership in Reserved Instance (RI) management, along with resource utilization and cost optimization. Poliandro was impressed that CloudHealth was proactively identifying and informing him about opportunities to save costs in AWS. He found it advantageous that since CloudHealth’s contract is based on Braze’s AWS spend.
“It was an easy decision [to select CloudHealth] when we moved to AWS. We were in a pre-series B phase of funding and burn rate was very important. I needed to optimize my cloud spend while saving time, even though those two things do not go in the same bucket together. That drove us to a decision to choose an external vendor and we chose CloudHealth over others for their proven solution,” adds Poliandro.
Braze chose Datadog over their incumbent and other vendors for Application Performance Monitoring (APM) support along with compliance and infrastructure monitoring. Poliandro remembers that “Integration with infrastructure and other cloud services was important for us. The incumbent could not do that. From the technical and financial perspective, Datadog made more sense to us. We wanted to make it easy to pull our data into a single platform. We tried to build that in-house—it was a huge undertaking and maintenance cost was just too high.”
Metrics like error rates, the amount of data ingested, and any delays with ingesting data or processing data for customers are critical for Braze. “Datadog is already doing this really well for Braze. It is our preferred platform for different services, versus just an application service that our incumbent could support,” mentions Poliandro.
Braze's Results
Braze now has the visibility, agility, and transparency they need for their application development. Leveraging CloudHealth and Datadog together, Braze has gained the complete visibility needed to make well-informed decisions.Poliandro mentions that “We are a very transparent company and a lot of infrastructure was a black box for us. We could not show the granular spend earlier."