$ Millions Saved
CloudHealth's RI recommendations have saved Braze millions of dollars
Braze has used cost allocation reports to create a culture of financial accountability
Braze correlates data from CloudHealth and Datadog for granular cloud spend and usage insights
Braze, a lifecycle engagement platform, collects data from customers’ applications from web, email, iOS devices, Android devices, and Smart TVs. Braze ingests that information and more to build unique user profiles that help marketing brands message their customers in more relevant, engaging, and timely ways across different platforms, using historical, in-the-moment, and predictive data.
Salvatore Poliandro is currently the Director of DevOps and Security at Braze and has been at Braze for the last three years. Sal leads multiple teams, including Site Reliability Engineers, IT and IT Operations. His group is focused on IT infrastructure through which they process 5 million jobs per minute. They make sure that infrastructure and applications can scale as required with the current architecture. Poliandro’s team owns monitoring, Continuous Integration and Continuous Delivery (CI/CD), alerting, cost and everything above the application layer.
Braze as a company began with a dedicated environment for infrastructure that included 30 employees, fewer than 12 servers and did fewer than 100 million API calls a day. Poliandro’s first challenge was scaling their infrastructure at a rapid pace while monitoring and governing their environment with complete transparency. The second concern was breaking their monolith application into microservices to be more efficient in resource usage and also drive more accountability.
Braze decided to move to Amazon Web Services (AWS) in order to gain agility and scale. Now, they have more than 240 employees and they process more than 300 million API calls every hour. Braze began with a primary monolith application that was developed 7 years ago in Ruby. Currently, as a monolith app, it runs on thousands of Amazon EC2 instances across multiple regions, backed on hundreds of Redis nodes, and uses thousands of Mongo database nodes.
Finding a Solution
In order to ensure complete visibility while transforming and scaling rapidly, Poliandro and his team decided to adopt a new approach. Braze didn’t just want to choose vendors, the company was looking to forge lasting business partnerships. After going through rounds of evaluations Braze selected CloudHealth Technologies and Datadog as the key partners in the company’s transformation journey, based on the strength of their technology. Poliandro highlighted that the internal “build vs. buy” debate was initially weighed during the evaluation of CloudHealth Technologies and Datadog. However, considering the high Total Cost of Ownership (TCO) and the additional time for ongoing maintenance the team preferred to buy proven solutions.
Braze selected CloudHealth Technologies based on the platform’s proven leadership in Reserved Instance (RI) management, along with resource utilization and cost optimization. Poliandro was impressed that CloudHealth was proactively identifying and informing him about opportunities to save costs in AWS. He found it advantageous that since CloudHealth’s contract is based on Braze’s AWS spend.
“It was an easy decision [to select CloudHealth] when we moved to AWS. We were in a pre-series B phase of funding and burn rate was very important. I needed to optimize my cloud spend while saving time, even though those two things do not go in the same bucket together. That drove us to a decision to choose an external vendor and we chose CloudHealth over others for their proven solution,” adds Poliandro.
Braze chose Datadog over their incumbent and other vendors for Application Performance Monitoring (APM) support along with compliance and infrastructure monitoring. Poliandro remembers that “Integration with infrastructure and other cloud services was important for us. The incumbent could not do that. From the technical and financial perspective, Datadog made more sense to us. We wanted to make it easy to pull our data into a single platform. We tried to build that in-house—it was a huge undertaking and maintenance cost was just too high.”
Metrics like error rates, the amount of data ingested, and any delays with ingesting data or processing data for customers are critical for Braze. “Datadog is already doing this really well for Braze. It is our preferred platform for different services, versus just an application service that our incumbent could support,” mentions Poliandro.
Braze now has the visibility, agility, and transparency they need for their application development. Leveraging CloudHealth and Datadog together, Braze has gained the complete visibility needed to make well-informed decisions.Poliandro mentions that “We are a very transparent company and a lot of infrastructure was a black box for us. We could not show the granular spend earlier."
Being able to provide a self-service mechanism where our engineers can poke around their spend, cost savings, and key metrics (even from a security standpoint) has really been a big benefit across various teams. We get fewer questions from finance on them as a result.
CloudHealth has saved Braze millions of dollars through RI recommendations. Additionally, through cost allocation to different departments based on CloudHealth reports and dashboards, Braze has added more accountability. Poliandro proudly mentions that “CloudHealth reporting allows me to keep the teams in check as we grow. When we are doing POCs or starting a new project I can easily check team spend every day and send an email to the users to make them aware. That is a huge benefit in terms of creating business awareness on what resources cost.”
As they have adopted Datadog, Braze has already realized benefits like reduction in TCO.Poliandro mentions that he has gotten over a day-and-a-half a week back for one of his most senior employees—which is enormous since he can now focus on more strategic, cost-saving, scalability, and performance initiatives. He adds that “We have gotten the most from APM to the point where our projected APM spend has decreased substantially over time. Datadog is running on 80% of our infrastructure.” With Datadog-led enablement sessions, each team at Braze has a clear understanding of how to leverage the platform for their own use case.
In addition to the value each platform has brought to Braze independently, together CloudHealth and Datadog help answer Braze executives’ chief concern: what is the company’s Total Cost of Goods Sold? Braze correlates data from CloudHealth and Datadog to determine their overall cloud usage and spend, and to break down that usage by individual customers to better understand where resources are allocated. From there, executives can determine the profit margin produced by individual customers and campaigns.
Poliandro’s advice for cloud beginners
Having been through his own hybrid cloud journey, Poliandro begins by saying others should “monitor everything in your infrastructure.” He also suggests that you should display your ‘north star’ metrics somewhere. It will help instill a culture of being able to see what is going on and help make better decisions. Examples of ‘north star’ metrics for Poliandro’s team are average processing time, percentage of uptime, and processing time of data points. Whereas Braze’s product managers want to look at the depth of overall product usage, Braze’s operations team is most concerned with tracking the total number of campaigns and the speed in which they are sent through the platform on behalf of their customers.
DOWNLOAD PDF View All Case Studies
New York, New York
Resource Usage Reports
We Think You'll Like These
Discovery Takes a Smart Approach to Cloud as it Pursues Global Growth
Learn how Discovery reduced their costs by 40% with more effective management and...READ MORE
Arctic Wolf Trusts CloudHealth to Manage Complex AWS Workloads
Learn how Arctic Wolf was able to regain control of their complex AWS environment, all...READ MORE
Segment Uses Containers and Optimizes Operations to Drive Efficiency
Learn how Segment was able to scale their business efficiently while maintaining high...READ MORE