Warden: Real Time Anomaly Detection at Pinterest
<p>Detecting anomalous events has been becoming increasingly important in recent years at Pinterest. Anomalous events, broadly defined, are rare occurrences that deviate from normal or expected behavior. Because these types of events can be found almost anywhere, opportunities and applications for anomaly detection are vast. At Pinterest, we have explored leveraging anomaly detection, specifically our Warden Anomaly Detection Platform, for several use cases (which we’ll get into in this post). With the positive results we are seeing, we are planning to continue to expand our anomaly detection work and use cases.</p>
<p>In this blog post, we will walk through:</p>
<ol>
<li><strong>The Warden Anomaly Detection Platform. </strong>We’ll detail<strong> </strong>the general architecture and design philosophy of the platform.</li>
<li><strong>Use Case #1: ML Model Drift.</strong> Recently, we have been adding functionality to review ML scores to our Warden anomaly detection platform. This enables us to analyze any drift in the models.</li>
<li><strong>Use Case #2: Spam Detection.</strong> Detection and removal of spam and users who create spam is a priority in keeping our systems safe and providing a great experience for our users.</li>
</ol>
<h1>What is Warden?</h1>
<p>Warden is the anomaly detection platform created at Pinterest. The key design principle for Warden is modularity — building the platform in a modular way so that we can easily make changes.</p>
<p>Why? Early on in our research, it became quickly clear that there were many approaches to detecting anomalies, dependent on the type of data or how anomalies may be defined for the data. Different approaches and algorithms would be needed to accommodate those differences. With this in mind, we worked on creating three different modules, modules that we are still using today:</p>
<ul>
<li>Query input data: retrieves data to be analyzed from data source.</li>
<li>Applying anomaly algorithm: analyzes the data and identifies any outliers</li>
<li>Notification: returning results or alerts for consuming systems to trigger next steps</li>
</ul>
<p>This modular approach has enabled us to easily adjust for new data types and plug in new algorithms when needed. In the sections below we will review two of our main use cases: ML Model Drift and Spam Detection.</p>
<p><a href="https://medium.com/pinterest-engineering/warden-real-time-anomaly-detection-at-pinterest-210c122f6afa">Click Here</a></p>