Identifying high-risk APIs across thousands of code repositories

admin

In this Help Net Security interview, Joni Klippert, CEO of StackHawk, discusses why API visibility is a major blind spot for security teams, how legacy tools fall short, and how StackHawk identifies risky APIs and sensitive data directly from code before anything is deployed.

APIs Sensitive Data Detection

API visibility is often cited as a major blind spot for security teams. Why do you think so many organizations still struggle to identify their full API attack surface, and how does this gap arise?

Modern applications — especially those built with or by AI — are increasingly just front ends on top of collections of APIs. These APIs are what actually talk to your databases, handle sensitive data, and expose business logic. So when we talk about the attack surface, the API is the surface.

The challenge is that most security teams are still relying on legacy approaches like API gateway monitoring for visibility. The problem? Gateways can only see what’s been deployed and routed through them and is actively receiving traffic. That means internal APIs, integration endpoints, or anything not properly registered is invisible — including AI-generated code that often bypasses standard processes. You’re effectively waiting until something gets hit by real-world traffic — or worse, exploited — before you even know it exists.

On top of that, API sprawl is accelerating fast. With APIs now making up more than 80% of all internet traffic, new endpoints are being deployed at a pace that traditional AppSec tools simply can’t track.

StackHawk’s approach is different: we identify APIs directly from the codebase — before they’re deployed. That gives teams a complete view of the API attack surface early in the SDLC, and allows them to prioritize testing based on factors like sensitive data exposure and authentication requirements, instead of just reacting after something hits production.

StackHawk says it identifies APIs handling sensitive data directly from source code. What kinds of sensitive data are being flagged, and how does that inform testing priorities?

Sensitive Data Detection helps security and development teams focus on the APIs that matter most by flagging where regulated or high-risk data types are being handled — directly from the source code.

For example, in a fintech or banking app, our system might detect that an API endpoint like /transactions accepts a JSON body with fields such as “card_number”, “cvv”, or “expiry_date”. These key-value pairs indicate PCI-regulated data. StackHawk flags these instances and prioritizes those endpoints for security testing — because if they’re vulnerable, attackers could gain direct access to financial data.

The ability to automatically surface these types of data from code — before anything is even deployed — means security teams aren’t flying blind. It helps them focus efforts where the impact of a vulnerability would be highest, and ensures that the most sensitive APIs get the deepest scrutiny before they ever reach production.

Can you walk us through how the Sensitive Data Identification feature actually works under the hood? What signals or metadata is it analyzing?

Under the hood, StackHawk analyzes source code to detect patterns that indicate sensitive data. We’re looking at key-value pairs and identifiers commonly associated with regulated or high-risk information, such as “credit_card”, “ssn”, “dob”, and more.

We use a weighted scoring model to evaluate both the criticality of the data type and how frequently it appears. Based on that, we determine whether the API should be flagged for handling sensitive data. This helps teams prioritize which APIs need the most urgent security testing.

Customers can review the exact matched terms within the platform, giving them transparency into why something was flagged. And soon, they’ll be able to customize the detection by adding their own sensitive data keywords to tailor the system to their domain-specific needs — whether that’s healthcare, financial services, or beyond.

StackHawk emphasizes starting “where the code lives.” How important is this shift-left approach for maintaining API security in CI/CD environments? How does this capability integrate with developers’ existing workflows? Does it require separate tooling or can it tie into CI pipelines and version control platforms seamlessly?

The shift-left approach is critical for API security today, especially as modern applications — including those built with AI — are shipping APIs faster than security teams can manually track. StackHawk starts “where the code lives” by integrating directly with source code repositories like GitHub, GitLab, Bitbucket, and Azure Repos. This lets us analyze code pre-deploy to automatically detect where APIs exist and whether they handle sensitive data such as PCI, PII, or HIPAA-regulated fields.

Because this analysis happens continuously, teams gain real-time visibility into newly added APIs before they ever reach production — a huge step forward from traditional approaches that rely on runtime traffic or manual inventory. Security teams can stay ahead of changes, and developers don’t have to do anything differently to surface this insight.

When it comes to testing, StackHawk integrates directly into CI/CD pipelines — so once an API is flagged as high-risk due to sensitive data or authentication patterns, it can be tested immediately as part of a pull request or build. That means security checks happen automatically, in the same workflows developers already use, without slowing them down or requiring yet another tool to manage.

It’s about making security proactive and continuous, instead of reactive and late.

Do you see Sensitive Data Identification evolving into an automated enforcement tool, such as blocking deployment of APIs handling certain data types unless verified as secure?

While the underlying technology makes enforcement possible, our philosophy at StackHawk is to empower collaboration between security and engineering — not create friction. Automatically blocking deploys based on data type alone could unintentionally disrupt critical workflows, especially in fast-moving development environments.

Instead, we see Sensitive Data Detection as a way to give security teams early, actionable visibility so they can engage with engineering before code hits production. For example, if a new API handling PCI or PII is introduced, security can prioritize getting that endpoint tested — ideally as part of the same PR or CI workflow — and work with the team to ensure the right coverage.

Over time, organizations may choose to adopt enforcement based on their own risk appetite and maturity. But we believe the first, most important step is visibility — and using that insight to build shared accountability around securing what matters most.

CONTACT INFO

Cyber Security Salary for Beginners and Experienced

Ex-NSA Chief Paul Nakasone Has a Warning for

DARPA’s AI Cyber Challenge reveals winning models for

API visibility is often cited as a major blind spot for security teams. Why do you think so many organizations still struggle to identify their full API attack surface, and how does this gap arise?

StackHawk says it identifies APIs handling sensitive data directly from source code. What kinds of sensitive data are being flagged, and how does that inform testing priorities?

Can you walk us through how the Sensitive Data Identification feature actually works under the hood? What signals or metadata is it analyzing?

Do you see Sensitive Data Identification evolving into an automated enforcement tool, such as blocking deployment of APIs handling certain data types unless verified as secure?

About Company

Quick Link

Explore

Contact Us

Phone No:

Email Address:

New York, United States

CONTACT INFO

Cyber Security Salary for Beginners and Experienced

Ex-NSA Chief Paul Nakasone Has a Warning for

DARPA’s AI Cyber Challenge reveals winning models for

Identifying high-risk APIs across thousands of code repositories

API visibility is often cited as a major blind spot for security teams. Why do you think so many organizations still struggle to identify their full API attack surface, and how does this gap arise?

StackHawk says it identifies APIs handling sensitive data directly from source code. What kinds of sensitive data are being flagged, and how does that inform testing priorities?

Can you walk us through how the Sensitive Data Identification feature actually works under the hood? What signals or metadata is it analyzing?

Do you see Sensitive Data Identification evolving into an automated enforcement tool, such as blocking deployment of APIs handling certain data types unless verified as secure?

Subscribe Newsletter

About Company

Quick Link

Explore

Contact Us

Phone No:

Email Address:

New York, United States