Terms and Concepts
This page defines the common terminology and concepts used by HoundDog.ai. Familiarizing yourself with these terms will help you better understand the features and capabilities of our code scanner.
Data Elements
Sensitive Data Elements (or simply Data Elements) refer to any information within a codebase that is considered confidential, private, or critical. They often require special handling - such as masking and encryption - to prevent unauthorized exposure.
Categories
Below are some of the most common categories of sensitive data elements:
Category | Description |
---|---|
PII (Personally Identifiable Information) | Data that can be used to identify an individual, such as full names, physical addresses, email addresses, dates of birth, and Social Security Numbers. |
PIFI (Personally Identifiable Financial Information) | A subset of PII focused on financial data, including credit card numbers, bank account details, and payment history. |
PHI (Protected Health Information) | Medical records, insurance account information, or any data related to an individual's health status, as defined by regulations like HIPAA. |
To ensure privacy and security, it’s essential to identify, classify, and safeguard sensitive data elements such as PII within your codebase.
Sensitivity Levels
Sensitive data elements are categorized into three sensitivity levels: Critical, Medium, and Low. Below are examples for each level.
Sensitivity Level | Examples |
---|---|
Critical |
|
Medium |
|
Low |
|
Data Element Definitions
Sensitive Data Element Definitions (or simply Data Element Definitions) are predefined sets of conditions and match patterns used to detect identifiers in code such as class names, function names, and variable names that strongly suggest they handle sensitive data (e.g., User.lastName
, get_ssn
). HoundDog.ai continuously evolves and curates this collection through advanced workflows and real-world testing.
Vulnerabilities
Vulnerabilities are flaws or weaknesses in software that can be exploited by attackers to gain unauthorized access to sensitive data elements, such as PII. They often stem from design flaws, coding errors, or configuration mistakes.
HoundDog.ai also classifies unauthorized or unintended dataflows as vulnerabilities, as they can pose significant privacy and security risks.
Dataflows
Dataflows refer to the movement of sensitive data elements such as PII, PIFI, or PHI through various parts of a codebase, particularly when passed into data sinks. Understanding and monitoring potentially vulnerable dataflows is crucial for identifying where sensitive information might be exposed, mishandled, or insufficiently protected.
Data Sinks
Data sinks are endpoints or destinations where data leaves its original context - such as logs, cookies, JWTs, external APIs, databases, third-party services, or user interfaces. Common examples include Datadog, AWS S3, PostgreSQL, Sentry, Firebase, and Snowflake. These are often the final stop for data before it's stored, transmitted, or displayed, making them critical points for enforcing security and privacy controls.
While some data sinks offer PII scrubbing capabilities, they typically rely on sampling and broad pattern matching that lack the specificity or flexibility needed for different use cases. Scrubbing at the sink is also reactive and often costly, making it an inefficient last line of defense. HoundDog.ai takes a proactive approach by detecting risky dataflows early in the development cycle, long before any changes reach production.
CWEs
HoundDog.ai's mission is to shift-left and empower organizations to prevent and eliminate vulnerabilities at the source code level, with primary focus on PII (Personally Identifiable Information). Here are some of the most common CWEs (common weakness numerations) our scanner covers extensively:
CWE | Description |
---|---|
CWE-201 | Information Exposure Through Sent Data |
CWE-209 | Information Exposure Through an Error Message |
CWE-312 | Cleartext Storage of Sensitive Information |
CWE-313 | Cleartext Storage in a File or on Disk |
CWE-315 | Cleartext Storage of Sensitive Information in a Cookie |
CWE-532 | Insertion of Sensitive Information into Log File |
CWE-539 | Use of Persistent Cookies Containing Sensitive Information |
Severity Levels
HoundDog.ai associates each vulnerability (i.e. vulnerable dataflow) with one or more data elements and classifies them into three severity levels: Critical, Medium, and Low. The severity level of a vulnerability is determined by the highest sensitivity level of its data elements associated with the vulnerability.