Terms and Concepts
This page defines the common terminology and concepts used by HoundDog.ai. Familiarizing yourself with these terms will help you better understand the features and capabilities of our code scanner.
Data Elements
Sensitive Data Elements (or simply Data Elements) refer to any information within a codebase that is considered confidential, private, or critical. They often require special handling - such as masking and encryption - to prevent unauthorized exposure.
Categories
Below are some of the most common categories of sensitive data elements:
Category | Description |
---|---|
PII (Personally Identifiable Information) | Data that can be used to identify an individual, such as full names, physical addresses, email addresses, dates of birth, and Social Security Numbers. |
PIFI (Personally Identifiable Financial Information) | A subset of PII focused on financial data, including credit card numbers, bank account details, and payment history. |
PHI (Protected Health Information) | Medical records, insurance account information, or any data related to an individual's health status, as defined by regulations like HIPAA. |
CHD (Cardholder Data) | Payment card information defined under PCI DSS. It includes the Primary Account Number (PAN) and may also include the cardholder name, expiration date, and service code. |
To ensure privacy and security, it’s essential to identify, classify, and safeguard sensitive data elements such as PII within your codebase.
Sensitivity Levels
Sensitive data elements are categorized into three sensitivity levels: Critical, Medium, and Low. Below are examples for each level.
Sensitivity Level | Examples |
---|---|
Critical |
|
Medium |
|
Low |
|
Data Element Definitions
Sensitive Data Element Definitions (or simply Data Element Definitions) are predefined sets of conditions and match patterns used to detect identifiers in code such as class names, function names, and variable names that strongly suggest they handle sensitive data (e.g., User.lastName
, get_ssn
). HoundDog.ai continuously evolves and curates this collection through advanced workflows and real-world testing, and users can also define custom data elements to detect organization-specific sensitive data types.
Vulnerabilities
Vulnerabilities, with our current focus on sensitive data leaks, are flaws or weaknesses in software that expose sensitive information in risky mediums. These exposures can lead to privacy violations and increase the likelihood of data breaches through lateral movement. Common causes include design flaws, coding errors, or configuration mistakes, such as leaking tainted variables into logs, files, local storage, third party or AI SDKs, or unintentionally dumping entire objects containing sensitive data. AI generated code can also introduce these mistakes.
At HoundDog.ai, unintended dataflows are classified as vulnerabilities because they pose significant privacy and security risks.
Dataflows
Dataflows refer to the movement of sensitive information, such as PII, PHI, PIFI, CHD, or authentication tokens, through different parts of a codebase, particularly when passed into data sinks. Understanding and monitoring potentially vulnerable data flows is crucial for identifying where sensitive information might be exposed or mishandled.
Data Sinks
Data sinks are endpoints or destinations where data leaves its original context - such as logs, cookies, JWTs, external APIs, databases, third-party services, or user interfaces. Common examples include OpenAI, Anthropic, LangChain, Datadog, AWS S3, PostgreSQL, and Sentry. These are often the final stop for data before it's stored, transmitted, or displayed, making them critical points for enforcing security and privacy controls.
While some data sinks offer PII scrubbing capabilities, they typically rely on sampling and broad pattern matching that lack the specificity or flexibility needed for different use cases. Scrubbing at the sink is also reactive and often costly, making it an inefficient last line of defense. HoundDog.ai takes a proactive approach by detecting risky dataflows early in the development cycle, long before any changes reach production.
Severity Levels
HoundDog.ai associates each vulnerability (i.e. vulnerable dataflow) with one or more data elements and classifies them into three severity levels: Critical, Medium, and Low. The severity level of a vulnerability is determined by the highest sensitivity level of its data elements associated with the vulnerability.
Privacy Risks
The following are key examples of privacy violations that HoundDog.ai detects and prevents. Each issue highlights how sensitive data can be mishandled across different mediums and why it results in violations of global privacy and security frameworks.
Issue | Violated Frameworks | Reason for Violation |
---|---|---|
Sensitive data in logs (plaintext) | GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPL | The exposure of sensitive data in logs is especially risky because logs are often ingested by monitoring and SIEM tools (e.g., Datadog, Splunk). When sensitive data is detected, it is considered a disruptive incident, often requiring up to 100 hours of remediation. This typically involves scrubbing logs, stopping ingestion, and working retroactively with engineering teams to fix the code that caused the leak. |
Sensitive data in files (plaintext) | GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPL | While files may have strong access controls limiting who can access them, storing sensitive data in plaintext significantly increases the risk of data exfiltration. This may occur through lateral movement (for example, a hacker already in the network with elevated permissions) or through insider data theft. |
Sensitive data in cookies (plaintext) | GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPL | Identifiers exposed without encryption or valid consent are clear privacy violations. Understanding what sensitive data is exposed in cookies is a critical part of privacy compliance. |
Sensitive data in local storage (plaintext) | GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPL | Sensitive data stored in local storage may be accessible by client-side scripts and is often unencrypted, creating clear privacy violations. |
Sensitive data in JWT tokens (plaintext) | GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPL | Tokens grant access rights, and storing them in plaintext poses a high security and privacy risk. |
Sensitive data in third party integrations (beyond DPA) | GDPR (Art. 5, 28), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPL | Sharing sensitive data beyond the scope of a Data Processing Agreement or Privacy Notice breaches both consent and purpose limitation rules. This is one of the most overlooked types of violations, as applications often include hidden abstractions and SDKs that expose data to third party and AI integrations. |
CWEs
HoundDog.ai's mission is to shift-left and empower organizations to prevent and eliminate vulnerabilities at the source code level, with primary focus on PII (Personally Identifiable Information). Here are some of the most common CWEs (common weakness numerations) our scanner covers extensively:
CWE | Description |
---|---|
CWE-201 | Information Exposure Through Sent Data |
CWE-209 | Information Exposure Through an Error Message |
CWE-312 | Cleartext Storage of Sensitive Information |
CWE-313 | Cleartext Storage in a File or on Disk |
CWE-315 | Cleartext Storage of Sensitive Information in a Cookie |
CWE-532 | Insertion of Sensitive Information into Log File |
CWE-539 | Use of Persistent Cookies Containing Sensitive Information |
###