Title
Create new category
Edit page index title
Edit category
Edit link
Terms and Concepts
This page defines the common terminology and concepts used by HoundDog.ai. Familiarizing yourself with these terms will help you better understand the features and capabilities of our code scanner.
Data Elements
Sensitive Data Elements (or simply Data Elements) refer to any information within a codebase that is considered confidential, private, or critical. They often require special handling - such as masking and encryption - to prevent unauthorized exposure.
Categories
Below are some of the most common categories of sensitive data elements:
| Category | Description |
|---|---|
| PII (Personally Identifiable Information) | Data that can be used to identify an individual, such as full names, physical addresses, email addresses, dates of birth, and Social Security Numbers. |
| PIFI (Personally Identifiable Financial Information) | A subset of PII focused on financial data, including credit card numbers, bank account details, and payment history. |
| PHI (Protected Health Information) | Medical records, insurance account information, or any data related to an individual's health status, as defined by regulations like HIPAA. |
| CHD (Cardholder Data) | Payment card information defined under PCI DSS. It includes the Primary Account Number (PAN) and may also include the cardholder name, expiration date, and service code. |
To ensure privacy and security, it’s essential to identify, classify, and safeguard sensitive data elements such as PII within your codebase.
Sensitivity Levels
Sensitive data elements are categorized into three sensitivity levels: Critical, Medium, and Low. Below are examples for each level.
| Sensitivity Level | Examples |
|---|---|
| Critical |
|
| Medium |
|
| Low |
|
Data Element Definitions
Sensitive Data Element Definitions (or simply Data Elements) are predefined sets of conditions and match patterns used to detect identifiers in code such as class names, function names, and variable names that strongly suggest they handle sensitive data (e.g., User.lastName, get_ssn). HoundDog.ai continuously evolves and curates this collection through advanced workflows and real-world testing, and users can also define custom data elements to detect organization-specific sensitive data types.
A complete list of supported data elements is available in the scanner’s GitHub repository: https://github.com/hounddogai/hounddog/blob/main/data-elements.md
Dataflows
Dataflows represent how your application code defines the movement of sensitive information from custom built applications to downstream data sinks such as logs, files, APIs, third party services, and AI integrations. This includes data types like PII, PHI, PIFI, CHD, and authentication tokens. By analyzing code paths and data handling logic, HoundDog.ai identifies how data is collected, transformed, and ultimately exposed to these destinations.
The Dataflows page provides a consolidated view of sensitive data flows detected by the HoundDog.ai code scanner. It shows how data moves from your applications to different data sinks, creating code level evidence of processing activities. This helps keep privacy reports such as Records of Processing Activities up to date for GDPR, CCPA, and other regulatory frameworks.
Risky data flows are rarely intentional. They tend to emerge as codebases grow. A developer may log a full user object for debugging, or a tainted variable may carry sensitive data across multiple transformations. By the time the issue is identified, the data may already be stored or shared with a third party.
Data Sinks
Data sinks are endpoints or destinations where data leaves its original context - such as logs, cookies, JWTs, external APIs, databases, third-party services, or user interfaces. Common examples include OpenAI, Anthropic, LangChain, Datadog, AWS S3, PostgreSQL, and Sentry. These are often the final stop for data before it's stored, transmitted, or displayed, making them critical points for enforcing security and privacy controls.
A complete list of supported data sinks, organized by programming language, is available in the scanner’s GitHub repository: https://github.com/hounddogai/hounddog/blob/main/data-sinks.md
While some data sinks offer PII scrubbing capabilities, they typically rely on sampling and broad pattern matching that lack the specificity or flexibility needed for different use cases. Scrubbing at the sink is also reactive and often costly, making it an inefficient last line of defense. HoundDog.ai takes a proactive approach by detecting risky dataflows early in the development cycle, long before any changes reach production.
Severity Levels
Each data flow is assigned a severity level of Critical, Medium, Low, or Info. Severity is determined based on the sensitivity of the data elements involved, which can be customized in the Data Elements page, and the destination of the data, including any allowlist rules defined in the Data Sinks page.
Privacy Risks
The following are key examples of privacy violations that HoundDog.ai detects and prevents. Each issue highlights how sensitive data can be mishandled across different mediums and why it results in violations of global privacy and security frameworks.
| Issue | Violated Frameworks | Reason for Violation |
|---|---|---|
| Sensitive data in logs (plaintext) | GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPL | The exposure of sensitive data in logs is especially risky because logs are often ingested by monitoring and SIEM tools (e.g., Datadog, Splunk). When sensitive data is detected, it is considered a disruptive incident, often requiring up to 100 hours of remediation. This typically involves scrubbing logs, stopping ingestion, and working retroactively with engineering teams to fix the code that caused the leak. |
| Sensitive data in files (plaintext) | GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPL | While files may have strong access controls limiting who can access them, storing sensitive data in plaintext significantly increases the risk of data exfiltration. This may occur through lateral movement (for example, a hacker already in the network with elevated permissions) or through insider data theft. |
| Sensitive data in cookies (plaintext) | GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPL | Identifiers exposed without encryption or valid consent are clear privacy violations. Understanding what sensitive data is exposed in cookies is a critical part of privacy compliance. |
| Sensitive data in local storage (plaintext) | GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPL | Sensitive data stored in local storage may be accessible by client-side scripts and is often unencrypted, creating clear privacy violations. |
| Sensitive data in JWT tokens (plaintext) | GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPL | Tokens grant access rights, and storing them in plaintext poses a high security and privacy risk. |
| Sensitive data in third party integrations (beyond DPA) | GDPR (Art. 5, 28), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPL | Sharing sensitive data beyond the scope of a Data Processing Agreement or Privacy Notice breaches both consent and purpose limitation rules. This is one of the most overlooked types of violations, as applications often include hidden abstractions and SDKs that expose data to third party and AI integrations. |
CWEs
HoundDog.ai's mission is to shift-left and empower organizations to prevent and eliminate vulnerabilities at the source code level, with primary focus on PII (Personally Identifiable Information). Here are some of the most common CWEs (common weakness numerations) our scanner covers extensively:
| CWE | Description |
|---|---|
| CWE-201 | Information Exposure Through Sent Data |
| CWE-209 | Information Exposure Through an Error Message |
| CWE-312 | Cleartext Storage of Sensitive Information |
| CWE-313 | Cleartext Storage in a File or on Disk |
| CWE-315 | Cleartext Storage of Sensitive Information in a Cookie |
| CWE-532 | Insertion of Sensitive Information into Log File |
| CWE-539 | Use of Persistent Cookies Containing Sensitive Information |
###
© 2025 HoundDog.ai, Inc. All rights reserved.