Terms and Concepts

AI Tools

This page defines the common terminology and concepts used by HoundDog.ai. Familiarizing yourself with these terms will help you better understand the features and capabilities of our code scanner.

Data Elements

Sensitive Data Elements (or simply Data Elements) refer to any information within a codebase that is considered confidential, private, or critical. They often require special handling - such as masking and encryption - to prevent unauthorized exposure.

Categories

Below are some of the most common categories of sensitive data elements:

CategoryDescription
PII (Personally Identifiable Information)Data that can be used to identify an individual, such as full names, physical addresses, email addresses, dates of birth, and Social Security Numbers.
PIFI (Personally Identifiable Financial Information)A subset of PII focused on financial data, including credit card numbers, bank account details, and payment history.
PHI (Protected Health Information)Medical records, insurance account information, or any data related to an individual's health status, as defined by regulations like HIPAA.
CHD (Cardholder Data)Payment card information defined under PCI DSS. It includes the Primary Account Number (PAN) and may also include the cardholder name, expiration date, and service code.

To ensure privacy and security, it’s essential to identify, classify, and safeguard sensitive data elements such as PII within your codebase.

Sensitivity Levels

Sensitive data elements are categorized into three sensitivity levels: Critical, Medium, and Low. Below are examples for each level.

Sensitivity LevelExamples
Critical
  • Social Security Numbers (SSNs)
  • Cardholder data
  • Medical diagnoses, history, and related information
Medium
  • Physical address
  • IP addresses
Low
  • Usernames
  • Dates of birth
  • First and last names

Data Element Definitions

Sensitive Data Element Definitions (or simply Data Elements) are predefined sets of conditions and match patterns used to detect identifiers in code such as class names, function names, and variable names that strongly suggest they handle sensitive data (e.g., User.lastName, get_ssn). HoundDog.ai continuously evolves and curates this collection through advanced workflows and real-world testing, and users can also define custom data elements to detect organization-specific sensitive data types.

A complete list of supported data elements is available in the scanner’s GitHub repository: https://github.com/hounddogai/hounddog/blob/main/data-elements.md

Dataflows

Dataflows represent how your application code defines the movement of sensitive information from custom built applications to downstream data sinks such as logs, files, APIs, third party services, and AI integrations. This includes data types like PII, PHI, PIFI, CHD, and authentication tokens. By analyzing code paths and data handling logic, HoundDog.ai identifies how data is collected, transformed, and ultimately exposed to these destinations.

The Dataflows page provides a consolidated view of sensitive data flows detected by the HoundDog.ai code scanner. It shows how data moves from your applications to different data sinks, creating code level evidence of processing activities. This helps keep privacy reports such as Records of Processing Activities up to date for GDPR, CCPA, and other regulatory frameworks.

Risky data flows are rarely intentional. They tend to emerge as codebases grow. A developer may log a full user object for debugging, or a tainted variable may carry sensitive data across multiple transformations. By the time the issue is identified, the data may already be stored or shared with a third party.

Data Sinks

Data sinks are endpoints or destinations where data leaves its original context - such as logs, cookies, JWTs, external APIs, databases, third-party services, or user interfaces. Common examples include OpenAI, Anthropic, LangChain, Datadog, AWS S3, PostgreSQL, and Sentry. These are often the final stop for data before it's stored, transmitted, or displayed, making them critical points for enforcing security and privacy controls.

A complete list of supported data sinks, organized by programming language, is available in the scanner’s GitHub repository: https://github.com/hounddogai/hounddog/blob/main/data-sinks.md

While some data sinks offer PII scrubbing capabilities, they typically rely on sampling and broad pattern matching that lack the specificity or flexibility needed for different use cases. Scrubbing at the sink is also reactive and often costly, making it an inefficient last line of defense. HoundDog.ai takes a proactive approach by detecting risky dataflows early in the development cycle, long before any changes reach production.

Severity Levels

Each data flow is assigned a severity level of Critical, Medium, Low, or Info. Severity is determined based on the sensitivity of the data elements involved, which can be customized in the Data Elements page, and the destination of the data, including any allowlist rules defined in the Data Sinks page.

Privacy Risks

The following are key examples of privacy violations that HoundDog.ai detects and prevents. Each issue highlights how sensitive data can be mishandled across different mediums and why it results in violations of global privacy and security frameworks.

IssueViolated FrameworksReason for Violation
Sensitive data in logs (plaintext)GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPLThe exposure of sensitive data in logs is especially risky because logs are often ingested by monitoring and SIEM tools (e.g., Datadog, Splunk). When sensitive data is detected, it is considered a disruptive incident, often requiring up to 100 hours of remediation. This typically involves scrubbing logs, stopping ingestion, and working retroactively with engineering teams to fix the code that caused the leak.
Sensitive data in files (plaintext)GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPLWhile files may have strong access controls limiting who can access them, storing sensitive data in plaintext significantly increases the risk of data exfiltration. This may occur through lateral movement (for example, a hacker already in the network with elevated permissions) or through insider data theft.
Sensitive data in cookies (plaintext)GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPLIdentifiers exposed without encryption or valid consent are clear privacy violations. Understanding what sensitive data is exposed in cookies is a critical part of privacy compliance.
Sensitive data in local storage (plaintext)GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPLSensitive data stored in local storage may be accessible by client-side scripts and is often unencrypted, creating clear privacy violations.
Sensitive data in JWT tokens (plaintext)GDPR (Art. 5, 32), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPLTokens grant access rights, and storing them in plaintext poses a high security and privacy risk.
Sensitive data in third party integrations (beyond DPA)GDPR (Art. 5, 28), CCPA/CPRA, HIPAA, PCI, GLBA, PIPEDA, APPI, NIST 800-53, ISO/IEC 29100, KSA PDPL, UAE PDPL, Qatar PDPPLSharing sensitive data beyond the scope of a Data Processing Agreement or Privacy Notice breaches both consent and purpose limitation rules. This is one of the most overlooked types of violations, as applications often include hidden abstractions and SDKs that expose data to third party and AI integrations.

CWEs

HoundDog.ai's mission is to shift-left and empower organizations to prevent and eliminate vulnerabilities at the source code level, with primary focus on PII (Personally Identifiable Information). Here are some of the most common CWEs (common weakness numerations) our scanner covers extensively:

CWEDescription
CWE-201Information Exposure Through Sent Data
CWE-209Information Exposure Through an Error Message
CWE-312Cleartext Storage of Sensitive Information
CWE-313Cleartext Storage in a File or on Disk
CWE-315Cleartext Storage of Sensitive Information in a Cookie
CWE-532Insertion of Sensitive Information into Log File
CWE-539Use of Persistent Cookies Containing Sensitive Information

###

VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches