Skip to content

FEATURE: Local Data Redaction for PHI (HIPAA Privacy Compliance) #60

@RishiGoswami-code

Description

@RishiGoswami-code

Feature and its Use Cases

Feature Request: Local Data Redaction for PHI (HIPAA Privacy Compliance)

Description

Currently, the DocPilot application sends raw, unredacted transcripts containing Protected Health Information (PHI)—such as patient names, dates, and medical identifiers—directly to the Google Gemini API for summarization.

Without a Business Associate Agreement (BAA) and an Enterprise API tier, transmitting raw PHI to public LLM endpoints is a severe privacy vulnerability and a violation of HIPAA/GDPR compliance. We need to implement a mechanism to sanitize the transcript locally on the device before it gets sent to the network.

Proposed Solution

Implement a Local PII/PHI Redaction layer within the app:

  1. After Deepgram returns the transcript, intercept the String before passing it to the Gemini summarization function.
  2. Use a local Regex or lightweight NLP Named Entity Recognition (NER) pipeline (using google_mlkit_nlp or standard Dart Regex formulas) to detect names, dates, and phone numbers.
  3. Replace the detected entities with placeholders (e.g., [PATIENT_NAME], [DATE], [ID]).
  4. Send the sanitized, placeholder-injected string to Gemini to generate the summary/prescription.

Tasks & Architecture

  • Create an AnonymizationService class in the services/ directory.
  • Implement Regex rules or integrate an on-device NLP package to detect and replace sensitive entities.
  • Integrate the AnonymizationService into the ChatbotService / Gemini processing flow.
  • Add a toggle in the App Settings allowing doctors to optionally turn off the "Privacy Filter" if they have an enterprise-compliant API key.
  • Write unit tests to ensure names and dates are correctly scrubbed from dummy medical paragraphs.

Alternatives Considered

  • On-Device LLM (e.g., Gemma/Llama via ONNX): While 100% secure since no data leaves the device, compiling local models into Flutter is currently too heavy for the app size and performance budget. Local text redaction is the best immediate stop-gap.

Context

Medical applications demand high privacy standards. By adding a client-side redaction wall, we protect the patient's identity from third-party LLM data logging and make DocPilot significantly safer for real-world clinical usage.

Additional Context

No response

Code of Conduct

  • I have joined the Discord server and will post updates there
  • I have searched existing issues to avoid duplicates

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions