Globality | Innovation Blog

Security in the AI Era: Beyond Compliance to True Data Protection

Written by Keith McFarlane | Jan 14, 2025 9:55:31 AM

The statistics tell an alarming story: according to IBM's 2024 Cost of a Data Breach Report [1], the average cost of a data breach has now reached $4.88 million. The report highlights an emerging concern: attackers are increasingly leveraging AI and automation tools, adding complexity to the threat landscape and extending breach lifecycles by weeks or months.

But beyond these numbers lies a more nuanced reality; as AI transforms business processes, it creates novel attack surfaces and security challenges that traditional frameworks struggle to address. The rapid adoption of AI technologies across industries has fundamentally changed the security landscape, introducing complexities that weren't even contemplated in traditional security frameworks. Organizations now face the dual challenge of protecting not just their data, but also the AI models that process it, the training data that shapes these models, and the outputs they generate.

At Globality, we've learned that robust security isn't just about checking boxes; it's about building a comprehensive shield around customer data that evolves with emerging threats. While we're proud of our ISO 27001 certification and SOC2 compliance (both recently reaffirmed through rigorous audits), we see these as foundations rather than finish lines. These certifications represent our commitment to maintaining rigorous security standards, but our actual security practices go well beyond what these frameworks require. We've built our security infrastructure with the understanding that in today's landscape, compliance is merely the starting point.

Beyond Standards: Security in Practice

The rise of AI has introduced new complexities in data protection that few organizations are fully prepared to address. Traditional security models often fall short when confronted with the unique challenges of AI systems.

Data residency requirements have become increasingly complex, varying not just by country, but by data type and processing stage. Organizations must now track and control data flows across multiple jurisdictions, ensuring compliance with a patchwork of regulations while maintaining system performance. This becomes particularly challenging when AI models need to process data across borders or when training data comes from multiple jurisdictions.

The protection of both training data and model outputs presents another layer of complexity. At Globality, we've developed sophisticated approaches to this challenge. Our AI models fall into two main categories; classifiers for categorizing projects based on natural language descriptions, and clustering/ranking models for provider matching. Each requires its own careful security consideration. For classifier training, we've implemented a rigorous four-step anonymization and sanitization pipeline:

  • We systematically remove client-specific identifiers and codes
  • We employ Named Entity Recognition (NER) to identify and remove personal and company proprietary information
  • We use sophisticated techniques like Normalized Google Distance (NGD) calculations to identify and redact client-specific entities
  • We implement reproducible name mangling for sensitive terms

This process, performed with heavy involvement by Globality InfoSec personnel, ensures that while our models can learn from aggregate patterns, they never retain or expose sensitive client information. We're particularly stringent about personal identifiable information (PII); it's completely excluded from our training pipeline.

For our clustering and ranking models, which process a wide array of supplier data, we've implemented strict controls on data sourcing and usage. While we collect data from public, private, and proprietary sources, we maintain clear boundaries around what data can be used for model training. We've explicitly designed our systems to exclude sensitive proprietary data from model training without explicit customer consultation and approval.

Ensuring AI systems maintain security during both training and inference requires a comprehensive approach that traditional security frameworks don't fully address. During training, we must protect against data poisoning attacks and unauthorized access to training data. During inference, we need to guard against model extraction attacks and ensure that model outputs don't leak sensitive information.

To address these challenges, we've implemented advanced security measures that go beyond standard practices. For example, our Hold Your Own Key (HYOK) encryption capability gives customers complete control over their data while maintaining functionality, and our AI Governance Committee provides oversight of all AI changes from model improvements to prompt evaluation.

IP allowlisting is another critical component of our security infrastructure that we've implemented as part of a broader zero-trust architecture to protect every point of ingress (including both web application and REST API requests). This means every access attempt, whether from an allowed IP or not, undergoes rigorous authentication and authorization checks. We've also adopted the ISO 42001 framework for AI systems, ensuring security is built into our AI operations from the ground up, not bolted on as an afterthought.

Practical Security Advice for the AI Era

Based on our experience protecting sensitive enterprise data, we've developed a comprehensive set of security principles that organizations should consider as they build out their AI infrastructure:

  • Treat AI model inputs and outputs with the same rigorous security controls as your raw data. Many organizations focus on securing their databases but leave model interactions inadequately protected. This creates vulnerabilities that sophisticated attackers can exploit. Recent research has demonstrated how seemingly innocuous model outputs can potentially be used to reconstruct training data [2]. That's why we've implemented end-to-end encryption for all model interactions, ensuring that data remains protected at every stage of processing.
  • Implement granular access controls as part of a comprehensive zero-trust architecture. IP allowlisting is just the beginning; every access request should be validated regardless of its source. This means implementing strong authentication mechanisms, regular access reviews, and detailed audit logging. At Globality, we maintain comprehensive audit trails of all system access, enabling us to detect and respond to potential security incidents quickly.
  • Encryption should be end-to-end, with customer control over keys. Our HYOK implementation demonstrates that robust encryption doesn't have to come at the cost of functionality. We've developed sophisticated key management systems that allow customers to maintain complete control over their data while still leveraging our AI capabilities. This includes automatic key rotation, secure key storage, and immediate revocation capabilities.
  • Implement comprehensive data anonymization pipelines for AI training. This isn't just about removing obvious identifiers; it requires a sophisticated multi-layer approach that considers context and relationships between data points. Our experience shows that effective anonymization must combine rule-based systems, machine learning techniques, and careful human oversight.
  • Embrace data minimization as a fundamental security principle. At Globality, we rigorously define what data we need and only use the data required. We also carefully select our models and vendors, and we don't allow third parties to use any data we send through their APIs for AI training. This approach follows a core security principle: the best way to mitigate risk is to minimize the data you hold in the first place. By being intentional about data collection and processing, organizations can significantly reduce their attack surface and potential exposure.

The Path Forward

The reality is that security in the AI era requires a fundamental shift in how we think about data protection. It's not enough to secure static data; we need to protect information as it flows through increasingly complex AI systems. This means implementing security controls that understand and account for the unique characteristics of AI workflows, from data ingestion through model training and inference.

Our commitment to security at Globality goes beyond maintaining certifications and implementing standard security measures. We're continuously evolving our security infrastructure to address emerging threats and protect our customers' data in an increasingly complex technological landscape. By sharing our experiences and insights, we hope to contribute to the development of more robust security practices across the industry.

This approach has not only helped us maintain the trust of our enterprise customers but has also positioned us to adapt to emerging security challenges as AI technology continues to evolve. The future of AI security will require even more sophisticated approaches to data protection, and we're committed to staying at the forefront of these developments.

Click here to book a demo of our award-winning AI-driven sourcing platform.

References

[1] IBM Security. "Cost of a Data Breach Report 2024." IBM Security, 2024. https://www.ibm.com/reports/data-breach

[2] Haim, N., Vardi, G., Yehudai, G., Shamir, O., & Irani, M. "Reconstructing Training Data From Trained Neural Networks." Advances in Neural Information Processing Systems (NeurIPS) 2022. https://openreview.net/forum?id=Sxk8Bse3RKO