Privacy Analysis of Data Anonymization 🛡️

Research Assistant

My Contributions

  • Produced two presentations for partner company
  • Deliver a landscape paper of state-of-the-art techniques on anonymizing spatio-temporal data.
  • Investigation & Scope
    • Numerous mechanisms
    • Contemporary research problems
    • Case studies of successful and failed public distributions of sensitive data
  • Projects: GeoLife & mobility-traj-gan

Publication
Murray Jr, Jeffrey, Afra Mashhadi, Brent Lagesse, and Michael Stiber. 2021. “Privacy Preserving Techniques Applied to CPNI Data: Analysis and Recommendations.” arXiv:2101.09834 [Cs], January. http://arxiv.org/abs/2101.09834.

View Landscape Paper

Introduction

Telecom operators collect enormous amounts of data that is often referred to as Customer Proprietary Network Information (CPNI). One of the main dimensions of such data is Call Data Records which includes fine grain information regarding the location (latitude and longitude) of the caller, and the callee, along with their associated cell towers and the time of the call. This data is not only essential to the billing operations of the telecom industry but is also extremely valuable for third party agencies and researchers. For example, a common need when analyzing real-world data-sets is determining which instances stand out as being dissimilar to all others. Such instances are known as anomalies, and given a high prevalence of normal instances of data, anomalies can be identified using machine learning approaches.

In the context of spatial temporal data such as those of Call Data Records, anomalies could correspond to the high load on a specific cell tower at a given time caused by a natural phenomena such as events, crowd flow/traffic jam (Oliver et al., 2015), or by adversarial attacks. Other use cases of such data include detecting similarities and patterns. For example, CDR data has been shown to improve our understanding of the dynamics of citizens’ mobility and communities (Hong et al., 2018; Oliver et al., 2020).

With Telecommunication Carriers growing interest in contributing Customer Proprietary Network Information (CPNI) for research, they must go through their legal team to verify the request of distributing data with academia and other untrustworthy third parties. For them to achieve this solution, our cohort was tasked with investigating the viability of anonymization techniques and recommending necessary actions to release the information.

References

de Montjoye, Y.-A., Hidalgo, C. A., Verleysen, M., & Blondel, V. D. (2013). Unique in the Crowd: The privacy bounds of human mobility. Scientific Reports, 3(1), 1376. https://doi.org/10.1038/srep01376

Hong, L., Lee, M., Mashhadi, A., & Frias-Martinez, V. (2018). Towards Understanding Communication Behavior Changes During Floods Using Cell Phone Data. Social Informatics, 97–107. https://doi.org/10.1007/978-3-030-01159-8_9

Oliver, N., Lepri, B., Sterly, H., Lambiotte, R., Deletaille, S., De Nadai, M., Letouzé, E., Salah, A. A., Benjamins, R., Cattuto, C., Colizza, V., de Cordes, N., Fraiberger, S. P., Koebe, T., Lehmann, S., Murillo, J., Pentland, A., Pham, P. N., Pivetta, F., … Vinck, P. (2020). Mobile phone data for informing public health actions across the COVID-19 pandemic life cycle. Science Advances, 6(23), eabc0764. https://doi.org/10.1126/sciadv.abc0764

Oliver, N., Matic, A., & Frias-Martinez, E. (2015). Mobile Network Data for Public Health: Opportunities and Challenges. Frontiers in Public Health, 3. https://doi.org/10.3389/fpubh.2015.00189

Wang, S., Cao, J., & Yu, P. S. (2019). Deep Learning for Spatio-Temporal Data Mining: A Survey. ArXiv:1906.04928 [Cs, Stat]. http://arxiv.org/abs/1906.04928