Vis enkel innførsel

dc.contributor.authorManzanares-Salor, Benet
dc.contributor.authorSánchez, David
dc.contributor.authorLison, Pierre
dc.date.accessioned2023-03-03T16:02:30Z
dc.date.available2023-03-03T16:02:30Z
dc.date.created2022-10-11T13:49:37Z
dc.date.issued2022
dc.identifier.isbn978-3-031-13944-4
dc.identifier.urihttps://hdl.handle.net/11250/3055870
dc.description.abstractThe standard approach to evaluate text anonymization methods consists of comparing their outcomes with the anonymization performed by human experts. The degree of privacy protection attained is then measured with the IR-based recall metric, which expresses the proportion of re-identifying terms that were correctly detected by the anonymization method. However, the use of recall to estimate the degree of privacy protection suffers from several limitations. The first is that it assigns a uniform weight to each re-identifying term, thereby ignoring the fact that some missed re-identifying terms may have a larger influence on the disclosure risk than others. Furthermore, IR-based metrics assume the existence of a single gold standard annotation. This assumption does not hold for text anonymization, where several maskings (each one encompassing a different combination of terms) could be equally valid to prevent disclosure. Finally, those metrics rely on manually anonymized datasets, which are inherently subjective and may be prone to various errors, omissions and inconsistencies. To tackle these issues, we propose an automatic re-identification attack for (anonymized) texts that provides a realistic assessment of disclosure risks. Our method follows a similar premise as the well-known record linkage methods employed to evaluate anonymized structured data, and leverages state-of-the-art deep learning language models to exploit the background knowledge available to potential attackers. We also report empirical evaluations of several well-known methods and tools for text anonymization. Results show significant re-identification risks for all methods, including also manual anonymization efforts.en_US
dc.description.abstractAutomatic Evaluation of Disclosure Risks of Text Anonymization Methodsen_US
dc.language.isoengen_US
dc.relation.ispartofPrivacy in Statistical Databases
dc.rightsNavngivelse-Ikkekommersiell-DelPåSammeVilkår 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/deed.no*
dc.titleAutomatic Evaluation of Disclosure Risks of Text Anonymization Methodsen_US
dc.title.alternativeAutomatic Evaluation of Disclosure Risks of Text Anonymization Methodsen_US
dc.typeChapteren_US
dc.description.versionacceptedVersionen_US
cristin.ispublishedtrue
cristin.fulltextpostprint
cristin.qualitycode1
dc.identifier.cristin2060503
dc.relation.projectNorges forskningsråd: 308904en_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse-Ikkekommersiell-DelPåSammeVilkår 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse-Ikkekommersiell-DelPåSammeVilkår 4.0 Internasjonal