AI 醫療與個資隱私是兩難課題?

議程

14:00-14:05  活動介紹

14:05-15:00   專題演講一「智慧醫療 AI 應用之法規課題」:何之行 副研究員(中央研究院歐美研究所)

15:00-16:00   專題演講二「如何利用技術工具兼顧智慧醫療數據的資安與隱私」:范俊逸 特聘教授(國立中山大學資訊工程學系)-線上

會議記錄

台灣網路講堂於7月31日(一)舉辦「AI 醫療與個資隱私是兩難課題?」專題講座,由2位國內在醫療保健與智慧科技領域的專家,包含中央研究院歐美研究所何之行副研究員以及國立中山大學資訊工程學系范俊逸教授,帶領大家理解醫療保健 AI 應用涉及的法規協調課題,以及如何透過技術來達到醫療資料的安全與隱私保護。

何之行 副研究員(中央研究院歐美研究所)

何之行博士的演講主要討論AI在醫療與照護領域領域的個人隱私問題以及AI帶來的監理困難。目前影像病理資料已經大量應用AI,而FDA也已經做了相關的規範,但還有許多待開發的應用,未來將有機會與大數據或精準醫療等其他新興技術連結。而我們期待的就是要在資料使用與可歸責性方面,讓AI進入場域時,是增加益處而非造成危害的。何博士歸納了數個重要議題,包括健康資料整合、資料二次利用、同意模式、去識別化、商業進用、偏誤與可解釋性,以及可信任AI等。

我國有關健康資料蒐集處理與利用的依據是個人資料保護法,該法上位原則是蒐集要有特定目的;而醫療、基因和健康檢查是屬於敏感性個資,原則上不得蒐集處理或利用,但目前個資法第六條訂有但書,作為例外允許的規範基礎,例如法律明文規定、當事人書面同意等。通常商業公司要先與學校進行產學合作才有機會取得如健保資料做二次利用,因為條款中規定只有公務機關與學術研究機構可基於醫療等目的,為統計或學術研究有必要,且在資料無從識別的情況下才能進行。何博士表示最難處理的就是去識別化。以歐盟的假名化及匿名化作法為例,歐盟認為假名化並非作為資料處理的合法充足條件,因假名化仍有被回推連結至原始個資的可能性,而匿名化資料則是做到完全無法連結原始資料,但臺灣常見的狀況就是把假名化資料當作匿名化資料來使用。

何博士接著分析111字憲判字第13號判決(健保資料庫案),判決提到健保法第79、80條因欠缺資料提供目的外利用的明確規定,而有違憲之虞。但最後判決並沒有立即宣告禁止資料釋出,而是提供三年時間立法或修法明訂法律授權事項,如逾期未改正則必須設有當事人請求退出機制,去解決現行目的外利用規範不夠明確的問題。我們從此判決看到的趨勢是,當我們沒有考慮當事人意願時,特別是醫療領域,都會有規範適法性的問題。可惜的是,目前只有針對健保資料庫另立專法,並未能夠處理我國更廣泛的醫療保健資料。

臺灣花了很多努力在解決醫療資料的二次利用,參考歐盟的做法,其一是限定公共利益之目的,其二是要有去識別化的安全處理措施。但臺灣目前欠缺的是法律授權,對於資料的二次利用,目前的改革方向主要仰賴健保資料庫規範且另立專法或是人體生物資料庫管理條例。

在有關AI演算法決策的偏差課題上,何博士提到偏見的來源可能存在於演算法開發過程中使用了片面或部分的資料、或受到人類主觀影響的資料蒐集與分類;也可能是在設計過程中缺乏適當監理,導致演算法反映且複製人類的偏見,而出現演算法偏差,導致健康不平等結果。她進一步以皮膚科的AI輔助系統CNN(卷積神經網路)為例,當訓練系統的皮膚病變樣本影像有九成以上來自於白人患者,診斷準確性只剩下一半;另外,當演算法將健康花費做為健康需求的表徵時,也會錯誤結論認為黑人的醫療支出較少,所以黑人比白人患者更為健康,因此做出黑人可獲取較少健康福利資源分配之建議。現實面是,有色人口醫療支出較低的理由是,該族群多沒有足夠資源可前往醫療院所,導致病歷資料量不足。AI在臨床演算法應用訓練的資料庫會造成所謂的「看不見的患者」,會造成醫療健康資源政策分配的錯誤,其可能無法診斷或治療整個患者群體,也因此要讓移民、兒童、老人等少數族盡可能地納入資料庫。

可信任的演算法決策到底應該如何達成?從規範上首先探討,如何讓AI治理可以有足夠的透明性作為監理的依據,反過來說,透過透明度也可以建立醫患間的信任關係。那麼該如何處理不透明性?不透明性分為三個層次,其一是我們是否知道給予的建議是由AI所產生(揭露);第二層次則是考慮為什麼AI系統會給予這樣的建議,包括考慮了那些參數或哪個資料庫。最難處理的是第三層次,到底AI是如何給出如此建議。

最後何博士結論認為所有的問題都可以回到歐盟可信任AI的7個準則,包括安全提升、較好的資料治理、尊重隱私,讓AI應用更透明化、包容性、提高可歸責性等;過程中我們需要不斷地回顧並檢視是否以人為中心。

 

范俊逸 特聘教授(國立中山大學資訊工程學系)

范俊逸教授的演講主要涵蓋了智慧醫療領域資安與隱私的威脅與挑戰,以及有哪些可運用的技術工具。在資安的威脅與挑戰部分,范教授首先回顧了疫情期間的全球重大網路攻擊事件,疫情導致的遠距工作型態造成資安風險提高,居家網路環境的資安弱點帶來了破口。而在後疫情時代,在地緣政治與戰爭影響下又有不同的網路安全問題持續發生,包含金融業、政府部門、企業、醫院等均有因政治事件而遭受駭侵的案例發生。現今網路攻擊的目標主要以基礎設施為主,醫療領域也成為網路攻擊的重點目標之一,而醫療領域組織遭駭的平均復原時間為4天左右。

醫療院所中所採用的IOT設備也是駭客攻擊的重點之一,這些儀器、設備通常存有大量病患的機敏個資,其原始設計上安全性的不足,或是使用者缺乏安全意識,都容易遭到駭客攻擊。范教授引用Palo Alto針對醫療院所所使用來自於7家醫療設備製造商的20多萬台物聯網輸液幫浦的分析結果發現,有75%的設備有安全漏洞,這意味著不須驗證駭客即可竄改藥物劑量,將導致嚴重影響病患生命安全。近期我國衛福部正式開放電子病歷上雲並允許醫院全面無紙化,應多加關注雲端所帶來的安全威脅,如虛擬平台弱點、移轉與備份管理、跨虛擬主機攻擊等問題。

范教授強調資安是跨領域課題且沒有特效藥,每個組織都應當要理解自己的系統與需求後再來擬訂應變措施。在兼顧智慧醫療資安和隱私部分,他提出以密文驅動的技術工具,回歸到資料保護的觀點。醫療院所中AI的使用相當常見,他用Google提出的聯邦式學習模型為例,說明可以透過多個本地端模型一起來訓練雲端上的整合模型,過程中無須上傳本地端數據,如此可兼顧各個本地端模型的資料隱私,目前在高雄也有台灣人工智慧實驗室(AI Labs)與高雄4大醫療中心結盟訓練AI工具的實例。他進一步指出,駭客可透過模型參數回推本地端的個人隱私資料,可透過結合同態加密、差分隱私等隱私保護技術,強化聯邦式學習的安全性。換言之,聯邦式學習是在加密過程中進行的。

范教授認為資料是資訊安全的核心,所有的資安問題都是由資料而起。當今已經有許多技術可以讓資料在完全加密情況下也可以處理,如IBE加密郵件服務、可搜尋式ABE加密檔案分享服務等技術,他也進一步介紹近期研究的「具隱私保護醫療資料探勘倉儲系統」。范教授最後建議以資料保護為基礎,讓資料透過加密保護到極致(盡量維持資料大部分時間處於加密狀態),一旦解決資料風險的核心問題,也就能夠比較精準地建置精準資安措施,類似醫療精準的概念,之後再來為組織量身訂製一套資安方案,才有機會用最低的成本來做低碳與永續資安。

簡報下載
**講者不同意分享簡報檔案**

The Privacy Challenges of AI in Healthcare

Agenda
14:00-14:05   Introduce
14:05–15:00  Keynote #1–Regulatory Issues in Healthcare AI by Ho, Chih-Hsing , Associate Research Fellow, Institute of European and American Studies, Academia Sinica
15:00–16:00  Keynote #2–Balancing Data Security and Privacy in Smart Healthcare through Technological Tools by Fan, Chun-I , Professor, Department of Computer Science and Engineering, National Sun Yat-sen University (Online)

Meeting Minutes

Ho, Chih-Hsing , Associate Research Fellow, Institute of European and American Studies, Academia Sinica

Dr. Ho’s presentation primarily focused on the personal privacy concerns in the field of AI in healthcare and caregiving, as well as the regulatory challenges posed by AI. Currently, AI has been extensively applied to digital pathology data, and the FDA has already established relevant regulations. However, there are still many applications awaiting development, with potential connections to emerging technologies such as big data and precision medicine. Our aspiration is to ensure that as AI enters the domain, it contributes benefits rather than harm, particularly in terms of data usage and accountability. Dr. Ho summarized several key issues, including health data integration, data reusability, consent models, de-identification, commercial exploitation, bias and interpretability, as well as trustworthy AI.

The basis for collecting, processing, and utilizing health data in our country is the Personal Data Protection Act. The primary principle outlined in this law is that data collection should have a specific purpose. Medical, genetic, and health examination data are considered sensitive personal information, which generally should not be collected, processed, or utilized. However, the Article 6 of the Personal Data Protection Act includes exceptions known as subclauses, which provide regulatory foundations for cases such as explicit legal requirements or written consent from the individuals involved.

Usually, commercial companies need to engage in industry-academia collaboration with educational institutions to obtain secondary use of data like National Health Insurance (NHI) data. The terms specify that only public authorities and academic research institutions can use health data for statistical or academic research purposes, and this can only be done when the data is de-identified to the point where identification is not possible.

Dr. Ho mentioned that the most challenging aspect is de-identification. Taking the example of pseudonymization and anonymization practices in the European Union (EU), the EU views pseudonymization as not a sufficient condition for lawful data processing, as there remains a possibility of re-identification. On the other hand, anonymized data is completely unlinkable to the original data. However, a common scenario in Taiwan is treating pseudonymized data as anonymized data for usage purposes.

Dr. Ho then proceeded to analyze Constitutional Interpretation No. 13, Judgment No. 111 (NHI Database Case). The judgment pointed out that the lack of explicit provisions in Articles 79 and 80 of the National Health Insurance Act regarding the use of data beyond the scope of its original purpose could potentially be unconstitutional. However, the final ruling did not immediately declare a prohibition on data release. Instead, it granted a three-year period for legislation or amendment to clearly define legal authorization for such matters. If corrective actions were not taken within this period, a mechanism would need to be established for individuals to request withdrawal from data use. This was aimed at addressing the issue of ambiguity in the current regulations concerning the use of data beyond its original purpose.

The trend observed from this judgment is that when the wishes of the parties involved, especially in the medical field, are not taken into consideration, there can be issues regarding the legality of regulations. Unfortunately, at present, there is only separate legislation specifically addressing the NHI database, and it has not been able to address the broader range of healthcare data in our country.

Taiwan has put a lot of effort into addressing the secondary use of medical data, drawing inspiration from the practices of the European Union. One approach is to limit purposes to the public interest, while another involves implementing secure de-identification measures. However, what Taiwan currently lacks is legal authorization for the secondary use of data. The current reform direction primarily relies on regulations related to the National Health Insurance (NHI) database and the establishment of separate laws, such as the Human Biobank Management Act, to govern data’s secondary use.

Dr. Ho discussed the issue of biases in AI algorithmic decision-making. She mentioned that biases can arise from using one-sided or partial data during algorithm development or from data collection and classification influenced by human subjectivity. Biases can also emerge due to a lack of proper oversight in the design process, leading algorithms to reflect and replicate human biases, resulting in algorithmic biases that contribute to health inequalities.

She provided an example involving a dermatology AI-assisted system utilizing Convolutional Neural Networks (CNN). When training the system using skin lesion images, with over 90% of samples coming from white patients, the diagnostic accuracy dropped significantly. Additionally, when algorithms equate healthcare spending with health needs, they might wrongly conclude that Black patients have lower medical expenditures, leading to the erroneous recommendation that they require fewer healthcare resources. The reality is that healthcare spending is lower among minority populations because they often lack the resources to access medical facilities, resulting in insufficient medical records. This can lead to what she termed “invisible patients” in clinical algorithm training databases, resulting in erroneous allocation of healthcare resources and policy decisions. The solution involves including minority populations such as immigrants, children, and the elderly as extensively as possible in the training data to avoid overlooking significant demographics that algorithms may fail to diagnose or treat effectively.

Achieving trustworthy algorithmic decision-making involves several aspects. From a regulatory perspective, it’s important to establish sufficient transparency in AI governance to serve as a basis for oversight. In turn, transparency can also foster trust between medical professionals and patients. But how should e address opacity? Opacity can be categorized into three levels:

  1. Disclosure:
    At the first level, it’s about knowing whether the recommendations are generated by AI. Disclosing AI involvement helps individuals understand the role of automation in decision-making.
  2. Explanation:
    The second level involves understanding why the AI system provided a particular recommendation. This includes considering the parameters or datasets that influenced the AI’s decision. Providing explanations can enhance the perceived legitimacy of AI decisions.
  3. Comprehensibility:
    The most challenging level is understanding how the AI arrived at its recommendation. This requires comprehending the intricate workings of complex algorithms, which may involve technical complexities that are difficult for non-experts to grasp.

In conclusion, Dr. Ho emphasized that all these issues can be aligned with the 7 key principles of trustworthy AI established by the European Union. These principles include enhancing safety, improving data governance, respecting privacy, ensuring AI applications are more transparent, fostering inclusivity, enhancing accountability, and keeping a human-centric approach throughout the process. It’s important to continually review and assess whether the development and deployment of AI are centered around human well-being and ethical considerations. By adhering to these principles, the path toward building trustworthy and responsible AI systems in the medical field becomes clearer.

 

Fan, Chun-I , Professor, Department of Computer Science and Engineering, National Sun Yat-sen University

Professor Fan’s presentation mainly covered the threats and challenges of cybersecurity and privacy in the field of smart healthcare, as well as the technological tools that can be utilized. In the section on cybersecurity threats and challenges, Professor Fan first reviewed major global cyberattacks that occurred during the pandemic. The shift to remote work due to the pandemic led to increased cybersecurity risks, and vulnerabilities in home network environments created entry points for attackers.

In the post-pandemic era, geopolitical factors and conflicts have given rise to different cybersecurity issues. Sectors including finance, government, enterprises, and hospitals have experienced breaches due to political events. Currently, cyberattacks primarily target critical infrastructure, and the healthcare sector has become one of the key targets. On average, healthcare organizations take around 4 days to recover from cyberattacks.

IOT devices utilized in medical institutions are indeed one of the prime targets for hacker attacks. These devices often hold substantial amounts of sensitive personal information. Weaknesses in their original designs or a lack of security awareness among users can make them susceptible to hacking attempts. Professor Fan referred to Palo Alto’s analysis of over 200,000 IoT infusion pumps from 7 medical device manufacturers used in healthcare facilities. The study found that 75% of these devices had security vulnerabilities. This means that hackers could tamper with drug dosages without verification, posing serious threats to patient safety.

In the recent period, Taiwan’s Ministry of Health and Welfare has officially opened electronic medical records to cloud storage and allowed hospitals to go completely paperless. It’s crucial to pay close attention to the security threats posed by cloud environments. These threats can include vulnerabilities in virtual platforms, issues related to data migration and backup management, and attacks that exploit vulnerabilities across virtual hosts. As the healthcare industry transitions towards digitization and cloud-based solutions, safeguarding against these security risks becomes paramount.

Professor Fan stressed that cybersecurity is a cross-disciplinary issue with no “magic bullet” solution. Each organization should understand its systems and needs before formulating response measures. To address both smart healthcare cybersecurity and privacy concerns, he advocated for cryptographic-driven technological tools, emphasizing a data protection perspective.

AI usage is prevalent in healthcare institutions, and Professor Fan used Google’s Federated Learning model as an example. He explained that multiple local models can collaboratively train a centralized model on the cloud without uploading local data. This approach ensures data privacy for each local model. He mentioned the example of Taiwan AI Labs collaborating with the four major medical centers in Kaohsiung to train AI tools, demonstrating practical applications.

He further highlighted the risk of hackers reverse-engineering local personal data through model parameters. He proposed strengthening the security of Federated Learning through techniques such as homomorphic encryption and differential privacy. In essence, Federated Learning occurs within an encrypted context, enhancing its security while maintaining data privacy.

Professor Fan believes that data is the core of information security, and all security issues stem from data. Many modern technologies allow data to be processed while fully encrypted. Examples include IBE (Identity-Based Encryption) encrypted email services and searchable ABE (Attribute-Based Encryption) encrypted file-sharing services. He also introduced his recent research on “Privacy-Preserving Healthcare Data Mining Warehousing System.”

Professor Fan’s ultimate recommendation is to build a foundation of data protection, ensuring that data is encrypted to the greatest extent possible (maintaining data in an encrypted state most of the time). By addressing the core issues of data risk, organizations can then accurately establish precise security measures, similar to the concept of precision medicine. Subsequently, customized security solutions can be tailored for organizations, ultimately achieving low-carbon and sustainable security at minimal cost.

Presentation Download
**The speakers doesn’t agree to share the presentation file**