Vis enkel innførsel

dc.contributor.advisorRong, Chunming
dc.contributor.advisorGeng, Jiahui
dc.contributor.authorRehman, Ali Akbar
dc.date.accessioned2022-08-31T15:51:22Z
dc.date.available2022-08-31T15:51:22Z
dc.date.issued2022
dc.identifierno.uis:inspera:92613534:56603678
dc.identifier.urihttps://hdl.handle.net/11250/3014780
dc.description.abstractPerformance of complex analytics \& AI algorithms typically involves large amounts of data. The data may originate from multiple sources and is typically compiled and moved to a central location before it can be consumed by the algorithms, making this approach impractical for untrusting organizations interested to share analytics and results but not risking the exposure of the dataset in its entirety. Current approaches to support such a scenario for data consumption is to move the computation closer to the data instead of the other way around. But that involves writing code for distributed file systems like Hadoop File System (HDFS), which demands professional expertise in writing Map-Reduce jobs and parallel code design patterns. In this thesis, we demonstrate a proof of concept allowing organizations to share their datasets for consumption by inter-organizational workflows without exposing the data itself and avoiding distributed programming expertise. We propose an approach using Hyperledger Fabric for untrusting entities to advertise their datasets for consumption by other organizations without demanding extensive knowledge of writing distributed code, and all this without ever exposing the data itself to the user. Hence the analytics can be run on the data while maintaining ownership. A permissioned blockchain network is established using Hyperledger Fabric and organizations can join the mentioned consortium. A JupyterHub server is hosted on a Kubernetes cluster that services users with a Jupyter instance where users can explore the datasets available through our custom extension, write code and construct workflows running the algorithms on the datasets. The required datasets are consumed as persistent volumes when running the workflow; only exposing the data to the job requiring it. To ensure the privacy of sensitive information committed to the blockchain, organizations encrypt the sensitive information with keys that are internal to the organization.
dc.description.abstract
dc.languageeng
dc.publisheruis
dc.titleSystem for Workflow Design and Execution on Data Shared Between Untrusting Organizations for Analytics
dc.typeMaster thesis


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel