System for Workflow Design and Execution on Data Shared Between Untrusting Organizations for Analytics

Rehman, Ali Akbar

dc.contributor.advisor	Rong, Chunming
dc.contributor.advisor	Geng, Jiahui
dc.contributor.author	Rehman, Ali Akbar
dc.date.accessioned	2022-08-31T15:51:22Z
dc.date.available	2022-08-31T15:51:22Z
dc.date.issued	2022
dc.identifier	no.uis:inspera:92613534:56603678
dc.identifier.uri	https://hdl.handle.net/11250/3014780
dc.description.abstract	Performance of complex analytics \& AI algorithms typically involves large amounts of data. The data may originate from multiple sources and is typically compiled and moved to a central location before it can be consumed by the algorithms, making this approach impractical for untrusting organizations interested to share analytics and results but not risking the exposure of the dataset in its entirety. Current approaches to support such a scenario for data consumption is to move the computation closer to the data instead of the other way around. But that involves writing code for distributed file systems like Hadoop File System (HDFS), which demands professional expertise in writing Map-Reduce jobs and parallel code design patterns. In this thesis, we demonstrate a proof of concept allowing organizations to share their datasets for consumption by inter-organizational workflows without exposing the data itself and avoiding distributed programming expertise. We propose an approach using Hyperledger Fabric for untrusting entities to advertise their datasets for consumption by other organizations without demanding extensive knowledge of writing distributed code, and all this without ever exposing the data itself to the user. Hence the analytics can be run on the data while maintaining ownership. A permissioned blockchain network is established using Hyperledger Fabric and organizations can join the mentioned consortium. A JupyterHub server is hosted on a Kubernetes cluster that services users with a Jupyter instance where users can explore the datasets available through our custom extension, write code and construct workflows running the algorithms on the datasets. The required datasets are consumed as persistent volumes when running the workflow; only exposing the data to the job requiring it. To ensure the privacy of sensitive information committed to the blockchain, organizations encrypt the sensitive information with keys that are internal to the organization.
dc.description.abstract
dc.language	eng
dc.publisher	uis
dc.title	System for Workflow Design and Execution on Data Shared Between Untrusting Organizations for Analytics
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.uis:inspera:92613534:566036 ...
Størrelse:: 2.016Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Studentoppgaver (TN-IDE) [866]
Studentoppgaver i informasjonsteknologi, datateknikk / kybernetikk, signalbehandling

Vis enkel innførsel