Text Pattern Discovery and Extraction

Morten, Waersland

dc.contributor.author	Morten, Waersland
dc.date.accessioned	2016-10-10T10:24:04Z
dc.date.available	2016-10-10T10:24:04Z
dc.date.issued	2016-06-15
dc.identifier.uri	http://hdl.handle.net/11250/2413858
dc.description	Master's thesis in Computer science	nb_NO
dc.description.abstract	This thesis presents a technique for discovering and extracting unknown patterns for structured data. There is no need for pre-knowledge to be able to discover patterns. But by applying pre-knowledge these patterns can be classified. When merging information from structured data, it is important that correct information is merged together. To achieved this multiple techniques are needed to analyse the information. This thesis provides a technique that can increase the accuracy. By collecting unique values using a trie structure, unknown pattern is discovered and extracted. These patterns are represented by using regular expressions and classified by using a decision tree. The technique presented provides regular expressions that are efficient and accurate. Along with the decision tree that classifies correct with a score greater than 80%. This technique can be used to improve the accuracy when merging structured data, increases the knowledge about a file, detect ID values, calculate other measurement including the consistency of a file, and if there are typographical errors.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	University of Stavanger, Norway	nb_NO
dc.relation.ispartofseries	Masteroppgave/UIS-TN-IDE/2016;
dc.subject	informasjonsteknologi	nb_NO
dc.subject	information technology	nb_NO
dc.subject	computer science	nb_NO
dc.subject	machine learning	nb_NO
dc.subject	trie structure	nb_NO
dc.subject	regular expression	nb_NO
dc.title	Text Pattern Discovery and Extraction	nb_NO
dc.type	Master thesis	nb_NO
dc.subject.nsi	VDP::Technology: 500::Information and communication technology: 550::Computer technology: 551	nb_NO

Tilhørende fil(er)

Filnavn:: Waersland_Morten.pdf
Størrelse:: 1.110Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Studentoppgaver (TN-IDE) [823]
Studentoppgaver i informasjonsteknologi, datateknikk / kybernetikk, signalbehandling

Vis enkel innførsel