Vis enkel innførsel

dc.contributor.advisorKvaløy, Jan Terje
dc.contributor.authorEilertsen, Dag Recep Eroglu
dc.date.accessioned2020-11-18T10:08:05Z
dc.date.available2020-11-18T10:08:05Z
dc.date.issued2020-06
dc.identifier.urihttps://hdl.handle.net/11250/2688414
dc.descriptionMaster's thesis in Mathematicsen_US
dc.description.abstractIn this report, survival data from a german breast cancer study has been analysed using the programming software R. For the 686 female patients participating in the study, the value of eight explanatory variables were recorded at the start of the study. These variables were age, menopause status, whether the patient received tamoxifen or not, tumor grade, tumor size, number of positive lymph nodes, and amount of progesterone and estrogen bound to proteins in the cytosol of the primary tumor. Both time to recurrence of tumor and time to death were recorded for each patient. The focus has been to find out how important the explanatory variables are when it comes to time to recurrence and time to death. After creating Kaplan-Meier curves and doing log-rank tests, the data were analysed using Cox regression. The method of purposeful selection was used to choose which of the explanatory variables that should be included in the Cox regression model. Schoenfeld residuals plots were used to identify wheter or not the assumption of proportional hazards has been obeyed. Martingale residuals plots were used to detect the functional form that should be used for the explanatory variable values in the models. After performing purposeful selection, size, grade, nodes and progesterone were the variables that remained for time to death. For time to recurrence, tamoxifen, grade, nodes and progesterone were the ones that remained. An attempt to model recurrence as a time-dependent variable was made for time to death, and it was found that people experiencing recurrence has a much higher chance of death than those not experiencing recurrence. Weibull distributed survival times were simulated by assuming the value of three explanatory variables (normally, exponentially and uniformly distributed) and their associated regression coefficients. A data frame of the simulated survival data were created, and Cox regression were runned on this data frame to check if the assumed regression coefficients were reproduced. The 95 % confidence intervals for the regression coefficients produced by the Cox regression machinery were found to include the assumed regression coefficient values. It was found that increasing the standard deviation of the normally distributed explanatory variable increased the accuracy of the regression coefficient estimates. Increasing the number of simulations was also found to increase the accuracy of the estimates. Survival data which had non-proportional hazards were simulated by an inbuilt R-function called sim.survdata. These data were used to test whether or not the Schoenfeld residuals plot could detect the assumed functional form of a time-dependent regression coefficient. From the plot it was possible to detect that the assumed functional form had the graph of a parabola.en_US
dc.language.isoengen_US
dc.publisherUniversity of Stavanger, Norwayen_US
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.subjectlektorutdanningen_US
dc.subjectrealfagen_US
dc.subjectbrystkreften_US
dc.subjectdataanalyseen_US
dc.titleSurvival Analysis using Cox Regression on Breast Cancer Dataen_US
dc.typeMaster thesisen_US
dc.subject.nsiVDP::Samfunnsvitenskap: 200::Pedagogiske fag: 280::Fagdidaktikk: 283en_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal