**RESEARCH RESTART**

The Libraries are resuming limited in-person research activities by appointment only as part of the University's Research Restart Plan.

Learn more about the Libraries' entry requirements and available services.

### Librarian View

LEADER 03419nam a22003017i 4500

001
a11849351

003
SIRSI

005
20200821003001.0

006
m d

007
cr un

008
161011s2016 xx sm 000 0 eng d

040

a| CSt
c| CSt
d| UtOrBLW

100

1

a| He, Yu.
=| ^A2762222

245

1

0

a| Efficient permutation P-value estimation for gene set tests
h| [electronic resource] /
c| Yu He.

260

c| 2016.

300

a| 1 online resource.

500

a| Submitted to the Department of Statistics.

502

a| Thesis (Ph.D.)--Stanford University, 2016.

520

3

a| In a genome-wide expression study, gene set testing is often used to find potential gene sets that correlate with a treatment(disease, drug, phenotype etc.). A gene set may contain tens to thousands genes, and genes within a gene set are generally correlated. Permutation tests are standard approaches of getting p-values for these gene set tests. Plain Monte Carlo methods that generate random permutations can be computationally infeasible for small p-values. Ackermann and Strimmer (2009) finds two families of test statistics that achieve overall best performances - a linear family and a quadratic family. This dissertation first reviews the relative background of gene set testing and permutation tests, and then provides three alternative approaches to estimate small permutation p-values efficiently. The first approach focuses on the linear statistic. Observing the p-value can be written as the proportion of points lying in a spherical cap, the p-value is approximated by the volume of a spherical cap. Error estimates can be derived from generalized Stolarsky's invariance principal, and alternative probabilistic proofs are provided. The second approach focuses on the quadratic statistic. Importance sampling is used to estimate the area of the (continuous) significant region on the sphere, and the volume of the region is used as an approximation for the (discrete proportion) p-value. Different proposal distributions are studied and compared. The third approach estimates the p-value with nested sampling. It may work for both the linear and the quadratic statistic. Similar ideas can be found in literature spanning from combinatorics, sequential Monte Carlo, Bayesian computation, rare event estimation, network reliability etc., and bears different names, e.g. approximate counting, nested sampling, subset simulation, multilevel splitting etc. We give a thorough review of literature in these different areas, and apply the technique to the gene set testing with the quadratic test statistic. Finally, we compare the proposed methods with plain Monte Carlo and saddle- point approximation on three expression studies in Parkinson's Disease patients. This work was supported by the US National Science Foundation under grant DMS-1521145.

700

1

a| Owen, Art B.
e| primary advisor.
4| ths
=| ^A1531908

700

1

a| Hastie, Trevor
e| advisor.
4| ths
=| ^A749051

700

1

a| Wong, Wing Hung
e| advisor.
4| ths
=| ^A1649606

710

2

a| Stanford University.
b| Department of Statistics.
=| ^A432305

596

a| 21 22

856

4

0

u| http://purl.stanford.edu/hg200hk9670
x| SDR-PURL
x| item

916

a| DATE CATALOGED
b| 20161018

999

a| 3781 2016 H
w| ALPHANUM
c| 1
i| 36105223477782
l| UARCH-30
m| SPEC-COLL
r| Y
s| Y
t| NONCIRC
u| 10/12/2016

999

a| INTERNET RESOURCE
w| ASIS
c| 1
i| 11849351-2001
l| INTERNET
m| SUL
r| Y
s| Y
t| SUL
u| 10/12/2016
x| E-THESIS