Vis enkel innførsel

dc.contributor.authorTjøstheim, Dag Bjarne
dc.contributor.authorJullum, Martin
dc.contributor.authorLøland, Anders
dc.date.accessioned2023-12-04T10:26:31Z
dc.date.available2023-12-04T10:26:31Z
dc.date.created2023-09-27T14:30:33Z
dc.date.issued2023
dc.identifier.citationStatistical Science. 2023, 38 (3), 411-439.en_US
dc.identifier.issn0883-4237
dc.identifier.urihttps://hdl.handle.net/11250/3105785
dc.description.abstractThere has been an intense recent activity in embedding of very high-dimensional and nonlinear data structures, much of it in the data science and machine learning literature. We survey this activity in four parts. In the first part, we cover nonlinear methods such as principal curves, multidimensional scaling, local linear methods, ISOMAP, graph-based methods and diffusion mapping, kernel based methods and random projections. The second part is concerned with topological embedding methods, in particular mapping topological properties into persistence diagrams and the Mapper algorithm. Another type of data sets with a tremendous growth is very high-dimensional network data. The task considered in part three is how to embed such data in a vector space of moderate dimension to make the data amenable to traditional techniques such as cluster and classification techniques. Arguably, this is the part where the contrast between algorithmic machine learning methods and statistical modeling, represented by the so-called stochastic block model, is at its greatest. In the paper, we discuss the pros and cons for the two approaches. The final part of the survey deals with embedding in R2, that is, visualization. Three methods are presented: t-SNE, UMAP and LargeVis based on methods in parts one, two and three, respectively. The methods are illustrated and compared on two simulated data sets; one consisting of a triplet of noisy Ranunculoid curves, and one consisting of networks of increasing complexity generated with stochastic block models and with two types of nodes.en_US
dc.language.isoengen_US
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.subjectStatistical embeddingen_US
dc.titleStatistical Embedding: Beyond Principal Componentsen_US
dc.title.alternativeStatistical Embedding: Beyond Principal Componentsen_US
dc.typeJournal articleen_US
dc.typePeer revieweden_US
dc.description.versionacceptedVersionen_US
cristin.ispublishedtrue
cristin.fulltextpostprint
cristin.qualitycode2
dc.identifier.doi10.1214/22-STS881
dc.identifier.cristin2179499
dc.source.journalStatistical Scienceen_US
dc.source.volume38en_US
dc.source.issue3en_US
dc.source.pagenumber411-439en_US
dc.relation.projectNorges forskningsråd: 237718en_US
dc.subject.nsiVDP::Mathematics and natural science: 400::Mathematics: 410en_US
dc.subject.nsiVDP::Mathematics and natural science: 400::Mathematics: 410::Statistics: 412en_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal