HU Berlin Statistic Presentation

Author:
Dennis Köhn
License:
Creative Commons CC BY 4.0 ^(?)
Open as Template View Source
Download PDF
Abstract:
Slide example to hold a presentation at the chair of statistics at HU Berlin.
Tags:
Find More Templates
              
% Type of the document
\documentclass{beamer}

% elementary packages:
\usepackage{graphicx}
\usepackage[latin1]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[english]{babel}
\usepackage{listings}
\usepackage{xcolor}
\usepackage{eso-pic}
\usepackage{mathrsfs}
\usepackage{url}
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{multirow}
\usepackage{hyperref}
\usepackage{booktabs}
\usepackage{tikz}



% additional packages
\usepackage{bbm}

% packages supplied with ise-beamer:
\usepackage{cooltooltips}
\usepackage{colordef}
\usepackage{beamerdefs}
\usepackage{lvblisting}

% Mathematics
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{mathrsfs}
\usepackage{amsthm,amsfonts}
\usepackage{mathtools}
\usepackage{algorithmic}
\usepackage[linesnumbered,ruled]{algorithm2e}
\usepackage{float}


% Change the pictures here:
% logobig and logosmall are the internal names for the pictures: do not modify them. 
% Pictures must be supplied as JPEG, PNG or, to be preferred, PDF
\pgfdeclareimage[height=2cm]{logobig}{Figures/hulogo}
% Supply the correct logo for your class and change the file name to "logo". The logo will appear in the lower
% right corner:
\pgfdeclareimage[height=0.7cm]{logosmall}{Figures/hulogo}

% Title page outline:
% use this number to modify the scaling of the headline on title page
\renewcommand{\titlescale}{1.0}
% the title page has two columns, the following two values determine the percentage each one should get
\renewcommand{\titlescale}{1.0}
\renewcommand{\leftcol}{0.6}

% smaller font for selected slides
\newcommand\Fontvi{\fontsize{10}{7.2}\selectfont}
\newcommand\Fontsm{\fontsize{8}{7.2}\selectfont}


% Define the title. Don't forget to insert an abbreviation instead 
% of "title for footer". It will appear in the lower left corner:
\title[Title shown at each slide]{Title for title page}
% Define the authors:
\authora{Author 1} % a-c
\authorb{Author 2}
\authorc{Author 3}

% Define any internet addresses, if you want to display them on the title page:
\def\linka{http://lvb.wiwi.hu-berlin.de}
\def\linkb{www.case.hu-berlin.de}
\def\linkc{}
% Define the institute:
\institute{Ladislaus von Bortkiewicz Chair of Statistics \\
C.A.S.E. -- Center for Applied Statistics\\
    and Economics\\
Humboldt--Universit{\"a}t zu Berlin \\}

% Comment the following command, if you don't want, that the pdf file starts in full screen mode:
\hypersetup{pdfpagemode=FullScreen}

%%%%
% Main document
%%%%
\begin{document}
% Draw title page
\frame[plain]{%
\titlepage{}
}

% The titles of the different sections of you talk, can be included via the \section command. The title will be displayed in the upper left corner. To indicate a new section, repeat the \section command with, of course, another section title
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\frame{
\frametitle{Outline}
\begin{enumerate}
\item Introduction 
\item Pre-processing Steps
\item Model Selection
\item Variable Importance and Dimensionality Reduction 
\item Results and Conclusion
\end{enumerate}
}

\section{Introduction}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% (A numbering of the slides can be useful for corrections, especially if you are
% dealing with large tex-files)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\frame{
\frametitle{Formal Problem Setting}
\begin{itemize}
\item \textit{training set}: inputs $X = (x_1,\dots,x_n) \in \mathbb{R}^{n \times d}$ and labels $Y = (y_1,\dots,y_n)  \in  \mathbb{R}^{n}$
\item  \textit{test set}: inputs $X' = (x'_1,\dots,x'_t) \in \mathbb{R}^{t \times d}$ without labels
\end{itemize}
\vspace{0.5cm}
Find a function 
\begin{align}
f: X\rightarrow Y 
\end{align}
s.t. the \textit{test set} labels are predicted as accurately as possible, i.e.
\begin{align}
f(X') \approx Y'
\end{align} 
}

\section{Pre-Processing}
 
\frame{
\frametitle{Outline}
\begin{enumerate}
\item Introduction \quad \checkmark
\item Pre-processing Steps 
\item Model Selection
\item Variable Importance and Dimensionality Reduction 
\item Results and Conclusion
\end{enumerate}
}

\frame{
\vspace{0.1cm}
Several transformations and cleaning steps needed before putting the data into an algorithm, e.g. 
\frametitle{Pre-processing}

\begin{figure}
	\begin{center}
	\includegraphics[scale=0.25]{Figures/DataPipeline-1.jpg}
	\caption{Workflow of Pre-Processing Steps}
	\label{fig:DataPipeline}
	\end{center}
\end{figure}
All transformation need to be preformed on the test set as well! 
}

\begin{frame}[fragile]
\begin{center}
\begin{lstlisting}[
    basicstyle=\tiny, %or \small or \footnotesize etc.
]
basic_preprocessing = function(X_com, y, scaler="gaussian") 
{
	source("replace_ratings.R")
	source("convert_categoricals.R")
	source("impute_data.R")
	source("encode_time_variables.R")
	source("impute_outliers.R")
	source("scale_data.R")
	source("delete_nearzero_variables.R")
    X_ratings = replace_ratings(X_com)
    X_imputed = naive_imputation(X_ratings)
    X_no_outlier = data.frame(lapply(X_imputed, iqr_outlier))
    X_time_encoded = include_quarter_dummies(X_no_outlier)
    X_scaled = scale_data(X_time_encoded, scale_method = scaler)
    X_encoded = data.frame(lapply(X_scaled, cat_to_dummy))
    X_com = delect_nz_variable(X_encoded)
    idx_train = c(1:length(y))
    train = cbind(X_com[idx_train, ]
    test = X_com[-idx_train, ]
    return(list(train = train, X_com = X_com, test = test))
}
\end{lstlisting}
\end{center}
\quantnet \href{https://github.com/koehnden/SPL16/tree/master/Quantnet/dataProcessing/}{dataProcessing}
\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Model Selection}

\frame{
\frametitle{Outline}
\begin{enumerate}
\item Introduction \quad \checkmark
\item Pre-processing Steps \quad \checkmark
\item Model Selection
\item Variable Importance and Dimensionality Reduction 
\item Results and Conclusion
\end{enumerate}
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[fragile]
\frametitle{Optimizing Hyper-parameters} 

\begin{algorithm}[H]
\algsetup{linenosize=\tiny}
\scriptsize
\BlankLine
\ForEach{i in 1:t}
    {
     Randomly split the data into k folds of the same size \\
    	\ForEach{j in 1:k}
    	{
    	Use $j$th fold as test set and the union of remaining folds as training set \\
        \ForEach{p in 1:grid}
        {
                Fit model on training set using parameter set $p$ \\
                Predict on test set and calculate RMSE 
        }
    	}%end inner for
        \ForEach{p in 1:grid}{
                Calculate average RMSE over the $t \times k$-runs 
        }
        choose $p$ with the lowest RMSE
    }%end oute and r for
\caption{t-time k-fold crossvalidation and gridSearch}
\label{alg:seq}
\end{algorithm}


\quantnet \href{https://github.com/koehnden/SPL16/tree/master/Quantnet/xgbTuning/}{xgbTuning}
\quantnet \href{https://github.com/koehnden/SPL16/tree/master/Quantnet/rfTuning/}{rfTuning}
\quantnet \href{https://github.com/koehnden/SPL16/tree/master/Quantnet/svmTuning}{svmTuning}
\end{frame}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\frame{
\frametitle{Taking on the curse of Dimensionality}
Problem: 
\begin{itemize}
\item many variables (99 after pre-processing)
\item small training set ($n = 1460$) 
\item variables are correlated with each other
\end{itemize}
\vspace{0.1cm}
Our approaches:
\begin{itemize}
\item Variable selection through variable importance ranking
\item Extract a smaller set of variable using PCA
\end{itemize}
}

\section{Results and Conclusion}
\frame{
\frametitle{Outline}
\begin{enumerate}
\item Introduction \quad \checkmark
\item Pre-processing Steps \quad \checkmark
\item Model Selection \quad \checkmark
\item Variable Importance and Dimensionality Reduction \quad \checkmark 
\item Results and Conclusion
\end{enumerate}
}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\frame{
\frametitle{Results}
\begin{itemize}
\item Gaussian SVR with all variable is the single best model
\item PCA did not work well 
\item Models perform best with the full set of variables as Figure \ref{fig:RFE} suggested 
\end{itemize}
\vspace{0.25cm}
\begin{table}
\begin{center}
\begin{tabular}{c|ccc} 
\hline\hline
Inputs 		  & Gaussian SVR    & Random Forest & GBM      \\ 
\hline 
All Variables & \textbf{0.1308} &  0.1484       & 0.1333   \\
Top 30 		  & 0.1323  	 	&  0.1515       & 0.1436    \\
PCA	   		  & 0.1607  	    &  0.1657       & 0.1657     \\
\hline\hline
\end{tabular}
\caption{RMSE of submitted predictions}
\end{center}
\end{table}
\hspace{7.2cm} \href{https://github.com/koehnden/SPL16/blob/master/finalModels.R}{Github: finalModels}
}

\frame{
\frametitle{Outline}
\begin{enumerate}
\item Introduction \quad \checkmark
\item Pre-processing Steps \quad \checkmark
\item Model Selection \quad \checkmark
\item Variable Importance and Dimensionality Reduction \quad \checkmark 
\item Results and Conclusion \quad \checkmark 
\end{enumerate}
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Dedicated section for references
\section{References}
\frame{
\frametitle{References}
\begin{thebibliography}{aaaaaaaaaaaaaaaaa}
\Fontvi
\beamertemplatearticlebibitems
\bibitem{Breiman:2003}
Breiman, Leo
\newblock{\em "Random Forest." Machine learning, 45(1), 5-32, (1999)}
\newblock available on \href{http://machinelearning202.pbworks.com/w/file/fetch/60606349/breiman_randomforests.pdf}{http://machinelearning202.pbworks.com}
\bibitem{ChenGuestrin:2015}
Chen, Tianqi, and Carlos Guestrin
\newblock{\em "XGBoost: Reliable Large-scale Tree Boosting System", Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining
Pages 785-794 (2015)}
\newblock available on \href{http://learningsys.org/papers/LearningSys_2015_paper_32.pdf}{http://learningsys.org} 
\beamertemplatearticlebibitems
\bibitem{DeCock:2011}
De Cock, Dean
\newblock{\em "Ames, Iowa: Alternative to the Boston housing data as an end of semester regression project" Journal of Statistics Education 19.3 (2011)}
\newblock available on \href{https://ww2.amstat.org/publications/jse/v19n3/decock.pdf}{https://ww2.amstat.org}
\beamertemplatearticlebibitems
\end{thebibliography}
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\frame{
\frametitle{References}
\begin{thebibliography}{aaaaaaaaaaaaaaaaa}
\Fontvi
\beamertemplatearticlebibitems
\bibitem{Friedman:2003}
Friedman, Jerome H.
\newblock{\em "Greedy function approximation: a gradient boosting machine." Annals of statistics 1189-1232 (2001).}
\newblock available on \href{http://projecteuclid.org/download/pdf_1/euclid.aos/1013203451}{https://www.jstor.org/journal/annalsstatistics}
\bibitem{Kuhn:2015}
Kuhn, Max, and Kjell Johnson
\newblock{\em "Applied predictive modeling". New York: Springer (2013)}
\beamertemplatearticlebibitems
\bibitem{Vapnik:1997}
Vapnik, Vladimir, Steven E. Golowich, and Alex Smola
\newblock{\em "Support vector method for function approximation, regression estimation, and signal processing." Advances in neural information processing systems 281-287 (1997)}
\newblock available on \href{https://pdfs.semanticscholar.org/43ff/a2c1a06a76e58a333f2e7d0bd498b24365ca.pdf}{https://semanticscholar.org}
\beamertemplatearticlebibitems
\end{thebibliography}
}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{document}
Loading

Error

Success

HU Berlin Statistic Presentation

Start with our Templates

Make your Own

Follow us for More

Search for more Templates, Articles and Examples

HU Berlin Statistic Presentation

Source Code (Open as template)

Start with our Templates

Make your Own

Follow us for More

Search for more Templates, Articles and Examples