Canhong Wen
R
and Python
. R
or Python
using Rmarkdown
or Jupyter
notebooks. By the end of this course students will be able to do:
R
and Python
(vectors, matrices, arrays, lists, dataframes) and their various strengths and weaknesses.R
and Python
.R
and Python
codeR
and Python
, including: importing, preprossing, ploting data, and perform basic statistical inference such as linear regression and hypothesis testingR
package or Python
module
.There is no required textbook for this course. Lectures will be based on material from the following sources.
Other online resourses:
There will be a biweekly in-class lab(optional), homework nearly every one week, a midterm project and a final exam. Grades will be calculated as follows:
Project: 30%
Survey of Kagglers finds Python, R to be preferred tools
The rankings varied according to the job title of the respondent.
R
: Business Analyst, Data Analyst, Data Miner, Operations Researcher, Predictive Modeler, StatisticianPython
: Computer Scientist, Data Scientist, Engineer, Machine Learning Engineer, Other, Programmer, Researcher, Scientist, Software DeveloperBasic interaction with R is by typing in the console, a.k.a. terminal or command-line
You type in commands, R gives back answers (or errors)
Menus and other graphical interfaces are extras built on top of the console
getOption("defaultPackages")
(.packages())
Load R Package before you use it.
library(rpart)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
require(rpart)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
What is the difference between library
and require
?
{r}
install.packages("BeSS")
Install from other resources through devtools
{r}
library("devtools")
install_github("ggplot2") # install from Github
install_bioc("AnnotationDbi")
Install locally in some directory (downloaded from author's homepage)
{r}
install.packages("l0tf_0.1.0.tar.gz", repos = NULL, type = "source")
{r}
help("t.test")
?t.test
Python is rapidly becoming the preferred language of data scientists in both industry and academia. It’s used by Google, Facebook and other tech giants to perform data analysis and run machine learning algorithms that can handle hundreds of thousands of terabytes of data per day.
Python can be used for:
After installing Anaconda, we have
The Jupyter notebook is an interactive, web-based environment that allows one to combine code, text and graphics into one unified document.
The Jupyter notebook has three types of cells:
For more details, see this online Reference Guide
2+3
5
import time, sys # Import packages time and sys
for i in range(8): # for loop
print(i) # print the number of iteration
time.sleep(0.5) # sleep 0.5 second before executing the procedure
0 1 2 3 4 5 6 7
List in Markdown
Style and Emphasis
*Italics*
_Italics_
**Bold**
__Bold__
***Bold and Italics***
___Bold and Italics___
~~strickout~~
Italics
Italics
Bold
Bold
Bold and Italics
Bold and Italics
strickout
Inerting Table in Markdown
Header | Header | Header | Header |
---|---|---|---|
Cell | Cell | Cell | Cell |
Cell | Cell | Cell | Cell |
Cell | Cell | Cell | Cell |
Cell | Cell | Cell | Cell |
Centered, Right-Justified, and Regular Cells and Headers:
centered header | regular header | right-justified header | centered header | regular header |
---|---|---|---|---|
centered cell | regular cell | right-justified cell | centered cell | regular cell |
centered cell | regular cell | right-justified cell | centered cell | regular cell |
Inserting Hyperlinks
www.ustc.edu.cn
[clink this link](http://www.ustc.edu.cn)
[clink this link](http://www.ustc.edu.cn "USTC")
Inserting Images
Inserting an image is almost identical to inserting a link. You just also type a !
before the first set of brackets:

Including Code Examples
lm()
import time, sys # Import packages time and sys
for i in range(8): # for loop
print(i) # print the number of iteration
time.sleep(0.5) # sleep 0.5 second before executing the procedure
import time, sys # Import packages time and sys
for i in range(8): # for loop
print(i) # print the number of iteration
time.sleep(0.5) # sleep 0.5 second before executing the procedure
LaTeX Math
Jupyter Notebooks' Markdown cells support LateX for formatting mathematical equations. To tell Markdown to interpret your text as LaTex, surround your input with dollar signs like this:
$z=\dfrac{2x}{3y}$
$z=\dfrac{2x}{3y}$
$$2x+3y=z$$
This is Raw NBConvert output:
centered header | regular header | right-justified header | centered header | regular header |
---|---|---|---|---|
centered cell | regular cell | right-justified cell | centered cell | regular cell |
centered cell | regular cell | right-justified cell | centered cell | regular cell |
R Markdown provides an authoring framework for data science. You can use a single R Markdown file to both
Installation
{r}
install.packages("rmarkdown")