Introducing Python
- Python is a widely used language that can be used for a variety of tasks, including analyzing data
- Python uses different data types to handle text, numbers, collections, and other kinds of data
- Assign values to variables using the
=
operator - Use functions and methods to perform specific actions
- Python’s functionality can be extended using packages developed by the community
- Use the
help()
function and developer documentation to learn more about Python modules and packages
Introducing pandas
- This lesson uses real data from a decades-long survey of rodents in Arizona
- pandas is a data analysis package that allows users to read, manipulate, and view tabular data using Python
- pandas represents data as a dataframe consisting of rows (records) and columns (fields or variables)
- We can read a dataframe from CSV using the
pd.read_csv()
function and write a dataframe to CSV using theto_csv()
method - The behavior of a function can be modified by including arguments and keyword arguments when the function is called
- pandas uses its own classes to represent text, numbers, booleans, and datetimes
Accessing and Filtering Data
- Use square brackets to access rows, columns, and specific cells
- Sort data and get unique values in a dataframe using methods provided by pandas
- By default, most dataframe operations return a copy of the original data
- Scatter plots can be used to visualize how two parameters in a dataset covary
Aggregating Data
- Calculate individual summary statistics using dataframe methods like
mean()
,max()
, andmin()
- Calculate multiple summary statistics at once using the dataframe
methods
describe()
andagg()
- Group data by one or more columns using the
groupby()
method - Failing to consider how missing data is interpreted in a dataset can introduce significant errors
- Box plots can be used to visualize the distribution of a single parameter
Combining Dataframes
- Combine two dataframes on one or more common values using
pd.merge()
- Append rows from one dataframe to another using
pd.concat()
- Combine multiple text columns into one using the
+
operator - Convert date info to datetime objects using
pd.to_datetime()
Visualizing Data
- Plotly offers a wide variety of ways to build and style scatter plots
- Use scatter plots to visualize how parameters covary
- Use box and violin plots to visualize the distribution of a parameter