Introducing Python
- Python is a widely used language that can be used for a variety of tasks, including analyzing data
- Python uses different data types to handle text, numbers, collections, and other kinds of data
- Assign values to variables using the
=
operator - Use functions and methods to perform specific actions
- Python’s functionality can be extended using libraries, including libraries written by members of the community that address discipline-specific needs
- Use the
help()
function and developer documentation to learn more about Python and Python libraries
Introducing pandas
- This lesson uses real data from a decades-long survey of rodents in Arizona
- pandas is a data analysis library that allows users to read, manipulate, and view tabular data using Python
- pandas represents data as a dataframe consisting of rows (records) and columns (fields or variables)
- Read a dataframe from CSV using the
pd.read_csv()
function - Write a dataframe to CSV using the
to_csv()
method - The behavior of a function can be modified by including arguments and keyword arguments when the function is called
- pandas uses its own classes to represent text, numbers, booleans, and datetimes
Accessing Data in a Dataframe
- Use square brackets to access rows, columns, and specific cells
- Use operators like
+
,-
, and/
to perform arithmetic on rows and columns - Store the results of calculations in a dataframe by adding a new column or overwriting an existing column
- Sort data, rename columns, and get unique values in a dataframe using methods provided by pandas
- By default, most dataframe operations return a copy of the original data
Aggregating and Grouping Data
- Calculate individual summary statistics using dataframe methods like
mean()
,max()
, andmin()
- Calculate multiple summary statistics at once using the dataframe methods
describe()
andagg()
- Group data by one or more columns using the
groupby()
method - pandas uses NaN to represent missing data in a dataframe
- Failing to consider how missing data is interpreted in a dataset can introduce errors into calculations
Combining Dataframes
- Combine two dataframes on one or more common values using
pd.merge()
- Append rows from one dataframe to another using
pd.concat()
- Combine multiple text columns into one using the
+
operator - Use the
str
accessor to use string methods likesplit()
andzfill()
on text columns - Convert date strings to datetime objects using
pd.to_datetime()