Introducing Python


  • Python is a widely used language that can be used for a variety of tasks, including analyzing data
  • Python uses different data types to handle text, numbers, collections, and other kinds of data
  • Assign values to variables using the = operator
  • Use functions and methods to perform specific actions
  • Python’s functionality can be extended using libraries, including libraries written by members of the community that address discipline-specific needs
  • Use the help() function and developer documentation to learn more about Python and Python libraries

Introducing pandas


  • This lesson uses real data from a decades-long survey of rodents in Arizona
  • pandas is a data analysis library that allows users to read, manipulate, and view tabular data using Python
  • pandas represents data as a dataframe consisting of rows (records) and columns (fields or variables)
  • Read a dataframe from CSV using the pd.read_csv() function
  • Write a dataframe to CSV using the to_csv() method
  • The behavior of a function can be modified by including arguments and keyword arguments when the function is called
  • pandas uses its own classes to represent text, numbers, booleans, and datetimes

Accessing Data in a Dataframe


  • Use square brackets to access rows, columns, and specific cells
  • Use operators like +, -, and / to perform arithmetic on rows and columns
  • Store the results of calculations in a dataframe by adding a new column or overwriting an existing column
  • Sort data, rename columns, and get unique values in a dataframe using methods provided by pandas
  • By default, most dataframe operations return a copy of the original data

Aggregating and Grouping Data


  • Calculate individual summary statistics using dataframe methods like mean(), max(), and min()
  • Calculate multiple summary statistics at once using the dataframe methods describe() and agg()
  • Group data by one or more columns using the groupby() method
  • pandas uses NaN to represent missing data in a dataframe
  • Failing to consider how missing data is interpreted in a dataset can introduce errors into calculations

Combining Dataframes


  • Combine two dataframes on one or more common values using pd.merge()
  • Append rows from one dataframe to another using pd.concat()
  • Combine multiple text columns into one using the + operator
  • Use the str accessor to use string methods like split() and zfill() on text columns
  • Convert date strings to datetime objects using pd.to_datetime()

Data Workflows and Automation