Introducing Python


  • Python is a widely used language that can be used for a variety of tasks, including analyzing data
  • Python uses different data types to handle text, numbers, collections, and other kinds of data
  • Assign values to variables using the = operator
  • Use functions and methods to perform specific actions
  • Python’s functionality can be extended using packages developed by the community
  • Use the help() function and developer documentation to learn more about Python modules and packages

Introducing pandas


  • This lesson uses real data from a decades-long survey of rodents in Arizona
  • pandas is a data analysis package that allows users to read, manipulate, and view tabular data using Python
  • pandas represents data as a dataframe consisting of rows (records) and columns (fields or variables)
  • We can read a dataframe from CSV using the pd.read_csv() function and write a dataframe to CSV using the to_csv() method
  • The behavior of a function can be modified by including arguments and keyword arguments when the function is called
  • pandas uses its own classes to represent text, numbers, booleans, and datetimes

Accessing and Filtering Data


  • Use square brackets to access rows, columns, and specific cells
  • Sort data and get unique values in a dataframe using methods provided by pandas
  • By default, most dataframe operations return a copy of the original data
  • Scatter plots can be used to visualize how two parameters in a dataset covary

Aggregating Data


  • Calculate individual summary statistics using dataframe methods like mean(), max(), and min()
  • Calculate multiple summary statistics at once using the dataframe methods describe() and agg()
  • Group data by one or more columns using the groupby() method
  • Failing to consider how missing data is interpreted in a dataset can introduce significant errors
  • Box plots can be used to visualize the distribution of a single parameter

Combining Dataframes


  • Combine two dataframes on one or more common values using pd.merge()
  • Append rows from one dataframe to another using pd.concat()
  • Combine multiple text columns into one using the + operator
  • Convert date info to datetime objects using pd.to_datetime()

Visualizing Data


  • Plotly offers a wide variety of ways to build and style scatter plots
  • Use scatter plots to visualize how parameters covary
  • Use box and violin plots to visualize the distribution of a parameter