Welcome to Brannon's Reading Notes!

This serves as my personal page to keep and update my reading notes for Code Fellows Courses 201, 301, and 401.

Reading 12

Pandas 🐼

Pandas is a Python library that is full of tools which can be used for data analysis. Pandas uses DataFrames to organize and manipulate data.

First, import pandas: import pandas as pd

Methods called on the DataFrame (df) can remove, group, and plot and graph the df data. Pandas strength lies in the ability to quickly reorganize a df into an easily readable table with the desired categories being taken into consideration and displayed.

Creating a DataFrame

Below, a dictionary of objects is converted to a series.

df2 = pd.DataFrame(
    {
        "A": 1.0,
        "B": pd.Timestamp("20130102"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
    }
)


df2
Out[10]: 
     A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo

source: https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html

Pandas allows the user to write to Excel or CSV files, and can push data to Numpy for plotting.

A Growing List of Useful Methods:

DataFrame.to_numpy() gives a NumPy representation of your data. This is fast when all data types are the same, but expensive when they vary.
describe() provides a quick summary of your data
T transposes data
stack() “compresses” a level in the DataFrame’s columns.
to_csv(“spam.csv”) writes to a DF file.