Reading 12
Pandas 🐼
Pandas is a Python library that is full of tools which can be used for data analysis. Pandas uses DataFrames to organize and manipulate data.
First, import pandas:
import pandas as pd
Methods called on the DataFrame (df) can remove, group, and plot and graph the df data. Pandas strength lies in the ability to quickly reorganize a df into an easily readable table with the desired categories being taken into consideration and displayed.
Creating a DataFrame
Below, a dictionary of objects is converted to a series.
df2 = pd.DataFrame(
{
"A": 1.0,
"B": pd.Timestamp("20130102"),
"C": pd.Series(1, index=list(range(4)), dtype="float32"),
"D": np.array([3] * 4, dtype="int32"),
"E": pd.Categorical(["test", "train", "test", "train"]),
"F": "foo",
}
)
df2
Out[10]:
A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo
source: https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html
Pandas allows the user to write to Excel or CSV files, and can push data to Numpy for plotting.
A Growing List of Useful Methods:
- DataFrame.to_numpy() gives a NumPy representation of your data. This is fast when all data types are the same, but expensive when they vary.
- describe() provides a quick summary of your data
- T transposes data
- stack() “compresses” a level in the DataFrame’s columns.
- to_csv(“spam.csv”) writes to a DF file.