Group-wise Count and Sorting Example
Suppose we have a dataset of wines with countries and points:
| country | wine | points |
|---|---|---|
| India | WineA | 87 |
| US | WineB | 90 |
| India | WineC | 88 |
| France | WineD | 85 |
| India | WineE | 91 |
Group by Country and Count:
reviews.groupby('country').country.count()
| country | count |
|---|---|
| France | 1 |
| India | 3 |
| US | 1 |
Sorted by Count (Descending):
reviews.groupby('country').country.count().sort_values(ascending=False)
| country | count |
|---|---|
| India | 3 |
| France | 1 |
| US | 1 |
Explanation:
- groupby('country') → Creates buckets for each country.
- .count() → Counts how many rows are in each group.
- .sort_values(ascending=False) → Sorts the result so the highest count appears first.
Understanding the reviews DataFrame
In the examples above, reviews is a pandas DataFrame that stores
the data read from a CSV file.
1. Reading a CSV into reviews
import pandas as pd
reviews = pd.read_csv("wine-reviews.csv") # 'reviews' now stores all CSV data
2. What reviews contains
- Rows: each row is a record (a single wine review).
- Columns: each column is a feature like
country,points,price,title, etc.
3. Inspecting the reviews DataFrame
You can check its size, first few rows, or data types:
reviews.shape # Shows number of rows and columns
reviews.head() # Shows the first 5 rows
reviews.dtypes # Shows data types of each column
Whenever you see reviews in the tutorials, think of it as the DataFrame object
holding your CSV data. It allows you to perform all sorts of data operations like filtering,
grouping, and sorting.