Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Fix csv file paths for learn site sample. Improve streaming output formatting with code present #10498

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

moonbox3
Copy link
Contributor

Motivation and Context

The code sample located here had an issue loading the resource csv files right out of the box. Additionally, the formatted output could use some help.

Now the output is formatted nicely like:


To provide a column chart of the top 10 countries by population with values at the top of each column, I'll need to examine the contents of the uploaded files. Let's first take a look at the data in these files to understand their structure and content.

import pandas as pd

# Load the uploaded files to inspect their contents
file_1_path = '/mnt/data/assistant-T1tWYT1f5Y8JFRuYtxgjck'
file_2_path = '/mnt/data/assistant-LUgpebvSh4PTrFdEkVEXMi'

# Try reading the files into pandas dataframes
try:
    data_1 = pd.read_csv(file_1_path)
except Exception as e:
    data_1 = str(e)

try:
    data_2 = pd.read_csv(file_2_path)
except Exception as e:
    data_2 = str(e)

data_1, data_2

The first file seems to contain regional population data for various provinces and states within countries, such as Belgium and the US. The important columns for this task are:

  • Country_Region: the name of the country
  • Population: the population of the region

The second file provides population data by country, which seems more appropriate for finding the top 10 countries by population. It contains:

  • Country_Region: the name of the country
  • Population: the population of the country

We'll use the second file to extract the top 10 countries by population and create a column chart with the population values displayed on top. Let's proceed with this analysis.

import matplotlib.pyplot as plt
import seaborn as sns

# Use the second data set which has per country data
country_data = data_2

# Sort by Population and select the top 10 countries
top_10_countries = country_data.sort_values(by='Population', ascending=False).head(10)

# Plotting
plt.figure(figsize=(12, 8))
sns.barplot(data=top_10_countries, x='Country_Region', y='Population', palette='viridis')
plt.xticks(rotation=45)
plt.title('Top 10 Countries by Population')
plt.xlabel('Country')
plt.ylabel('Population')

# Adding population values on top of each bar
for index, row in top_10_countries.iterrows():
    plt.text(index, row['Population'], f"{row['Population']:,}", color='black', ha="center")

plt.tight_layout()
plt.show()

Description

Fix formatting and csv links for a learn resource sample.

Contribution Checklist

@moonbox3 moonbox3 self-assigned this Feb 12, 2025
@moonbox3 moonbox3 requested a review from a team as a code owner February 12, 2025 04:38
@markwallace-microsoft markwallace-microsoft added the python Pull requests for the Python Semantic Kernel label Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests for the Python Semantic Kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants