Skip to content

Comments

Yuming Code Review#1

Open
CCranney wants to merge 3 commits intoreview/yuming-code-reviewfrom
main
Open

Yuming Code Review#1
CCranney wants to merge 3 commits intoreview/yuming-code-reviewfrom
main

Conversation

@CCranney
Copy link
Collaborator

@CCranney CCranney commented Oct 8, 2024

No description provided.

# 按行进行标准归一化
def standardscaler_row(df):
scaler = StandardScaler()
proname = df.T.columns
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is not used; delete it

from statsmodels.stats.multitest import multipletests

# file read path
file_read_path = r'C:\Users\jymbc\Desktop\python files for SMAD\SMAD_online\data'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a relative path; ideally it should get pulled from an environment file.

return dfpro_stdbyrow

# set four treatments
treatment=['Con']*6 + ['LPS']*6 + ['IL_4']*6 + ['IRD']*6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line gets executed every time the file is imported; it should be kept under __main__ if still relevant.


def knn_imputer(df, neighbors=6):
'''apply KNN imputation to a dataset'''
from sklearn.impute import KNNImputer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import statements should typically be called at the start of the file

df_lfq = pd.read_csv(f'{file_read_path}/all_proteins_after_SSnormalization.csv',index_col=0)


def get_co_index(list1, list2):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function doesn't return an index; it returns the intersection of two lists

Comment on lines +212 to +216
text_auto=True, # Automatically display correlation values on the heatmap
aspect="auto",
color_continuous_scale='RdBu_r', # Red-Blue color scale for better visualization of positive and negative correlations
zmin=-1.5, # Minimum value for the color scale
zmax=1.5, # Maximum value for the color scale
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove chatgpt comments

Comment on lines +291 to +294
dfdf = df_mean_multi_clustered.copy()
dfdf.index = extract_unique_gene_names(df_mean_multi_clustered.index)
gene_value_dict = dfdf.iloc[:,:4].to_dict(orient='index')
meta_gene = dfdf.iloc[-73:,:]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to __main__

Comment on lines +297 to +299
import networkx as nx
import plotly.graph_objs as go
import ast
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move import statements to top

Comment on lines +415 to +429
# test files
# select_df = load_cluster_data(file_path = f'{file_read_path}\Protein_KEGG_enriched_results_clustered_Macrophage.xlsx',
# cluster_number = get_Kmeans(df_mean_multi_clustered, '1/sp|O08529|CAN2_MOUSE'))
#
# # print(select_df['Genes'].apply(type))
# df_only = select_df.iloc[:5,:]
# # df
#
# tsttt = df_only.copy()
#
# tsttt.loc[len(tsttt)] = ['Metabolites', list(meta_gene[meta_gene['kmeans']==get_Kmeans(df_mean_multi_clustered, '1/sp|O08529|CAN2_MOUSE')].index)]

# tsttt

# plot_network_figure(tsttt.iloc[[5]],gene_value_dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented code

Comment on lines +432 to +445
import dash
from dash import Dash, dcc, html
from dash.dependencies import Input, Output
import plotly.express as px
import threading
from dash import dash_table

# Initialize Dash app
app = Dash(__name__)

# Determine valid molecular names present in both df_lfq and df_mean_multi_clustered
valid_proteins = df_lfq.index.intersection(df_mean_multi_clustered.index)
valid_metabolites = df_meta.index.intersection(df_mean_multi_clustered.index)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move dash section to its own file

return (fl)

# 按行进行标准归一化
def standardscaler_row(df):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this function needed if you already have the standardscaler function above? you can just use the axis argument in StandardScaler to change between columns/rows


import plotly.express as px

def plot_interactive_scatter_box_protein(protein_name='1/sp|B2RXS4|PLXB2_MOUSE', dfdf=df_lfq):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoiding argument names like dfdf would help code readability

meta_gene = dfdf.iloc[-73:,:]
# list(meta_gene[meta_gene['kmeans']==0].index)

import networkx as nx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to import everything at beginning of file to help readability

return imputed_df


def iterative_imputer(df, maxiteration=10, randomstates=0):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might consider combining knn_imputer and iterative_imputer into a single function that gives knn and iterative as arguments

Copy link
Collaborator Author

@CCranney CCranney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly added comments regarding function and variable names. There's also some tricks you can make with classes that can make the code more concise and easier to read; Let me know if you would be interested in doing something like that.

return imputed_df


def standardscaler(df):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it's helpful to have functions that have self-explanatory names. For example, 'standardscaler' could technically have many uses. Something longer and more specific, like 'make_standard_scaler_for_transcriptome_data_files', would help readers (both other people and your future self) understand what this function is used for without needing to spend 5 minutes reading the code. I made up that name, I do not actually know what this standard scaler is used for, but you get my point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's not just for this function, this applies to every function name you ever make. I like to make them short sentences, basically, that are verb-based (because functions do things).


def calculate_mean(df):
# Calculate mean values for each group of columns
mean_T1 = df.iloc[:, 0:6].mean(axis=1)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if you may need to change the size of your dataframe in the future, but you may want to update this function to be more dynamic. For example, making a means list:

means = [df.iloc[:, i*6:(i+1)*6].mean(axis=1) for i in range(4)]

and return dictionary:
return {f'T{i}':means[i] for i in range(4)}

# Return an empty figure if no data is available
return {}

def process_data_for_table(molecular_name):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of processing are you doing? The more specific your function name, the better

meta_gene = dfdf.iloc[-73:,:]
# list(meta_gene[meta_gene['kmeans']==0].index)

import networkx as nx
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Lex is making this comment, but you may want to break these up into separate files (multiple import statements). If you need help learning how to access functions/classes from other files, feel free to reach out to Lex or myself.

dfdf = df_mean_multi_clustered.copy()
dfdf.index = extract_unique_gene_names(df_mean_multi_clustered.index)
gene_value_dict = dfdf.iloc[:,:4].to_dict(orient='index')
meta_gene = dfdf.iloc[-73:,:]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does -73 represent? You may want to make a variable name for this so you know in the future, like the following:

number_of_valid_metabolites = 73
meta_gene = dfdf.iloc[-number_of_valid_metabolites:,:]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also something to bear in mind when you use any number in the file. What does '6' mean? Why are you doing groups of 4? It's always good to specify what a number is supposed to mean so that you don't have to dig deep to discover what it means.

return result_genes

# get gene to pathway reflection
dfdf = df_mean_multi_clustered.copy()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably have a better name than 'dfdf', I don't know what dataframe this is supposed to represent

Output('scatter-box-plot-protein', 'figure'),
[Input('protein-dropdown', 'value')]
)
def update_scatterbox_pro_plot(protein_name):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well just turn 'pro' -> 'protein' in this function name, since 'pro' can have multiple meanings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants