Yuming Code Review by CCranney · Pull Request #1 · jymbcrc/SMAD_data_web

CCranney · 2024-10-08T16:46:57Z

No description provided.

AlexandreHutton · 2024-10-08T16:50:30Z

Macrophages/Macrophages_multiome.py

+# 按行进行标准归一化
+def standardscaler_row(df):
+    scaler = StandardScaler()
+    proname = df.T.columns


This variable is not used; delete it

AlexandreHutton · 2024-10-08T16:51:56Z

Macrophages/Macrophages_multiome.py

+from statsmodels.stats.multitest import multipletests
+
+# file read path 
+file_read_path = r'C:\Users\jymbc\Desktop\python files for SMAD\SMAD_online\data'


This should be a relative path; ideally it should get pulled from an environment file.

AlexandreHutton · 2024-10-08T16:54:34Z

Macrophages/Macrophages_multiome.py

+    return dfpro_stdbyrow
+
+# set four treatments 
+treatment=['Con']*6 + ['LPS']*6 + ['IL_4']*6 + ['IRD']*6


This line gets executed every time the file is imported; it should be kept under __main__ if still relevant.

AlexandreHutton · 2024-10-08T16:56:47Z

Macrophages/Macrophages_multiome.py

+
+def knn_imputer(df, neighbors=6):
+    '''apply KNN imputation to a dataset'''
+    from sklearn.impute import KNNImputer


import statements should typically be called at the start of the file

AlexandreHutton · 2024-10-08T16:57:46Z

Macrophages/Macrophages_multiome.py

+df_lfq = pd.read_csv(f'{file_read_path}/all_proteins_after_SSnormalization.csv',index_col=0)
+
+
+def get_co_index(list1, list2):


this function doesn't return an index; it returns the intersection of two lists

AlexandreHutton · 2024-10-08T17:09:26Z

Macrophages/Macrophages_multiome.py

+                    text_auto=True,                 # Automatically display correlation values on the heatmap
+                    aspect="auto",
+                    color_continuous_scale='RdBu_r',  # Red-Blue color scale for better visualization of positive and negative correlations
+                    zmin=-1.5,                        # Minimum value for the color scale
+                    zmax=1.5,                         # Maximum value for the color scale


remove chatgpt comments

AlexandreHutton · 2024-10-08T17:11:43Z

Macrophages/Macrophages_multiome.py

+dfdf = df_mean_multi_clustered.copy()
+dfdf.index = extract_unique_gene_names(df_mean_multi_clustered.index)
+gene_value_dict = dfdf.iloc[:,:4].to_dict(orient='index')
+meta_gene = dfdf.iloc[-73:,:]


move to __main__

AlexandreHutton · 2024-10-08T17:12:29Z

Macrophages/Macrophages_multiome.py

+import networkx as nx
+import plotly.graph_objs as go
+import ast


move import statements to top

AlexandreHutton · 2024-10-08T17:13:30Z

Macrophages/Macrophages_multiome.py

+# test files 
+# select_df = load_cluster_data(file_path = f'{file_read_path}\Protein_KEGG_enriched_results_clustered_Macrophage.xlsx',
+#                       cluster_number = get_Kmeans(df_mean_multi_clustered, '1/sp|O08529|CAN2_MOUSE'))
+#
+# # print(select_df['Genes'].apply(type))
+# df_only = select_df.iloc[:5,:]
+# # df
+#
+# tsttt = df_only.copy()
+#
+# tsttt.loc[len(tsttt)] = ['Metabolites', list(meta_gene[meta_gene['kmeans']==get_Kmeans(df_mean_multi_clustered, '1/sp|O08529|CAN2_MOUSE')].index)]
+
+# tsttt
+
+# plot_network_figure(tsttt.iloc[[5]],gene_value_dict)


remove commented code

AlexandreHutton · 2024-10-08T17:15:08Z

Macrophages/Macrophages_multiome.py

+import dash
+from dash import Dash, dcc, html
+from dash.dependencies import Input, Output
+import plotly.express as px
+import threading
+from dash import dash_table
+
+# Initialize Dash app
+app = Dash(__name__)
+
+# Determine valid molecular names present in both df_lfq and df_mean_multi_clustered
+valid_proteins = df_lfq.index.intersection(df_mean_multi_clustered.index)
+valid_metabolites = df_meta.index.intersection(df_mean_multi_clustered.index)
+


move dash section to its own file

csmova · 2024-10-08T17:05:22Z

Macrophages/Macrophages_multiome.py

+    return (fl)
+
+# 按行进行标准归一化
+def standardscaler_row(df):


is this function needed if you already have the standardscaler function above? you can just use the axis argument in StandardScaler to change between columns/rows

csmova · 2024-10-08T17:16:55Z

Macrophages/Macrophages_multiome.py

+
+import plotly.express as px
+
+def plot_interactive_scatter_box_protein(protein_name='1/sp|B2RXS4|PLXB2_MOUSE', dfdf=df_lfq):


avoiding argument names like dfdf would help code readability

csmova · 2024-10-08T17:18:27Z

Macrophages/Macrophages_multiome.py

+meta_gene = dfdf.iloc[-73:,:]
+# list(meta_gene[meta_gene['kmeans']==0].index)
+
+import networkx as nx


try to import everything at beginning of file to help readability

csmova · 2024-10-08T17:20:27Z

Macrophages/Macrophages_multiome.py

+    return imputed_df
+
+
+def iterative_imputer(df, maxiteration=10, randomstates=0):


you might consider combining knn_imputer and iterative_imputer into a single function that gives knn and iterative as arguments

CCranney

I mostly added comments regarding function and variable names. There's also some tricks you can make with classes that can make the code more concise and easier to read; Let me know if you would be interested in doing something like that.

CCranney · 2024-10-08T16:58:28Z

Macrophages/Macrophages_multiome.py

+    return imputed_df
+
+
+def standardscaler(df):


In general, it's helpful to have functions that have self-explanatory names. For example, 'standardscaler' could technically have many uses. Something longer and more specific, like 'make_standard_scaler_for_transcriptome_data_files', would help readers (both other people and your future self) understand what this function is used for without needing to spend 5 minutes reading the code. I made up that name, I do not actually know what this standard scaler is used for, but you get my point.

And it's not just for this function, this applies to every function name you ever make. I like to make them short sentences, basically, that are verb-based (because functions do things).

CCranney · 2024-10-08T17:02:54Z

Macrophages/Macrophages_multiome.py

+
+def calculate_mean(df):
+    # Calculate mean values for each group of columns
+    mean_T1 = df.iloc[:, 0:6].mean(axis=1)


I don't know if you may need to change the size of your dataframe in the future, but you may want to update this function to be more dynamic. For example, making a means list:

means = [df.iloc[:, i*6:(i+1)*6].mean(axis=1) for i in range(4)]

and return dictionary:
return {f'T{i}':means[i] for i in range(4)}

CCranney · 2024-10-08T17:09:30Z

Macrophages/Macrophages_multiome.py

+    # Return an empty figure if no data is available
+    return {}
+
+def process_data_for_table(molecular_name):


What kind of processing are you doing? The more specific your function name, the better

CCranney · 2024-10-08T17:11:34Z

Macrophages/Macrophages_multiome.py

+meta_gene = dfdf.iloc[-73:,:]
+# list(meta_gene[meta_gene['kmeans']==0].index)
+
+import networkx as nx


I think Lex is making this comment, but you may want to break these up into separate files (multiple import statements). If you need help learning how to access functions/classes from other files, feel free to reach out to Lex or myself.

CCranney · 2024-10-08T17:13:07Z

Macrophages/Macrophages_multiome.py

+dfdf = df_mean_multi_clustered.copy()
+dfdf.index = extract_unique_gene_names(df_mean_multi_clustered.index)
+gene_value_dict = dfdf.iloc[:,:4].to_dict(orient='index')
+meta_gene = dfdf.iloc[-73:,:]


What does -73 represent? You may want to make a variable name for this so you know in the future, like the following:

number_of_valid_metabolites = 73
meta_gene = dfdf.iloc[-number_of_valid_metabolites:,:]

This is also something to bear in mind when you use any number in the file. What does '6' mean? Why are you doing groups of 4? It's always good to specify what a number is supposed to mean so that you don't have to dig deep to discover what it means.

CCranney · 2024-10-08T17:13:40Z

Macrophages/Macrophages_multiome.py

+    return result_genes
+
+# get gene to pathway reflection
+dfdf = df_mean_multi_clustered.copy()


I'd probably have a better name than 'dfdf', I don't know what dataframe this is supposed to represent

CCranney · 2024-10-08T17:15:54Z

Macrophages/Macrophages_multiome.py

+    Output('scatter-box-plot-protein', 'figure'),
+    [Input('protein-dropdown', 'value')]
+)
+def update_scatterbox_pro_plot(protein_name):


Might as well just turn 'pro' -> 'protein' in this function name, since 'pro' can have multiple meanings.

jymbcrc added 3 commits October 8, 2024 07:18

Create read.txt

803cb31

Add files via upload

ceadff2

Update Macrophages_multiome.py

2cd56ba

AlexandreHutton requested changes Oct 8, 2024

View reviewed changes

csmova reviewed Oct 8, 2024

View reviewed changes

CCranney commented Oct 8, 2024

View reviewed changes

		df_lfq = pd.read_csv(f'{file_read_path}/all_proteins_after_SSnormalization.csv',index_col=0)


		def get_co_index(list1, list2):


		import plotly.express as px

		def plot_interactive_scatter_box_protein(protein_name='1/sp\|B2RXS4\|PLXB2_MOUSE', dfdf=df_lfq):

		return imputed_df


		def iterative_imputer(df, maxiteration=10, randomstates=0):

Comments

Conversation

CCranney commented Oct 8, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CCranney left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants