-
Notifications
You must be signed in to change notification settings - Fork 0
Added part of data from Alfalfa catalog to the test server #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
import pandas as pd | ||
import numpy as np | ||
|
||
|
||
df = pd.read_csv("./alfalfa/tables/raw_info.csv", sep=" ", engine="python") | ||
df.rename(columns={ | ||
"Units": "unit", | ||
"Label": "name", | ||
"Explanations": "description"} | ||
, inplace=True) | ||
df = df[["name", "unit","description"]] | ||
azzzile marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
def check_nans(row) -> str: | ||
if row["unit"] == "---": | ||
return np.nan | ||
return row["unit"] | ||
|
||
df['unit'] = df.apply(check_nans, axis=1) | ||
|
||
def escape_percent(value): | ||
if isinstance(value, str): | ||
return value.replace('%', '%%') | ||
return value | ||
|
||
df['description'] = df['description'].apply(escape_percent) | ||
|
||
|
||
# also replacing dots with NaN in catalog data | ||
def check_nans_in_df(row) -> str: | ||
if row["Name"] == "........": | ||
return np.nan | ||
return row["Name"] | ||
|
||
data = pd.read_csv("./alfalfa/tables/main_data.csv") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. и это тоже давай аргументом командной строки |
||
data['Name'] = data.apply(check_nans_in_df, axis=1) | ||
data.to_csv(f"./alfalfa/tables/main_data.csv", index=False) | ||
|
||
|
||
table_columns = data.dtypes | ||
table_columns = pd.DataFrame({'name':table_columns.index, 'data_type':table_columns.values}) | ||
table_columns = table_columns.replace({ | ||
"int64": "int", | ||
"float64": "float", | ||
"object": "str", | ||
}) | ||
|
||
table_columns = pd.merge(table_columns, df, on="name", how="left") | ||
table_columns.to_csv(f"./alfalfa/tables/main_info.csv", index=False) |
azzzile marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
import pandas as pd | ||
import hyperleda | ||
import os | ||
import psycopg2 | ||
|
||
|
||
|
||
ALFALFA_BIBCODE = "2018ApJ...861...49H" # bibcode for ALFALFA 2018 article from ads and offisial website | ||
|
||
conn = psycopg2.connect( | ||
azzzile marked this conversation as resolved.
Show resolved
Hide resolved
|
||
host= os.getenv("HYPERLEDA_DB_HOST"), | ||
database=os.getenv("HYPERLEDA_DB_DATABASE"), | ||
user=os.getenv("HYPERLEDA_DB_USER"), | ||
password=os.getenv("HYPERLEDA_DB_PASSWORD"), | ||
port=os.getenv("HYPERLEDA_DB_PORT") | ||
) | ||
|
||
client = hyperleda.HyperLedaClient(endpoint=hyperleda.TEST_ENDPOINT) | ||
|
||
|
||
def del_nans(row): | ||
return {k:v for k,v in row.items() if v == v} | ||
|
||
def leda_dtyper(row) -> str: | ||
return hyperleda.DataType(row["data_type"]) | ||
|
||
|
||
# getting columns info | ||
table_columns = pd.read_csv(f"./alfalfa/tables/main_info.csv") | ||
table_columns["data_type"] = table_columns.apply(leda_dtyper, axis=1) | ||
table_dict = table_columns.to_dict("records") | ||
|
||
# table creation | ||
table_name = f"alfalfa_hi_source_catalog" | ||
azzzile marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
table_id = client.create_table( | ||
hyperleda.CreateTableRequestSchema( | ||
table_name=table_name, | ||
columns=[ | ||
hyperleda.ColumnDescription(**del_nans(column)) for column in table_dict | ||
], | ||
bibcode=ALFALFA_BIBCODE, | ||
) | ||
) | ||
|
||
print(f"Created table '{table_name}' with ID: {table_id}") | ||
|
||
# reading all data from alfalfa catalog | ||
df = pd.read_csv("./alfalfa/tables/main_data.csv") | ||
|
||
offset = 0 | ||
batch = 500 | ||
test_limit = 1000 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. тут же вроде не надо, строк-то не оч много? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. если сразу всю табличку закидывать, мне все равно будет банить с request entity too large There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. я максимум ~3к объектов за раз могу грузить There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. тут речь не про батч, а про test_limit) |
||
|
||
while offset <= test_limit: | ||
data = df.iloc[offset:offset+batch] | ||
|
||
if data.empty: | ||
break | ||
|
||
print(data) | ||
client.add_data(table_id, data) | ||
|
||
print(f"Added {data.shape[0]} rows to the table {table_name}. In total {offset + batch} rows") | ||
|
||
offset += batch | ||
|
||
print(f"Added all data to the table '{table_name}'") | ||
|
||
conn.close() | ||
|
azzzile marked this conversation as resolved.
Show resolved
Hide resolved
|
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
name,data_type,unit,description | ||
AGCNr,int,,Entry number in catalog | ||
Name,str,,Common name | ||
RAdeg_HI,float,, | ||
DECdeg_HI,float,, | ||
RAdeg_OC,float,, | ||
DECdeg_OC,float,, | ||
Vhelio,int,km/s,Heliocentric velocity of the HI profile midpoint | ||
W50,int,km/s,Observed velocity width at 50%% of peak on either side | ||
sigW,int,, | ||
W20,int,km/s,Observed velocity width at 20%% of peak on either side | ||
HIflux,float,Jy.km/s,HI line flux density | ||
sigflux,float,Jy.km/s,Uncertainty in HIflux | ||
SNR,float,,Ratio of peak flux to rms noise | ||
RMS,float,mJy,The RMS noise in the extracted spectrum at 10 km/s resolution | ||
Dist,float,Mpc,"Adopted distance, where applicable" | ||
sigDist,float,, | ||
logMH,float,, | ||
siglogMH,float,, | ||
HIcode,int,,HI source code (2) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
Bytes Format Units Label Explanations | ||
1-6 I6 --- AGCNr Entry number in catalog | ||
8-15 A8 --- Name Common name | ||
17-31 A15 --- PosHI Position (J2000) of HI centroid (1) | ||
33-47 A15 --- PosOC Position (J2000) of optical counterpart, where applicable (1) | ||
49-53 I5 km/s Vhelio Heliocentric velocity of the HI profile midpoint | ||
55-57 I3 km/s W50 Observed velocity width at 50% of peak on either side | ||
59-61 I3 km/s sigW50 Uncertainty in W50 | ||
63-65 I3 km/s W20 Observed velocity width at 20% of peak on either side | ||
67-72 F7.2 Jy.km/s HIflux HI line flux density | ||
74-77 F4.2 Jy.km/s sigflux Uncertainty in HIflux | ||
79-83 F5.1 --- SNR Ratio of peak flux to rms noise | ||
85-89 F5.2 mJy RMS The RMS noise in the extracted spectrum at 10 km/s resolution | ||
91-95 F5.1 Mpc Dist Adopted distance, where applicable | ||
97-100 F4.1 Mpc sigD Uncertainty in distance, where applicable | ||
102-106 F5.2 [solMass] logMHI HI mass in logarithmic solar units, where distance has been adopted | ||
108-111 F4.1 [solMass] sigMHI Uncertainty in logMHI | ||
113 I1 --- HIcode HI source code (2) |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. и тут тоже давай пути сделаем click-ом сразу There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Кстати параметры базы данных тоже можно туда утащить через переменные окружения: https://click.palletsprojects.com/en/8.1.x/arguments/#environment-variables There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. а их кликом доставать или как было: host=os.getenv("HYPERLEDA_DB_HOST") и тд? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. клик умеет в переменные окружения, так что как будто можно сразу кликом и просто в переменную положить |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
давай сделаем это параметром из командной строки? чтобы если путь поменялся можно было бы легко поменять а не по коду искать
хорошая и простая библиотека для таких штук click
https://click.palletsprojects.com/en/8.1.x/quickstart/#basic-concepts-creating-a-command
https://click.palletsprojects.com/en/8.1.x/quickstart/#adding-parameters