Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL数据转换CSV丢失数据 #1

Open
Guo-Zhang opened this issue Feb 19, 2024 · 1 comment
Open

SQL数据转换CSV丢失数据 #1

Guo-Zhang opened this issue Feb 19, 2024 · 1 comment

Comments

@Guo-Zhang
Copy link
Member

先把mySQL的数据下载下来,保存在csv,然后再用csv的数据去用Pandas匹配,这样得到的数据跟最初的数据不太一样,不知道什么原因。现在去掉保存csv这个步骤,数据就能匹配得上了

建议1:

从table往csv, xlsx转化的时候会损失一些信息,用pickle序列化就很少有这个问题

Q:在这个问题里具体损失了什么信息。

建议2:

  1. Pandas可以直接连接SQL
  2. 最佳实践是建议先用SQL做前期工作(特别是join生成新表),然后再导入。数据库对join有优化,Pandas理论上很难做到数据库的水平(本质上是把所有数据读内存)。
@Guo-Zhang
Copy link
Member Author

Q:在这个问题里具体损失了什么信息
打开Csv时显示DtypeWarning: Columns (0,3,4,8,10,11,19) have mixed types.
有一些乱码和重复值,去掉这部分以后剩下的数据应该和SQL里是一样的。但是id里面既有int也有str,我当时没发现这一点,所以导致很多数据没匹配上。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant