Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

配置好json后,无法爬取微博信息 #642

Closed
BGMY-CUSC opened this issue Mar 10, 2025 · 2 comments
Closed

配置好json后,无法爬取微博信息 #642

BGMY-CUSC opened this issue Mar 10, 2025 · 2 comments
Labels
failed 程序运行出错

Comments

@BGMY-CUSC
Copy link

BGMY-CUSC commented Mar 10, 2025

为了更好的解决问题,请认真回答下面的问题。等到问题解决,请及时关闭本issue。

  • 问:请您指明哪个版本运行出错(github版/PyPi版/全部)?

答:github

  • 问:您使用的是否是最新的程序(是/否)?

答:是

  • 问:爬取任意用户都会运行出错吗(是/否)?

答:是

  • 问:若只有爬特定微博时才出错,能否提供出错微博的weibo_id或url(非必填)?

答:

  • 问:若您已提供出错微博的weibo_id或url,可忽略此内容,否则能否提供出错账号的user_id及您配置的since_date,方便我们定位出错微博(非必填)?

答:

  • 问:如果方便,请您描述出错详情,最好附上错误提示。

答:爬取报错:

list index out of range
Traceback (most recent call last):
  File "D:\conda\envs\tmap\lib\site-packages\weibo_spider\parser\index_parser.py", line 39, in get_user
    self.user.weibo_num = string_to_int(user_info[0][3:-1])
IndexError: list index out of range
None
****************************************************************************************************
'NoneType' object has no attribute 'nickname'
Traceback (most recent call last):
  File "D:\conda\envs\tmap\lib\site-packages\weibo_spider\spider.py", line 226, in _get_filepath
    dir_name = self.user.nickname
AttributeError: 'NoneType' object has no attribute 'nickname'
expected str, bytes or os.PathLike object, not NoneType
Traceback (most recent call last):
  File "D:\conda\envs\tmap\lib\site-packages\weibo_spider\writer\csv_writer.py", line 25, in __init__
    with open(self.file_path, 'a', encoding='utf-8-sig',
TypeError: expected str, bytes or os.PathLike object, not NoneType
'NoneType' object has no attribute 'nickname'
Traceback (most recent call last):
  File "D:\conda\envs\tmap\lib\site-packages\weibo_spider\spider.py", line 226, in _get_filepath
    dir_name = self.user.nickname
AttributeError: 'NoneType' object has no attribute 'nickname'
'NoneType' object has no attribute '__dict__'
Traceback (most recent call last):
  File "D:\conda\envs\tmap\lib\site-packages\weibo_spider\spider.py", line 313, in get_one_user
    self.write_user(self.user)
  File "D:\conda\envs\tmap\lib\site-packages\weibo_spider\spider.py", line 137, in write_user
    writer.write_user(user)
  File "D:\conda\envs\tmap\lib\site-packages\weibo_spider\writer\txt_writer.py", line 29, in write_user
    [v + ':' + str(self.user.__dict__[k]) for k, v in self.user_desc])
  File "D:\conda\envs\tmap\lib\site-packages\weibo_spider\writer\txt_writer.py", line 29, in <listcomp>
    [v + ':' + str(self.user.__dict__[k]) for k, v in self.user_desc])
AttributeError: 'NoneType' object has no attribute '__dict__'

我的json:

{
    "user_id_list": ["6864639772", "3279151672"],
    "filter": 1,
    "since_date": "2020-01-01",
    "end_date": "now",
    "random_wait_pages": [1, 5],
    "random_wait_seconds": [6, 10],
    "global_wait": [[1000, 3600], [500, 2000]],
    "write_mode": ["csv", "txt"],
    "pic_download": 0,
    "video_download": 0,
	"file_download_timeout": [5, 5, 10],
	"result_dir_name": 0,
    "cookie": "",
    "mysql_config": {
        "host": "localhost",
        "port": 3306,
        "user": "root",
        "password": "123456",
        "charset": "utf8mb4"
    },
    "kafka_config": {
        "bootstrap-server": "127.0.0.1:9092",
        "weibo_topics": ["spider_weibo"],
        "user_topics": ["spider_weibo"]
    },
    "sqlite_config": "weibo.db"
}

已经持续好几天 没有运行起来过 json里面的cookie是最新的 无论是换爬取对象还是修改爬取时间等等都同样报错

@BGMY-CUSC BGMY-CUSC added the failed 程序运行出错 label Mar 10, 2025
@dataabc
Copy link
Owner

dataabc commented Mar 10, 2025

感谢反馈。应该是被暂时限制了,一般来说,限制几天后会自动解除。另外,不建议把cookie这种敏感的信息贴网上,为了安全,我把上面的cookie信息删除了。

@BGMY-CUSC
Copy link
Author

好的 感谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
failed 程序运行出错
Projects
None yet
Development

No branches or pull requests

2 participants