Skip to content

Conversation

@yindz
Copy link
Contributor

@yindz yindz commented Nov 8, 2025

来自外部数据源(如Word文档)的字符串偶然会出现 \u200B 等特殊字符,应当视为空白符处理。因此优化了 Utils 里面的 isBlankChar 方法的判断逻辑。详见源码及单元测试用例。

@abel533 abel533 requested a review from Copilot November 10, 2025 00:58
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the isBlankChar method in the Utils class to detect a broader range of format control characters and adds comprehensive test coverage for the method.

Key Changes:

  • Generalizes the blank character detection from checking a single specific character (U+202A) to checking all FORMAT category characters using Character.getType(c) == Character.FORMAT
  • Adds new test file UtilsTest.java with comprehensive test cases covering various whitespace and format control characters, including Unicode characters commonly found in Word documents

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
common/src/main/java/io/mybatis/common/util/Utils.java Replaces hardcoded check for U+202A with a generalized check for all FORMAT characters to improve detection of format control characters
common/src/test/java/io/mybatis/common/util/UtilsTest.java Adds new test class with comprehensive test coverage for isBlankChar method, including common whitespace characters and special Unicode format characters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,36 @@
package io.mybatis.common.util;
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is missing the Apache License header that is present in other test files in this project (e.g., I18nTest.java). For consistency, please add the standard license header at the beginning of this file.

Copilot uses AI. Check for mistakes.
|| Character.isSpaceChar(c)
|| c == '\ufeff'
|| c == '\u202a';
|| Character.getType(c) == Character.FORMAT;
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from checking a specific character (c == '\u202a') to checking all FORMAT characters (Character.getType(c) == Character.FORMAT) significantly broadens the scope. The Character.FORMAT category includes directional formatting marks (U+202A-U+202E, U+061C) and other format control characters that control text rendering but don't represent whitespace or blank space.

According to the Javadoc comment above this method (lines 83-84), blank characters should include "空格、制表符、全角空格和不间断空格" (space, tab, full-width space, and non-breaking space). Directional formatting marks like U+202A (LEFT-TO-RIGHT EMBEDDING) don't semantically represent blank space - they control text direction.

Consider whether all FORMAT characters should truly be treated as "blank," or if only specific zero-width space characters (like U+200B) should be added to the existing check. If the intent is to detect all invisible characters, the documentation should be updated to reflect this broader definition.

Copilot uses AI. Check for mistakes.
public static boolean isBlankChar(int c) {
return Character.isWhitespace(c)
|| Character.isSpaceChar(c)
|| c == '\ufeff'
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit check for '\ufeff' on line 95 is now redundant since U+FEFF (Zero Width No-Break Space / BOM) is a FORMAT character and will be matched by Character.getType(c) == Character.FORMAT on line 96. Consider removing the explicit check on line 95 to avoid duplication.

Suggested change
|| c == '\ufeff'

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant