UTF-8 클래스 만들기 #81
Replies: 7 comments 31 replies
-
|
UTF-8 well-formed byte seqences 입니다. 코드 포인터 범위마다 유효한 바이트값의 범위를 확인할 수 있습니다! Table
Reference: https://stackoverflow.com/questions/1301402/example-invalid-utf8-string |
Beta Was this translation helpful? Give feedback.
-
|
@kuro11pow2 @judemin
한글은 3Bytes, 영어는 1Bytes이므로 hexSource의 size는 12인데, 이를 decode할 때, 어디서 끊어야할지(?) 어떻게 구분할 수 있을까요? 저는 위 같이 이해하고 있었는데, 제가 잘 못 이해하고 있는 점이 있다면 피드백 바랍니다!!! |
Beta Was this translation helpful? Give feedback.
-
|
로직을 어떻게 구현할지 함께 고민해봐요! EncodeHexVector encode(std::string src); 입력"가" = eab080 = 111010101011000010000000 출력1110 1010 1011 0000 1000 0000 |
Beta Was this translation helpful? Give feedback.
-
Decodestd::string decode(const HexVector &src);입력1110 1010 1011 0000 1000 0000 출력"가" = eab080 = 111010101011000010000000 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
#include <iostream>
#include <vector>
#include <string>
#include <sstream>
using namespace std;
typedef uint8_t u_int8_t;
using namespace std;
class HexVector {
public:
HexVector(int size) {
hexSource.reserve(size);
}
void pushBack(u_int8_t hexByte) {
if (hexByte <= 0x0f)
{
hexSource.push_back(hexByte);
}
else
{
hexSource.push_back((hexByte >> 4));
hexSource.push_back(hexByte & 0x0f);
}
}
const std::vector<u_int8_t>& getHexSource() const {
return hexSource;
}
private:
std::vector<u_int8_t> hexSource;
};
class Charset {
public:
virtual ~Charset() = default;
virtual HexVector encode(std::string src) = 0;
virtual std::string decode(const HexVector& src) = 0;
};
class UTF8Charset : public Charset {
public:
UTF8Charset() = default;
~UTF8Charset() = default;
HexVector encode(std::string src) {
HexVector result = HexVector(src.size());
for (int i = 0; i < src.size(); i++)
result.pushBack(src[i]);
return result;
}
std::string decode(const HexVector& src) {
std::string result;
std::vector<u_int8_t> a = src.getHexSource();
int char_byte = 0;
int start_index = 0;
int start_byte;
while (start_index < a.size() - 1) {
std::string c = "";
int start_byte = (a[start_index] << 4) | a[start_index + 1];
if ((start_byte & 0b11111000) == 0b11110000) {
char_byte = 4;
}
else if ((start_byte & 0b11110000) == 0b11100000) {
char_byte = 3;
}
else if ((start_byte & 0b11100000) == 0b11000000) {
char_byte = 2;
}
else if ((start_byte & 0b10000000) == 0b00000000) {
char_byte = 1;
}
int next_index = char_byte;
int i = start_index;
while (char_byte != 0) {
c += ((a[i] << 4) | a[i + 1]);
i += 2;
char_byte--;
}
start_index += next_index * 2;
result += c;
}
return result;
/* for test
std::cout << hex << ((a[0] << 4) | a[1]);
cout << hex << (a[2] << 4 | a[3]);
cout << hex << (a[4] << 4 | a[5]);
result += a[0] << 4 | a[1];
result += a[2] << 4 | a[3];
result += a[4] << 4 | a[5];
cout << endl;*/
}
};
int main() {
UTF8Charset a;
UTF8Charset b;
HexVector c = a.encode("!@#가나다라12345abdcd^&^$");
std::cout << "예상 결과 : !@#가나다라12345abdcd^&^$" << endl << b.decode(c);
} |
Beta Was this translation helpful? Give feedback.
-
|
오잉? 생각보다 환경 구축이 안되신 분들이 많네요!? |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
21-10-02 12:00 ~ 14:00 결과물
kuro11pow2@c5b10f7
UTF8Charset.h
Beta Was this translation helpful? Give feedback.
All reactions