有问题的建表语句

CREATE TABLE test_table (
    content varchar(20) DEFAULT '' NOT NULL COMMENT '内容',
    KEY idx_content(content)
)ENGINE =InnoDB DEFAULT CHARSET =utf8mb4;

插入 HELLO 和 hello：

INSERT INTO `test_table`(content) values('HELLO');
INSERT INTO `test_table`(content) values ('hello');

查询 content 为 hello 的数据：

mysql> select * from test_table where content='hello';
+---------+
| content |
+---------+
| HELLO   |
| hello   |
+---------+

结果，查到了两条。查询时隐式的忽略了大小写，这是不符合预期的。

对我而言，默认查询是不忽略大小写的，要忽略大小写必须显式说明。所以这里MySQL的表现是不符合预期的，或者说我没有用对。

忽略大小写的坏影响不止体现在简单查询，还会在 distinct、group by 等地方导致不符合预期的结果。因为 HELLO 和 hello 被认为是相同的字符串。

解决方案1：使用 binary

在查询条件前加上

mysql> select * from test_table where binary content='hello';
+---------+
| content |
+---------+
| hello   |
+---------+

解决方案2：指定字符集排序方式

我们在创建表时为表指定字符集排序方式（也可以单独为列指定指定排序方式，但这样会增加复杂度，慎用）：

CREATE TABLE test_table (
    content varchar(20) DEFAULT '' NOT NULL COMMENT '内容',
    KEY idx_content(content)
)ENGINE =InnoDB DEFAULT CHARSET =utf8mb4  COLLATE=utf8mb4_bin;

测试：

mysql> INSERT INTO `test_table`(content) values('HELLO');
mysql> INSERT INTO `test_table`(content) values ('hello');
mysql> select * from test_table where content='hello';
+---------+
| content |
+---------+
| hello   |
+---------+

符合预期。

背后的原因

在不给列单独指定字符集的情况下，每张表都会使用一个字符集，也会用一个排序方式。如果未指定，那就用默认的字符集和排序方式。查看支持的所有字符集和排序方式：

show collation;

每个字符集有多种排序方式，以utf8mb4为例：

show collation where Charset='utf8mb4';

结果是：

+------------------------+---------+-----+---------+----------+---------+
| Collation              | Charset | Id  | Default | Compiled | Sortlen |
+------------------------+---------+-----+---------+----------+---------+
| utf8mb4_general_ci     | utf8mb4 | 45  | Yes     | Yes      | 1       |
| utf8mb4_bin            | utf8mb4 | 46  |         | Yes      | 1       |
| utf8mb4_unicode_ci     | utf8mb4 | 224 |         | Yes      | 8       |
| utf8mb4_icelandic_ci   | utf8mb4 | 225 |         | Yes      | 8       |
| utf8mb4_latvian_ci     | utf8mb4 | 226 |         | Yes      | 8       |
| utf8mb4_romanian_ci    | utf8mb4 | 227 |         | Yes      | 8       |
| utf8mb4_slovenian_ci   | utf8mb4 | 228 |         | Yes      | 8       |
| utf8mb4_polish_ci      | utf8mb4 | 229 |         | Yes      | 8       |
| utf8mb4_estonian_ci    | utf8mb4 | 230 |         | Yes      | 8       |
| utf8mb4_spanish_ci     | utf8mb4 | 231 |         | Yes      | 8       |
| utf8mb4_swedish_ci     | utf8mb4 | 232 |         | Yes      | 8       |
| utf8mb4_turkish_ci     | utf8mb4 | 233 |         | Yes      | 8       |
| utf8mb4_czech_ci       | utf8mb4 | 234 |         | Yes      | 8       |
| utf8mb4_danish_ci      | utf8mb4 | 235 |         | Yes      | 8       |
| utf8mb4_lithuanian_ci  | utf8mb4 | 236 |         | Yes      | 8       |
| utf8mb4_slovak_ci      | utf8mb4 | 237 |         | Yes      | 8       |
| utf8mb4_spanish2_ci    | utf8mb4 | 238 |         | Yes      | 8       |
| utf8mb4_roman_ci       | utf8mb4 | 239 |         | Yes      | 8       |
| utf8mb4_persian_ci     | utf8mb4 | 240 |         | Yes      | 8       |
| utf8mb4_esperanto_ci   | utf8mb4 | 241 |         | Yes      | 8       |
| utf8mb4_hungarian_ci   | utf8mb4 | 242 |         | Yes      | 8       |
| utf8mb4_sinhala_ci     | utf8mb4 | 243 |         | Yes      | 8       |
| utf8mb4_german2_ci     | utf8mb4 | 244 |         | Yes      | 8       |
| utf8mb4_croatian_ci    | utf8mb4 | 245 |         | Yes      | 8       |
| utf8mb4_unicode_520_ci | utf8mb4 | 246 |         | Yes      | 8       |
| utf8mb4_vietnamese_ci  | utf8mb4 | 247 |         | Yes      | 8       |
+------------------------+---------+-----+---------+----------+---------+

utf8mb4_general_ci是默认的排序方式。在它的规则下会忽略大小写，甚至不同语言的文字之间也会判等，具体可参考：MYSQL中的COLLATE是什么？。

utf8mb4_bin是将字符串看做二进制串进行比较，所以不会忽略大小写。

在一个不忽略大小写的表中如何忽略大小写

我们可以使用 MySQL 的lower 函数：

CREATE TABLE test_table (
    content varchar(20) DEFAULT '' NOT NULL COMMENT '内容',
    KEY idx_content(content)
)ENGINE =InnoDB DEFAULT CHARSET =utf8mb4  COLLATE=utf8mb4_bin;

插入数据：

INSERT INTO `test_table`(content) values('HELLO');
INSERT INTO `test_table`(content) values ('hello');

查询：

mysql> select * from test_table where lower(content)=lower('hello');
+---------+
| content |
+---------+
| HELLO   |
| hello   |
+---------+

MySQL：字符串类型值的大小写

有问题的建表语句

解决方案1：使用 binary

解决方案2：指定字符集排序方式

背后的原因

在一个不忽略大小写的表中如何忽略大小写