cnt.rulebase.const package¶
Submodules¶
cnt.rulebase.const.chinese_chars module¶
Consts for detecting chinese chars.
-
cnt.rulebase.const.chinese_chars.
ITV_CHINESE_CHARS
= [(11904, 12019), (12032, 12245), (12272, 12283), (12549, 12591), (12704, 12730), (12736, 12771), (13312, 19893), (19968, 40869), (40870, 40943), (58368, 58856), (58880, 59087), (59413, 59503), (63744, 64217), (131072, 173782), (173824, 177972), (177984, 178205), (178208, 183969), (183984, 191456), (194560, 195101)]¶ Chinese Chars. Pulled from https://www.qqxiuzi.cn/zh/hanzi-unicode-bianma.php Notice
3007
is a delimiter, hence should not be included.Range generation:
lines = '''copy paste the table here''' [l.split('\t') for l in lines.strip().split('\n')]
cnt.rulebase.const.delimiters module¶
Consts for detecting delimiter chars.
-
cnt.rulebase.const.delimiters.
ITV_DELIMITERS
= [(33, 47), (58, 64), (91, 96), (123, 126), (183, 183), (8208, 8231), (8237, 8238), (8240, 8286), (12289, 12351), (65072, 65103), (65281, 65295), (65306, 65312), (65339, 65344), (65371, 65380), (65504, 65518)]¶ Delimiters.
cnt.rulebase.const.digits module¶
Consts for detecting digit chars.
-
cnt.rulebase.const.digits.
ITV_DIGITS
= [(48, 57), (65296, 65305)]¶ Digits.
cnt.rulebase.const.english_chars module¶
Consts for detecting chinese chars.
-
cnt.rulebase.const.english_chars.
ITV_ENGLISH_CHARS
= [(65, 90), (97, 122), (65313, 65338), (65345, 65370)]¶ English Chars.
cnt.rulebase.const.utils module¶
Utils functions
-
cnt.rulebase.const.utils.
normalize_cjk_fullwidth_ascii
(seq)[source]¶ Conver fullwith ASCII to halfwidth ASCII. See https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms
- Return type
str
Module contents¶
All consts for rule-based tasks.
Naming patterns:
EM_*: List of exact match strings.
ITV_*: List of closed intervals.
RE_*: List of regular expressions.