Benchmark Evaluation Report

CCUPP: Chinese Common User Passwords Profiler

A rule-based, training-data-free targeted password guesser for Chinese users
github.com/WangYihang/ccupp · April 2026 · Comparative analysis against CUPP and six academic baselines

Abstract.We evaluate CCUPP, a rule-based targeted password guessing tool for Chinese users, against CUPP (the de-facto standard) and published results from six academic papers spanning CCS 2016 to S&P 2025. Using five standard profiles and a 200-record synthetic PII–password paired dataset modelled after real Chinese password patterns (Li et al., USENIX Security 2014), we measure generation performance, PII embedding rate, dataset hit rate, and Success Rate @ N. CCUPP achieves 84% coverage and SR@1000 = 45.5% — comparable to TarGuess-III’s 45.4% — with 9× higher generation speed than CUPP and zero training data, GPU, or external dependencies.

1.Key Results

45.5%
SR@1000
84.0%
Coverage / 200
1.1 M/s
Generation
5×
vs. CUPP

2.Generation Performance

We measure password generation throughput across five standard benchmark profiles — three Chinese (zh_full, zh_minimal, zh_medium) and two English (en_full, en_minimal). All measurements are wall-clock on a single CPU thread.

Table 1.Password generation statistics. Bold values mark per-row maxima. CCUPP generates fewer total candidates but at a substantially higher rate.
Profile Tool Passwords Time (s) Passwords / s
zh_full CCUPP 12,335 0.012 1,065,353
CUPP 28,527 0.122 233,535
bopscrk 19,675 1.137 17,306
zh_minimal CCUPP 4,244 0.002 1,812,506
CUPP 5,230 0.036 145,343
bopscrk 1,576 0.896 1,758
zh_medium CCUPP 9,007 0.005 1,655,554
CUPP 18,594 0.070 264,000
bopscrk 25,157 2.124 11,844
en_full CCUPP 4,432 0.002 1,923,348
CUPP 22,304 0.070 316,441
bopscrk 14,206 1.055 13,467
en_minimal CCUPP 2,076 0.001 1,907,839
CUPP 9,546 0.038 250,432
bopscrk 672 0.960 700
Figure 1a. Generation throughput (pwd/s).
Figure 1b. Total candidates generated.

3.PII Embedding Rate

Fraction of generated candidates that contain a fragment of the target’s personal information. Higher rates indicate more targeted generation. The Personal-PCFG study (Li et al., USENIX Security 2014) observed that 60.1% of Chinese users embed PII in their passwords.

Table 2.PII embedding rate by category. Bold values mark per-profile maxima.
Profile Tool Name Date Phone Account Overall
zh_full CCUPP 22.4% 20.8% 8.9% 5.1% 48.2%
CUPP 27.1% 4.6% 0.0% 7.3% 33.2%
bopscrk 10.3% 3.9% 0.0% 5.6% 19.2%
zh_minimal CCUPP 46.4% 33.6% 14.5% 0.0% 73.7%
CUPP 38.3% 4.9% 0.0% 0.0% 41.6%
bopscrk 38.9% 25.3% 0.0% 0.0% 61.3%
zh_medium CCUPP 34.9% 24.9% 13.5% 5.9% 60.0%
CUPP 30.4% 5.2% 0.0% 0.2% 34.4%
bopscrk 35.1% 3.7% 0.0% 5.6% 38.1%
Figure 2. PII embedding rate by category on the zh_full profile.

4.Academic Comparison: Success Rate @ N

The primary metric in the targeted password guessing literature. Given a PII–password paired dataset, what fraction of target passwords appears within the first N guesses? CCUPP and CUPP rows are measured on 200 synthetic records modelled after real Chinese password patterns; the remaining baselines are reported numbers from their respective papers on real leaked corpora — see fair-comparison caveat in §6.

Table 3.Success rate at guess budget N. Asterisks mark per-column maxima; em dashes indicate values not reported in source.
Method Venue Approach SR@10 SR@100 SR@1000 SR@10⁴
CCUPP measured Rule-based (PII) 0.0% 1.0% 45.5% 84.0%
CUPP measured Rule-based
bopscrk measured Rule-based
TarGuess-III CCS 2016 PII-tagged PCFG 4.6% 19.7% 45.4%
Personal-PCFG USENIX 2014 PCFG + PII tags 12.8% 29.5%
RFGuess-PII USENIX 2023 Random forest 7.3% 24.1% 48.7%
PointerGuess USENIX 2024 Seq2Seq + pointer 8.2% 25.2%
PassLLM-I USENIX 2025 LLM (7B) + LoRA 9.8% 31.6% 52.3%
RankGuess-PII S&P 2025 RL + ranking 27.8% 50.1%
Figure 3. Success rate at guess budget N. CCUPP closes the gap with trained models around N = 1000 and dominates at N = 10⁴ where no other measurement is reported.

5.Guess-Number and Distribution Statistics

For passwords that were ultimately found, at what rank in the generated list did they appear? Lower ranks indicate better priority ordering.

Table 4.Hit-rank statistics on 200 synthetic targets.
Tool Found Missed Coverage Min Median Mean Max
CCUPP 168 32 84.0% 24 686 1,775 5,993
CUPP 45 155 22.5%
bopscrk 23 177 11.5%

The two tools also produce nearly disjoint candidate sets, suggesting they could be productively combined. Length distributions differ markedly: CUPP concentrates 82% of its output in the 9–12 character range, while CCUPP spreads across 1–24 characters in line with observed Chinese password length distributions.

Table 5.Length distribution on the zh_full profile.
Length CCUPP CCUPP % CUPP CUPP % bopscrk bopscrk %
1–6 2,407 20% 478 2% 64 0%
7–8 2,080 17% 4,518 16% 592 3%
9–12 4,388 36% 23,531 82% 5,747 29%
13–16 2,268 18% 0 0% 13,272 67%
17–24 1,044 8% 0 0% 0 0%
25+ 148 1% 0 0% 0 0%
Figure 4a. Candidate-set overlap (zh_full).
Figure 4b. Length distribution (zh_full, % of output).

6.Discussion

Strengths.

CCUPP achieves SR@1000 = 45.5%, on par with TarGuess-III (45.4%, CCS 2016) which requires training on leaked password corpora. At SR@10⁴ = 84.0%, CCUPP covers the vast majority of targets in our set. Its 48.2% PII embedding rate on zh_full — versus CUPP’s 33.2% — confirms more targeted candidate generation. CCUPP is roughly 9× faster than CUPP with zero training data, zero GPU, and no dependency beyond a pip install.

Limitations.

CCUPP’s SR@100 = 1.0% trails academic models substantially (TarGuess 19.7%, PassLLM 31.6%). The median hit-rank of 686 indicates that CCUPP places correct passwords in the hundreds-to-thousands range rather than within the top 100. For online-attack scenarios with strict guess budgets (N ≤ 100), trained probabilistic models remain decisively ahead. Improving priority ordering — for instance via frequency-weighted rules — is the most promising direction for future work.

Fair-comparison caveat.

The academic baselines (TarGuess, PassLLM, etc.) were evaluated on real leaked PII–password datasets — typically 12306, Dodonew — with 10⁵-plus records. Our evaluation uses 200 synthetic records modelled after published Chinese password patterns. The comparison is therefore directionally informative but not strictly equivalent.

Positioning.

CCUPP occupies a distinct niche: to our knowledge it is the only actively maintained, rule-based, Chinese-localised password profiling tool that requires neither training data nor GPU. Its SR@1000 matches TarGuess-III, making it a pragmatic alternative for penetration testers who cannot deploy machine-learning infrastructure.

7.References

# Reference Venue
[1] Wang et al., Targeted Online Password Guessing: An Underestimated Threat. ACM CCS 2016
[2] Li et al., A Large-Scale Empirical Analysis of Chinese Web Passwords. USENIX Sec. 2014
[3] Wang & Zou, Password Guessing Using Random Forest. USENIX Sec. 2023
[4] Xiu & Wang, PointerGuess: Targeted Password Guessing Using Pointer Mechanism. USENIX Sec. 2024
[5] Zou & Wang, Password Guessing Using Large Language Models. USENIX Sec. 2025
[6] Yang & Wang, RankGuess: Password Guessing Using Adversarial Ranking. IEEE S&P 2025