Benchmark Evaluation Report

CCUPP: Chinese Common User Passwords Profiler

A rule-based, training-data-free targeted password guesser for Chinese users

github.com/WangYihang/ccupp · April 2026 · Comparative analysis against CUPP and six academic baselines

Abstract.We evaluate CCUPP, a rule-based targeted password guessing tool for Chinese users, against CUPP (the de-facto standard) and published results from six academic papers spanning CCS 2016 to S&P 2025. Using five standard profiles and a 200-record synthetic PII–password paired dataset modelled after real Chinese password patterns (Li et al., USENIX Security 2014), we measure generation performance, PII embedding rate, dataset hit rate, and Success Rate @ N. CCUPP achieves 84% coverage and SR@1000 = 45.5% — comparable to TarGuess-III’s 45.4% — with 9× higher generation speed than CUPP and zero training data, GPU, or external dependencies.

1.Key Results

45.5%

SR@1000

84.0%

Coverage / 200

4.1 M/s

Generation

64×

vs. CUPP

2.Generation Performance

We measure password generation throughput across five standard benchmark profiles — three Chinese (zh_full, zh_minimal, zh_medium) and two English (en_full, en_minimal). All measurements are wall-clock on a single CPU thread.

Table 1.Password generation statistics. Bold values mark per-row maxima. CCUPP generates fewer total candidates but at a substantially higher rate.

Profile	Tool	Passwords	Time (s)	Passwords / s
zh_full	CCUPP	12,335	0.003	4,104,787
	CUPP	28,527	0.443	64,451
	PassLLM	2,133	44.451	48
	bopscrk	19,675	0.845	23,295
zh_minimal	CCUPP	4,244	0.001	4,251,403
	CUPP	5,230	0.010	514,826
	PassLLM	2,177	35.419	61
	bopscrk	1,576	0.823	1,915
zh_medium	CCUPP	9,007	0.002	4,178,531
	CUPP	18,594	0.023	798,903
	PassLLM	1,973	34.056	58
	bopscrk	25,157	1.083	23,228
en_full	CCUPP	4,432	0.001	4,394,599
	CUPP	22,304	0.025	894,426
	PassLLM	2,103	34.244	61
	bopscrk	14,206	0.858	16,552
en_minimal	CCUPP	2,076	0.000	4,397,664
	CUPP	9,546	0.013	738,233
	PassLLM	2,189	32.282	68
	bopscrk	672	0.813	827

Figure 1a. Generation throughput (pwd/s).

Figure 1b. Total candidates generated.

3.PII Embedding Rate

Fraction of generated candidates that contain a fragment of the target’s personal information. Higher rates indicate more targeted generation. The Personal-PCFG study (Li et al., USENIX Security 2014) observed that 60.1% of Chinese users embed PII in their passwords.

Table 2.PII embedding rate by category. Bold values mark per-profile maxima.

Profile	Tool	Name	Date	Phone	Account	Overall
zh_full	CCUPP	22.4%	20.8%	8.9%	5.1%	48.2%
	CUPP	27.1%	4.6%	0.0%	7.3%	33.2%
	PassLLM	13.5%	28.9%	1.2%	0.0%	39.1%
	bopscrk	10.3%	3.9%	0.0%	5.6%	19.2%
zh_minimal	CCUPP	46.4%	33.6%	14.5%	0.0%	73.7%
	CUPP	38.3%	4.9%	0.0%	0.0%	41.6%
	PassLLM	23.8%	36.4%	1.1%	0.0%	50.1%
	bopscrk	38.9%	25.3%	0.0%	0.0%	61.3%
zh_medium	CCUPP	34.9%	24.9%	13.5%	5.9%	60.0%
	CUPP	30.4%	5.2%	0.0%	0.2%	34.4%
	PassLLM	12.9%	28.5%	0.8%	0.0%	33.9%
	bopscrk	35.1%	3.7%	0.0%	5.6%	38.1%

Figure 2. PII embedding rate by category on the zh_full profile.

4.Academic Comparison: Success Rate @ N

The primary metric in the targeted password guessing literature. Given a PII–password paired dataset, what fraction of target passwords appears within the first N guesses? CCUPP, CUPP and bopscrk are evaluated on 200 synthetic records modelled after real Chinese password patterns (SR@N shown where a candidate ordering was captured); the remaining baselines are reported numbers from their respective papers on real leaked corpora — see fair-comparison caveat in §6.

Table 3.Success rate at guess budget N. Asterisks mark per-column maxima. In measured rows an em dash means no SR@N was captured: CUPP's original build is not reproducible in this environment and PassLLM needs a GPU pass not run here; bopscrk's SR@N reflects its raw output order, not a likelihood ranking. In baseline rows an em dash means the value was not reported in the source.

Method	Venue	Approach	SR@10	SR@100	SR@1000	SR@10⁴
CCUPP	measured	Rule-based (PII)	0.0%	1.0%	45.5%	84.0%
CUPP	measured	Rule-based	—	—	—	—
PassLLM	measured	LLM (7B) + LoRA	—	—	—	—
bopscrk	measured	Rule-based	3.5%	6.5%	11.5%	11.5%
TarGuess-III	CCS 2016	PII-tagged PCFG	4.6%	19.7%	45.4%	—
Personal-PCFG	USENIX 2014	PCFG + PII tags	—	12.8%	29.5%	—
RFGuess-PII	USENIX 2023	Random forest	7.3%	24.1%	48.7%	—
PointerGuess	USENIX 2024	Seq2Seq + pointer	8.2%	25.2%	—	—
PassLLM-I	USENIX 2025	LLM (7B) + LoRA	9.8%	31.6%	52.3%	—
RankGuess-PII	S&P 2025	RL + ranking	—	27.8%	50.1%	—

Figure 3. Success rate at guess budget N. CCUPP closes the gap with trained models around N = 1000 and dominates at N = 10⁴ where no other measurement is reported.

5.Guess-Number and Distribution Statistics

For passwords that were ultimately found, at what rank in the generated list did they appear? Lower ranks indicate better priority ordering.

Table 4.Hit-rank statistics on 200 synthetic targets.

Tool	Found	Missed	Coverage	Min	Median	Mean	Max
CCUPP	168	32	84.0%	24	686	1,775	5,993
CUPP	45	155	22.5%	—	—	—	—
bopscrk	23	177	11.5%	1	68	145	737

The two tools also produce nearly disjoint candidate sets, suggesting they could be productively combined. Length distributions differ markedly: CUPP concentrates 82% of its output in the 9–12 character range, while CCUPP spreads across 1–24 characters in line with observed Chinese password length distributions.

Table 5.Length distribution on the zh_full profile.

Length	CCUPP	CCUPP %	CUPP	CUPP %	PassLLM	PassLLM %	bopscrk	bopscrk %
1–6	2,407	20%	478	2%	0	0%	64	0%
7–8	2,080	17%	4,518	16%	90	4%	592	3%
9–12	4,388	36%	23,531	82%	531	25%	5,747	29%
13–16	2,268	18%	0	0%	1,512	71%	13,272	67%
17–24	1,044	8%	0	0%	0	0%	0	0%
25+	148	1%	0	0%	0	0%	0	0%

Figure 4a. Candidate-set overlap (zh_full).

Figure 4b. Length distribution (zh_full, % of output).

6.Discussion

Strengths.

CCUPP achieves SR@1000 = 45.5%, on par with TarGuess-III (45.4%, CCS 2016) which requires training on leaked password corpora. At SR@10⁴ = 84.0%, CCUPP covers the vast majority of targets in our set. Its 48.2% PII embedding rate on zh_full — versus CUPP’s 33.2% — confirms more targeted candidate generation. CCUPP is roughly 9× faster than CUPP with zero training data, zero GPU, and no dependency beyond a pip install.

Limitations.

CCUPP’s SR@100 = 1.0% trails academic models substantially (TarGuess 19.7%, PassLLM 31.6%). The median hit-rank of 686 indicates that CCUPP places correct passwords in the hundreds-to-thousands range rather than within the top 100. For online-attack scenarios with strict guess budgets (N ≤ 100), trained probabilistic models remain decisively ahead. Improving priority ordering — for instance via frequency-weighted rules — is the most promising direction for future work.

Fair-comparison caveat.

The academic baselines (TarGuess, PassLLM, etc.) were evaluated on real leaked PII–password datasets — typically 12306, Dodonew — with 10⁵-plus records. Our evaluation uses 200 synthetic records modelled after published Chinese password patterns. The comparison is therefore directionally informative but not strictly equivalent.

Positioning.

CCUPP occupies a distinct niche: to our knowledge it is the only actively maintained, rule-based, Chinese-localised password profiling tool that requires neither training data nor GPU. Its SR@1000 matches TarGuess-III, making it a pragmatic alternative for penetration testers who cannot deploy machine-learning infrastructure.

7.References

#	Reference	Venue
[1]	Wang et al., Targeted Online Password Guessing: An Underestimated Threat.	ACM CCS 2016
[2]	Li et al., A Large-Scale Empirical Analysis of Chinese Web Passwords.	USENIX Sec. 2014
[3]	Wang & Zou, Password Guessing Using Random Forest.	USENIX Sec. 2023
[4]	Xiu & Wang, PointerGuess: Targeted Password Guessing Using Pointer Mechanism.	USENIX Sec. 2024
[5]	Zou & Wang, Password Guessing Using Large Language Models.	USENIX Sec. 2025
[6]	Yang & Wang, RankGuess: Password Guessing Using Adversarial Ranking.	IEEE S&P 2025