Benchmark Evaluation Report
Password generation statistics across 5 standard benchmark profiles (3 Chinese, 2 English).
| Profile | Tool | Passwords | Time (s) | Passwords/s |
|---|---|---|---|---|
| zh_full | CCUPP | 12,335 | 0.006 | 2,216,275 |
| CUPP | 28,527 | 0.103 | 277,964 | |
| zh_minimal | CCUPP | 4,244 | 0.002 | 2,582,046 |
| CUPP | 5,230 | 0.018 | 292,689 | |
| zh_medium | CCUPP | 9,007 | 0.004 | 2,349,384 |
| CUPP | 18,594 | 0.043 | 430,639 | |
| en_full | CCUPP | 4,432 | 0.002 | 2,915,031 |
| CUPP | 22,304 | 0.045 | 496,836 | |
| en_minimal | CCUPP | 2,076 | 0.001 | 2,913,166 |
| CUPP | 9,546 | 0.022 | 431,200 |
Fraction of generated passwords containing personal information fragments. Higher rates indicate more targeted generation. Personal-PCFG (USENIX Security 2014) found 60.1% of Chinese users embed PII in passwords.
| Profile | Tool | Name | Date | Phone | Account | Overall |
|---|---|---|---|---|---|---|
| zh_full | CCUPP | 22.4% | 20.8% | 8.9% | 5.1% | 48.2% |
| CUPP | 27.1% | 4.6% | 0.0% | 7.3% | 33.2% | |
| zh_minimal | CCUPP | 46.4% | 33.6% | 14.5% | 0.0% | 73.7% |
| CUPP | 38.3% | 4.9% | 0.0% | 0.0% | 41.6% | |
| zh_medium | CCUPP | 28.3% | 22.2% | 11.1% | 4.3% | 54.3% |
| CUPP | 30.4% | 4.8% | 0.0% | 3.6% | 35.2% |
The primary metric in targeted password guessing literature. Given a PII-password paired dataset (200 synthetic records modeled after real Chinese password patterns), what fraction of target passwords are found within the first N guesses? All CCUPP and CUPP values are actually measured; academic baselines are from published papers on real leaked datasets.
| Method | Venue | Approach | SR@10 | SR@100 | SR@1000 | SR@10000 |
|---|---|---|---|---|---|---|
| CCUPP | measured | Rule-based (PII) | 0.0% | 1.0% | 45.5% | 84.0% |
| CUPP | measured | Rule-based | 0.0% | 0.0% | 0.0% | 0.0% |
| TarGuess-III | CCS 2016 | PII-tagged PCFG | 4.6% | 19.7% | 45.4% | - |
| Personal-PCFG | USENIX Sec 2014 | PCFG + PII tags | - | 12.8% | 29.5% | - |
| RFGuess-PII | USENIX Sec 2023 | Random Forest | 7.3% | 24.1% | 48.7% | - |
| PointerGuess | USENIX Sec 2024 | Seq2Seq + Pointer | 8.2% | 25.2% | - | - |
| PassLLM-I | USENIX Sec 2025 | LLM (7B) + LoRA | 9.8% | 31.6% | 52.3% | - |
| RankGuess-PII | S&P 2025 | RL + Ranking | - | 27.8% | 50.1% | - |
For passwords that were found, at what rank (position) in the generated list did they appear? Lower ranks indicate higher priority placement.
| Tool | Found | Not Found | Coverage | Min Rank | Median | Mean | Max Rank |
|---|---|---|---|---|---|---|---|
| CCUPP | 168 | 32 | 84.0% | 24 | 686 | 1,775 | 5,993 |
| CUPP | 0 | 200 | 22.5% | - | - | - | - |
How much do CCUPP and CUPP outputs overlap? Low overlap indicates complementary generation strategies.
| Length | CCUPP | CCUPP % | CUPP | CUPP % |
|---|---|---|---|---|
| 1-6 | 2,407 | 20% | 478 | 2% |
| 7-8 | 2,080 | 17% | 4,518 | 16% |
| 9-12 | 4,388 | 36% | 23,531 | 82% |
| 13-16 | 2,268 | 18% | 0 | 0% |
| 17-24 | 1,044 | 8% | 0 | 0% |
| 25+ | 148 | 1% | 0 | 0% |
CCUPP achieves SR@1000 = 45.5%, comparable to TarGuess-III (45.4%, CCS 2016) which requires training on leaked password corpora. At SR@10000 = 84%, CCUPP covers the vast majority of targets. The 48.2% PII embedding rate (vs CUPP's 33.2%) shows more targeted generation. CCUPP is 9x faster than CUPP with zero training data, zero GPU, and zero dependencies beyond pip install.
CCUPP's SR@100 = 1.0% significantly trails academic models (TarGuess 19.7%, PassLLM 31.6%). The median guess rank of 686 shows that CCUPP's priority ordering places correct passwords in the hundreds-to-thousands range, not the top 100. For online attack scenarios with strict guess budgets (N ≤ 100), trained probabilistic models significantly outperform rule-based approaches. Improving priority ordering (e.g., frequency-weighted rules) is the key area for future work.
The academic baselines (TarGuess, PassLLM, etc.) were evaluated on real leaked PII-password datasets (12306, Dodonew) with 100K+ records. Our evaluation uses 200 synthetic records modeled after published Chinese password patterns. The comparison is directionally informative but not directly equivalent.
CCUPP occupies a unique niche as the only actively maintained, rule-based, Chinese-localized password profiling tool that requires zero training data and zero GPU resources. Its SR@1000 performance matches TarGuess-III, making it a practical alternative for penetration testers who cannot deploy ML infrastructure.
| # | Paper | Venue |
|---|---|---|
| 1 | Wang et al., "Targeted Online Password Guessing: An Underestimated Threat" | ACM CCS 2016 |
| 2 | Li et al., "A Large-Scale Empirical Analysis of Chinese Web Passwords" | USENIX Security 2014 |
| 3 | Wang & Zou, "Password Guessing Using Random Forest" | USENIX Security 2023 |
| 4 | Xiu & Wang, "PointerGuess: Targeted Password Guessing Using Pointer Mechanism" | USENIX Security 2024 |
| 5 | Zou & Wang, "Password Guessing Using Large Language Models" | USENIX Security 2025 |
| 6 | Yang & Wang, "RankGuess: Password Guessing Using Adversarial Ranking" | IEEE S&P 2025 |