15 Data Analysis Methods: Formulas, Benchmarks, and Real-World Scenarios
What are the most common data analysis methods? 15 methods explained with full formulas, judgment benchmarks, and real store scenarios. Covers descriptive stats, trends, YoY/MoM, Pareto, RFM, ABC, funnel, correlation, clustering, anomaly detection, and more.
Data Analysis Methods: Organized by Purpose
Data analysis methods fall into 4 categories: Descriptive (what happened) — 'What was last month's revenue? What's the 3-month trend?' Diagnostic (why it happened) — 'Why did revenue drop? Which category dragged performance?' Predictive & Segmentation (what will happen) — 'Which customers might churn? Which SKUs are critical?' Prescriptive (what to do) — 'Which store should I optimize next? How far from the target?' Below, 15 methods explained by category.
Method 1: Descriptive Statistics — Your Data's ID Card
The starting point for all analysis: 5 numbers that summarize a dataset — mean (average), median (middle value, robust to outliers), standard deviation (variability), max, min. Formulas: Mean = Σx ÷ n; SD = √(Σ(x - mean)² ÷ n). Benchmark: SD ÷ Mean > 30% indicates high volatility — investigate outliers.
Scenario: Monthly Revenue Summary
12 months of store revenue: mean ¥285K, median ¥278K, SD ¥62K, max ¥420K (Chinese New Year), min ¥190K (lockdown month). SD/Mean = 21.7%, moderate volatility. Removing 2 extreme months: mean ¥271K, SD ¥23K — this is the 'normal month' baseline.
Methods 2 & 3: YoY and MoM — See Real Growth by Excluding Seasonality
Year-over-Year (vs same period last year) removes seasonality to show long-term trends. Month-over-Month (vs last month) tracks short-term changes. YoY formula: (Current - Prior Year) ÷ Prior Year × 100%. MoM formula: (Current - Previous) ÷ Previous × 100%. Benchmarks: YoY >10% = healthy growth, 0-10% = moderate, <0 = warning. 3+ consecutive negative MoM = structural decline.
Scenario: Combined YoY and MoM Judgment
A beverage store in June: MoM +25% (summer peak), but YoY -8% (worse than last year overall). Conclusion: seasonal recovery, but long-term competitiveness declining. Looking only at MoM would miss the YoY problem.
Common Pitfalls with YoY/MoM
Pitfall 1: Only checking MoM — misled by seasonality. Pitfall 2: Small YoY base — a new store's 2nd month YoY growth might be 200%, which is meaningless. Pitfall 3: Ignoring day count — CNY month has 15-20 operating days vs 30 in a normal month.
Method 4: Trend Analysis — Spot Direction and Inflection Points
Visualize time series data (line chart) to identify 3 signals: direction (up/down/flat), inflection points (when trends reverse), and cycles (regular patterns). Key metric: 3-month moving average (smooths short-term noise), consecutive up/down weeks (trend strength).
Scenario: Identifying an Inflection Point
12-month revenue line chart: steady growth Jan-May, sudden -12% in June, low July-December. Inflection point in June — a competitor opened across the street. 3-month moving average makes it clearer: from +3%/month average growth to -2%/month decline.
Method 5: Pareto Analysis (80/20 Rule) — Find the Vital Few
Core principle: 80% of results come from 20% of causes. Method: sort items by contribution (descending), calculate cumulative share, items reaching 80% are the 'vital few.' Formula: Cumulative Share = Σ(Top N contributions) ÷ Total × 100%. Use cases: find the 20% of categories driving 80% of revenue, the 20% of customers generating 80% of profit.
Scenario: Category Pareto Analysis
30 dishes ranked by revenue: top 6 (20%) contribute 78% of revenue — the vital few. Bottom 15 contribute only 5% but consume 40% of kitchen prep capacity. Recommendation: cut the bottom 10 dishes, free up kitchen capacity for core items.
Method 6: Dimensional Breakdown — Find the Root Cause
When a metric changes, dimensional breakdown traces where the change came from. Formula: Total Change = Σ(Change per Dimension). Method: Start with the broadest dimension (e.g., store), find the problem store; then break down within that store (category, time period), narrowing the scope step by step.
Scenario: Revenue Decline Breakdown
Total revenue MoM -10%. Step 1, by store: Store A -25%, Store B +5%, rest flat → problem is Store A. Step 2, by category in A: Mains -30%, Beverages -15%, Snacks +10% → mains are the driver. Step 3, by time period for mains: Lunch -35%, Dinner -20% → lunch is critical. Conclusion: Store A's lunch mains traffic dropped sharply — investigate lunchtime competitors and promotions.
Method 7: Correlation Analysis — Are Two Variables Related?
Measures linear relationship between two variables. Formula: Pearson correlation coefficient r, range -1 to 1. |r| > 0.7 strong, 0.4-0.7 moderate, < 0.4 weak. Positive = positive correlation, negative = inverse. Warning: correlation ≠ causation — both variables might move due to a third factor.
Scenario: Promo Spend vs Sales
12 months of promo spend and sales: r = 0.82, strong positive correlation. Each ¥10K promo increase → ~¥35K sales increase, ROI = 3.5x. But removing CNY month (naturally high sales), r drops to 0.65 — promo effect isn't as strong as it appears.
Method 8: Anomaly Detection — Automatically Find Abnormal Data Points
Find data points outside normal range. Core formula: Z-score = (Current Value - Historical Mean) ÷ Standard Deviation. Benchmarks: |Z| > 2 (~5% probability) = possible anomaly; |Z| > 3 (~0.3% probability) = definite anomaly. Simplified: current value >2 standard deviations from historical mean warrants investigation.
Scenario: Store Return Rate Anomaly
12-month return rate: mean 2.3%, SD 0.8%. This month: 5.1%. Z-score = (5.1 - 2.3) ÷ 0.8 = 3.5, |Z| > 3, definite anomaly. Investigation: a batch of low-quality ingredients caused multiple order cancellations. Without anomaly detection, this might not surface until monthly inventory review.
Method 9: RFM Model — Customer Value Segmentation
RFM scores customers on Recency (last purchase), Frequency (purchase count), Monetary (total spend) to create 8 value tiers. Core formula: each dimension scored 1-5 (quintile method), total = R × weight + F × weight + M × weight. Default R:F:M = 3:2:2. Key tiers: Champions (R high, F high, M high) need retention; At-Risk (R low, F high, M high) need win-back. See the dedicated RFM Model article for full details.
Method 10: ABC Analysis — Classify Items by Importance
Classifies items into A (critical), B (moderate), C (minor). Criteria: A = 70-80% of total value from top 10-20% items; B = 15-25% from next 20-30%; C = 5-10% from bottom 50-70%. Method: sort by value descending, calculate cumulative share, draw lines at 80% and 95%.
Scenario: SKU ABC Classification
200 SKUs ranked by monthly sales: A-class 25 SKUs (12.5%) = 72% of sales → priority stocking, zero stockout tolerance. B-class 55 SKUs (27.5%) = 20% → normal stocking, weekly check. C-class 120 SKUs (60%) = 8% → reduce stock or eliminate. Result: inventory turnover dropped from 45 to 32 days, stockout rate down 60%.
Method 11: Funnel Analysis — Find Conversion Drop-off Points
Track user drop-off at each conversion step. Formula: Step Conversion Rate = Current Step ÷ Previous Step × 100%. Overall Conversion Rate = Final Step ÷ First Step × 100%. Key insight: find the step with the lowest conversion rate — that's the biggest optimization opportunity.
Scenario: Store Visit-to-Purchase Funnel
Walk-by 1000 → Enter 300 (30%) → Browse 240 (80%) → Purchase 180 (75%) → Return 54 (30%). Biggest drop at walk-by → enter (30%). Optimization: improve storefront display, add traffic-driving activities. Assuming entry rate increases to 40% (+100 people), with subsequent rates unchanged, monthly revenue increases ~18%.
Method 12: Cluster Analysis — Auto-Group Stores by Similarity
Automatically groups similar data points without predefined categories. K-Means algorithm clusters stores by multiple metrics (revenue, revenue/sqm, average ticket, foot traffic) into 2-5 types. Typical results: High revenue + high efficiency (mature benchmarks), High revenue + low efficiency (large but underperforming), Low revenue + high growth (new stores needing support), Low revenue + low growth (problem stores needing diagnosis).
Method 13: Comparative Analysis — A/B Tests and Before/After
Compare metrics between groups. Common uses: before/after promotions (same store), A/B tests (different strategies across stores), industry benchmarks (your store vs industry average). Key principle: ensure comparable conditions (same time period, same store type) — otherwise conclusions are unreliable.
Scenario: Promotion Effect Evaluation
Store A runs promo (¥20 off ¥100), Store B doesn't. After 1 week: Store A revenue +35%, traffic +50%, average ticket -10% (promo lowered ticket). Store B revenue +5% (natural growth). Net promo effect = A growth - B natural growth = +30%. But lower ticket means promo attracted low-spending customers — need to check their return rate.
Method 14: Gap Analysis — How Far from the Target?
Compare actual vs target. Formula: Completion Rate = Actual ÷ Target × 100%; Gap = Target - Actual. Monthly tracking: if days 1-10 completion < 25%, the month will likely miss target — adjust strategy immediately. If days 1-20 completion > 70%, the month is on track to exceed.
Scenario: Monthly Revenue Target Tracking
Monthly target ¥1M. Day 10: ¥220K (22%), below 25% pace. Breakdown: weekday 90% on target, weekend only 55%. Bottleneck: weekend revenue. Strategy: increase weekend staffing, launch weekend-only specials. Day 20: ¥680K (68%), accelerated after adjustments. Final: ¥1.03M.
Method 15: Sensitivity Analysis — Simulate 'What If?'
Simulate impact of variable changes. Formula: Revenue = Traffic × Average Ticket; Profit = Revenue × Margin - Fixed Costs. Adjust one variable, see how results change. Method: pick 1-2 key variables (ticket, traffic), set 3 scenarios (optimistic +10%, baseline 0%, pessimistic -10%), calculate each outcome.
Scenario: Ticket and Traffic Sensitivity
Store monthly revenue ¥300K = 6,000 visitors × ¥50 ticket. Scenario 1: ticket +10% (¥55), traffic unchanged → ¥330K (+10%). Scenario 2: ticket unchanged, traffic +15% (6,900) → ¥345K (+15%). Scenario 3: ticket +10%, traffic -5% (5,700) → ¥313.5K (+4.5%). Conclusion: raising ticket is more controllable, but increasing traffic has bigger impact. Net effect of combining both depends on whether you can raise prices without losing traffic.
How to Choose: 4 Question Types Map to 4 Method Categories
Ask 'What's the overall situation?' → descriptive stats + trend analysis + YoY/MoM. Ask 'Why is there a problem?' → dimensional breakdown + Pareto + anomaly detection. Ask 'How to segment customers/products?' → RFM + ABC + cluster analysis. Ask 'What to do next?' → comparative analysis + gap analysis + sensitivity analysis. With DataFish, AI automatically selects and runs appropriate methods based on your data — no need to choose yourself. Complete diagnosis in 5 minutes.