Analysis Methods·12 min read

15 Data Analysis Methods: Formulas, Benchmarks, and Real-World Scenarios

What are the most common data analysis methods? 15 methods explained with full formulas, judgment benchmarks, and real store scenarios. Covers descriptive stats, trends, YoY/MoM, Pareto, RFM, ABC, funnel, correlation, clustering, anomaly detection, and more.

Data Analysis Methods: Organized by Purpose

Data analysis methods fall into 4 categories: Descriptive (what happened) — 'What was last month's revenue? What's the 3-month trend?' Diagnostic (why it happened) — 'Why did revenue drop? Which category dragged performance?' Predictive & Segmentation (what will happen) — 'Which customers might churn? Which SKUs are critical?' Prescriptive (what to do) — 'Which store should I optimize next? How far from the target?' Below, 15 methods explained by category.

Method 1: Descriptive Statistics — Your Data's ID Card

The starting point for all analysis: 5 numbers that summarize a dataset — mean (average), median (middle value, robust to outliers), standard deviation (variability), max, min. Formulas: Mean = Σx ÷ n; SD = √(Σ(x - mean)² ÷ n). Benchmark: SD ÷ Mean > 30% indicates high volatility — investigate outliers.

Scenario: Monthly Revenue Summary

12 months of store revenue: mean ¥285K, median ¥278K, SD ¥62K, max ¥420K (Chinese New Year), min ¥190K (lockdown month). SD/Mean = 21.7%, moderate volatility. Removing 2 extreme months: mean ¥271K, SD ¥23K — this is the 'normal month' baseline.

Methods 2 & 3: YoY and MoM — See Real Growth by Excluding Seasonality

Year-over-Year (vs same period last year) removes seasonality to show long-term trends. Month-over-Month (vs last month) tracks short-term changes. YoY formula: (Current - Prior Year) ÷ Prior Year × 100%. MoM formula: (Current - Previous) ÷ Previous × 100%. Benchmarks: YoY >10% = healthy growth, 0-10% = moderate, <0 = warning. 3+ consecutive negative MoM = structural decline.

Scenario: Combined YoY and MoM Judgment

A beverage store in June: MoM +25% (summer peak), but YoY -8% (worse than last year overall). Conclusion: seasonal recovery, but long-term competitiveness declining. Looking only at MoM would miss the YoY problem.

Common Pitfalls with YoY/MoM

Pitfall 1: Only checking MoM — misled by seasonality. Pitfall 2: Small YoY base — a new store's 2nd month YoY growth might be 200%, which is meaningless. Pitfall 3: Ignoring day count — CNY month has 15-20 operating days vs 30 in a normal month.

Method 4: Trend Analysis — Spot Direction and Inflection Points

Visualize time series data (line chart) to identify 3 signals: direction (up/down/flat), inflection points (when trends reverse), and cycles (regular patterns). Key metric: 3-month moving average (smooths short-term noise), consecutive up/down weeks (trend strength).

Scenario: Identifying an Inflection Point

12-month revenue line chart: steady growth Jan-May, sudden -12% in June, low July-December. Inflection point in June — a competitor opened across the street. 3-month moving average makes it clearer: from +3%/month average growth to -2%/month decline.

Method 5: Pareto Analysis (80/20 Rule) — Find the Vital Few

Core principle: 80% of results come from 20% of causes. Method: sort items by contribution (descending), calculate cumulative share, items reaching 80% are the 'vital few.' Formula: Cumulative Share = Σ(Top N contributions) ÷ Total × 100%. Use cases: find the 20% of categories driving 80% of revenue, the 20% of customers generating 80% of profit.

Scenario: Category Pareto Analysis

30 dishes ranked by revenue: top 6 (20%) contribute 78% of revenue — the vital few. Bottom 15 contribute only 5% but consume 40% of kitchen prep capacity. Recommendation: cut the bottom 10 dishes, free up kitchen capacity for core items.

Method 6: Dimensional Breakdown — Find the Root Cause

When a metric changes, dimensional breakdown traces where the change came from. Formula: Total Change = Σ(Change per Dimension). Method: Start with the broadest dimension (e.g., store), find the problem store; then break down within that store (category, time period), narrowing the scope step by step.

Scenario: Revenue Decline Breakdown

Total revenue MoM -10%. Step 1, by store: Store A -25%, Store B +5%, rest flat → problem is Store A. Step 2, by category in A: Mains -30%, Beverages -15%, Snacks +10% → mains are the driver. Step 3, by time period for mains: Lunch -35%, Dinner -20% → lunch is critical. Conclusion: Store A's lunch mains traffic dropped sharply — investigate lunchtime competitors and promotions.

Method 7: Correlation Analysis — Are Two Variables Related?

Measures linear relationship between two variables. Formula: Pearson correlation coefficient r, range -1 to 1. |r| > 0.7 strong, 0.4-0.7 moderate, < 0.4 weak. Positive = positive correlation, negative = inverse. Warning: correlation ≠ causation — both variables might move due to a third factor.

Scenario: Promo Spend vs Sales

12 months of promo spend and sales: r = 0.82, strong positive correlation. Each ¥10K promo increase → ~¥35K sales increase, ROI = 3.5x. But removing CNY month (naturally high sales), r drops to 0.65 — promo effect isn't as strong as it appears.

Method 8: Anomaly Detection — Automatically Find Abnormal Data Points

Find data points outside normal range. Core formula: Z-score = (Current Value - Historical Mean) ÷ Standard Deviation. Benchmarks: |Z| > 2 (~5% probability) = possible anomaly; |Z| > 3 (~0.3% probability) = definite anomaly. Simplified: current value >2 standard deviations from historical mean warrants investigation.

Scenario: Store Return Rate Anomaly

12-month return rate: mean 2.3%, SD 0.8%. This month: 5.1%. Z-score = (5.1 - 2.3) ÷ 0.8 = 3.5, |Z| > 3, definite anomaly. Investigation: a batch of low-quality ingredients caused multiple order cancellations. Without anomaly detection, this might not surface until monthly inventory review.

Method 9: RFM Model — Customer Value Segmentation

RFM scores customers on Recency (last purchase), Frequency (purchase count), Monetary (total spend) to create 8 value tiers. Core formula: each dimension scored 1-5 (quintile method), total = R × weight + F × weight + M × weight. Default R:F:M = 3:2:2. Key tiers: Champions (R high, F high, M high) need retention; At-Risk (R low, F high, M high) need win-back. See the dedicated RFM Model article for full details.

Method 10: ABC Analysis — Classify Items by Importance

Classifies items into A (critical), B (moderate), C (minor). Criteria: A = 70-80% of total value from top 10-20% items; B = 15-25% from next 20-30%; C = 5-10% from bottom 50-70%. Method: sort by value descending, calculate cumulative share, draw lines at 80% and 95%.

Scenario: SKU ABC Classification

200 SKUs ranked by monthly sales: A-class 25 SKUs (12.5%) = 72% of sales → priority stocking, zero stockout tolerance. B-class 55 SKUs (27.5%) = 20% → normal stocking, weekly check. C-class 120 SKUs (60%) = 8% → reduce stock or eliminate. Result: inventory turnover dropped from 45 to 32 days, stockout rate down 60%.

Method 11: Funnel Analysis — Find Conversion Drop-off Points

Track user drop-off at each conversion step. Formula: Step Conversion Rate = Current Step ÷ Previous Step × 100%. Overall Conversion Rate = Final Step ÷ First Step × 100%. Key insight: find the step with the lowest conversion rate — that's the biggest optimization opportunity.

Scenario: Store Visit-to-Purchase Funnel

Walk-by 1000 → Enter 300 (30%) → Browse 240 (80%) → Purchase 180 (75%) → Return 54 (30%). Biggest drop at walk-by → enter (30%). Optimization: improve storefront display, add traffic-driving activities. Assuming entry rate increases to 40% (+100 people), with subsequent rates unchanged, monthly revenue increases ~18%.

Method 12: Cluster Analysis — Auto-Group Stores by Similarity

Automatically groups similar data points without predefined categories. K-Means algorithm clusters stores by multiple metrics (revenue, revenue/sqm, average ticket, foot traffic) into 2-5 types. Typical results: High revenue + high efficiency (mature benchmarks), High revenue + low efficiency (large but underperforming), Low revenue + high growth (new stores needing support), Low revenue + low growth (problem stores needing diagnosis).

Method 13: Comparative Analysis — A/B Tests and Before/After

Compare metrics between groups. Common uses: before/after promotions (same store), A/B tests (different strategies across stores), industry benchmarks (your store vs industry average). Key principle: ensure comparable conditions (same time period, same store type) — otherwise conclusions are unreliable.

Scenario: Promotion Effect Evaluation

Store A runs promo (¥20 off ¥100), Store B doesn't. After 1 week: Store A revenue +35%, traffic +50%, average ticket -10% (promo lowered ticket). Store B revenue +5% (natural growth). Net promo effect = A growth - B natural growth = +30%. But lower ticket means promo attracted low-spending customers — need to check their return rate.

Method 14: Gap Analysis — How Far from the Target?

Compare actual vs target. Formula: Completion Rate = Actual ÷ Target × 100%; Gap = Target - Actual. Monthly tracking: if days 1-10 completion < 25%, the month will likely miss target — adjust strategy immediately. If days 1-20 completion > 70%, the month is on track to exceed.

Scenario: Monthly Revenue Target Tracking

Monthly target ¥1M. Day 10: ¥220K (22%), below 25% pace. Breakdown: weekday 90% on target, weekend only 55%. Bottleneck: weekend revenue. Strategy: increase weekend staffing, launch weekend-only specials. Day 20: ¥680K (68%), accelerated after adjustments. Final: ¥1.03M.

Method 15: Sensitivity Analysis — Simulate 'What If?'

Simulate impact of variable changes. Formula: Revenue = Traffic × Average Ticket; Profit = Revenue × Margin - Fixed Costs. Adjust one variable, see how results change. Method: pick 1-2 key variables (ticket, traffic), set 3 scenarios (optimistic +10%, baseline 0%, pessimistic -10%), calculate each outcome.

Scenario: Ticket and Traffic Sensitivity

Store monthly revenue ¥300K = 6,000 visitors × ¥50 ticket. Scenario 1: ticket +10% (¥55), traffic unchanged → ¥330K (+10%). Scenario 2: ticket unchanged, traffic +15% (6,900) → ¥345K (+15%). Scenario 3: ticket +10%, traffic -5% (5,700) → ¥313.5K (+4.5%). Conclusion: raising ticket is more controllable, but increasing traffic has bigger impact. Net effect of combining both depends on whether you can raise prices without losing traffic.

How to Choose: 4 Question Types Map to 4 Method Categories

Ask 'What's the overall situation?' → descriptive stats + trend analysis + YoY/MoM. Ask 'Why is there a problem?' → dimensional breakdown + Pareto + anomaly detection. Ask 'How to segment customers/products?' → RFM + ABC + cluster analysis. Ask 'What to do next?' → comparative analysis + gap analysis + sensitivity analysis. With DataFish, AI automatically selects and runs appropriate methods based on your data — no need to choose yourself. Complete diagnosis in 5 minutes.

Want to try it yourself?

Upload your spreadsheet and see what's in your data in 5 minutes.

Analyze My Data Free

Related Articles

AI Data Analysis2026-06-01·5 min read

AI Data Analysis: Upload a Spreadsheet, Get Insights in 30s

AI data analysis is transforming how businesses handle data. This guide covers the basics and shows how DataFish completes the full analysis pipeline in 30 seconds.

Productivity2026-05-28·4 min read

Excel for Data Analysis? 5 Pain Points and How AI Fixes Them

Excel data analysis too slow and tedious? Here are the 5 most common pain points and how AI cuts the time from hours to 30 seconds.

Industry2026-05-25·5 min read

Restaurant Chain Data Analysis: Store Comparison to Insights

A practical guide for restaurant chain data analysis: using AI to quickly complete store rankings, category analysis, time-slot analysis, and business recommendations.

← Back to Blog