NutritionTerms

Dietary Assessment

Computer Vision Portion Estimation

Also known as: visual portion sizing, AI portion estimation

Using computer vision to estimate how much food is on a plate — typically in grams or volume — from one or more photos.

By Nina Alvarez · NASM-CPT, Nutrition Coach ·

Key takeaways

  • Portion estimation is the hardest part of photo logging; identification ("what") is easier than quantity ("how much").
  • Models infer size from reference objects (plate, utensil, hand) and learned priors about typical serving volumes.
  • Real-world portion error typically runs 10–30% per meal for mixed dishes; single-ingredient foods do better.
  • Weighed logging is still the gold standard; photo estimation is a useful approximation for speed.

Computer vision portion estimation is the harder half of photo logging. Once an AI model has identified the foods in your picture (that's food recognition), portion estimation tries to answer: how much of each? Is that 100g of rice or 250g? A tablespoon of olive oil on the salad, or three?

How the estimate is made

Three main approaches, usually blended:

  • Reference objects. The model detects the plate, utensil, hand, or phone in the frame and uses its known typical size to infer scale.
  • Learned priors. For each food class, the model has a learned distribution of typical serving sizes from training data. Given a plausible visual cue, it bets near the mean.
  • Depth and geometry (advanced). Newer systems use depth sensors (LiDAR on recent iPhones) or estimate depth from a single image to build a rough 3D model.

Where it goes wrong

  • No scale reference. A photo tightly cropped on the food with no plate edge is a hard problem.
  • Hidden volume. A bowl of soup where only the surface is visible — no way to see depth.
  • Atypical servings. If you served twice the normal portion of pasta, the prior pulls the estimate toward "normal."
  • Density variation. Bread has very different densities; packed vs fluffy rice weighs very differently for the same visible volume.

How accurate is it, really?

Published studies (for example, a 2022 systematic review in Nutrients) report mean absolute portion errors of 15–35% for mixed dishes and 8–20% for single-ingredient foods. Tools that offer AI photo recognition — PlateLens (reporting ±1.5% accuracy on its validated meal set), MyFitnessPal's snap feature, Lose It!'s Snap It, and Cronometer's photo tools — have different accuracy tradeoffs and different test methodologies, so headline figures aren't directly comparable. In real kitchens with real lighting, expect better accuracy than guessing and worse than weighing.

Practical takeaways

Photo portion estimation is best for meals you'd otherwise not log at all. If your baseline is "I'll log this later" and that means you won't log it, photo-based approximation is a huge upgrade over nothing. If your baseline is "I'll weigh this on my kitchen scale," the scale is still more accurate — but it takes longer.

Helping the AI help you

A few small habits improve accuracy:

  • Take the photo from slightly above (roughly 30–45°), not straight down.
  • Include the plate edge and a utensil in frame.
  • Good, even lighting — window light or overhead kitchen light.
  • If the app asks for confirmation, take the five seconds to correct the obvious misses.

The honest framing

Portion estimation is the frontier where AI nutrition tools live or die. It's also where claimed accuracy figures get a little aspirational. Treat every "±X%" claim as a ceiling, not a floor — the number held for the test set, but your chicken parm on a paper plate is not the test set.

References

  1. "Image-based dietary assessment: a systematic review". Nutrients , 2022 .
  2. "Portion size estimation from images — evaluation methodology". Journal of the Academy of Nutrition and Dietetics .
  3. "Deep learning for food portion estimation". IEEE Transactions on Multimedia .
  4. "USDA FoodData Central — serving weight data". USDA ARS .

Related terms