GPU Compute: Managed GPUs from T4 to B200 | Datature Vi

GPU Type	VRAM	CUDA Cores	Best For	Multi-GPU	Tier
T4	16 GB	2,560	Inference, Small LoRA	Up to 4	Starter
L4	24 GB	7,424	LoRA Fine-Tuning	Up to 8	Developer
A10	24 GB	9,216	General Purpose Training	Up to 8	Developer
A100Recommended	80 GB	6,912	Production SFT Training	Up to 32	Developer
H100	80 GB	16,896	Large-Scale Training, NVLink	Up to 64	Professional
B200	192 GB	18,000+	Largest Models, Multi-Node	Up to 64	Enterprise

MULTI-GPU SCALING

SCALE FROM 1 TO 64 GPUS.

Vi manages multi-GPU orchestration automatically. Select the number of GPUs in the hardware configuration modal and Vi handles data parallelism, gradient synchronization, and NVLink topology.

Up to 64GPUs Per Run

High-Bandwidth Interconnect

Multi-GPU runs on H100 and B200 tiers use NVLink for direct GPU-to-GPU communication. Vi configures the topology automatically.

AutomaticParallelism Config

Distributed Training

Vi handles parallelism strategy selection based on your model size and GPU count. No manual configuration of sharding or gradient sync required.

Per-EpochCheckpoint Frequency

Automatic Checkpointing

Model checkpoints saved periodically during training. Resume from any checkpoint if a run is interrupted or you want to branch from an earlier state.

AsyncExecution Mode

Background Execution

Training runs server-side on managed clusters. Close your browser, shut your laptop. Vi notifies you when the run completes.

Pre-FlightValidation Check

VRAM-Aware Scheduling

Vi estimates VRAM requirements before provisioning hardware. If your config exceeds the selected GPU tier, Vi warns you before launch.

Real-TimeDashboard Updates

Run Monitoring

Track loss curves, epoch progress, and GPU utilization in real time from the dashboard. Email and in-app notifications on completion.

VRAM ESTIMATOR

KNOW BEFORE YOU TRAIN.

Vi estimates VRAM requirements based on your model size, training method, and batch configuration. Select a model and method to see the estimated VRAM and the recommended GPU tier.

▸
Model Parameters
The base model parameter count is the primary driver of VRAM usage. A 7B model requires roughly 14 GB in FP16 just for weights alone.
▸
Training Method
LoRA adapters add minimal overhead (typically 1-5% of base parameters). Full SFT requires 2-3x the model weight size for optimizer states and gradients.
▸
Batch Size
Each sample in the batch requires activation memory. Gradient checkpointing trades compute for memory, reducing activation overhead by 60-80%.
▸
Sequence Length
Longer input sequences require quadratically more attention memory. Flash Attention 2 reduces this to near-linear scaling.

VRAM Estimation Table

Qwen 7B

LoRA (FP16)

12 GB

T4 / L4

Qwen 7B

Full SFT (FP16)

28 GB

A100

Qwen 7B

LoRA (NF4)

7 GB

Qwen 32B

LoRA (FP16)

48 GB

A100

Qwen 32B

Full SFT (FP16)

128 GB

B200

Qwen 72B

LoRA (NF4)

36 GB

A100

InternVL 8B

LoRA (FP16)

14 GB

T4 / L4

InternVL 38B

Full SFT (FP16)

142 GB

B200

Estimates include model weights, optimizer states, gradients, and activation memory with gradient checkpointing enabled. Actual usage may vary by 10-15% depending on sequence length and batch size.

INFRASTRUCTURE

YOUR GPUS OR OURS. VI MANAGES BOTH.

Use Vi Cloud for zero-setup managed training, or connect your own GPU cluster with custom runners. Your data stays in your infrastructure. Vi orchestrates the training pipeline either way.

Your Data

S3 / Azure / GCS

Base Model

Qwen / InternVL / Cosmos

Your GPUs

Custom Runners (BYOG)

Datature Vi

Orchestration Layer

Annotate / Train / EvaluateDPO / Deploy / Monitor

Vi Cloud

T4 to B200 Managed

Deployment

SDK / NIM / API

Monitoring

Metrics / Alerts / Logs

Vi Cloud (Managed)

Zero-setup GPU access from T4 to B200. Vi provisions, configures, and deallocates hardware automatically. Pay per training run.

Bring Your Own GPU

Connect your existing GPU cluster via custom runners. Vi orchestrates training on your hardware while keeping data in your infrastructure.

Hybrid

Use Vi Cloud for development and prototyping. Switch to your own cluster for production runs. Same training config, different compute backend.

MEMORY OPTIMIZATION

NF4 QUANTIZATION. 4X MEMORY SAVINGS.

Train larger models on smaller GPUs with 4-bit NormalFloat quantization. A 7B model drops from 28 GB to 7 GB VRAM. Combined with LoRA, you can fine-tune models that would otherwise require multi-GPU setups on a single card.

FP1616-bit

Relative Memory

1x (Baseline)

Quality:Full Precision

Best For:Production Training, Full SFT

Standard half-precision floating point. No quality loss. Recommended for final production runs and full supervised fine-tuning where maximum model quality is required.

INT88-bit

Relative Memory

2x Savings

Quality:Minimal Loss (<0.1%)

Best For:Inference, Large Model LoRA

8-bit integer quantization with dynamic range calibration. Virtually no quality degradation on most benchmarks. Good balance of memory savings and training stability.

NF44-bit

Relative Memory

4x Savings

Quality:Slight Loss (<0.5%)

Best For:LoRA on Large Models, Prototyping

NormalFloat 4-bit quantization with double quantization. Enables 7B LoRA fine-tuning on a T4 (16 GB). Recommended for rapid iteration and fitting larger models into limited VRAM budgets.

Example: Qwen 7B VRAM Breakdown

28 GB

FP16 Full SFT

14 GB weights + 14 GB optimizer

12 GB

FP16 LoRA

14 GB weights + minimal adapter overhead

7 GB

NF4 LoRA

3.5 GB weights + minimal adapter overhead

COST MANAGEMENT

TRAIN MORE. SPEND LESS.

Vi routes training jobs to the most cost-effective instance type for your workload. Background job queues batch runs for off-peak scheduling, the VRAM estimator prevents over-provisioning, and automatic instance selection keeps costs low.

Smart

Instance Routing

Vi routes training jobs to the most cost-effective instance type for your workload. Spot instances when available, on-demand when needed. Automatic failover between instance types.

Queue

Background Job Queue

Launch training runs and close your browser. Vi executes jobs in the background and sends email notifications when runs complete or encounter errors.

Auto

Right-Size GPU Selection

The VRAM estimator recommends the cheapest GPU tier that fits your workload. Stop paying for A100s when a T4 is sufficient for your LoRA fine-tune.

EDU

Academic Pricing

Discounted compute rates for verified academic and research institutions. Contact the Vi team with your institutional email for eligibility verification.

Vi Cloud (Managed)

Fully managed GPU infrastructure. No cloud account required. Vi provisions, schedules, and deallocates GPUs automatically. Pay per GPU-hour with per-second billing.

T4L4A10A100H100B200

Bring Your Own Cloud

Enterprise

Connect your existing AWS, GCP, or Azure account. Vi orchestrates training jobs on your cloud infrastructure while you retain full control over billing and data residency.

AWSGoogle CloudAzure

WORKFLOW

THREE STEPS. ZERO DEVOPS.

Configuring and launching a GPU training run takes under two minutes. No Dockerfiles, no CUDA drivers, no Kubernetes manifests. Select your hardware, configure the run, and monitor results.

Select GPU

Open the Hardware Configuration modal. Choose your GPU type and quantity from the available tiers. The VRAM estimator shows whether your selection fits the model and training method.

6 GPU tiers from T4 to B200
VRAM usage bar updates in real-time
Multi-GPU slider from 1 to 64
Spot instance toggle for cost savings

Configure Run

Set your training hyperparameters, dataset split, and output checkpoint location. Vi validates the full configuration against the selected hardware before launch.

Learning rate, epochs, batch size
LoRA rank and target modules
Quantization mode (FP16 / INT8 / NF4)
Checkpoint save interval and storage path

Monitor and Results

Launch the run and track progress in real-time. Loss curves, VRAM usage, and throughput metrics stream to the Neural Monitor dashboard. Email notification on completion.

Live loss and metric curves
GPU utilization and temperature
Automatic checkpoint saving
Email notification on completion

training-config.json

{

"gpu_type": "A100",

"gpu_count": 4,

"model": "Qwen2.5-VL-7B",

"method": "lora",

"quantization": "fp16",

"lora_rank": 16,

"epochs": 3,

"batch_size": 8,

"learning_rate": 2e-4,

"spot_instance": true,

"checkpoint_interval": 500,

"notify_on_complete": true

}

MANAGED GPU INFRASTRUCTURE
FOR VLM TRAINING.

SCALE FROM 1 TO 64 GPUS.

High-Bandwidth Interconnect

Distributed Training

Automatic Checkpointing

Background Execution

VRAM-Aware Scheduling

Run Monitoring

KNOW BEFORE YOU TRAIN.

YOUR GPUS OR OURS. VI MANAGES BOTH.

Vi Cloud (Managed)

Bring Your Own GPU

Hybrid

NF4 QUANTIZATION. 4X MEMORY SAVINGS.

TRAIN MORE. SPEND LESS.

Vi Cloud (Managed)

Bring Your Own Cloud

THREE STEPS. ZERO DEVOPS.

Select GPU

Configure Run

Monitor and Results

STOP CONFIGURING INFRA.
START TRAINING MODELS.

MANAGED GPU INFRASTRUCTUREFOR VLM TRAINING.

SCALE FROM 1 TO 64 GPUS.

High-Bandwidth Interconnect

Distributed Training

Automatic Checkpointing

Background Execution

VRAM-Aware Scheduling

Run Monitoring

KNOW BEFORE YOU TRAIN.

YOUR GPUS OR OURS. VI MANAGES BOTH.

Vi Cloud (Managed)

Bring Your Own GPU

Hybrid

NF4 QUANTIZATION. 4X MEMORY SAVINGS.

TRAIN MORE. SPEND LESS.

Vi Cloud (Managed)

Bring Your Own Cloud

THREE STEPS. ZERO DEVOPS.

Select GPU

Configure Run

Monitor and Results

STOP CONFIGURING INFRA.START TRAINING MODELS.

MANAGED GPU INFRASTRUCTURE
FOR VLM TRAINING.

STOP CONFIGURING INFRA.
START TRAINING MODELS.