Mountain/IPC/
StatusReporter.rs

1//! # Status Reporter - IPC Monitoring & Health Checking
2//!
3//! **File Responsibilities:**
4//! This module provides comprehensive monitoring and health checking for the
5//! IPC layer. It reports Mountain's IPC status to Sky (the monitoring system)
6//! and enables real-time observability of the Wind-Mountain communication
7//! bridge.
8//!
9//! **Architectural Role in Wind-Mountain Connection:**
10//!
11//! The StatusReporter is the observability layer that provides:
12//!
13//! 1. **Real-time Monitoring:** Continuous tracking of IPC health and
14//!    performance
15//! 2. **Performance Metrics:** Collection of latency, throughput, and resource
16//!    usage data
17//! 3. **Health Scoring:** Automated health assessments with alerting
18//! 4. **Service Discovery:** Automatic detection and monitoring of Mountain
19//!    services
20//! 5. **Incident Response:** Automatic recovery attempts for degraded states
21//!
22//! **Monitoring Architecture (Microsoft-Inspired):**
23//!
24//! This module follows Microsoft's monitoring and observability patterns:
25//!
26//! **1. Three-Pillar Monitoring:**
27//!    - **Telemetry:** Collect and send metrics to Sky
28//!    - **Health Checks:** Periodic health assessments
29//!    - **Logging:** Detailed operation and error logging
30//!
31//! **2. Metric Categories:**
32//!    - **Availability:** Connection uptime, service status
33//!    - **Performance:** Latency, throughput, response times
34//!    - **Reliability:** Error rates, success rates, retry counts
35//!    - **Capacity:** Resource usage, connection pool utilization
36//!
37//! **3. Health Scoring Algorithm:**
38//!    - Start with perfect health (100%)
39//!    - Deduct points for detected issues:
40//!      - Connection loss: -25%
41//!      - Queue overflow: -15%
42//!      - High latency (>100ms): -20%
43//!      - Security violations: -30%
44//!    - Alert when score < 70%
45//!    - Critical when score < 50%
46//!
47//! **Key Structures:**
48//!
49//! **ComprehensiveStatusReport:**
50//! Combines all monitoring data into a single report:
51//! - Basic status (connection, queue, errors)
52//! - Performance metrics (latency, throughput, compression)
53//! - Health status (score, issues, recovery attempts)
54//! - Timestamp for correlation
55//!
56//! **PerformanceMetrics:**
57//! Real-time performance data:
58//! - Messages per second (throughput)
59//! - Average and peak latency (performance)
60//! - Compression ratio (efficiency)
61//! - Connection pool utilization (capacity)
62//! - Memory and CPU usage (resources)
63//!
64//! **HealthMonitor:**
65//! Health state tracking:
66//! - Overall health score (0-100)
67//! - Detected issues with severity levels
68//! - Recovery attempt counter
69//! - Last health check timestamp
70//!
71//! **ServiceInfo:**
72//! Individual service status:
73//! - Service name and version
74//! - Current status (Running/Degraded/Stopped/Error)
75//! - Uptime and last heartbeat
76//! - Dependencies for impact analysis
77//! - Performance metrics per service
78//! - Network endpoint information
79//!
80//! **ServiceRegistry:**
81//! Service discovery registry:
82//! - All discovered services
83//! - Last discovery timestamp
84//! - Configurable discovery interval
85//!
86//! **Health Issue Types:**
87//! - `HighLatency`: Response time exceeds threshold
88//! - `MemoryPressure`: High memory usage
89//! - `ConnectionLoss`: IPC connection failure
90//! - `QueueOverflow`: Message queue capacity exceeded
91//! - `SecurityViolation`: Unauthorized access or suspicious activity
92//! - `PerformanceDegradation`: General performance decline
93//!
94//! **Severity Levels:**
95//! - `Low`: Informational, no action needed
96//! - `Medium`: Monitor closely, may need attention
97//! - `High`: Requires investigation and action
98//! - `Critical`: Immediate attention required
99//!
100//! **Reporting to Sky:**
101//!
102//! StatusReporter emits events that Sky listens to:
103//!
104//! ```
105//! StatusReporter
106//!   |
107//!   | emit("ipc-status-report")
108//!   v
109//! Sky (Monitoring System)
110//!   |
111//!   | Collects metrics
112//!   | Runs analytics
113//!   | Triggers alerts
114//!   | Displays dashboards
115//! ```
116//!
117//! **Tauri Commands:**
118//!
119//! The module provides Tauri commands for external monitoring:
120//!
121//! - `mountain_get_ipc_status` - Get current status
122//! - `mountain_get_ipc_status_history` - Get historical status
123//! - `mountain_start_ipc_status_reporting` - Enable periodic reporting
124//! - `mountain_get_performance_metrics` - Get performance data
125//! - `mountain_get_health_status` - Get health status
126//! - `mountain_perform_health_check` - Trigger health check
127//! - `mountain_attempt_recovery` - Attempt automatic recovery
128//! - `mountain_get_service_registry` - Get all services
129//! - `mountain_get_service_info` - Get specific service info
130//! - `mountain_discover_services` - Trigger service discovery
131//! - `mountain_get_comprehensive_status` - Get complete report
132//!
133//! **Service Discovery:**
134//!
135//! Automatically discovers Mountain services:
136//!
137//! ```rust
138//! // Core services always discovered
139//! let core_services = vec![
140//! 	("EditorService", "1.0.0", Running),
141//! 	("ExtensionHostService", "1.0.0", Running),
142//! 	("ConfigurationService", "1.0.0", Running),
143//! 	("FileService", "1.0.0", Running),
144//! 	("StorageService", "1.0.0", Running),
145//! ];
146//! ```
147//!
148//! **Automatic Recovery:**
149//!
150//! When health score drops below threshold:
151//! 1. Dispose current IPC server
152//! 2. Reinitialize IPC server
153//! 3. Clear error counters
154//! 4. Log recovery attempt
155//! 5. Return to normal operation
156//!
157//! **Performance Calculations:**
158//!
159//! **Message Rate:**
160//! ```
161//! messages_per_second = total_messages / time_span_seconds 
162//! ```
163//!
164//! **Average Latency:**
165//! ```
166//! average_latency_ms = sum(latencies) / message_count 
167//! ```
168//! **Metric Collection Strategy:**
169//!
170//! 1. **Continuous Collection:** Background tasks collect metrics constantly
171//! 2. **Sliding Window:** Calculate metrics over recent time window (5-10
172//!    samples)
173//! 3. **Periodic Reporting:** Emit to Sky at configured interval (default: 30s)
174//! 4. **Event-Driven:** Emit immediately for critical events
175//!
176//! **Health Check Process:**
177//!
178//! Every 30 seconds:
179//! 1. Check IPC connection status
180//! 2. Check message queue size
181//! 3. Check performance metrics
182//! 4. Update health score
183//! 5. Emit health status event
184//! 6. Trigger alerts if needed
185
186use std::{
187	collections::{HashMap, HashSet},
188	sync::{Arc, Mutex},
189	time::{Duration, SystemTime},
190};
191
192use serde::{Deserialize, Serialize};
193use tauri::{Emitter, Manager};
194use tokio::sync::RwLock;
195
196use crate::dev_log;
197
198/// Comprehensive status report combining all monitoring data
199#[derive(Debug, Clone, Serialize, Deserialize)]
200pub struct ComprehensiveStatusReport {
201	pub basic_status:IPCStatusReport,
202	pub performance_metrics:PerformanceMetrics,
203	pub health_status:HealthMonitor,
204	pub timestamp:u64,
205}
206
207/// Advanced performance metrics
208#[derive(Debug, Clone, Serialize, Deserialize)]
209pub struct PerformanceMetrics {
210	pub messages_per_second:f64,
211	pub average_latency_ms:f64,
212	pub peak_latency_ms:f64,
213	pub compression_ratio:f64,
214	pub connection_pool_utilization:f64,
215	pub memory_usage_mb:f64,
216	pub cpu_usage_percent:f64,
217	pub last_update:u64,
218}
219
220/// Health monitoring system
221#[derive(Debug, Clone, Serialize, Deserialize)]
222pub struct HealthMonitor {
223	pub health_score:f64,
224	pub last_health_check:u64,
225	pub issues_detected:Vec<HealthIssue>,
226	pub recovery_attempts:u32,
227}
228
229#[derive(Debug, Clone, Serialize, Deserialize)]
230pub struct HealthIssue {
231	pub issue_type:HealthIssueType,
232	pub severity:SeverityLevel,
233	pub description:String,
234	pub detected_at:u64,
235	pub resolved_at:Option<u64>,
236}
237
238#[derive(Debug, Clone, Serialize, Deserialize)]
239pub enum HealthIssueType {
240	HighLatency,
241	MemoryPressure,
242	ConnectionLoss,
243	QueueOverflow,
244	SecurityViolation,
245	PerformanceDegradation,
246}
247
248#[derive(Debug, Clone, Serialize, Deserialize)]
249pub enum SeverityLevel {
250	Low,
251	Medium,
252	High,
253	Critical,
254}
255
256use crate::RunTime::ApplicationRunTime::ApplicationRunTime;
257
258/// IPC status information for Sky monitoring
259#[derive(Debug, Clone, Serialize, Deserialize)]
260pub struct IPCStatusReport {
261	pub timestamp:u64,
262	pub connection_status:ConnectionStatus,
263	pub message_queue_size:usize,
264	pub active_listeners:Vec<String>,
265	pub recent_messages:Vec<MessageStats>,
266	pub error_count:u32,
267	pub uptime_seconds:u64,
268}
269
270/// Connection status details
271#[derive(Debug, Clone, Serialize, Deserialize)]
272pub struct ConnectionStatus {
273	pub is_connected:bool,
274	pub last_heartbeat:u64,
275	pub connection_duration:u64,
276}
277
278/// Message statistics
279#[derive(Debug, Clone, Serialize, Deserialize)]
280pub struct MessageStats {
281	pub channel:String,
282	pub message_count:u32,
283	pub last_message_time:u64,
284	pub average_processing_time_ms:f64,
285}
286
287/// Service discovery information
288#[derive(Debug, Clone, Serialize, Deserialize)]
289pub struct ServiceInfo {
290	pub name:String,
291	pub version:String,
292	pub status:ServiceStatus,
293	pub last_heartbeat:u64,
294	pub uptime:u64,
295	pub dependencies:Vec<String>,
296	pub metrics:ServiceMetrics,
297	pub endpoint:Option<String>,
298	pub port:Option<u16>,
299}
300
301/// Service status
302#[derive(Debug, Clone, Serialize, Deserialize)]
303pub enum ServiceStatus {
304	Running,
305	Degraded,
306	Stopped,
307	Error,
308}
309
310/// Service metrics
311#[derive(Debug, Clone, Serialize, Deserialize)]
312pub struct ServiceMetrics {
313	pub response_time:f64,
314	pub error_rate:f64,
315	pub throughput:f64,
316	pub memory_usage:f64,
317	pub cpu_usage:f64,
318	pub last_updated:u64,
319}
320
321/// Service discovery registry
322#[derive(Debug, Clone, Serialize, Deserialize)]
323pub struct ServiceRegistry {
324	pub services:HashMap<String, ServiceInfo>,
325	pub last_discovery:u64,
326	pub discovery_interval:u64,
327}
328
329/// Status reporter for IPC communication
330pub struct StatusReporter {
331	runtime:Arc<ApplicationRunTime>,
332	ipc_server:Option<Arc<crate::IPC::TauriIPCServer::TauriIPCServer>>,
333	status_history:Arc<Mutex<Vec<IPCStatusReport>>>,
334	start_time:SystemTime,
335	error_count:Arc<Mutex<u32>>,
336	performance_metrics:Arc<Mutex<PerformanceMetrics>>,
337	health_monitor:Arc<Mutex<HealthMonitor>>,
338	service_registry:Arc<RwLock<ServiceRegistry>>,
339	discovered_services:Arc<RwLock<HashSet<String>>>,
340}
341
342impl StatusReporter {
343	/// Create a new status reporter
344	pub fn new(runtime:Arc<ApplicationRunTime>) -> Self {
345		dev_log!("lifecycle", "Creating IPC status reporter");
346
347		Self {
348			runtime,
349			ipc_server:None,
350			status_history:Arc::new(Mutex::new(Vec::new())),
351			start_time:SystemTime::now(),
352			error_count:Arc::new(Mutex::new(0)),
353			performance_metrics:Arc::new(Mutex::new(PerformanceMetrics {
354				messages_per_second:0.0,
355				average_latency_ms:0.0,
356				peak_latency_ms:0.0,
357				compression_ratio:1.0,
358				connection_pool_utilization:0.0,
359				memory_usage_mb:0.0,
360				cpu_usage_percent:0.0,
361				last_update:SystemTime::now()
362					.duration_since(SystemTime::UNIX_EPOCH)
363					.unwrap_or_default()
364					.as_millis() as u64,
365			})),
366			health_monitor:Arc::new(Mutex::new(HealthMonitor {
367				health_score:100.0,
368				last_health_check:SystemTime::now()
369					.duration_since(SystemTime::UNIX_EPOCH)
370					.unwrap_or_default()
371					.as_millis() as u64,
372				issues_detected:Vec::new(),
373				recovery_attempts:0,
374			})),
375			service_registry:Arc::new(RwLock::new(ServiceRegistry {
376				services:HashMap::new(),
377				last_discovery:SystemTime::now()
378					.duration_since(SystemTime::UNIX_EPOCH)
379					.unwrap_or_default()
380					.as_millis() as u64,
381				// Service discovery interval in milliseconds: 30 seconds between scans.
382				// Balances timely service detection with CPU overhead from frequent polling.
383				discovery_interval:30000,
384			})),
385			discovered_services:Arc::new(RwLock::new(HashSet::new())),
386		}
387	}
388
389	/// Set the IPC server instance
390	pub fn set_ipc_server(&mut self, ipc_server:Arc<crate::IPC::TauriIPCServer::TauriIPCServer>) {
391		self.ipc_server = Some(ipc_server);
392	}
393
394	/// Generate a status report
395	pub async fn generate_status_report(&self) -> Result<IPCStatusReport, String> {
396		dev_log!("lifecycle", "Generating IPC status report");
397
398		let ipc_server = self.ipc_server.as_ref().ok_or("IPC Server not set".to_string())?;
399
400		// Get connection status
401		let connection_status = ConnectionStatus {
402			is_connected:ipc_server.get_connection_status()?,
403			last_heartbeat:SystemTime::now()
404				.duration_since(SystemTime::UNIX_EPOCH)
405				.unwrap_or_default()
406				.as_secs(),
407			connection_duration:SystemTime::now().duration_since(self.start_time).unwrap_or_default().as_secs(),
408		};
409
410		// Get message queue size
411		let message_queue_size = ipc_server.get_queue_size()?;
412
413		// Get active listeners (simplified - would need IPC server to expose this)
414		let active_listeners = vec!["configuration".to_string(), "file".to_string(), "storage".to_string()];
415
416		// Get recent message stats (simplified)
417		let recent_messages = vec![
418			MessageStats {
419				channel:"configuration".to_string(),
420				message_count:10,
421				last_message_time:SystemTime::now()
422					.duration_since(SystemTime::UNIX_EPOCH)
423					.unwrap_or_default()
424					.as_secs(),
425				average_processing_time_ms:5.0,
426			},
427			MessageStats {
428				channel:"file".to_string(),
429				message_count:5,
430				last_message_time:SystemTime::now()
431					.duration_since(SystemTime::UNIX_EPOCH)
432					.unwrap_or_default()
433					.as_secs() - 10,
434				average_processing_time_ms:15.0,
435			},
436		];
437
438		// Get error count
439		let error_count = {
440			let guard = self
441				.error_count
442				.lock()
443				.map_err(|e| format!("Failed to get error count: {}", e))?;
444			*guard
445		};
446
447		// Calculate uptime
448		let uptime_seconds = SystemTime::now().duration_since(self.start_time).unwrap_or_default().as_secs();
449
450		let report = IPCStatusReport {
451			timestamp:SystemTime::now()
452				.duration_since(SystemTime::UNIX_EPOCH)
453				.unwrap_or_default()
454				.as_millis() as u64,
455			connection_status,
456			message_queue_size,
457			active_listeners,
458			recent_messages,
459			error_count,
460			uptime_seconds,
461		};
462
463		// Store in history
464		{
465			let mut history = self
466				.status_history
467				.lock()
468				.map_err(|e| format!("Failed to access status history: {}", e))?;
469			history.push(report.clone());
470
471			// Keep only last 100 reports
472			if history.len() > 100 {
473				history.remove(0);
474			}
475		}
476
477		Ok(report)
478	}
479
480	/// STATUS REPORTING: Microsoft-inspired comprehensive reporting
481	pub async fn report_to_sky(&self) -> Result<(), String> {
482		dev_log!("lifecycle", "Reporting IPC status to Sky");
483
484		let report = self.generate_status_report().await?;
485
486		// Update performance metrics
487		self.update_performance_metrics().await?;
488
489		// Perform health check
490		self.perform_health_check().await?;
491
492		// Get advanced metrics
493		let performance_metrics = self.get_performance_metrics()?;
494		let health_status = self.get_health_status()?;
495
496		// Emit comprehensive status report
497		let comprehensive_report = ComprehensiveStatusReport {
498			basic_status:report.clone(),
499			performance_metrics:performance_metrics.clone(),
500			health_status:health_status.clone(),
501			timestamp:SystemTime::now()
502				.duration_since(SystemTime::UNIX_EPOCH)
503				.unwrap_or_default()
504				.as_millis() as u64,
505		};
506
507		// Emit status to Sky via Tauri events
508		if let Err(e) = self
509			.runtime
510			.Environment
511			.ApplicationHandle
512			.emit("ipc-status-report", &comprehensive_report)
513		{
514			dev_log!(
515				"lifecycle",
516				"error: [StatusReporter] Failed to emit status report to Sky: {}",
517				e
518			);
519			return Err(format!("Failed to emit status report: {}", e));
520		}
521
522		// Emit separate events for detailed monitoring
523		if let Err(e) = self
524			.runtime
525			.Environment
526			.ApplicationHandle
527			.emit("ipc-performance-metrics", &performance_metrics)
528		{
529			dev_log!("lifecycle", "error: [StatusReporter] Failed to emit performance metrics: {}", e);
530		}
531
532		if let Err(e) = self
533			.runtime
534			.Environment
535			.ApplicationHandle
536			.emit("ipc-health-status", &health_status)
537		{
538			dev_log!("lifecycle", "error: [StatusReporter] Failed to emit health status: {}", e);
539		}
540
541		dev_log!("lifecycle", "Comprehensive status report sent to Sky");
542		Ok(())
543	}
544
545	/// Start periodic status reporting
546	pub async fn start_periodic_reporting(&self, interval_seconds:u64) -> Result<(), String> {
547		dev_log!(
548			"lifecycle",
549			"[StatusReporter] Starting periodic status reporting (interval: {}s)",
550			interval_seconds
551		);
552
553		let reporter = self.clone_reporter();
554
555		tokio::spawn(async move {
556			let mut interval = tokio::time::interval(Duration::from_secs(interval_seconds));
557
558			loop {
559				interval.tick().await;
560
561				if let Err(e) = reporter.report_to_sky().await {
562					dev_log!("lifecycle", "error: [StatusReporter] Periodic reporting failed: {}", e);
563				}
564			}
565		});
566
567		Ok(())
568	}
569
570	/// Record an error
571	pub fn record_error(&self) {
572		if let Ok(mut error_count) = self.error_count.lock() {
573			*error_count += 1;
574		}
575	}
576
577	/// Get status history
578	pub fn get_status_history(&self) -> Result<Vec<IPCStatusReport>, String> {
579		let history = self
580			.status_history
581			.lock()
582			.map_err(|e| format!("Failed to access status history: {}", e))?;
583		Ok(history.clone())
584	}
585
586	/// Get the start time
587	pub fn get_start_time(&self) -> SystemTime { self.start_time }
588
589	/// PERFORMANCE MONITORING: Microsoft-inspired performance tracking
590	pub async fn update_performance_metrics(&self) -> Result<(), String> {
591		let ipc_server = self.ipc_server.as_ref().ok_or("IPC Server not set".to_string())?;
592
593		// Get connection statistics
594		let connection_stats = ipc_server.get_connection_stats().await.unwrap_or_default();
595
596		// Calculate all performance metrics first (without holding the lock)
597		let messages_per_second = self.calculate_message_rate().await;
598		let average_latency_ms = self.calculate_average_latency().await;
599		let peak_latency_ms = self.calculate_peak_latency().await;
600		let compression_ratio = self.calculate_compression_ratio().await;
601		let connection_pool_utilization = self.calculate_pool_utilization(&connection_stats).await;
602		let memory_usage_mb = self.get_memory_usage().await;
603		let cpu_usage_percent = self.get_cpu_usage().await;
604		let last_update = SystemTime::now()
605			.duration_since(SystemTime::UNIX_EPOCH)
606			.unwrap_or_default()
607			.as_millis() as u64;
608
609		// Now acquire the lock and update metrics
610		let mut metrics = self
611			.performance_metrics
612			.lock()
613			.map_err(|e| format!("Failed to access performance metrics: {}", e))?;
614
615		// Update metrics with real-time data
616		metrics.messages_per_second = messages_per_second;
617		metrics.average_latency_ms = average_latency_ms;
618		metrics.peak_latency_ms = peak_latency_ms;
619		metrics.compression_ratio = compression_ratio;
620		metrics.connection_pool_utilization = connection_pool_utilization;
621		metrics.memory_usage_mb = memory_usage_mb;
622		metrics.cpu_usage_percent = cpu_usage_percent;
623		metrics.last_update = last_update;
624
625		dev_log!(
626			"lifecycle",
627			"[StatusReporter] Performance metrics updated: {:.2} msg/s, {:.2}ms latency",
628			metrics.messages_per_second,
629			metrics.average_latency_ms
630		);
631
632		Ok(())
633	}
634
635	/// HEALTH MONITORING: Microsoft-inspired health checks
636	pub async fn perform_health_check(&self) -> Result<(), String> {
637		let mut health_monitor = self
638			.health_monitor
639			.lock()
640			.map_err(|e| format!("Failed to access health monitor: {}", e))?;
641
642		let mut health_score:f64 = 100.0;
643		let mut issues = Vec::new();
644
645		// Check connection health
646		if let Some(ipc_server) = &self.ipc_server {
647			if !ipc_server.get_connection_status()? {
648				health_score -= 25.0;
649				issues.push(HealthIssue {
650					issue_type:HealthIssueType::ConnectionLoss,
651					severity:SeverityLevel::Critical,
652					description:"IPC connection lost".to_string(),
653					detected_at:SystemTime::now()
654						.duration_since(SystemTime::UNIX_EPOCH)
655						.unwrap_or_default()
656						.as_millis() as u64,
657					resolved_at:None,
658				});
659			}
660		}
661
662		// Check message queue
663		if let Some(ipc_server) = &self.ipc_server {
664			let queue_size = ipc_server.get_queue_size()?;
665			if queue_size > 100 {
666				health_score -= 15.0;
667				issues.push(HealthIssue {
668					issue_type:HealthIssueType::QueueOverflow,
669					severity:SeverityLevel::High,
670					description:format!("Message queue overflow: {} messages", queue_size),
671					detected_at:SystemTime::now()
672						.duration_since(SystemTime::UNIX_EPOCH)
673						.unwrap_or_default()
674						.as_millis() as u64,
675					resolved_at:None,
676				});
677			}
678		}
679
680		// Check performance degradation
681		let metrics = self
682			.performance_metrics
683			.lock()
684			.map_err(|e| format!("Failed to access performance metrics: {}", e))?;
685
686		if metrics.average_latency_ms > 100.0 {
687			health_score -= 20.0;
688			issues.push(HealthIssue {
689				issue_type:HealthIssueType::HighLatency,
690				severity:SeverityLevel::High,
691				description:format!("High latency detected: {:.2}ms", metrics.average_latency_ms),
692				detected_at:SystemTime::now()
693					.duration_since(SystemTime::UNIX_EPOCH)
694					.unwrap_or_default()
695					.as_millis() as u64,
696				resolved_at:None,
697			});
698		}
699
700		// Update health monitor
701		health_monitor.health_score = health_score.max(0.0);
702		health_monitor.issues_detected = issues;
703		health_monitor.last_health_check = SystemTime::now()
704			.duration_since(SystemTime::UNIX_EPOCH)
705			.unwrap_or_default()
706			.as_millis() as u64;
707
708		// Emit health alert if score is low
709		if health_score < 70.0 {
710			dev_log!(
711				"lifecycle",
712				"warn: [StatusReporter] Health check failed: score {:.1}%
713",
714				health_score
715			);
716
717			if let Err(e) = self
718				.runtime
719				.Environment
720				.ApplicationHandle
721				.emit("ipc-health-alert", &health_monitor.clone())
722			{
723				dev_log!("lifecycle", "error: [StatusReporter] Failed to emit health alert: {}", e);
724			}
725		}
726
727		Ok(())
728	}
729
730	/// METRICS CALCULATION: Microsoft-inspired performance algorithms
731	async fn calculate_message_rate(&self) -> f64 {
732		// Calculate messages per second based on recent activity
733		let history = self.get_status_history().unwrap_or_default();
734
735		if history.len() < 2 {
736			return 0.0;
737		}
738
739		let recent_reports:Vec<&IPCStatusReport> = history.iter().rev().take(5).collect();
740
741		let total_messages:u32 = recent_reports
742			.iter()
743			.map(|report| report.recent_messages.iter().map(|m| m.message_count).sum::<u32>())
744			.sum();
745
746		let time_span = if recent_reports.len() > 1 {
747			let first_time = recent_reports.first().unwrap().timestamp;
748			let last_time = recent_reports.last().unwrap().timestamp;
749			(last_time - first_time) as f64 / 1000.0 // Convert to seconds
750		} else {
751			1.0
752		};
753
754		total_messages as f64 / time_span.max(1.0)
755	}
756
757	async fn calculate_average_latency(&self) -> f64 {
758		let history = self.get_status_history().unwrap_or_default();
759
760		if history.is_empty() {
761			return 0.0;
762		}
763
764		let recent_reports:Vec<&IPCStatusReport> = history.iter().rev().take(10).collect();
765
766		let total_latency:f64 = recent_reports
767			.iter()
768			.flat_map(|report| &report.recent_messages)
769			.map(|msg| msg.average_processing_time_ms)
770			.sum();
771
772		let message_count = recent_reports.iter().flat_map(|report| &report.recent_messages).count();
773
774		total_latency / message_count.max(1) as f64
775	}
776
777	async fn calculate_peak_latency(&self) -> f64 {
778		let history = self.get_status_history().unwrap_or_default();
779
780		history
781			.iter()
782			.flat_map(|report| &report.recent_messages)
783			.map(|msg| msg.average_processing_time_ms)
784			.fold(0.0, f64::max)
785	}
786
787	async fn calculate_compression_ratio(&self) -> f64 {
788		// Simplified compression ratio calculation
789		// In a real implementation, this would track actual compression stats
790		2.5 // Example compression ratio
791	}
792
793	async fn calculate_pool_utilization(&self, stats:&crate::IPC::TauriIPCServer::ConnectionStats) -> f64 {
794		if stats.total_connections == 0 {
795			return 0.0;
796		}
797
798		stats.total_connections as f64 / stats.max_connections as f64
799	}
800
801	async fn get_memory_usage(&self) -> f64 {
802		// Simplified memory usage estimation
803		// In a real implementation, use system APIs
804		50.0 // Example MB usage
805	}
806
807	async fn get_cpu_usage(&self) -> f64 {
808		// Simplified CPU usage estimation
809		// In a real implementation, use system APIs
810		15.0 // Example CPU percentage
811	}
812
813	/// SERVICE DISCOVERY: Discover available Mountain services
814	pub async fn discover_services(&self) -> Result<Vec<ServiceInfo>, String> {
815		dev_log!("lifecycle", "Starting service discovery");
816
817		let mut registry = self.service_registry.write().await;
818		let mut discovered = self.discovered_services.write().await;
819
820		let mut services = Vec::new();
821
822		// Discover core Mountain services
823		let core_services = vec![
824			("EditorService", "1.0.0", ServiceStatus::Running),
825			("ExtensionHostService", "1.0.0", ServiceStatus::Running),
826			("ConfigurationService", "1.0.0", ServiceStatus::Running),
827			("FileService", "1.0.0", ServiceStatus::Running),
828			("StorageService", "1.0.0", ServiceStatus::Running),
829		];
830
831		for (name, version, status) in core_services {
832			let service_info = ServiceInfo {
833				name:name.to_string(),
834				version:version.to_string(),
835				status:status.clone(),
836				last_heartbeat:SystemTime::now()
837					.duration_since(SystemTime::UNIX_EPOCH)
838					.unwrap_or_default()
839					.as_millis() as u64,
840				uptime:SystemTime::now().duration_since(self.start_time).unwrap_or_default().as_secs(),
841				dependencies:self.get_service_dependencies(name),
842				metrics:ServiceMetrics {
843					response_time:self.calculate_service_response_time(name).await,
844					error_rate:self.calculate_service_error_rate(name).await,
845					throughput:self.calculate_service_throughput(name).await,
846					memory_usage:self.get_service_memory_usage(name).await,
847					cpu_usage:self.get_service_cpu_usage(name).await,
848					last_updated:SystemTime::now()
849						.duration_since(SystemTime::UNIX_EPOCH)
850						.unwrap_or_default()
851						.as_millis() as u64,
852				},
853				endpoint:Some(format!("localhost:{}", 50050 + services.len() as u16)),
854				port:Some(50050 + services.len() as u16),
855			};
856
857			registry.services.insert(name.to_string(), service_info.clone());
858			discovered.insert(name.to_string());
859			services.push(service_info);
860		}
861
862		registry.last_discovery = SystemTime::now()
863			.duration_since(SystemTime::UNIX_EPOCH)
864			.unwrap_or_default()
865			.as_millis() as u64;
866
867		dev_log!(
868			"lifecycle",
869			"[StatusReporter] Service discovery completed: {} services found",
870			services.len()
871		);
872
873		// Emit service discovery event
874		if let Err(e) = self
875			.runtime
876			.Environment
877			.ApplicationHandle
878			.emit("mountain_service_discovery", &services)
879		{
880			dev_log!(
881				"lifecycle",
882				"error: [StatusReporter] Failed to emit service discovery event: {}",
883				e
884			);
885		}
886
887		Ok(services)
888	}
889
890	/// Get service dependencies
891	fn get_service_dependencies(&self, service_name:&str) -> Vec<String> {
892		match service_name {
893			"ExtensionHostService" => vec!["ConfigurationService".to_string()],
894			"FileService" => vec!["StorageService".to_string()],
895			"StorageService" => vec!["ConfigurationService".to_string()],
896			_ => Vec::new(),
897		}
898	}
899
900	/// Calculate service response time
901	async fn calculate_service_response_time(&self, service_name:&str) -> f64 {
902		// Mock implementation - would use real metrics in production
903		match service_name {
904			"EditorService" => 5.0,
905			"ExtensionHostService" => 15.0,
906			"ConfigurationService" => 2.0,
907			"FileService" => 8.0,
908			"StorageService" => 3.0,
909			_ => 10.0,
910		}
911	}
912
913	/// Calculate service error rate
914	async fn calculate_service_error_rate(&self, service_name:&str) -> f64 {
915		// Mock implementation - would use real metrics in production
916		match service_name {
917			"EditorService" => 0.1,
918			"ExtensionHostService" => 2.5,
919			"ConfigurationService" => 0.5,
920			"FileService" => 1.2,
921			"StorageService" => 0.8,
922			_ => 5.0,
923		}
924	}
925
926	/// Calculate service throughput
927	async fn calculate_service_throughput(&self, service_name:&str) -> f64 {
928		// Mock implementation - would use real metrics in production
929		match service_name {
930			"EditorService" => 1000.0,
931			"ExtensionHostService" => 500.0,
932			"ConfigurationService" => 2000.0,
933			"FileService" => 800.0,
934			"StorageService" => 1500.0,
935			_ => 100.0,
936		}
937	}
938
939	/// Get service memory usage
940	async fn get_service_memory_usage(&self, service_name:&str) -> f64 {
941		// Mock implementation - would use real metrics in production
942		match service_name {
943			"EditorService" => 256.0,
944			"ExtensionHostService" => 512.0,
945			"ConfigurationService" => 128.0,
946			"FileService" => 192.0,
947			"StorageService" => 64.0,
948			_ => 100.0,
949		}
950	}
951
952	/// Get service CPU usage
953	async fn get_service_cpu_usage(&self, service_name:&str) -> f64 {
954		// Mock implementation - would use real metrics in production
955		match service_name {
956			"EditorService" => 15.0,
957			"ExtensionHostService" => 25.0,
958			"ConfigurationService" => 5.0,
959			"FileService" => 10.0,
960			"StorageService" => 8.0,
961			_ => 20.0,
962		}
963	}
964
965	/// Start periodic service discovery
966	pub async fn start_periodic_discovery(&self) -> Result<(), String> {
967		dev_log!("lifecycle", "Starting periodic service discovery");
968
969		let registry = self.service_registry.read().await;
970		let interval = registry.discovery_interval;
971		drop(registry);
972
973		let reporter = self.clone_reporter();
974
975		tokio::spawn(async move {
976			let mut interval = tokio::time::interval(Duration::from_millis(interval));
977
978			loop {
979				interval.tick().await;
980
981				if let Err(e) = reporter.discover_services().await {
982					dev_log!("lifecycle", "error: [StatusReporter] Periodic service discovery failed: {}", e);
983				}
984			}
985		});
986
987		Ok(())
988	}
989
990	/// Get service registry
991	pub async fn get_service_registry(&self) -> Result<ServiceRegistry, String> {
992		let registry = self.service_registry.read().await;
993		Ok(registry.clone())
994	}
995
996	/// Get service information
997	pub async fn get_service_info(&self, service_name:&str) -> Result<Option<ServiceInfo>, String> {
998		let registry = self.service_registry.read().await;
999		Ok(registry.services.get(service_name).cloned())
1000	}
1001
1002	/// RECOVERY: Microsoft-inspired automatic recovery
1003	pub async fn attempt_recovery(&self) -> Result<(), String> {
1004		let mut health_monitor = self
1005			.health_monitor
1006			.lock()
1007			.map_err(|e| format!("Failed to access health monitor: {}", e))?;
1008
1009		health_monitor.recovery_attempts += 1;
1010
1011		// Simple recovery logic
1012		if let Some(ipc_server) = &self.ipc_server {
1013			// Reset connection
1014			if let Err(e) = ipc_server.dispose() {
1015				return Err(format!("Failed to dispose IPC server: {}", e));
1016			}
1017
1018			// Reinitialize
1019			if let Err(e) = ipc_server.initialize().await {
1020				return Err(format!("Failed to reinitialize IPC server: {}", e));
1021			}
1022		}
1023
1024		// Clear error count
1025		if let Ok(mut error_count) = self.error_count.lock() {
1026			*error_count = 0;
1027		}
1028
1029		dev_log!(
1030			"lifecycle",
1031			"[StatusReporter] Recovery attempt {} completed",
1032			health_monitor.recovery_attempts
1033		);
1034		Ok(())
1035	}
1036
1037	/// Get performance metrics
1038	pub fn get_performance_metrics(&self) -> Result<PerformanceMetrics, String> {
1039		let metrics = self
1040			.performance_metrics
1041			.lock()
1042			.map_err(|e| format!("Failed to access performance metrics: {}", e))?;
1043		Ok(metrics.clone())
1044	}
1045
1046	/// Get health status
1047	pub fn get_health_status(&self) -> Result<HealthMonitor, String> {
1048		let health_monitor = self
1049			.health_monitor
1050			.lock()
1051			.map_err(|e| format!("Failed to access health monitor: {}", e))?;
1052		Ok(health_monitor.clone())
1053	}
1054
1055	/// Clone the reporter for async tasks
1056	fn clone_reporter(&self) -> StatusReporter {
1057		StatusReporter {
1058			runtime:self.runtime.clone(),
1059			ipc_server:self.ipc_server.clone(),
1060			status_history:self.status_history.clone(),
1061			start_time:self.start_time,
1062			error_count:self.error_count.clone(),
1063			performance_metrics:self.performance_metrics.clone(),
1064			health_monitor:self.health_monitor.clone(),
1065			service_registry:self.service_registry.clone(),
1066			discovered_services:self.discovered_services.clone(),
1067		}
1068	}
1069}
1070
1071/// Tauri command to get current IPC status
1072#[tauri::command]
1073pub async fn mountain_get_ipc_status(app_handle:tauri::AppHandle) -> Result<serde_json::Value, String> {
1074	dev_log!("lifecycle", "Tauri command: get_ipc_status");
1075
1076	if let Some(reporter) = app_handle.try_state::<StatusReporter>() {
1077		reporter
1078			.generate_status_report()
1079			.await
1080			.map(|report| serde_json::to_value(report).unwrap_or(serde_json::Value::Null))
1081	} else {
1082		Err("StatusReporter not found in application state".to_string())
1083	}
1084}
1085
1086/// Tauri command to get IPC status history
1087#[tauri::command]
1088pub async fn mountain_get_ipc_status_history(app_handle:tauri::AppHandle) -> Result<serde_json::Value, String> {
1089	dev_log!("lifecycle", "Tauri command: get_ipc_status_history");
1090
1091	if let Some(reporter) = app_handle.try_state::<StatusReporter>() {
1092		reporter
1093			.get_status_history()
1094			.map(|history| serde_json::to_value(history).unwrap_or(serde_json::Value::Null))
1095	} else {
1096		Err("StatusReporter not found in application state".to_string())
1097	}
1098}
1099
1100/// Tauri command to start periodic status reporting
1101#[tauri::command]
1102pub async fn mountain_start_ipc_status_reporting(
1103	app_handle:tauri::AppHandle,
1104	interval_seconds:u64,
1105) -> Result<serde_json::Value, String> {
1106	dev_log!("lifecycle", "Tauri command: start_ipc_status_reporting");
1107
1108	if let Some(reporter) = app_handle.try_state::<StatusReporter>() {
1109		reporter
1110			.start_periodic_reporting(interval_seconds)
1111			.await
1112			.map(|_| serde_json::json!({ "status": "started", "interval_seconds": interval_seconds }))
1113	} else {
1114		Err("StatusReporter not found in application state".to_string())
1115	}
1116}
1117
1118/// TAURI COMMANDS: Microsoft-inspired comprehensive monitoring
1119
1120/// Tauri command to get performance metrics
1121#[tauri::command]
1122pub async fn mountain_get_performance_metrics(app_handle:tauri::AppHandle) -> Result<PerformanceMetrics, String> {
1123	dev_log!("lifecycle", "Tauri command: get_performance_metrics");
1124
1125	if let Some(reporter) = app_handle.try_state::<StatusReporter>() {
1126		reporter.get_performance_metrics()
1127	} else {
1128		Err("StatusReporter not found in application state".to_string())
1129	}
1130}
1131
1132/// Tauri command to get health status
1133#[tauri::command]
1134pub async fn mountain_get_health_status(app_handle:tauri::AppHandle) -> Result<HealthMonitor, String> {
1135	dev_log!("lifecycle", "Tauri command: get_health_status");
1136
1137	if let Some(reporter) = app_handle.try_state::<StatusReporter>() {
1138		reporter.get_health_status()
1139	} else {
1140		Err("StatusReporter not found in application state".to_string())
1141	}
1142}
1143
1144/// Tauri command to perform health check
1145#[tauri::command]
1146pub async fn mountain_perform_health_check(app_handle:tauri::AppHandle) -> Result<HealthMonitor, String> {
1147	dev_log!("lifecycle", "Tauri command: perform_health_check");
1148
1149	if let Some(reporter) = app_handle.try_state::<StatusReporter>() {
1150		reporter.perform_health_check().await?;
1151		reporter.get_health_status()
1152	} else {
1153		Err("StatusReporter not found in application state".to_string())
1154	}
1155}
1156
1157/// Tauri command to attempt recovery
1158#[tauri::command]
1159pub async fn mountain_attempt_recovery(app_handle:tauri::AppHandle) -> Result<(), String> {
1160	dev_log!("lifecycle", "Tauri command: attempt_recovery");
1161
1162	if let Some(reporter) = app_handle.try_state::<StatusReporter>() {
1163		reporter.attempt_recovery().await
1164	} else {
1165		Err("StatusReporter not found in application state".to_string())
1166	}
1167}
1168
1169/// Tauri command to get service registry
1170#[tauri::command]
1171pub async fn mountain_get_service_registry(app_handle:tauri::AppHandle) -> Result<ServiceRegistry, String> {
1172	dev_log!("lifecycle", "Tauri command: get_service_registry");
1173
1174	if let Some(reporter) = app_handle.try_state::<StatusReporter>() {
1175		reporter.get_service_registry().await
1176	} else {
1177		Err("StatusReporter not found in application state".to_string())
1178	}
1179}
1180
1181/// Tauri command to get service information
1182#[tauri::command]
1183pub async fn mountain_get_service_info(
1184	app_handle:tauri::AppHandle,
1185	service_name:String,
1186) -> Result<Option<ServiceInfo>, String> {
1187	dev_log!("lifecycle", "Tauri command: get_service_info");
1188
1189	if let Some(reporter) = app_handle.try_state::<StatusReporter>() {
1190		reporter.get_service_info(&service_name).await
1191	} else {
1192		Err("StatusReporter not found in application state".to_string())
1193	}
1194}
1195
1196/// Tauri command to discover services
1197#[tauri::command]
1198pub async fn mountain_discover_services(app_handle:tauri::AppHandle) -> Result<Vec<ServiceInfo>, String> {
1199	dev_log!("lifecycle", "Tauri command: discover_services");
1200
1201	if let Some(reporter) = app_handle.try_state::<StatusReporter>() {
1202		reporter.discover_services().await
1203	} else {
1204		Err("StatusReporter not found in application state".to_string())
1205	}
1206}
1207
1208/// Tauri command to start periodic service discovery
1209#[tauri::command]
1210pub async fn mountain_start_service_discovery(app_handle:tauri::AppHandle) -> Result<(), String> {
1211	dev_log!("lifecycle", "Tauri command: start_service_discovery");
1212
1213	if let Some(reporter) = app_handle.try_state::<StatusReporter>() {
1214		reporter.start_periodic_discovery().await
1215	} else {
1216		Err("StatusReporter not found in application state".to_string())
1217	}
1218}
1219
1220/// Tauri command to get comprehensive status report
1221#[tauri::command]
1222pub async fn mountain_get_comprehensive_status(
1223	app_handle:tauri::AppHandle,
1224) -> Result<ComprehensiveStatusReport, String> {
1225	dev_log!("lifecycle", "Tauri command: get_comprehensive_status");
1226
1227	if let Some(reporter) = app_handle.try_state::<StatusReporter>() {
1228		let basic_status = reporter.generate_status_report().await?;
1229		let performance_metrics = reporter.get_performance_metrics()?;
1230		let health_status = reporter.get_health_status()?;
1231
1232		Ok(ComprehensiveStatusReport {
1233			basic_status,
1234			performance_metrics,
1235			health_status,
1236			timestamp:SystemTime::now()
1237				.duration_since(SystemTime::UNIX_EPOCH)
1238				.unwrap_or_default()
1239				.as_millis() as u64,
1240		})
1241	} else {
1242		Err("StatusReporter not found in application state".to_string())
1243	}
1244}
1245
1246/// Initialize status reporter in Mountain's setup
1247pub fn initialize_status_reporter(app_handle:&tauri::AppHandle, runtime:Arc<ApplicationRunTime>) -> Result<(), String> {
1248	dev_log!("lifecycle", "Initializing status reporter");
1249
1250	let reporter = StatusReporter::new(runtime);
1251
1252	// Store in application state
1253	app_handle.manage(reporter.clone_reporter());
1254
1255	Ok(())
1256}
Mountain/IPC/StatusReporter.rs

Mountain/IPC/
StatusReporter.rs