The Medical Research Council grading system has served through decades for the evaluation of muscle strength and has been recognized as a cardinal feature of daily neurological, rehabilitation and general medicine examination of patients, despite being respectfully criticized due to the unequal width of its response options. No study has systematically examined, through modern psychometric approach, whether physicians are able to properly use the Medical Research Council grades. The objectives of this study were: (i) to investigate physicians' ability to discriminate among the Medical Research Council categories in patients with different neuromuscular disorders and with various degrees of weakness through thresholds examination using Rasch analysis as a modern psychometric method; (ii) to examine possible factors influencing physicians' ability to apply the Medical Research Council categories through differential item function analyses; and (iii) to examine whether the widely used Medical Research Council 12 muscles sum score in patients with Guillain-Barré syndrome and chronic inflammatory demyelinating polyradiculoneuropathy would meet Rasch model's expectations. A total of 1065 patients were included from nine cohorts with the following diseases: Guillain-Barré syndrome (n = 480); myotonic dystrophy type-1 (n = 169); chronic inflammatory demyelinating polyradiculoneuropathy (n = 139); limb-girdle muscular dystrophy (n = 105); multifocal motor neuropathy (n = 102); Pompe's disease (n = 62) and monoclonal gammopathy of undetermined related polyneuropathy (n = 8). Medical Research Council data of 72 muscles were collected. Rasch analyses were performed on Medical Research Council data for each cohort separately and after pooling data at the muscle level to increase category frequencies, and on the Medical Research Council sum score in patients with Guillain-Barré syndrome and chronic inflammatory demyelinating polyradiculoneuropathy. Disordered thresholds were demonstrated in 74-79% of the muscles examined, indicating physicians' inability to discriminate between most Medical Research Council categories. Factors such as physicians' experience or illness type did not influence these findings. Thresholds were restored after rescoring the Medical Research Council grades from six to four options (0, paralysis; 1, severe weakness; 2, slight weakness; 3, normal strength). The Medical Research Council sum score acceptably fulfilled Rasch model expectations after rescoring the response options and creating subsets to resolve local dependency and item bias on diagnosis. In conclusion, a modified, Rasch-built four response category Medical Research Council grading system is proposed, resolving clinicians' inability to differentiate among its original response categories and improving clinical applicability. A modified Medical Research Council sum score at the interval level is presented and is recommended for future studies in Guillain-Barré syndrome and chronic inflammatory demyelinating polyradiculoneuropathy.