Data to be translated: Computer scientists evaluated the responses of several large language models to Java coding questions on StackOverflow and found that the code quality of these models still leaves much to be desired. Researchers collected 1208 Java coding questions from StackOverflow, which involved 24 common Java APIs. They then used 4 large language models capable of generating code to provide answers and evaluated these responses using their own developed API checker, RobustAPI. The results showed that the API misuse rates for GPT-3.5 and GPT-4 were 49.83% and 62.09%, respectively. The study suggests that there is a significant gap between the improvement in code generation capabilities of large language models and the reliability and robustness of the code, indicating room for further improvement.