I didn't get that from the article, but ok. I'm sure that voice is VoIP after it hits the cell tower, but unless phones are redesigned, it's not that kind of data from the phone. On the other hand, it's been decades since I did communication engineering; it might be different now.
No, you are correct. VoLTE (Voice over LTE) for the 4G networks is a pure VoIP type setup, but for UMTS and GSM systems, it works a bit differently. Of course, the actual data will still be carried in either IP packets or ATM cells, but the setup of the call is different.
For UMTS systems (which I am most familiar with), all calls are divided into either the Packet Switched (PS) domain, or the Circuit Switched (CS) domain.
The PS domain is normally "best effort", and is used for all data calls (including any VoIP client running on the UE), and works pretty much as you expect an IP network to. It is possible to set up connections with certain requirements on minimum and average bit rates, but in general, you get what is available, and your allocation can shift over time.
The CS domain, on the other hand, behaves like the old telephony systems, and is primarily used for voice calls. When you set up your call, the resources needed to provide the requested sustained bit rates are allocated from your UE all the way to the receivers UE. If such resources can not be allocated, the call setup will be refused. If the resources are available, you should, theoretically, be able to keep your connection indefinitely, without quality degradation. In reality, of course, there are cases where your call will be dropped or reduce in quality (e.g. if the cell you are in is full, and someone makes an emergency call).