Well, without this speaker thing, the shopkeeper has to either look at their mobile device and verify every transaction, or take the customer's word for it.
I've seen street vendors in NYC that will accept showing my phone with a Zelle confirmation displayed as proof of payment, and another poster says that's pretty common in India as well. This is, of course, dead easy to spoof. It's probably harder to spoof exactly the right tone of synthesized voice, coming from exactly the right point in space, to a sufficient degree to fool the merchant reliably.
Couple this with the ability to side-verify larger transactions, or spot-check any transaction, possible penalties or shame for being caught cheating, and a relatively small percentage of the population who are willing and able to cheat, and it's probably a win for the merchants. And there's some value, as another poster above said, in the whole thing being controlled by the payment network, so both parties to the transaction can more likely than not trust it.
I'm not so sure the payment network should be renting them out, perhaps they should be free to anyone with a merchant account, or one-time purchase, assuming the network makes its money on transaction fees. I don't know the Indian economy, so I can't estimate the value proposition for the merchant and network.