Large sitemap.xml files should always be generated automatically. Typically it’s a CMS or ecommerce system on the backend. Typically there are plugins or modules available that will do it or you can write your own. We would HIGHLY recommend utilizing the <priority> element in the sitemap.xml to prioritize your pages so the engines know which pages are most important to be indexed, second most important to be indexed, third most important to be indexed, etc.
We agree that typically sitemap.xml files are not needed by most sites, they are indeed very helpful for most large sites. Ecommerce sites, for instance, may have hundreds of thousands of products. Those products might be linked to primarily from category pages DEEP in the site (could be 5,6,7, etc.) clicks from the home page.
Each site has both a crawl budget (amount of time and resources that the engine is going to dedicate to find pages on your site). The engines are typically also going to index only a certain number of pages on your site. Both of these are based primarily on the number and quality of backlinks the site has (i.e. PageRank).
So anything you can do to help the engines like Google crawl your site more efficiently (spend your crawl budget crawling pages you think are most important like product pages on ecommerce sites) and to influence which pages get indexed before others (using <priority> in a sitemap.xml) is advantageous for large sites EVEN if the site has great navigation.
Google has lots of limitations on sitemaps. 50,000 URLs per sitemap file. A sitemap index file can contain at most 50,000 sitemap.xml files. We believe the size of the sitemap.xml cannot exceed more than 50MB uncompressed and possible 10MB compressed.
How we can create/generate xml-sitemap file for the same.
Within Categories Options, set Frequency to Daily and Priority to 1.
Within Products Options, set Frequency to Daily and Priority to 0.8 (or anything less than 1 and more than we are about to set the CMS pages to).
Within CMS Pages Options, set Frequency to Weekly and Priority to 0.25.
Within Generation Settings, set Enabled to Yes, Start Time to 01 00 00 (01:00 a.m. or you can use another time that your traffic is at it’s lowest), and Frequency to Daily, and enter your e-mail address into the Error Email Recipient field.