AI generated stock photo library and it's costs
In an AI showcase fair some marketing and IT professionals of a large industrial company presented their usecase of the generative AI. They decided to attack the problem of "tremendous costs" of stock photographs with automated content generation via genAI. In this note we analyze the argument and give guidance on what to pay attention to in such a case.
The enterprise stock photographs serve essentially two purposes. One is to make the story you tell easier to grasp by using visuals and the other is to promote your brand, which is embodied in the style of the stock photographs. Usually large enterprises will maintain their own library of photographs, which was created for the company by professional photographers and which are styled such as to comply with the company brand strategy. The photographs are owned by the company so the intellectual property issue is clear. Moreover, the photographs are selected in such a way, that they provide visuals for broad spectrum of situations. It is also worth noting, that a visual usually has a supporting role in the storytelling and does not need to be very specific, in contrast to say firms current revenue chart. The lifetime of the photographs can be long as well, so one photograph can be used over decades. Each photograph costs, however, on the order of 100 - 500 EUR and the company may require thousands of them. This may look at first glance quite much and may give rise to an attempt to cut those costs.
That very total cost was the argument the presenters brought in as subject to cost cut and their solution was, of course using generative AI. The argument as presented was simple: Generation of one photograph by genAI costs around 0.04 EUR, compared to say 400 EUR for a stock photograph. So the cost cut is the enormous factor of 10.000, or 99.99% in savings! Clearly one has to proceed with it. This is an example of a naive decision, which, unfortunately, occur very often in business. In following we make an attempt to convince you, to always look closer into the problem and the solution.
Let us look very briefly in a way genAI works. First let us make clear, that the term genAI in the context of this note refers to a set of large scale generative models, usually based on so called diffusion procedure. Such models require a tremendous amount of data for training, usually this amount is as large as the amount of publicly accessible (regardless of IP and access protection) image and video data. At the same time the number of model parameters must be large enough for the model to actually be able to encode the information which is contained in that amount of data. Typical model size determines computational requirements at generation time, which are very high. This is the reason, why your company will probably not own such a model and you have to purchase the service form one of the vendors. The model effectively learns the approximate distribution of the data it was trained on and is able to generate samples from this approximate distribution. This means, that the images you get, will be the images your neighbor potentially gets and getting the images to comply to your brand strategy might be difficult. It may be of interest, that the models tend to compress the training data in a particular lossy way, which allows them to generate data samples unseen in training distribution but are also capable to reconstruct the training samples with high probability, which indicate the memorization tendency. The IP issues may therefore be a problem, in fact as of time of this writing there are several law suits going on against major genAI providers. There is also rising evidence, that the cost of the genAI based content generation will actually rise! Having said that, we move on to the usage patterns.
In a traditional stock photograph, the number of photographs is fixed, so the cost can be centrally controlled by the stock manager. It does not matter how many users for how many presentations are trying to use the photographs. The cost of actual photograph creation remains fixed. Compared to this operational model, the proposed genAI solution gives the decision about the selection of the material and the required amount of it to the edge, i.e. to actual users. This makes the decision making about the content selection and its required volume local, as opposed to stock photograph model where it is global. We have here silo vs. global transparency issue. A user may generate essentially the same content as his peers and doing so redundantly introducing inefficiency. This is reinforced by the fact, that the essence of the visual is not apparent and people will tend to add subjective details, which are irrelevant but appear important to the author. Since she has a possibility to include that detail they will do it, producing essentially redundant material, despite it actually looking different (details are irrelevant). The number of generation attempts in such a situation will tend to be large, again due to the locally perceived necessity, the possibility and locally low cost. As discussed initially, the cost saving factor for genAI solution was around 10.000. Considering the arguments above and assuming 1000 users attempt to generate essentially the same motif using on average ten image generation calls will hit the same spend of 400 EUR as a motif from a stock photograph model. Other example may be 100 users per year do ten attempts per motif over ten years. This fact, together with quality and legal issues should be considered while making the decision for quick win and apparent 99.99% cost reduction.