Page History

...

The answer to this question depends heavily on the complexity of the attribute the model should learn, as well as on the diversity of your products and the informativeness of the selected training attributes. It is best to iteratively annotate more and more products iteratively and train new models until the desired performance is reached. There are, however, a couple of things you should keep in mind:

Even if you have annotated thousands of products, if a certain attribute value is assigned to just a few of them, the model would not be able to learn any useful patterns for it.
- In this case, you might consider grouping certain values together or even removing them.
We set a A bare minimum of 10 products per attribute value , any is absolutely needed. Any attribute value with less annotations would simply be ignored by the model.
- If less than 2 values have enough annotations, there would be no use for a model at all – creating one would always fail.

...

Panel

borderWidth	2
title	Which attributes should I select?

You should select only the most informative attributes. These could be:

The ones you used during the annotation process to decide which attribute value should be assigned to each product.
The ones you as a domain expert believe contain useful information regarding the new attribute.

Remember, it is always best to start simple. Adding a training attribute which does not contain any useful information regarding the new attribute would only introduce unnecessary noise in the model input. Hence, the performance would most likely drop instead of improving with the addition of unneeded attributes. We suggest starting It is recommended to start with just a single attribute and add more iteratively adding further ones until the model performance models efficiency stops increasing.

To find the attributes you can use the several filter settings. You can search for

...

In the model performance section, you see three metrics - the model accuracy, the number of matching predictions (same prediction as annotation) and the number of annotated products the model has been trained on. Note that the percentage of the last is determined based on the number of products currently in your product feed and can hence be over 100% if some products were removed.

We try to train the best possible model despite the Despite the potential scarcity of annotated products the best possible model will be trained. Therefore , we first split the annotated products are split in 5 different ways, each containing a training data set of 80% of the products and a test data set of the remaining 20%. We measure the The achieved model accuracy on each of these 5 splits , is measured on both on : the training and the test data sets, and present you the averaged result. The results are presented with averaged values. Then, we train the model is trained on the entire data set containing all product annotations and present you the amount of matches and mismatches finally achieved.

...

Instead, for each attribute value, we consider the corresponding annotated products are considered and measure the percentage of these on which the model prediction was correct is measured. The final accuracy score is then the average over all attribute values. In the scenario above, considering there are 2 possible attribute values, the model would achieve 100% and 0% accuracy on these. This would result in a final accuracy of 50%, which better expresses that the above mentioned model is as well suited for determining the correct attribute value for each product as tossing a coin.

Page tree

Versions Compared

Old Version 10

New Version Current

Key