PRE2019 4 Group2: Difference between revisions
TUe\20167014 (talk | contribs) (→Models) |
|||
(167 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
= DeepWeed, a weed/crop classification neural network = | |||
Leighton van Gellecom, Hilde van Esch, Timon Heuwekemeijer, Karla Gloudemans, Tom van Leeuwen | |||
Line 28: | Line 30: | ||
Spraying pesticides preventively reduces food quality and poses the problem of environmental pollution (Tang, J., Chen, X., Miao, R., & Wang, D.,2016). The users of the software for weed detection would not only be the sustainable farmers, but also indirectly the consumers of farming products, as it poses an influence on their food and environment. | Spraying pesticides preventively reduces food quality and poses the problem of environmental pollution (Tang, J., Chen, X., Miao, R., & Wang, D.,2016). The users of the software for weed detection would not only be the sustainable farmers, but also indirectly the consumers of farming products, as it poses an influence on their food and environment. | ||
This research is in cooperation with CSSF. In line with their advice, we will focus on the type of agroforestry farming where both crops and trees grow, in strips on the land. To test the functionality of our design, we will be working in cooperation with farmer | This research is in cooperation with CSSF. In line with their advice, we will focus on the type of agroforestry farming where both crops and trees grow, in strips on the land. To test the functionality of our design, we will be working in cooperation with farmer John Heesakkers, who has shifted from livestock farming towards this form of agroforestry recently. Therefore, his case will be our role model to design the system. | ||
'''System requirements''' | '''System requirements''' | ||
Since the approach and views of sustainable farmers may differ, one of the requirements of the system is that it is flexible in its views what may be concerned as weeds, and what as useful plants (Perrins, Williamson, Fitter, 1992). It should thus be able to distinguish multiple plants instead of merely classifying weeds/non-weeds. Based on user feedback, the following list of plant types should be recognised as weeds: [https://en.wikipedia.org/wiki/Atriplex Atriplex], [https://en.wikipedia.org/wiki/Capsella_bursa-pastoris Shepherd's purse ], [https://en.wikipedia.org/wiki/Persicaria_maculosa Redshank], [https://en.wikipedia.org/wiki/Stellaria_media Chickweed], [https://en.wikipedia.org/wiki/Lamium_purpureum Red Dead-Nettle], [https://en.wikipedia.org/wiki/Chenopodium_album Goosefoot], [https://en.wikipedia.org/wiki/Cirsium_arvense Creeping Thistle] and [https://en.wikipedia.org/wiki/Rumex_obtusifolius Bitter Dock]. Furthermore, regarding the set-up of agroforestry, the system should be able to deal with different kinds of plants in a small region, thus it should be able to recognise multiple plants in one image. It also means that the plant types include trees, making which set the maximum height and breadth of the plants. The non-weedsare expected to be recognised when (nearly) fully grown, as young plants are very hard to distinguish. However, weeds should be removed as soon as possible and in every growth stage. Next, the accuracy of the system should be as close as possible to 100%, however realistically an accuracy of at least 95% should be achieved. The system should | Since the approach and views of sustainable farmers may differ, one of the requirements of the system is that it is flexible in its views on what may be concerned as weeds, and what as useful plants (Perrins, Williamson, Fitter, 1992). It should thus be able to distinguish multiple plants instead of merely classifying weeds/non-weeds. Based on user feedback, the following list of plant types should be recognised as weeds: [https://en.wikipedia.org/wiki/Atriplex Atriplex], [https://en.wikipedia.org/wiki/Capsella_bursa-pastoris Shepherd's purse ], [https://en.wikipedia.org/wiki/Persicaria_maculosa Redshank], [https://en.wikipedia.org/wiki/Stellaria_media Chickweed], [https://en.wikipedia.org/wiki/Lamium_purpureum Red Dead-Nettle], [https://en.wikipedia.org/wiki/Chenopodium_album Goosefoot], [https://en.wikipedia.org/wiki/Cirsium_arvense Creeping Thistle] and [https://en.wikipedia.org/wiki/Rumex_obtusifolius Bitter Dock]. Furthermore, regarding the set-up of agroforestry, the system should be able to deal with different kinds of plants in a small region, thus it should be able to recognise multiple plants in one image. It also means that the plant types include trees, making which set the maximum height and breadth of the plants. The non-weedsare expected to be recognised when (nearly) fully grown, as young plants are very hard to distinguish. However, weeds should be removed as soon as possible and in every growth stage. Next, the accuracy of the system should be as close as possible to 100%, however realistically an accuracy of at least 95% should be achieved. The system should not recognize a non-weed as a weed, because this will lead to harm or destruction of the value of that plant. Lastly, based on constraints on both the training/testing and possible implementation, the neural network should be as efficient and compact as possible, so that it can classify plant images real-time. The following will give a rough estimation of the upper bound for the processing time. Given a speed of 3.6 km/h and a processed image every meter and maximally two cameras are used for detection, than the upper bound of the processing time is 500 milliseconds per image. If the system performs the classification more quickly, than the frequency of taking pictures could be increased, the movement speed could be increased or the combination of these improvements could happen. Moreover, farming equipment is getting increasingly expensive and therefore they are a pray to theft. The design should minimize the attractiveness of stealing the system. This yields the following concrete list of system requirements: | ||
<ol> | <ol> | ||
<li> The system should be flexible in its views what may be concerned as weeds. </li> | <li> The system should be flexible in its views on what may be concerned as weeds. </li> | ||
<li> The system should be able to distinguish the following types of weeds: [https://en.wikipedia.org/wiki/Atriplex Atriplex], [https://en.wikipedia.org/wiki/Capsella_bursa-pastoris Shepherd's purse ], [https://en.wikipedia.org/wiki/Persicaria_maculosa Redshank], [https://en.wikipedia.org/wiki/Stellaria_media Chickweed], [https://en.wikipedia.org/wiki/Lamium_purpureum Red Dead-Nettle], [https://en.wikipedia.org/wiki/Chenopodium_album Goosefoot], [https://en.wikipedia.org/wiki/Cirsium_arvense Creeping Thistle] and [https://en.wikipedia.org/wiki/Rumex_obtusifolius Bitter Dock] </li> | <li> The system should be able to distinguish the following types of weeds: [https://en.wikipedia.org/wiki/Atriplex Atriplex], [https://en.wikipedia.org/wiki/Capsella_bursa-pastoris Shepherd's purse ], [https://en.wikipedia.org/wiki/Persicaria_maculosa Redshank], [https://en.wikipedia.org/wiki/Stellaria_media Chickweed], [https://en.wikipedia.org/wiki/Lamium_purpureum Red Dead-Nettle], [https://en.wikipedia.org/wiki/Chenopodium_album Goosefoot], [https://en.wikipedia.org/wiki/Cirsium_arvense Creeping Thistle] and [https://en.wikipedia.org/wiki/Rumex_obtusifolius Bitter Dock] </li> | ||
<li> The system should be able to recognize multiple plants in one image.</li> | <li> The system should be able to recognize multiple plants in one image.</li> | ||
Line 41: | Line 43: | ||
<li> The classification accuracy of weeds versus non-weeds is preferably above 95%.</li> | <li> The classification accuracy of weeds versus non-weeds is preferably above 95%.</li> | ||
<li> The system should ideally be able to have no false positive classifications. </li> | <li> The system should ideally be able to have no false positive classifications. </li> | ||
<li> The system should be able to work under varying | <li> The system should be able to work under varying lighting conditions, but under the restriction of daytime.</li> | ||
<li> Preferably the system should work well under varying weather conditions, such as heat and rain.</li> | <li> Preferably the system should work well under varying weather conditions, such as heat and rain.</li> | ||
<li> The processing time of a single image should be real-time, that is in under 500 milliseconds.</li> | <li> The processing time of a single image should be real-time, that is in under 500 milliseconds.</li> | ||
Line 85: | Line 87: | ||
- Additional tools required for weeding | - Additional tools required for weeding | ||
From the comparison between the costs of the weeding robot and the costs of traditional weeding it turned out that the weeding robot will not be profitable to the farmer. This does not necessarily mean that the weeding robot will not be beneficial to the farmer though. | - Costs of mechanical damage to wanted plants | ||
From the comparison between the costs of the weeding robot and the costs of traditional weeding it turned out that the weeding robot will not be profitable to the farmer. This does not necessarily mean that the weeding robot will not be beneficial to the farmer though. The costs of the weeding robot per year were calculated to be around €250.000, as opposed to €50.000,- per year for traditional weeding. This calculation was based on the farm of John Heesakkers, with the estimation of 5 required robots for a farm of that size. A more detailed calculation can be found in the [[#Appendix|appendix]]. This would mean there is a 1:5 ratio in costs for traditional weeding versus weeding with the robot. There are still opportunities for the future of the robot though. | |||
As CSSF also mentioned, the costs of the robot will probably decline after the first production phase. Production can be made more efficient, and as happens with all new technologies, costs will decline after a while. Of course, this still means that the robot is very expensive in the beginning phase. This could be solved by the possibility of subsidizing the robot. This would be a plausible possibility, since the robot would support the cause of agroforestry, which is better for agriculture, as explained before. It is known to many people that change needs to happen in the field of agriculture with the high need for food with little space, especially in the Netherlands. Agroforestry seems promising for these issues, and thus it would be likely to be able to receive subsidies for a robot which would enable farmers to convert to agroforestry. | As CSSF also mentioned, the costs of the robot will probably decline after the first production phase. Production can be made more efficient, and as happens with all new technologies, costs will decline after a while. Of course, this still means that the robot is very expensive in the beginning phase. This could be solved by the possibility of subsidizing the robot. This would be a plausible possibility, since the robot would support the cause of agroforestry, which is better for agriculture, as explained before. It is known to many people that change needs to happen in the field of agriculture with the high need for food with little space, especially in the Netherlands. Agroforestry seems promising for these issues, and thus it would be likely to be able to receive subsidies for a robot which would enable farmers to convert to agroforestry. | ||
In addition, a large part of the difference between the costs of the robot and the costs of traditional weeding are caused by the current error rate of the weeding robot. Using the performance of the current system, the percentage of wanted plants that will be damaged by the robot is estimated to be at least 2%. Since John Heesakkers indicated that the mechanical damage of traditional weeding is very limited, the damage percentage of traditional weeding is estimated at 0.5%. This provides opportunities for cost reduction of the robot. If the performance of the weed recognition system would be further improved such that the costs of damage to plants would be equal or even lower to those in traditional weeding, the weeding robot could become profitable. Ways to improve the system are further elaborated below in the chapter on “Further research and developments”. | |||
From this, it can be concluded that while the robot may not necessarily be profitable for farmers at first compared to traditional weeding, there are still opportunities. In the first years, the difference in the costs and profits can be overcome using subsidies. In that period, attention should be given to a business plan which enables reduction of costs of the robots, due to new developments in technology, reduction in material costs, and good marketing such that R&D costs can be divided over a larger number of products. | From this, it can be concluded that while the robot may not necessarily be profitable for farmers at first compared to traditional weeding, there are still opportunities. In the first years, the difference in the costs and profits can be overcome using subsidies. In that period, attention should be given to a business plan which enables reduction of costs of the robots, due to new developments in technology, reduction in material costs, and good marketing such that R&D costs can be divided over a larger number of products. | ||
There are also other motivations that could play a role for farmers in the purchase of a weeding robot. While profit seems most obvious, it is not necessarily the main motive. As CSSF explained, for many farmers it is acceptable if the robot | There are also other motivations that could play a role for farmers in the purchase of a weeding robot. While profit seems most obvious, it is not necessarily the main motive. As CSSF explained, for many farmers it is acceptable if the robot costs more than traditional weeding, if that means they can contribute in this way to the development. There are other motives for purchasing the weeding robot. Farmer John Heesakkers explained his scenario: “Weeding by hand is not pleasurable work to do. Also, I have to search for workers in the months that the weeds grow really fast, because I cannot keep up with the work on my own anymore.”. | ||
Agroforestry is a relatively new form of farming. As explained earlier, it might prove very effective in different ways: it is better for the ground since the issue of soil depletion is reduced, the harvest is less vulnerable to plagues and diseases, and it improves biodiversity. However, to make this way of agriculture possible, new developments are required. Weeding becomes more difficult, since multiple kinds of crops are growing close to each other. Where it used to be enough to distinguish the only crop on the soil from weeds, it now takes more knowledge to be able to do weeding since multiple kinds of plants need to be distinguished. In addition, it is more difficult to weed between the rows of planted crops, since the rows are not necessarily straight, and sometimes even overlapping. Farmer John explained: "In the months of spring, when the weeds grow really fast, it would take two people weeding full-time to keep up with the weeds". Weeding is quite heavy work, and even requires quite some knowledge of plants in agroforestry. It is difficult to find people capable and willing to do this work. | Agroforestry is a relatively new form of farming. As explained earlier, it might prove very effective in different ways: it is better for the ground since the issue of soil depletion is reduced, the harvest is less vulnerable to plagues and diseases, and it improves biodiversity. However, to make this way of agriculture possible, new developments are required. Weeding becomes more difficult, since multiple kinds of crops are growing close to each other. Where it used to be enough to distinguish the only crop on the soil from weeds, it now takes more knowledge to be able to do weeding since multiple kinds of plants need to be distinguished. In addition, it is more difficult to weed between the rows of planted crops, since the rows are not necessarily straight, and sometimes even overlapping. Farmer John explained: "In the months of spring, when the weeds grow really fast, it would take two people weeding full-time to keep up with the weeds". Weeding is quite heavy work, and even requires quite some knowledge of plants in agroforestry. It is difficult to find people capable and willing to do this work. | ||
Line 99: | Line 105: | ||
'''Relating results to user needs''' | '''Relating results to user needs''' | ||
The end result of the project is a | The end result of the project is a computer vision tool with a certain accuracy and confusion matrix. From the confusion matrix follow the number of true positives, false positives, true negatives and false negatives. To determine whether these values are acceptable, the user’s needs are of importance. The project is focused on two users, CSSF and John Heesakkers. The needs of the two users overlap but are not the same. Therefore both needs are discussed separately. | ||
For John Heesakkers running his farm efficiently is important. Ideally, the weeding robot removes all weeds and does not damage the crops. This means a network with an accuracy of 100% and zero false positives or negatives. Since the | For John Heesakkers running his farm efficiently is important. Ideally, the weeding robot removes all weeds and does not damage the crops. This means a network with an accuracy of 100% and zero false positives or negatives. Since the model will have false positives and negatives, it is important to define the boundaries of these numbers. These boundaries are indicated by Mr. Heesakkers. The robot has to remove 80% of all the weeds in one try. He equals the maximum of damaged crops to 2-3% of all crops. Both percentages are determined intuitively. The percentages indicate that the priority of not damaging crops lies higher than removing weeds. It is more important to reduce the number of false positives than to increase the accuracy. This follows from the preferences of the user. | ||
Training the | Training the model with two categories means creating a network that makes a distinction between weeds and non-weeds. All the data of the crops is merged into one category and the same goes for the data of the weeds. The model has to give a positive result when it detects a weed and a negative result when it detects a non-weed. The number of false positives gives an indication of the percentage of crops that will be damaged by the robot. The percentage of damaged crops has to be lower than 3% in total. The percentage of false positives has to be even lower than 3%. The robot will pass the crops multiple times while weeding. If the robot damages 3% of the crops each time, the total percentage will be much higher. For example, damaging 0.5% of the crops each time the robot passes all plants means 3% will be damaged after six times. Weeds can grow from February through October. Weeding every other week means the robot will pass the crops 18 times. In order to only damage 3% in total, 0.17% can be damaged each time the robot passes all plants. The percentage will be even lower if errors in other parts of the robot are taken into account. The accuracy gives an indication of the weeds that are not removed. The robot has to remove 80% of all weeds in one try. The percentage of the accuracy has to be significantly higher than 80%. Errors in other parts of the robot is the reason why. Besides detecting the weed, the position has to be determined and the action of actual removing the weed has to take place. Each step has a certain error, which can cause the weed not to be removed or accidently damaging a crop. Since these steps are not worked out yet, an assumption has to be made. The weeding robot of Raja et al. (2020) had a gap of 15% between software accuracy and actual accuracy. Using this as a guideline, the accuracy has to be at least 95%. | ||
Training the | Training the model with eleven categories means creating a network that recognizes types of plants. It can recognize the different types crops and weeds. Even though the farmer is only interested in the distinction between weeds and non-weeds, working with multiple categories is useful. It shows the capabilities of the network. Also, the software is useful for a different farmer who wants to keep certain weeds. From the network the ratio of false positives for non-weeds has to be extracted. This ratio is used to give an indication of the percentage damaged crops. The accuracy of this network can be determined in two different ways. The overall accuracy of all the categories says something about the ability to recognize a plant species. The accuracy of only the weed plant species indicates the ability to recognize weeds. It is difficult to determine the exact accuracy of detecting weeds with only the overall accuracy. The degree of influence of the non-weeds is unknown. However, the overall accuracy is related to the accuracy of weed detection and therefore has to be above than 95%. | ||
CSSF does not indicate desired percentages like Mr. Heesakkers. They are interested in proof of concept. Instead of reaching a specific percentage of false | CSSF does not indicate desired percentages like Mr. Heesakkers. They are interested in proof of concept. Instead of reaching a specific accuracy or percentage of false positives, gaining technical insights about the network is important to them. Their goal is to create an autonomous weeding and harvesting robot with the help of multiple projects of different scales. This robot should work with a network which has a high accuracy, close to no false positives and gives negative as result in case of doubt. This project is a step towards the goal of CSSF. If the results do not reach the desired goal of Mr. Heesakkers, the project is still useful to CSSF. They have gained insights on what to do and not to do during the following projects. The final report including proof of concept and steps that have to be taken to create the final network is the result that meets the needs of CSSF. | ||
== Approach and Milestones == | == Approach and Milestones == | ||
Line 113: | Line 119: | ||
---- | ---- | ||
The main challenge is the ability to distinguish undesired (weeds) and desired (crops) plants. Previous attempts (Su et al., 2020)(Raja et al., 2020) have utilised chemicals to mark plants as a measurable classification method, and other attempts only try to distinguish a single type of crop. In sustainable farming based on biodiversity, a large variety of crops are grown at the same time, meaning that it is extremely important for automatic weed detection software to be able to recognise many different crops as well. To achieve this, the first main objective is collecting data, and determines which plants can be recognised. The data should be colour images of many species of plants, of an as high as possible quality, meaning that it should be of high resolution, in focus and with good lighting. Species that do not have enough images will be removed. Next, using the gathered data, the next main objective will be training and testing | The main challenge is the ability to distinguish undesired (weeds) and desired (crops) plants. Previous attempts (Su et al., 2020)(Raja et al., 2020) have utilised chemicals to mark plants as a measurable classification method, and other attempts only try to distinguish a single type of crop. In sustainable farming based on biodiversity, a large variety of crops are grown at the same time, meaning that it is extremely important for automatic weed detection software to be able to recognise many different crops as well. To achieve this, the first main objective is collecting data, and determines which plants can be recognised. The data should be colour images of many species of plants, of an as high as possible quality, meaning that it should be of high resolution, in focus and with good lighting. Species that do not have enough images will be removed. Next, using the gathered data, the next main objective will be training and testing neural networks with varying architectures. The architectures can range from very simple networks with one hidden layer to using pre-existing networks, such as ResNet (He et al., 2015) trained on datasets such as ImageNet (Russakovsky et al., 2015). Then, weeds will be defined as a species of plant that is not desired, or not recognised. Based on this, the final objective will be testing the best neural network(s) using new images from a farm, to see its accuracy in a real environment. | ||
To summarize: <br /> | To summarize: <br /> | ||
<ol> | <ol> | ||
<li>Images of plants will be collected for training.</li> | <li>Images of plants will be collected for training.</li> | ||
<li> | <li>Neural networks will be trained to recognise plants and weeds.</li> | ||
<li>The best | <li>The best neural networks will be tested in real situations.</li> | ||
</ol> | </ol> | ||
Line 126: | Line 132: | ||
---- | ---- | ||
The main deliverable will be a | The main deliverable will be a neural network that is trained to distinguish desired plants and undesired plants on a diverse farm, that is as accurate as possible, and can recognise as many different species as possible. The performance of this neural network, as well as the explored architectures and encountered problems will be described in this wiki, which is the second part of the deliverables. | ||
== Planning == | == Planning == | ||
Line 173: | Line 177: | ||
<td>Specify requirements</td> | <td>Specify requirements</td> | ||
<td>Tom</td> | <td>Tom</td> | ||
<td> | <td>Yes</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Make an informed choice for | <td>Make an informed choice for model</td> | ||
<td>Leighton</td> | <td>Leighton</td> | ||
<td></td> | <td>Yes</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Read up on (Python) neural networks</td> | <td>Read up on (Python) neural networks</td> | ||
<td>Everyone</td> | <td>Everyone</td> | ||
<td></td> | <td>Yes</td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
Line 200: | Line 204: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Have a training | <td>Have a training dataset</td> | ||
<td>Karla</td> | <td>Karla</td> | ||
<td></td> | <td>Yes</td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
Line 215: | Line 219: | ||
<tr> | <tr> | ||
<td>Implement basic neural network structure</td> | <td>Implement basic neural network structure</td> | ||
<td></td> | <td>Timon, Leighton</td> | ||
<td></td> | <td>Yes</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Justify all design choices on the wiki</td> | <td>Justify all design choices on the wiki</td> | ||
<td></td> | <td>Tom</td> | ||
<td></td> | <td>Yes</td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
Line 234: | Line 238: | ||
<tr> | <tr> | ||
<td>Implement a working neural network</td> | <td>Implement a working neural network</td> | ||
<td></td> | <td>Timon, Leighton</td> | ||
<td></td> | <td>Yes</td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
Line 248: | Line 252: | ||
<tr> | <tr> | ||
<td>Explain our process of tweaking the hyperparameters</td> | <td>Explain our process of tweaking the hyperparameters</td> | ||
<td></td> | <td>Timon</td> | ||
<td></td> | <td>Yes</td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
Line 262: | Line 266: | ||
<tr> | <tr> | ||
<td>Finish tweaking the hyperparameters and collect results</td> | <td>Finish tweaking the hyperparameters and collect results</td> | ||
<td></td> | <td>Timon, Leighton, Tom</td> | ||
<td></td> | <td>Yes</td> | ||
</tr> | |||
<tr> | |||
<td>Finding the costs and benefits of the weeding robot</td> | |||
<td>Hilde</td> | |||
<td>Yes</td> | |||
</tr> | </tr> | ||
</table> | </table> | ||
Line 277: | Line 286: | ||
<td>Create the final presentation</td> | <td>Create the final presentation</td> | ||
<td>Everyone</td> | <td>Everyone</td> | ||
<td></td> | <td>Yes</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Hand in peer review</td> | <td>Hand in peer review</td> | ||
<td>Everyone</td> | <td>Everyone</td> | ||
<td></td> | <td>Yes</td> | ||
</tr> | |||
<tr> | |||
<td>Translating the results and costs to the user needs</td> | |||
<td>Hilde and Karla</td> | |||
<td>Yes</td> | |||
</tr> | |||
<tr> | |||
<td>Determining future research</td> | |||
<td>Hilde and Karla</td> | |||
<td>Yes</td> | |||
</tr> | |||
<tr> | |||
<td>Writing prototype section </td> | |||
<td>Leighton, Timon, Tom</td> | |||
<td>Yes</td> | |||
</tr> | </tr> | ||
</table> | </table> | ||
Line 296: | Line 320: | ||
<td>Do the final presentation</td> | <td>Do the final presentation</td> | ||
<td>Everyone</td> | <td>Everyone</td> | ||
<td></td> | <td>Yes</td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
Line 314: | Line 338: | ||
Most weed recognition and detection systems designed up to now are specifically designed for a sole purpose or context. Plants are generally considered weeds when they either compete with the crops or are harmful to livestock. Weeds are traditionally mostly battled using pesticides, but this diminishes the quality of the crops. The Broad-leaved dock weed plant is one of the most common grassland weeds, and Kounalakis et al. (2018) aim to create a general weed recognition system for this weed. The system designed relied on images and feature extraction, instead of the classical choice for neural networks. It had a 89% accuracy. | Most weed recognition and detection systems designed up to now are specifically designed for a sole purpose or context. Plants are generally considered weeds when they either compete with the crops or are harmful to livestock. Weeds are traditionally mostly battled using pesticides, but this diminishes the quality of the crops. The Broad-leaved dock weed plant is one of the most common grassland weeds, and Kounalakis et al. (2018) aim to create a general weed recognition system for this weed. The system designed relied on images and feature extraction, instead of the classical choice for neural networks. It had a 89% accuracy. | ||
Salman et al. (2017) researched a method to classify plants based on 15 features of their leaves. This yielded a 85% accuracy for classification of 22 species with a training | Salman et al. (2017) researched a method to classify plants based on 15 features of their leaves. This yielded a 85% accuracy for classification of 22 species with a training dataset of 660 images. The algorithm was based on feature extraction, with the help of the Canny Edge Detector and SVM Classifier. | ||
Li et al. (2020) have compared multiple convolutional neural networks for recognizing crop pests. The used dataset consisted of 5629 images and was manually collected. They found that GoogLeNet outperformed VGG-16, VGG-19, ResNet50 and ResNet152 in terms of accuracy, robustness and model complexity. As input RGB images were used and in the future infrared images are also an option. | Li et al. (2020) have compared multiple convolutional neural networks for recognizing crop pests. The used dataset consisted of 5629 images and was manually collected. They found that GoogLeNet outperformed VGG-16, VGG-19, ResNet50 and ResNet152 in terms of accuracy, robustness and model complexity. As input RGB images were used and in the future infrared images are also an option. | ||
Line 320: | Line 344: | ||
Riehle et al. (2020) give a novel algorithm that can be used for plant/background segmentation in RGB images, which is a key component in digital image analysis dealing with plants. The algorithm has shown to work in spite of over- or underexposure of the camera, as well as with varying colours of the crops and background. The algorithm is index-based, and has shown to be more accurate and robust than other index-based approaches. The algorithm has an accuracy of 97.4% and was tested with 200 images. | Riehle et al. (2020) give a novel algorithm that can be used for plant/background segmentation in RGB images, which is a key component in digital image analysis dealing with plants. The algorithm has shown to work in spite of over- or underexposure of the camera, as well as with varying colours of the crops and background. The algorithm is index-based, and has shown to be more accurate and robust than other index-based approaches. The algorithm has an accuracy of 97.4% and was tested with 200 images. | ||
Dos Santos Ferreira et al. (2017) created data by taking pictures with a drone at a height of 4 meters above ground level. The approach used convolutional neural networks. The results achieved high accuracy in discriminating different types of weeds. In comparison to traditional neural networks and support vector machines deep learning has the key factor that features extraction is automatically learned from raw data. Thus it requires little by hand effort. Convolutional neural networks have been proven to be successful in image recognition. For image segmentation the simple linear iterative clustering algorithm (SLIC) is used, which is based upon the k-means centroid based clustering algorithm. The goal was to separate the image into segments that contain multiple leaves of soy or weeds. Important is that the pictures have a high resolution of 4000 by 3000 pixels. Segmentation was significantly influenced by | Dos Santos Ferreira et al. (2017) created data by taking pictures with a drone at a height of 4 meters above ground level. The approach used convolutional neural networks. The results achieved high accuracy in discriminating different types of weeds. In comparison to traditional neural networks and support vector machines deep learning has the key factor that features extraction is automatically learned from raw data. Thus it requires little by hand effort. Convolutional neural networks have been proven to be successful in image recognition. For image segmentation the simple linear iterative clustering algorithm (SLIC) is used, which is based upon the k-means centroid based clustering algorithm. The goal was to separate the image into segments that contain multiple leaves of soy or weeds. Important is that the pictures have a high resolution of 4000 by 3000 pixels. Segmentation was significantly influenced by lighting conditions. The convolutional neural network consists of 8 layers, 5 convolutional layers and 3 fully connected layers. The last layer uses SoftMax to produce the probability distribution. ReLU was used for the output of the fully connected layers and the convolutional layers. The classification of the segments was done with high robustness and had superior results to other approaches such as random forests and support vector machines. If a threshold of 0.98 is set to than 96.3% of the images are classified correctly and none received incorrect identification. | ||
Yu et al. (2019) argued that the deep convolutional neural networks (DCNN) takes much time in training (hours), and little time in classification (under a second). The researchers compared different existing DCNN for weed detection in perennial ryegrass and detection between different weeds too. Due to the recency of the paper and the comparison across different approaches it is a good estimation of the current state of the art. The best results seem to be > 0.98. It also shows weed detection in perennial ryegrass, so not perfectly aligned crops. However, only the distinction between the ryegrass or weeds is made. For robotics applications in agroforestry, different plants should be discriminated from different weeds. | Yu et al. (2019) argued that the deep convolutional neural networks (DCNN) takes much time in training (hours), and little time in classification (under a second). The researchers compared different existing DCNN for weed detection in perennial ryegrass and detection between different weeds too. Due to the recency of the paper and the comparison across different approaches it is a good estimation of the current state of the art. The best results seem to be > 0.98. It also shows weed detection in perennial ryegrass, so not perfectly aligned crops. However, only the distinction between the ryegrass or weeds is made. For robotics applications in agroforestry, different plants should be discriminated from different weeds. | ||
Line 354: | Line 378: | ||
'''Robot''' | '''Robot''' | ||
The software will depend on the robot design, therefore this section briefly elaborates on such a design. To keep costs as low as possible it is wise to create robots that can do multiple tasks, thus not only weeding. It will be assumed that such a general-purpose robot possesses a RGB-camera, to take pictures and/or videos. It is assumed that the robot will take the form of a land-vehicle instead of an unmanned aerial-vehicle (UAV). This decision is made because for the specific context an UAV would not be appropriate. In a latter stage of the farm, the trees would pose serious restrictions to the flying path. The trees will form obstacles, which the drone will have to avoid. Moreover weeds or bushes grow on the ground, so it could be that the tree blocks the line of sight to such weeds. It could be that the UAV therefore has to constantly adapt the flying height, which would yield inconsistency in the gathered data and presumably negatively affect the classification accuracy. Because of these reasons it has been decided to focus on a land-vehicle. This distinction is important as it influences the type of data that is gathered, and thus | The software will depend on the robot design, therefore this section briefly elaborates on such a design. To keep costs as low as possible it is wise to create robots that can do multiple tasks, thus not only weeding. It will be assumed that such a general-purpose robot possesses a RGB-camera, to take pictures and/or videos. It is assumed that the robot will take the form of a land-vehicle instead of an unmanned aerial-vehicle (UAV). This decision is made because for the specific context an UAV would not be appropriate. In a latter stage of the farm, the trees would pose serious restrictions to the flying path. The trees will form obstacles, which the drone will have to avoid. Moreover weeds or bushes grow on the ground, so it could be that the tree blocks the line of sight to such weeds. It could be that the UAV therefore has to constantly adapt the flying height, which would yield inconsistency in the gathered data and presumably negatively affect the classification accuracy. Because of these reasons it has been decided to focus on a land-vehicle. This distinction is important as it influences the type of data that is gathered, and thus the type of data the software should be designed for. Therefore, the data gathered consists of pictures from the side, slightly above the ground. | ||
Moreover, such a robot would need a particular speed to cover the farm by itself. Weeds grow and within 2 or 3 days they are clearly visible and easily removable. Because the weeding task will be only one of the tasks of such a robot it will be important that the classification can be done quickly. It is clear that it should be able to cover the farm in under 2 days. Another important factor which influences the available time is | Moreover, such a robot would need a particular speed to cover the farm by itself. Weeds grow and within 2 or 3 days they are clearly visible and easily removable. Because the weeding task will be only one of the tasks of such a robot it will be important that the classification can be done quickly. It is clear that it should be able to cover the farm in under 2 days. Another important factor which influences the available time is lighting. For now it is assumed that it can only work when there is natural light, so created by the sun. Thus it can only work with daylight, for which the duration in various parts of the world might differ according to the time of year. All these factors combined argue for the need of quick identification. | ||
To adhere to the requirement that the robot should be as less as possible the target of theft it should be able to be kept away when it is not working. | To adhere to the requirement that the robot should be as less as possible the target of theft it should be able to be kept away when it is not working. In addition, the value of the robot should be minimized whenever possible. Hardware necessary to do image processing will be rather expensive, therefore it is more convenient to process the images off the vehicle, for example by cloud computation. Moreover this would also minimize the maintenance and power needs of the robot. On the other hand it does need a stable and fast Internet connection, with the arrival of 5G this should be possible. | ||
Line 367: | Line 391: | ||
Worth noting is that not all of these classifiers were trained using the actual input image. Some researchers choose to first segment the image in different regions and feeding those segments for classification. Dos Santos Ferreira et al. (2017) used SLIC for segmentation of images, which is based upon the k-means algorithm. More recently, Riehle et al. (2020) were able to distinguish plants from the background with 98% accuracy using segmentation. The importance of segmentation is that by using it the position of the weed can be derived, which is of course crucial if the weed has to be removed. | Worth noting is that not all of these classifiers were trained using the actual input image. Some researchers choose to first segment the image in different regions and feeding those segments for classification. Dos Santos Ferreira et al. (2017) used SLIC for segmentation of images, which is based upon the k-means algorithm. More recently, Riehle et al. (2020) were able to distinguish plants from the background with 98% accuracy using segmentation. The importance of segmentation is that by using it the position of the weed can be derived, which is of course crucial if the weed has to be removed. | ||
Kounalakis et al. (2018) achieved 89% classification accuracy with the SVM approach to recognize weeds and Salman et al. (2017) achieved an 85% accuracy using the same approach for leaf classification and identification. Gašparović et al. (2020) have achieved an 89% accuracy recognizing weeds using the random forests approach. Notable is that the researchers have implemented 4 different algorithms for the random forests approach and that the accuracy result is from the best implementation. Tang et al. (2016) found an accuracy of 89% for an ordinary neural network with the backpropagation algorithm. Li et al. (2020) achieved an accuracy of 98% recognizing crop pests using a CNN. Yu et al. (2019) found an accuracy larger than 98% recognizing weeds in perennial ryegrass using a CNN. Espejo-Garcia et al. (2020) used a CNN with transfer learning and evaluated different models. Taking the best model (with a SVM for transfer learning) they achieved a 99% accuracy. Comparing these numbers it is clear that the CNN generally achieves the best result. However, it must be taken into account that these classifiers have all been trained on different | Kounalakis et al. (2018) achieved 89% classification accuracy with the SVM approach to recognize weeds and Salman et al. (2017) achieved an 85% accuracy using the same approach for leaf classification and identification. Gašparović et al. (2020) have achieved an 89% accuracy recognizing weeds using the random forests approach. Notable is that the researchers have implemented 4 different algorithms for the random forests approach and that the accuracy result is from the best implementation. Tang et al. (2016) found an accuracy of 89% for an ordinary neural network with the backpropagation algorithm. Li et al. (2020) achieved an accuracy of 98% recognizing crop pests using a CNN. Yu et al. (2019) found an accuracy larger than 98% recognizing weeds in perennial ryegrass using a CNN. Espejo-Garcia et al. (2020) used a CNN with transfer learning and evaluated different models. Taking the best model (with a SVM for transfer learning) they achieved a 99% accuracy. Comparing these numbers it is clear that the CNN generally achieves the best result. However, it must be taken into account that these classifiers have all been trained on different datasets and therefore comparing these numbers cannot fully argue for which approach is actually the best. | ||
Dos Santos Ferreira et al. (2017) tried to compare their CNN to a SVM, adaboost and a random forest. The CNN outperformed the other approaches in terms of classification accuracy. Since all approaches were tested on the same | Dos Santos Ferreira et al. (2017) tried to compare their CNN to a SVM, adaboost and a random forest. The CNN outperformed the other approaches in terms of classification accuracy. Since all approaches were tested on the same dataset we can argue that CNN’s seem most appropriate to achieve a high classification accuracy. Now in this particular context false positives weigh more heavily than false negatives in weed identification, because the false negatives could be solved if the robot goes by the same plant more than once. However, removing a crop due to falsely identifying it as a weed could have larger negative effects if the robot passes the crops relatively frequently. Dos Santos Ferreira et al. (2017) also found an important property of their CNN. When setting a threshold in determining classification they were able to achieve an 96.3% accuracy, with no false positives. The researchers also noted that using deep neural networks removes the tedious task of feature extraction, because the features are automatically learned from the raw data. This might enlarge the CNN’s generalizability. | ||
To further argue for the use of a convolutional neural network two other factors should be evaluated, namely; time taken for classification and it ability to use this approach for land-vehicles. Yu et al. (2019) state that these deep convolutional networks (DCNN) take much time in training (hours), whereas classification is done in little time (under a second). Booij et al. (2020) made a driving robot that had an identification with 96% accuracy and it could drive up to 4 km/h. Notable is that the researchers were able to use 5G and cloud computing, which might be crucial for real-time identification. Moreover, Raja et al. (2020) have made a weeding robot with a crop detection accuracy of 97.8%. The land-vehicle was able to move up to a speed of 3.2 km/h. However, there is still quite a gap between the detection accuracy and the 83% of weeds removed in the controlled setting where it was tested. However these researches confirm the possibility of a land-vehicle. Lastly, implicitly it is proven that a CNN is suitable for agriculture. This implicit prove is done by noting that the researches named above all focus on agriculture. But also explicitly it is argued that | To further argue for the use of a convolutional neural network two other factors should be evaluated, namely; time taken for classification and it ability to use this approach for land-vehicles. Yu et al. (2019) state that these deep convolutional networks (DCNN) take much time in training (hours), whereas classification is done in little time (under a second). Booij et al. (2020) made a driving robot that had an identification with 96% accuracy and it could drive up to 4 km/h. Notable is that the researchers were able to use 5G and cloud computing, which might be crucial for real-time identification. Moreover, Raja et al. (2020) have made a weeding robot with a crop detection accuracy of 97.8%. The land-vehicle was able to move up to a speed of 3.2 km/h. However, there is still quite a gap between the detection accuracy and the 83% of weeds removed in the controlled setting where it was tested. However these researches confirm the possibility of a land-vehicle. Lastly, implicitly it is proven that a CNN is suitable for agriculture. This implicit prove is done by noting that the researches named above all focus on agriculture. But also explicitly it is argued that CNNs have proven to deliver good results in precision agriculture for identifying plants (Espejo-Garcia et al., 2020). | ||
== Creating the | == Creating the dataset == | ||
---- | ---- | ||
One of the main obstacles of creating a functioning network is the | One of the main obstacles of creating a functioning network is the dataset which is used for training. The dataset has to consist of many pictures in order to reach a high accuracy. The dataset has to represent the system in which the robot will be operating. This means the plants on the pictures have to look the same as the plants on the farm. Obtaining these pictures of the specific plants on a farm online is not easy. Most datasets are owned by companies and not shared. The solution to this problem would be obtaining the pictures ourselves. This will be done in cooperation with the farmer. | ||
The dataset which is used in this project has three downsides compared to the | The dataset which is used in this project has three downsides compared to the dataset which can be created by taking the pictures ourselves. The first downside is the number of pictures. This number is not as large as desired. Furthermore, the pictures in the current dataset do not represent the farm as well as pictures taken at the farm. The current dataset is formed with pictures found online. It is possible to create a working network without using photos from the operating environment. However, using pictures from the operating environment is preferred. The final downside is the distribution in numbers of pictures per category. The current dataset contains significantly more pictures of weeds than non-weeds. When creating the dataset, it is important to take the mentioned distribution into account. | ||
It has multiple benefits to involve the farmer in the process of creating the | It has multiple benefits to involve the farmer in the process of creating the dataset. A condition to the benefits is that clear instructions and explanations are given to the farmer. The farmer knows which plants grow on his farm. He can tell which plants are unwanted weeds. There is no external person needed to identify the plants. There are other aspects in which the farmer can specify the network according to his wishes. The acceptable damage to the crops can be determined. A tradeoff has to be made between not damaging the crops and weeding all the undesired plants. Each farmer can decide to what extent one is more important than the other. Another benefit is that the farmer is introduced to the workings of the robot without the robot weeding plants. The robot will probably be new to the farmer. The farmer might have worries whether the robot will indeed only weed the unwanted plants. By starting with data obtaining, trust in the robot’s performance can be build. | ||
To create the | To create the dataset, the robot has to be able to ride around the crops and take pictures. In the beginning the robot will not be able to recognize weeds or crops. It will simply take pictures of the plants. A benefit of using pictures taken by the robot is that the angle in which the photos are taken will be the same during training and operating. The photos can be uploaded to a cloud storage immediately if this is possible. This would make processing the photos while the robot is taking them possible. The robot will probably have a wireless connection, but it is not certain whether this connection is strong enough to upload many pictures. If uploading to a cloud is not possible, the robot would have to store the pictures. The pictures can then be processed after the robot has taken pictures of the plants. Uploading the pictures to a cloud storage would have as benefit that the pictures can be processed anywhere. Employees whose task it is to process the pictures can work from home. A downside is that working with cloud storage does call for extra data security. Uploading the pictures after the robot has stored them internally is also possible. This would have to be done by the farmer or automatically. The latter is preferred. | ||
Which sizes of the weeds are important to record in the | Which sizes of the weeds are important to record in the dataset depends on how much the robot will be used by the farmer. If the robot were to be used every day, larger size weeds are not important since the weeds do not have the time to grow that large. The accuracy of the network is also of influence. With a lower accuracy, the chance of not removing all the weeds in one go is larger. Weeds that are not removed have more time to grow larger in size. An assumption has to be made of the maximum time a weed can have to grow. It is assumed that the robot will weed the plants every week and a weed will be removed in a maximum of three attempts. This means the sizes the weeds can have during three weeks are important to include in the dataset. Creating the dataset will therefore take up roughly three weeks. These weeks should fall in a time period in which the weeds will grow well. The weather will play a part in this. Some more days should be added to the weeks to include larger sizes in the dataset. This is for weeks in which the weeds grow faster than normally. This means the farmer will have to grow the weeds for about 3. in order to create the dataset. It is not necessary to take pictures each day during this period of time. | ||
The number of pictures that will be taken in total depends on the network, number of plant species and the time to process a picture. If only the network was of influence, as many pictures as possible would be made. For deep learning goes that more data is merrier. However, processing the pictures will be done manually so the number of pictures has to be limited. The minimum number of pictures the network needs, depends on the desired accuracy of the network. A high accuracy calls for a lot of data. The number of plant species is also of influence. More species means more pictures. Literature study showed thousands of pictures are necessary. A precise number is difficult to indicate for now. If a thousand pictures were made each day for five days, this would give five thousand pictures. With a helpful interface to work with, the average time spend on processing one picture is assumed to be 15 seconds. This means processing five thousand pictures takes up 21 hours. Assuming an hourly wage of 12 euros, processing the data will cost 252 euros. This amount will increase when the number of pictures increases and decrease when the processing interface is made easier and quicker. A tool to increase the | The number of pictures that will be taken in total depends on the network, number of plant species and the time to process a picture. If only the network was of influence, as many pictures as possible would be made. For deep learning goes that more data is merrier. However, processing the pictures will be done manually so the number of pictures has to be limited. The minimum number of pictures the network needs, depends on the desired accuracy of the network. A high accuracy calls for a lot of data. The number of plant species is also of influence. More species means more pictures. Literature study showed thousands of pictures are necessary. A precise number is difficult to indicate for now. If a thousand pictures were made each day for five days, this would give five thousand pictures. With a helpful interface to work with, the average time spend on processing one picture is assumed to be 15 seconds. This means processing five thousand pictures takes up 21 hours. Assuming an hourly wage of 12 euros, processing the data will cost 252 euros. This amount will increase when the number of pictures increases and decrease when the processing interface is made easier and quicker. A tool to increase the dataset after processing is image augmentation. This tool is relatively fast and cheap. | ||
In order to process the pictures efficiently, a clear plan of approach has to be made. This plan will make it possible for multiple employees of many levels to process data. With the plan of approach, these employees will not have to know the specifics of the farm. It will also ensure that the data is processed correctly and can be used to train the network without further adaptations. Since the robot will be used by multiple farms with different plant species, the process of creating a | In order to process the pictures efficiently, a clear plan of approach has to be made. This plan will make it possible for multiple employees of many levels to process data. With the plan of approach, these employees will not have to know the specifics of the farm. It will also ensure that the data is processed correctly and can be used to train the network without further adaptations. Since the robot will be used by multiple farms with different plant species, the process of creating a dataset will have to take place multiple times. It will be easier to have new employees working on data processing for a farm with a clear plan of approach. The plan has to contain descriptions of the wanted and unwanted plans. This will be specific for each farm. The farmer can help with these descriptions. The plan has to indicate how to make the data ready for training. This means deleting pictures with no plants at all. Pictures with one plant species on them have to be sorted. Pictures of multiple plant species have to be divided in a way. Whether this will be done by dividing into multiple pictures or divisions in one pictures is yet to decide. The divided parts have to be sorted. As mentioned before, a helpful interface to process the pictures will speed up the work. The plan of approach should go hand in hand with the processing software. | ||
After the data is processed on the cloud storage, the | After the data is processed on the cloud storage, the dataset is finished and accessible for the employees who will train the network. After the first time of executing this process of creating a database, the results might be insufficient. In that case, the process has to be altered. This has to be taken into account when starting with this process. After the start up phase, the process will be fine tuned and can be performed as desired at the following farms. The datasets and trained networks that are made can be reused, if the plant species are the same. If the other farm has more or different plant species, more data has to be collected. Creating a dataset at multiple farms will create a large dataset of many plant species. This could lead to a phase in which creating a dataset will not be necessary anymore. | ||
Line 402: | Line 426: | ||
Deep learning requires a lot of data to train a model correctly (the ImageNet database consists of over one million pictures). The amount of data sufficient for training depends on the type of data and model, but generally at least a thousand images are required for computer vision tasks. Quality images are often hard to find or owned by private companies, which limits the available data significantly. Because images of weeds etc. were limited in this manner, data augmentation was applied to prevent overfitting of the neural network by increasing the amount of different images. | Deep learning requires a lot of data to train a model correctly (the ImageNet database consists of over one million pictures). The amount of data sufficient for training depends on the type of data and model, but generally at least a thousand images are required for computer vision tasks. Quality images are often hard to find or owned by private companies, which limits the available data significantly. Because images of weeds etc. were limited in this manner, data augmentation was applied to prevent overfitting of the neural network by increasing the amount of different images. | ||
=== | ==== Data ==== | ||
As mentioned previously, the dataset is not balanced as can be seen in Figure 2. As can be seen from the two bar plots is that first of all, there is a huge gap between the amount of data for weeds and non-weeds, in total there are 298 images. Approximately 92% of the images are of weeds. Moreover, in te bar plot displaying the distribution of the images over classes it is noticeable that there is an unbalanced amount of canada thistle images (approximately 29%), whereas there is a clear lack of trees and shrubbery. All other data amounts seem relatively equal. The variability in size is also rather extensive, the maximal height is 4128 pixels, the maximal width is 4272 pixels, the minimal height is 84 pixels and the minimal width is 29 pixels. Figure 3 depicts the average height and width for the image classes. As one can see there is great variability in sizes, thus there is a need to resize all the data to one size (apart from implementation). Note how some image classes generally have a higher width than height, thus if resizing this data to the state of the art standard (a square) these images will turn out a bit squeezed horizontally. | |||
[[File:DataClassesDistribution2.jpg|1000px|thumb|center|'''Figure 2''': image count per class before data augmentation. ]] | |||
[[File:ImageSizesPerClass.jpg|600px|thumb|center|'''Figure 3''']] | |||
===== Data | ===== Data pre-processing ===== | ||
First of all due to the lack of data, augmentation techniques were applied. The data augmentations were selected in such a way that the end result is realistic: for example, left-right flipping was applied because plants are somewhat vertically symmetric, but up-down flipping was not applied as the vertical orientation is specific due to gravity. The augmentations are: left-right flipping, increased saturation, increased brightness, decreased brightness, blurring (to simulate out-of-focus plants) and center-cropping. This yields in total 6 augmentations and thus the dataset size is increased by a factor of 7. Apart from preprocessing the images using data augmentation another technique was applied, namely resizing. The images were resized in such a fashion that they have become square which is easier to handle for the models. For transfer learning networks this type of input is actually requested, with an upperbound on the size of 224 by 224 pixels. Since most of the data was larger than this size, the images were effectively downsized. Generally, larger images take longer and need better hardware, but can provide better results. | |||
[[File:7014 allTransformsInOneWithNumbers.jpg|1200px|thumb|center|'''Figure 1''': All transformations, from left to right: (1) original, (2) increased brightness, (3) center-cropping, (4) decreased brightness, (5) left-right flipping, (6) blurring, (7) increased saturation (numbers are only added for reference, in the dataset the pictures do '''not''' contain numbers).]] | |||
'''Table 1:''' Characteristics of pre-trained models on the ImageNet validation | ==== Transfer Learning Models ==== | ||
Python with tensorflow (including keras) 2.0 or higher is used to create models. The defaults for these models are chosen by current state of the art standards. From there on different models were created, including one network from scratch (ScratchNet) and three pre-trained models: MobilenetV2, DenseNet201 and InceptionResNetV2. These pre-trained models have been trained on the ImageNet database. The characteristics of these models are shown in Table 1. As can be seen in the table different networks are used with varying complexity. For the ImageNet data, it seems to be the more complex the network is, the better it performs. | |||
'''Table 1:''' Characteristics of pre-trained models on the ImageNet validation dataset. | |||
<table border="1px solid black"> | <table border="1px solid black"> | ||
<tr> | <tr> | ||
Line 421: | Line 457: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td> </td> | <td> MobileNetV2 </td> | ||
<td> </td> | <td> 0.901 </td> | ||
<td> </td> | <td> 3.538.984 </td> | ||
<td> </td> | <td> 88 </td> | ||
</tr> | |||
<tr> | |||
<td> DenseNet201 </td> | |||
<td> 0.936 </td> | |||
<td> 20.242.984 </td> | |||
<td> 201 </td> | |||
</tr> | |||
<tr> | |||
<td> InceptionResNetV2 </td> | |||
<td> 0.953 </td> | |||
<td> 55.873.736 </td> | |||
<td> 572 </td> | |||
</tr> | |||
</table> | |||
The defaults for the transfer learning networks; optimizer: Adam; number of hidden layers : 1; pooling method: globalAverage2DPooling; loss function: categorical cross entropy; data : augmented data; image size: 224 by 224 pixels; initialization weights : ImageNet; class weights: 1; batch size: 8 images; maximal number of epochs: 10; hidden layer’s activation function: rectified linear unit (ReLU); output layer’s activation function: SoftMax (creates a probability density distribution). Moreover some hyperparameters are optimized with the Keras hyperband tuner. This tuner evaluates possible combinations and eventually takes the most promising combination of hyperparameters. Concerning the transfer learning networks the following hyperparameters have been tuned: the number of neurons in the hidden layer (between 32 and 512 in steps of 32) and the learning rate (0.001, 0.0001, 0.00001, 0.000001). The transfer learning networks were adopted and the top (classifier) was replaced where the training of the base (convolutional layers) was disabled. The output layer consisted of the number of classes (either 11 or 2) nodes. | |||
==== ScratchNet ==== | |||
ScratchNet is a relatively simple convolutional neural network. The optimizer and pooling was handled the same as with the transfer learning models: Adam and global average 2D pooling respectively. This model used a smaller input image resolution of 165x165 pixels and a batch size of 32. The resolution was lowered to prevent the machine training the net to run out of memory while tuning the hyperparameters, but in hindsight using a higher resolution and smaller batch size would probably have had better results and would have also resolved the memory issue. | |||
The network is structured as follows: an input layer of 165x165x3, which feeds into a convolutional layer tuned as follows: amount of filters between 8 and 32 with steps of 4 (in this case 28) and a kernel size between 2 and 6 with steps of 1 (in this case 5), with ReLU as activation function. After that a pooling layer, with pooling of X*X, where X is tuned between 1 and 8 with steps of 1 (in this case 7). Then a flattening layer, a dense layer with between 32 and 512 nodes (steps of 32) (in this case 224) and finally the output layer, which contained either 11 nodes (one for each class in our dataset), or two nodes (weeds and non-weeds). | |||
==== Models With 11 Prediction Classes ==== | |||
In the 11 class case the classes are: ladysthumb, shrubbery, conference trees, burlat trees, purple deathnettle, sheperds purse, saltbush, broad leaf dock, Canada thistle, lambsquarters and chickweed. The different architectures have been applied to this problem with varying characteristics to investigate the following questions: | |||
<ol> | |||
<li> Which network with 11 prediction classes performs the best? </li> | |||
<li> Does a model with an extra hidden layer perform better? </li> | |||
<li> Is knowledge “transferred” with transfer learning? </li> | |||
<li> How does assigning class weights impact performance? </li> | |||
<li> What is the effect of augmenting data? </li> | |||
</ol> | |||
The results of training different networks for the 11 class case can be found in Table 2. To answer the questions well some terms need to be elaborated on. The false positives ratio is the ratio of non-weeds being classified as weeds (thus between 0 and 1, lower is better. The best performing network is the network with the lowest false positives ratio satisfying that the accuracy is 80% or higher with a classification time within 500 milliseconds. Class weights are used for weighting the loss function, loosely speaking it defines the importance for each class in training. Proportional class weights are class weights that show how much data is available for each class, it is computed by dividing the total amount of images by the product of the number of classes and the amount of images in the class. | |||
'''Table 2:''' accuracy, false positive ratio and classification time for different models predicting to 11 classes | |||
<table border="1px solid black"> | |||
<tr> | |||
<th> Row/Model number </th> | |||
<th> Model </th> | |||
<th> Accuracy </th> | |||
<th> False Positive Ratio</th> | |||
<th> Classification time (ms) </th> | |||
<th> Comment </th> | |||
</tr> | |||
<tr> | |||
<th> 1 </th> | |||
<td> MobileNetV2, defaults </td> | |||
<td> 0.5208 </td> | |||
<td> 1.0 </td> | |||
<td> 12 </td> | |||
<td> </td> | |||
</tr> | |||
<tr> | |||
<th> 2 </th> | |||
<td> MobileNetV2, with 2 hidden layers </td> | |||
<td> 0.4499 </td> | |||
<td> 1.0 </td> | |||
<td> 12 </td> | |||
<td> </td> | |||
</tr> | |||
<tr> | |||
<th> 3 </th> | |||
<td> MobileNetV2, random weights initialization </td> | |||
<td> 0.2917 </td> | |||
<td> 1.0 </td> | |||
<td> 12 </td> | |||
<td> All predictions are canada thistle, ~29% of total data </td> | |||
</tr> | |||
<tr> | |||
<th> 4 </th> | |||
<td> MobileNetV2, with proportional class weights </td> | |||
<td> 0.5333 </td> | |||
<td> 1.0 </td> | |||
<td> 12 </td> | |||
<td> </td> | |||
</tr> | |||
<tr> | |||
<th> 5 </th> | |||
<td> MobileNetV2, on raw data </td> | |||
<td> 0.3392 </td> | |||
<td> 1.0 </td> | |||
<td> 11 </td> | |||
<td> </td> | |||
</tr> | |||
<tr> | |||
<th> 6 </th> | |||
<td> MobileNetV2, with GlobalMax2DPooling </td> | |||
<td> 0.4368 </td> | |||
<td> 0.9188 </td> | |||
<td> 12 </td> | |||
<td> </td> | |||
</tr> | |||
<tr> | |||
<th> 7 </th> | |||
<td> MobileNetV2, on raw data with proportional class weights and GlobalMax2DPooling </td> | |||
<td> 0.2857 </td> | |||
<td> 1.0 </td> | |||
<td> 11 </td> | |||
<td> </td> | |||
</tr> | |||
<tr> | |||
<th> 8 </th> | |||
<td> MobileNetV2, with proportional class weights and globalMax2DPooling </td> | |||
<td> 0.2745 </td> | |||
<td> 1.0 </td> | |||
<td> 11 </td> | |||
<td> </td> | |||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td> </td> | <th> 9 </th> | ||
<td> </td> | <td> DenseNet201, defaults </td> | ||
<td> </td> | <td> 0.5613 </td> | ||
<td> </td> | <td> 1.0 </td> | ||
<td> 30 </td> | |||
<td> </td> | |||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td> </td> | <th> 10 </th> | ||
<td> </td> | <td> DenseNet201, with AdaDelta </td> | ||
<td> </td> | <td> 0.5348 </td> | ||
<td> 1.0 </td> | |||
<td> 29 </td> | |||
<td> </td> | |||
</tr> | |||
<tr> | |||
<th> 11 </th> | |||
<td> InceptionResNetV2, defaults </td> | |||
<td> 0.3559 </td> | |||
<td> 1.0 </td> | |||
<td> 35 </td> | |||
<td> predictions mostly on 2 classes </td> | |||
</tr> | |||
<tr> | |||
<th> 12 </th> | |||
<td> InceptionResNetV2, with AdaDelta </td> | |||
<td> 0.2966 </td> | |||
<td> 1.0 </td> | |||
<td> 35 </td> | |||
<td> predictions mostly on canada thistle, ~29% of total data </td> | |||
</tr> | |||
<tr> | |||
<th> 13 </th> | |||
<td> ScratchNet, default </td> | |||
<td> 0.9651 </td> | |||
<td> 0.0631 </td> | |||
<td> 7 </td> | |||
<td> visible bias towards canada thistle, ~29% of total data </td> | |||
</tr> | |||
<tr> | |||
<th> 14 </th> | |||
<td> ScratchNet, with proportional class weights </td> | |||
<td> 0.9676 </td> | |||
<td> 0.0325 </td> | |||
<td> 7 </td> | |||
<td> </td> | |||
</tr> | |||
<tr> | |||
<th> 15 </th> | |||
<td> ScratchNet, on raw data</td> | |||
<td> 0.6962</td> | |||
<td> 0.3836 </td> | |||
<td> 7 </td> | |||
<td> big bias towards canada thistle, ~29% of total data </td> | |||
</tr> | |||
<tr> | |||
<th> 16 </th> | |||
<td> ScratchNet, on raw data with proportional class weights</td> | |||
<td> 0.5139 </td> | |||
<td> 0.7153 </td> | |||
<td> 7 </td> | |||
<td> </td> | <td> </td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
== | |||
Question 1, it seems to be that the ScratchNet with proportional class weights has the lowest false positive ratio, with the highest accuracy and the lowest classification time. Thus it is chosen as the best model with 11 prediction classes. The visible bias towards one class signals that it could be possible that with a more balanced dataset the performance might increase. Note that all transfer learning networks have a false positive ratio so high that they would harm crops more than they would remove weeds making these networks useless given this data and training method. | |||
Question 2, compare the first and the second row of Table 2. The network with an additional hidden layer (in total 2) does not perform better, it even has a lower accuracy. Thus no, more hidden layers do not necessarily imply better performance. | |||
Question 3, compare the first and third row of Table 2. It seems to be that the model with random weights, thus not pre-trained, has the same false positive ratio but performs significantly worse since it always predicts only one class yielding a far lower accuracy. Thus yes, it seems to be the case that that some knowledge is transferred, in other words the initialized weights provides a better starting point compared to random weights. | |||
Question 4, compare the following pairs of rows: (1) row 1 and row 4; (2) row 6 and row 8; (3) row 13 and row 14; (4) row 15 and 16. At first sight, only at 2 out of the 4 pairs the models with class weights outperform the models without them. However it seems to be that models with no additional changes from the defaults (apart from the class weights) are consistent in showing better results. Even by comparing pair 3 (row 13 and 14), it seems to be that adding the class weights almost halves the False Positive Ratio while maintaining the accuracy. So to conclude, class weights can improve performance significantly. | |||
Question 5, compare the following pairs of rows: (1) row 1 and row 5; (2) row 7 and row 8; (3) row 13 and row 15; (4) row 14 and row 16. Apart from pair 2 there seems to be far better performance when using the augmented data versus using the raw data. Concerning pair two, there seems to be no significant difference. To conclude, augmenting data seems helpful (given this dataset), to improve performance. | |||
Apart from these questions there are some general remarks to be made about these results: essentially 4 different architectures have been compared: MobileNetV2, DenseNet201, InceptionResNetV2 and ScratchNet. It seems to be that all models perform their classification task relatively quickly. Comparing Table 1 and Table 2, there seems to be a positive correlation between network complexity (number of parameters and topological depth) and classification times, in other words: more complex networks seem to take longer. This is not only true for classification times, but also for training times. It seems to be that MobileNetV2 is almost three times as fast as the other transfer learning networks and there seems to be only a small improvement in performance from InceptionResNetV2 to MobileNetV2. Because of these differences in computational times, most of the questions have been tested with the MobileNetV2 model. Moreover the computational times are mainly dependent on whether the GPU or CPU (this case the GPU) is used for computations and the specific hardware used (Intel i7-7700HQ, NVIDIA Quadro M1200). Lastly, most comments relate towards the imbalanced dataset, suggesting that a more balanced dataset could yield better results. | |||
==== Models With 2 Prediction Classes ==== | |||
Here all data is divided into only two classes: weeds and non-weeds. We wanted to see if we could get lower false positive ratios when only classifying into two classes while keeping accuracy as high as possible. A similar approach is used as above. | |||
'''Table 3:''' accuracy, false positive ratio and classification time for different models predicting to 2 classes | |||
<table border="1px solid black"> | |||
<tr> | |||
<th> Row/Model number </th> | |||
<th> Model </th> | |||
<th> Accuracy </th> | |||
<th> False Positive Ratio</th> | |||
<th> Classification time (ms) </th> | |||
<th> Comment </th> | |||
</tr> | |||
<tr> | |||
<th> 1 </th> | |||
<td> MobileNetV2, defaults </td> | |||
<td> 0.9240 </td> | |||
<td> 1.0 </td> | |||
<td> 12 </td> | |||
<td> ~0.92 accuracy baseline</td> | |||
</tr> | |||
<tr> | |||
<th> 2 </th> | |||
<td> MobileNetV2, equal number of images in each class </td> | |||
<td> 0.5313 </td> | |||
<td> 1.0 </td> | |||
<td> 11 </td> | |||
<td> ~0.5 accuracy baseline</td> | |||
</tr> | |||
<tr> | |||
<th> 3 </th> | |||
<td> MobileNetV2, with proportional class weights </td> | |||
<td> 0.9250 </td> | |||
<td> 1.0 </td> | |||
<td> 12 </td> | |||
<td> Results stayed the same for class weights up to non-weeds having 50x more weight</td> | |||
</tr> | |||
<tr> | |||
<th> 4 </th> | |||
<td> ScratchNet, defaults </td> | |||
<td> 0.9955 </td> | |||
<td> 0.0653 </td> | |||
<td> 6 </td> | |||
<td></td> | |||
</tr> | |||
<tr> | |||
<th> 5 </th> | |||
<td> ScratchNet, with proportional class weights </td> | |||
<td> 0.9960 </td> | |||
<td> 0.0473 </td> | |||
<td> 7 </td> | |||
<td></td> | |||
</tr> | |||
<tr> | |||
<th> 6 </th> | |||
<td> ScratchNet, on raw data </td> | |||
<td> 0.9263 </td> | |||
<td> 0.7439</td> | |||
<td> 6 </td> | |||
<td></td> | |||
</tr> | |||
<tr> | |||
<th> 7 </th> | |||
<td> ScratchNet, on raw data with proportional class weights</td> | |||
<td> 0.9358 </td> | |||
<td> 0.6170 </td> | |||
<td> 6 </td> | |||
<td></td> | |||
</tr> | |||
</table> | |||
In row 2 of the table , a training dataset was used which had an equal amount of images in either class. This suggests that the high accuracy in row 1 is only achieved by classifying every plant as a weed, this is supported by the fact that in both rows the false positive ratio is 1.0 in both rows. | |||
The results above confirm some findings we had for 11 classes: | |||
* ScratchNet achieves the lowest false positive ratios | |||
* Using proportional class weights decreases false positive ratios | |||
* Using the augmented dataset improves results | |||
The best results for 11 and 2 classes (in both cases ScratchNet with proportional class weights) are very similar, with 2 output classes resulting in slightly higher accuracy and a slightly higher false positive ratio, although the latter may be whithin margin of error. | |||
==== Thresholds ==== | |||
False positives, where the software/weeding robot recognises a desired crop plant as an undesired weed, are the most important to get rid of in the prototype. For a user on a farm, false positives mean less income due to lower crop yield. To reduce the false positive rate more after training, a thresholding layer is introduced to prevent classification when the neural network is “unsure” about the classification. In theory, this will reduce both the classification of weeds and crops. For weeds, this reduced classification can be offset by having the weeding robot do multiple passes, or by using more images during a single pass. | |||
The thresholding is implemented in the following way: a trained model (the main model used is the best model from scratch for the false positive rate) is loaded in and a [https://www.tensorflow.org/api_docs/python/tf/keras/layers/ThresholdedReLU threshold layer] is added after the classification layer. The threshold value given means that any value below the threshold is set to 0. The model is then used to make predictions, and any predictions with only 0, predictions where the model was unsure, are removed. The false positives from the rest of the data are then visualised. | |||
In the following graphs the results of this thresholding are shown. The graphs are made using the network from scratch with 11 classes and class weights. Due to the way the data is loaded in, small fluctuations in the data occur, and these are reason for the fluctuations in the graphs. What can be seen is that after a threshold of 0.6 no false positives occur. The accuracy of the model with thresholding does not decrease drastically, even with a threshold of 0.9. | |||
[[File:Threshold_1.png|1000px|thumb|center|'''Figure 4''': False positive rate after thresholding with steps of 0.1 ]] [[File:Threshold_abs.png|1000px|thumb|center|'''Figure 5''': Absolute false positives after thresholding with steps of 0.1 ]] | |||
[[File:Threshold_acc.png|1000px|thumb|center|'''Figure 6''': Accuracy after thresholding with steps of 0.1 ]] | |||
To test the effectivity of thresholding on non-plant objects, a few images were applied to the model to observe the output. These images were: a pair of shoes, two chickens, a hose found on the farm of the user and a section of dirt found on the farm of the user. | |||
The highest prediction of these images was the chicken, with a 90% certainty, and the next highest was the dirt with 80% certainty. All images were classified as weeds, so this cannot be ignored. As seen from the earlier results, a threshold of 0.9 will not affect the model immensely, and could thus be used to prevent false positives from foreign objects. | |||
==== Preliminary Conclusions ==== | |||
Based on the results, ScratchNet has the best performance in terms of accuracy and false positives. However, the amount of classes used (2 or 11) affect the accuracy and the false positive rate in a manner that does not give a clear best option. The user has specified that optimizing the false positive rate is more important than the accuracy, and thus the network with 11 classes comes out on top. This is further supported by the network's response to thresholding: a threshold of 0.6 can fully remove the false positives with a small loss in accuracy. Based on this, ScratchNet with 11 classes and a 0.6 threshold is the best model for the users. | |||
== Discussion == | |||
---- | ---- | ||
'''Results and the users''' | |||
The final results have to be translated back to the users’ needs, starting with the farmer’s needs. The end result is a model which is able to distinguish weeds from non-weeds with an accuracy higher than 95%. Furthermore, the model has a false positive ratio equal to zero. A remark to add to these results is that the pictures used for validation were not from the farm. The actual number of false positives and accuracy can be different with pictures taken at the farm. The overall percentage of removed weeds and damaged crops also relies on other parts of the robot. It is likely that those parts will have error percentages as well. It is therefore not possible to say what the actual overall accuracy of the weeding robot and percentage of damaged crops will be. However, the results meet the farmer’s needs with the knowledge available at this moment. The model is able to detect enough weeds and will not see to many crops as weeds. | |||
It can also be said that the needs of CSSF have been met. It is proven that the concept of weed detection with a neural network is possible. The final accuracy and false positive ratio indicate that this method is a success. Furthermore, multiple technical insights during this project have been documented. During the project, conclusions have been drawn about how useful a certain choice was. Image augmentation, class weights and thresholding are examples of useful tools that have been applied. CSSF can use this knowledge in the following projects. | |||
'''Overall conclusions''' | |||
The goal of this project is to identify weeds by means of computer vision. During the project not only technical aspects are taken into account. The perspective of the user has played an important role as well. Several conclusions can be made from contact with the user. Profit is important to the user, however it is not the main goal. The weeding robot will not create much profit in the first years. The user is willing to accept that because he gains sustainability and stability. During the set up process of the robot on the farm, it is beneficial to include the farmer. | |||
Furthermore, there are several technical conclusions to draw. Creating a sufficient database to train the network is a large obstacle. Finding pictures online is difficult since they are owned by companies. Making the pictures ourselves is too much work. It is important to create a database with a balance in the amount of data per category. To create a sufficient database, time and cooperation with the farmer are necessary. | |||
Literature study shows that convolutional neural networks with transfer learning is the best option for the project goal. It also shows that setting a threshold in determining classification helps achieving a higher accuracy and less false positives. In the project, different models were created, including one network from scratch (ScratchNet) and three pre-trained models: MobilenetV2, DenseNet201 and InceptionResNetV2. The networks have been trained, tested and compared. Comparison was done by accuracy, false positive ratio and classification time. Multiple conclusions can be drawn by comparing the different models. More hidden layers does not necessarily imply better performance. Using one of the pre-trained models together with the dataset of this project leads to a model that will damage an excessive amount of crops. A pre-trained model is in this case not useful. Applying initialized weights provides a better starting point compared to random weights. It helped to decrease the false positive ratio for ScratchNet. Data augmentation seems to have improved the performance in this project. More complex networks seem to have a longer classification time and training time. The best model in this project is the one made from scratch working with 11 categories. | |||
Applying a threshold proves to be very successful. Using the best model, ScratchNet, a threshold has been implemented. This led to the result of no false positives and an accuracy above 95%. A remark to add to this result is that the pictures used for validation were not from the farm. The actual number of false positives and accuracy can be different with pictures from the farm. | |||
'''Conclusions on system requirements''' | |||
Earlier, [[#Users|system requirements]] were established. Now, these will be revisited to discuss whether our system meets each of the requirements. | |||
<ol> | |||
<li> The system is flexible in its views what may be concerned as weeds, as it classifies many different plant species. The outcome of the system specifies which plant species it is. From there, it can be concluded whether this plant species specifies as a weed, and whether it should be removed. | |||
In addition, using the Neural Network as we created it, different classes can be added as well. It is thus possible to add more plant species to be recognized. | |||
</li> | |||
<li> The high accuracy rates show that the system is able to distinguish the different types of weeds. </li> | |||
<li> The system is not yet able to recognize multiple plants in one image. There was not enough time to look at this aspect, and it was also not a very high priority for this project. However, it should not prove difficult to achieve this goal as well with the use of the system. | |||
If an image were to be divided into several fragments, these fragments could be fed to the system, leading to an output of a plant species within that fragment. Using the outputs of the different fragments, different plants within one image could be recognized. | |||
</li> | |||
<li> This requirement is not tested explicitly, but is likely to be fulfilled. The dataset used to train and test system contained images of both the weeds and non-weeds in different stages in their growth. The accuracy of the network proved to be high, and therefore it can be assumed that the system is able to deal with plant species within different growing phases.</li> | |||
<li> As shown in the results section, the accuracy can indeed be above 95%, even with a threshold that prevents false positives for crops. However, this is not the case for models with a high threshold to prevent false positives for other objects.</li> | |||
<li> The limit of false positives has been met when using thresholds of above 0.6 without drastically reducing the accuracy of the model. This threshold can be increased to 0.9 to prevent other objects from being classified as weeds. </li> | |||
<li> For this requirement, it can be assumed that it is met. This is based on two factors in which the amount of light was accounted for: the creation of the dataset and the manipulation of the images of the dataset. To create the dataset, images with varying lighting conditions were used, which means the dataset contains a range of lighting conditions. It was taken into account that the robot is supposed to work outside. Therefore, only images in natural light were used. In addition, data augmentations were used in which the contrast was varied, which supplied an even broader range of lighting conditions.</li> | |||
<li> Considering the recognition system, this should not lead to any problems. However, since the physical design of the robot is not finished yet, it is impossible to make any firm conclusions about this requirement. For the recognition system, the only thing that would matter is whether the hardware is protected, and whether the camera is kept free from water. </li> | |||
<li> Regarding the system as far as it is finished at the moment, this requirement seems to be very feasible. The classification processing time is far beneath this required time. However, additional processing time will add to the classification, such as taking the picture, feeding the picture to the classification system, and processing the output of the classification. We anticipate that the processing time will not exceed the requirement, as the classification time is quite short, but it cannot be confirmed before the complete system is finished.</li> | |||
<li> The system is not able to locate a plant within an image. Similar to the ability to locate multiple plants within one image, this requirement was not focused on yet, but could be reached by building onto the classification system. Thus, while the system does not meet the requirement yet, it provides some useful building blocks.</li> | |||
<li> As discussed in our costs and benefits, the robot will be constructed using some valuable parts. This could make it an attractive piece of technology to steal. This was somewhat taken into account, by accounting for multiple smaller weeding robots, instead of one bigger robot for the whole farm. The robots would then be not as visible, since they are smaller. Also, CSSF proposed to install the software in such a way that whenever it would lose contact or be at the wrong location, the software would become disabled. In other words: the robot would become useless if it were stolen from the farm. However, this does not obstruct the possibility to steal the robot and sell it in separate parts. This was also not accounted for in other ways, so this will still prove a challenge for further development.</li> | |||
<li> The current system has given a proof of concept of accurate classification of multiple plant species, even while the given dataset was quite small. Therefore, it is anticipated the classification system will not take much of the farmer’s time. However, in the beginning the farmer must probably learn about the workings of the robot. To do so, a training provided by CSSF is accounted for. This means that the robot will temporarily need some time of the farmer, but not much once it is completely adapted to the farm. Therefore, this requirement is considered to be met. </li> | |||
</ol> | |||
'''Further research and developments''' | |||
As mentioned before, this project was executed under the direction of CSSF. The weed recognition system forms one of many parts required to build a robot for autonomous weeding and harvesting. | As mentioned before, this project was executed under the direction of CSSF. The weed recognition system forms one of many parts required to build a robot for autonomous weeding and harvesting. | ||
Line 448: | Line 802: | ||
CSSF is still in the early phases of the development of this robot. Therefore, the main deliverable that was required from this project for their research was a proof of concept. While the recognition system was not required yet to have an ultimate accuracy, it had to show that it is possible to classify different kinds of weeds and plants, distinguishing between multiple classes that look quite alike. | CSSF is still in the early phases of the development of this robot. Therefore, the main deliverable that was required from this project for their research was a proof of concept. While the recognition system was not required yet to have an ultimate accuracy, it had to show that it is possible to classify different kinds of weeds and plants, distinguishing between multiple classes that look quite alike. | ||
However, there are still improvements possible for the weed recognition. First of all, the main limitation of the neural network as it is, is that the dataset of images used for training of the network is very small. The accuracy could be further improved by increasing the size and quality of the dataset. Currently, the dataset consists of a limited amount of images, containing a limited amount of classes (plant species). These images are not shot at the location of actual agroforestry farms. In addition, the balance between images of wanted plants and unwanted plants is very unequal: in the current dataset there are far more pictures of the weeds than the crops, leading to a bias in the algorithm. The dataset could thus be improved in several ways: | However, there are still improvements possible for the weed recognition. First of all, the main limitation of the neural network as it is, is that the dataset of images used for training of the network is very small. The accuracy could be further improved by increasing the size and quality of the dataset. Currently, the dataset consists of a limited amount of images, containing a limited amount of classes (plant species). These images are not shot at the location of actual agroforestry farms. In addition, the balance between images of wanted plants and unwanted plants is very unequal: in the current dataset there are far more pictures of the weeds than the crops, leading to a bias in the algorithm. The sizes of the individual pictures is also something to take into account. The dataset should have the pictures with the same size to improve performance. The dataset could thus be improved in several ways: | ||
- More pictures of the current species at relevant locations, especially the crops | - More pictures of the current species at relevant locations, especially the crops | ||
- Including pictures of other plant species that can be found within agroforestry | - Including pictures of other plant species that can be found within agroforestry | ||
- A new category for unrelated images to prevent misclassification of non-plant objects | |||
- Balance between the amount of data per species or category | |||
- Making sure all pictures are the same size. | |||
As already indirectly indicated, the algorithm as it is, is based on the farm of John Heesakkers. Therefore, the species included in the dataset are species that occur on his farm, both for the weeds and the crops. The system should still be generalized by extending the dataset with species occurring at other farms. Then, the user of the system could indicate which species should be considered by the algorithm for the specific farm, or in other words: what species occur on that farm. | As already indirectly indicated, the algorithm as it is, is based on the farm of John Heesakkers. Therefore, the species included in the dataset are species that occur on his farm, both for the weeds and the crops. The system should still be generalized by extending the dataset with species occurring at other farms. Then, the user of the system could indicate which species should be considered by the algorithm for the specific farm, or in other words: what species occur on that farm. | ||
Apart from the dataset, the implementation of the algorithm in the robot should be considered. To deal with the stream of pictures that is generated by the camera on the robot, recurrent neural networks could be useful. | Apart from the dataset, the implementation of the algorithm in the robot should be considered. To deal with the stream of pictures that is generated by the camera on the robot, recurrent neural networks could be useful by using multiple frames from a video to classify based on multiple angles. These networks work especially well for sequence data such as time series. It is recommended to also look into recurrent neural networks for further research. | ||
Another possible improvement is changing the last layer of the model to a simgoid layer, so that the model can predict multiple plant species in a single image. Training this model could be done by attaching multiple single-plant images a a single multi-plant image. | |||
Furthermore, the output generated by the recognition algorithm is still limited: it only indicates the species that is recognized from the image fed to the algorithm, but does not indicate its position yet. Building from the current network, the system can be further extended such that it divides the image into different image parts, and indicates the position and species of all recognized plants within each part of the image. A remark to add is that the training data used for positioning needs more information than only pictures and categories. The position has to be indicated. With the software being able to position the weed, it is still necessary to activate the hardware to remove the weeds. In conclusion, multiple steps are still necessary to get from the detection software to an actual weeding robot. | |||
== Appendix == | |||
---- | |||
[[File:Costs robot.png|1000px|thumb|center|'''Figure 7''': Costs of the weeding robot per year ]] | |||
Figure 7 shows the calculated costs of the weeding robot per year. There are some things that should be explained about this calculation: | |||
- As CSSF indicated, the R&D costs will be divided over the first 100 robots. | |||
- This calculation was focused on John's farm. It was estimated that John's farm would need five weeding robots. Therefore, all costs that concern the individual robot are multiplied by five. | |||
- The fault costs are based on the expectation of a 2% loss due to mechanical damage (false positives), and 3000 plants per hectare (total 504000 plants). The price per plant is based on the average costs of the plants that John currently has on its farm. | |||
[[File:Costs traditional weeding.png|1000px|thumb|center|'''Figure 8''': Costs of the traditional weeding per year ]] | |||
Figure 8 shows the calculated costs of traditional weeding per year. Again, some things should be elaborated on to understand this calculation: | |||
- The calculated hours for the farmer are based on the expectation of 4 hours a week, except when there are extra workers (14 weeks) or in the winter months (12 weeks). This is as indicated by John Heesakkers. | |||
- The calculated hours for extra workers are based on the expectation of 2 full-time working employees in the months May, June, July and half of August. According to John, these are the months in which the weeds are most problematic. The estimation of the expected amount of workers was also indicated by him. | |||
- The costs due to mechanical damage are based on the expectation of a 0.5% loss due to mechanical damage, and 3000 plants per hectare (total 504000 plants). | |||
== References == | == References == | ||
Line 601: | Line 987: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Timon Heuwekemeijer</td> | <td>Timon Heuwekemeijer (1003212)</td> | ||
<td>4,5</td> | <td>4,5</td> | ||
<td> Meetings (2,5 hours), create a planning (2 hours)</td> | <td> Meetings (2,5 hours), create a planning (2 hours)</td> | ||
Line 637: | Line 1,023: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Timon Heuwekemeijer</td> | <td>Timon Heuwekemeijer (1003212)</td> | ||
<td>9,5</td> | <td>9,5</td> | ||
<td>Meetings (1,5 hours), Creating and troubleshooting a collaborative development environment(5 hrs), collect database (3 hours)</td> | <td>Meetings (1,5 hours), Creating and troubleshooting a collaborative development environment(5 hrs), collect database (3 hours)</td> | ||
Line 674: | Line 1,060: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Timon Heuwekemeijer</td> | <td>Timon Heuwekemeijer (1003212)</td> | ||
<td>4,5</td> | <td>4,5</td> | ||
<td>Meetings (3 hours), Data sorting and naming (1,5 hours)</td> | <td>Meetings (3 hours), Data sorting and naming (1,5 hours)</td> | ||
Line 709: | Line 1,095: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Timon Heuwekemeijer</td> | <td>Timon Heuwekemeijer (1003212)</td> | ||
<td>8.25</td> | <td>8.25</td> | ||
<td>Meetings (2.25 hours), implementing an automatically tuning neural network(6 hours)</td> | <td>Meetings (2.25 hours), implementing an automatically tuning neural network(6 hours)</td> | ||
Line 725: | Line 1,111: | ||
<tr> | <tr> | ||
<td>Hilde van Esch (1306219)</td> | <td>Hilde van Esch (1306219)</td> | ||
<td></td> | <td>7</td> | ||
<td></td> | <td>Meetings (2 hours) + Working on costs and benefits (5 hours) </td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Leighton van Gellecom (1223623)</td> | <td>Leighton van Gellecom (1223623)</td> | ||
<td> 11 | <td> 11 </td> | ||
<td> Meetings (2 hours) + Generating and testing hypotheses for transfer learning (4 hours) + training models and evaluation (5 hours) </td> | <td> Meetings (2 hours) + Generating and testing hypotheses for transfer learning (4 hours) + training models and evaluation (5 hours) </td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Tom van Leeuwen (1222283)</td> | <td>Tom van Leeuwen (1222283)</td> | ||
<td></td> | <td>6.5 </td> | ||
<td></td> | <td>Meetings (2 hours) + Code for confusion matrix (3 hours) + Writing Design process (1.5 hours)</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Karla Gloudemans (0988750)</td> | <td>Karla Gloudemans (0988750)</td> | ||
<td>8</td> | <td>8</td> | ||
<td>Meetings (2 hours) + Writing 'Creating the | <td>Meetings (2 hours) + Writing 'Creating the dataset' (6 hours)</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Timon Heuwekemeijer</td> | <td>Timon Heuwekemeijer (1003212)</td> | ||
<td></td> | <td>9</td> | ||
<td></td> | <td>Meetings (2 hours) + Creating and finetuning a convolutional neural network, 2 output classes and class weights (7 hours)</td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
Line 760: | Line 1,146: | ||
<tr> | <tr> | ||
<td>Hilde van Esch (1306219)</td> | <td>Hilde van Esch (1306219)</td> | ||
<td></td> | <td>7</td> | ||
<td></td> | <td>Meetings (2.5 hours) + Working on costs and benefits (5 hours)</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Leighton van Gellecom (1223623)</td> | <td>Leighton van Gellecom (1223623)</td> | ||
<td></td> | <td> 10.25 </td> | ||
<td></td> | <td> Meetings (2.5 hours) + TF training, getting results and extra meeting (7.45 hours) </td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Tom van Leeuwen (1222283)</td> | <td>Tom van Leeuwen (1222283)</td> | ||
<td></td> | <td>5.5 </td> | ||
<td></td> | <td>Meetings (2.5 hours) + Design Choices (3 hours)</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Karla Gloudemans (0988750)</td> | <td>Karla Gloudemans (0988750)</td> | ||
<td>8 | <td>8.5</td> | ||
<td>Meetings (2,5 hours) + writing 'Relating results to user needs' and final changes 'Creating the | <td>Meetings (2,5 hours) + writing 'Relating results to user needs' and final changes 'Creating the dataset'(6 hours)</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Timon Heuwekemeijer</td> | <td>Timon Heuwekemeijer (1003212)</td> | ||
<td></td> | <td>9.5</td> | ||
<td></td> | <td>Meetings (2,5 hours) + training different networks, collecting and interpreting results (7 hours) </td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
Line 795: | Line 1,181: | ||
<tr> | <tr> | ||
<td>Hilde van Esch (1306219)</td> | <td>Hilde van Esch (1306219)</td> | ||
<td></td> | <td> 9 </td> | ||
<td></td> | <td> Meetings (2.5 hours) + writing on wiki (Discussion + User needs) (5.5 hours) + working on presentation (1 hour) </td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Leighton van Gellecom (1223623)</td> | <td>Leighton van Gellecom (1223623)</td> | ||
<td></td> | <td> 13.75 </td> | ||
<td></td> | <td> Meetings (2.5 hours) + write prototype sections/ adding code to do so (visualizations) (8 hours) + reviewing wiki (45 min) + create data demo visualizations (2.5 hours) </td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Tom van Leeuwen (1222283)</td> | <td>Tom van Leeuwen (1222283)</td> | ||
<td></td> | <td>9 </td> | ||
<td></td> | <td>Meeting (2 hours) + Thresholding (6 hours) + Conclusion (1 hour)</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Karla Gloudemans (0988750)</td> | <td>Karla Gloudemans (0988750)</td> | ||
<td></td> | <td>8.5</td> | ||
<td></td> | <td>Meetings (2.5 hours) + Conclusion and relating results to the user (6 hours)</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Timon Heuwekemeijer</td> | <td>Timon Heuwekemeijer (1003212)</td> | ||
<td></td> | <td>5.5</td> | ||
<td></td> | <td>Meetings (2.5 hours) + documenting results on the wiki (3 hours)</td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
Line 830: | Line 1,216: | ||
<tr> | <tr> | ||
<td>Hilde van Esch (1306219)</td> | <td>Hilde van Esch (1306219)</td> | ||
<td></td> | <td>5.5</td> | ||
<td></td> | <td>Meetings (2.5 hours), Presentation (1 hour), Proofreading (2 hours)</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Leighton van Gellecom (1223623)</td> | <td>Leighton van Gellecom (1223623)</td> | ||
<td></td> | <td> 5.5 </td> | ||
<td></td> | <td> Meetings (2.5 hours) + Presentation (2 hours) + Proofreading (1 hour) </td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Tom van Leeuwen (1222283)</td> | <td>Tom van Leeuwen (1222283)</td> | ||
<td></td> | <td>4.5</td> | ||
<td></td> | <td>Meetings (2.5 hours) + Presentation (1 hour) + Proofreading (1 hour)</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Karla Gloudemans (0988750)</td> | <td>Karla Gloudemans (0988750)</td> | ||
<td></td> | <td>15.5</td> | ||
<td></td> | <td>Meetings (2.5 hours) + Presentation (12 hour) + Proofreading (1 hour)</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>Timon Heuwekemeijer</td> | <td>Timon Heuwekemeijer (1003212)</td> | ||
<td></td> | <td>5.5</td> | ||
<td></td> | <td>Meetings (2.5 hours) + Presentation (2 hours) + Proofreading (1 hour)</td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
<!-- Comment --> | <!-- Comment --> |
Latest revision as of 14:48, 25 June 2020
DeepWeed, a weed/crop classification neural network
Leighton van Gellecom, Hilde van Esch, Timon Heuwekemeijer, Karla Gloudemans, Tom van Leeuwen
Problem statement
Current farming methods such as monocropping are outdated and have negative effects on soil quality, greenhouse gas emissions, the presence of invasive species and the increase in crop diseases and pests (Plourde et al., 2013). Herbicides are often used to control pests or weeds, because the use of herbicides could mean better crop yield. Moreover, the use of herbicides can be much cheaper than hiring manual weeding labor, even by 50% (Haggblade et al., 2017). This is problematic because the increasing use of agricultural chemicals poses environmental and human health risks (Pingali, 2001). Some seek the answer to such problems in the rise of precision farming. Precision farming’s promises are to reduce waste thereby cutting private and environmental costs (Finger et al., 2019). Others look further into the future and consider agroforestry. In the book Agroforestry Implications (Koh, 2010) the following definition is used: “agroforestry is loosely defined as production systems or practices that integrate trees with agricultural crops or livestock”. The author of the book poses that agroforestry compromises on expanding production while maintaining the potential for forest protection, the need for biodiversity and alleviating poverty.
Agroforestry is labor intensive, therefore the need arises for automation taking over some tasks. A particular task is that of weed identification and removal. The definition of weeds might differ between people. Some definitions include that of Ferreira et al. (2017) who define weeds as “undesirable plants that grow in agricultural crops, competing for elements such as sunlight and water, causing losses to crop yields. ” or a definition by their features (Tang et al., 2016): fast growth rate, greater growth increment and competition for resources such as water, fertilizer and space. The main conclusion that could be drawn from these definitions is that weeds harm the agricultural crops, thus they need to be removed.
Such a weeding robot, or even a general purpose machine, would need many different modules. Each module should operate independently to complete its task, but it should also communicate with other modules. This research restricts itself by specifically looking at weed detection in a setting of agroforestry where between different (fruit) trees plants grow. The aim is to identify weeds by means of computer vision.
Users
User profile
Farmers that adopt a sustainable farming method differ significantly from conventional farmers on personal characteristics. Sustainable farmers tend to have a higher level of education, be younger, have more off-farm income, and adopt more new farming practices (Ekanem & co., 1999). The sustainable farming has other goals than conventional farming as it focuses on aspects like biodiversity and soil quality in addition to the usual high productivity and high profit. The individual differences suggest that sustainable farmers are more likely to originally not be farmers. Also, having more off-farm income indicates limited time devotion to the farm. The willingness to adopt new farming practices could benefit our new software, as it might be more likely to be accepted and tried out.
There is a growing trend of sustainable farming, with the support of the EU, which has set goals for sustainable farming and promotes these guidelines (Ministerie van Landbouw, Natuur en Voedselkwaliteit, 2019). This trend expresses itself in the transition from conventional to sustainable methods within farms, and new initiatives, such as Herenboeren.
Agroforestry imposes more difficulty in the removal of weeds, due to the mixed crops. Weeding is a physically heavy and dreadful job. These reasons cause growing need for weeding systems from farmers who made a transition to agroforestry. This is also ascribed by Marius Monen, co-founder of CSSF and initiator in the field of agroforestry.
Spraying pesticides preventively reduces food quality and poses the problem of environmental pollution (Tang, J., Chen, X., Miao, R., & Wang, D.,2016). The users of the software for weed detection would not only be the sustainable farmers, but also indirectly the consumers of farming products, as it poses an influence on their food and environment.
This research is in cooperation with CSSF. In line with their advice, we will focus on the type of agroforestry farming where both crops and trees grow, in strips on the land. To test the functionality of our design, we will be working in cooperation with farmer John Heesakkers, who has shifted from livestock farming towards this form of agroforestry recently. Therefore, his case will be our role model to design the system.
System requirements
Since the approach and views of sustainable farmers may differ, one of the requirements of the system is that it is flexible in its views on what may be concerned as weeds, and what as useful plants (Perrins, Williamson, Fitter, 1992). It should thus be able to distinguish multiple plants instead of merely classifying weeds/non-weeds. Based on user feedback, the following list of plant types should be recognised as weeds: Atriplex, Shepherd's purse , Redshank, Chickweed, Red Dead-Nettle, Goosefoot, Creeping Thistle and Bitter Dock. Furthermore, regarding the set-up of agroforestry, the system should be able to deal with different kinds of plants in a small region, thus it should be able to recognise multiple plants in one image. It also means that the plant types include trees, making which set the maximum height and breadth of the plants. The non-weedsare expected to be recognised when (nearly) fully grown, as young plants are very hard to distinguish. However, weeds should be removed as soon as possible and in every growth stage. Next, the accuracy of the system should be as close as possible to 100%, however realistically an accuracy of at least 95% should be achieved. The system should not recognize a non-weed as a weed, because this will lead to harm or destruction of the value of that plant. Lastly, based on constraints on both the training/testing and possible implementation, the neural network should be as efficient and compact as possible, so that it can classify plant images real-time. The following will give a rough estimation of the upper bound for the processing time. Given a speed of 3.6 km/h and a processed image every meter and maximally two cameras are used for detection, than the upper bound of the processing time is 500 milliseconds per image. If the system performs the classification more quickly, than the frequency of taking pictures could be increased, the movement speed could be increased or the combination of these improvements could happen. Moreover, farming equipment is getting increasingly expensive and therefore they are a pray to theft. The design should minimize the attractiveness of stealing the system. This yields the following concrete list of system requirements:
- The system should be flexible in its views on what may be concerned as weeds.
- The system should be able to distinguish the following types of weeds: Atriplex, Shepherd's purse , Redshank, Chickweed, Red Dead-Nettle, Goosefoot, Creeping Thistle and Bitter Dock
- The system should be able to recognize multiple plants in one image.
- Non-weeds should be recognized in a mature growing stage, whereas weeds should be recognized in all different growing stages.
- The classification accuracy of weeds versus non-weeds is preferably above 95%.
- The system should ideally be able to have no false positive classifications.
- The system should be able to work under varying lighting conditions, but under the restriction of daytime.
- Preferably the system should work well under varying weather conditions, such as heat and rain.
- The processing time of a single image should be real-time, that is in under 500 milliseconds.
- The position of the weed should be known in the image.
- The robot should be as less as possible the target of theft.
- The farmer should not have to worry about the system, data acquisition and validation should only take little of the farmer’s time.
Costs and benefits
An important aspect regarding the user is what advantages the weeding robot will bring. The most obvious possible advantage is profit. Therefore, an estimation of the costs of the weeding robot per year was drawn up. In this estimation, the following aspects were considered:
- Production costs
- Research costs
- Costs of the production materials
- Packaging costs
- Maintenance costs
- The employee costs of the training required to work with the robot, where both the costs for the training givers (CSSF) and the receivers (the farmers) are considered
- The costs of the training itself
- Error costs of the machine: the costs made by the robot by accidentally destroying crops
- Internet costs required for the robot to function
- Energy costs of the robot
The actual costs may be quite different from the costs that were calculated in this estimation, since the production of the robot has not started yet, the design is not finished, and the research is still quite in an earlier phase. However, the estimation can still prove useful to gain a general idea of the costs. Furthermore, it should be noted that several kinds of costs were not taken into account yet, such as employee cars, office rent costs, remainder employee costs, production space rent costs, and possibly other indirect costs. This is with a reason: the costs of the robot will probably lower after the first production phase, which will be in proportion to these additional costs. This consideration was consulted by CSSF, which have spent more attention on this already. In addition, adding these costs to the estimation would make the estimation a lot more inaccurate probably, since these costs are very hard to estimate in such an earlier stadium.
Of course, it is impossible to know what the costs of the robot would mean for the farmer if it cannot be compared. Therefore, the costs that traditional weeding imposes per year were calculated as well. For this calculation, the following aspects were considered:
- Employee costs of the farmer to weed
- Employee costs of additional workers to weed
- Machines required for weeding
- Additional tools required for weeding
- Costs of mechanical damage to wanted plants
From the comparison between the costs of the weeding robot and the costs of traditional weeding it turned out that the weeding robot will not be profitable to the farmer. This does not necessarily mean that the weeding robot will not be beneficial to the farmer though. The costs of the weeding robot per year were calculated to be around €250.000, as opposed to €50.000,- per year for traditional weeding. This calculation was based on the farm of John Heesakkers, with the estimation of 5 required robots for a farm of that size. A more detailed calculation can be found in the appendix. This would mean there is a 1:5 ratio in costs for traditional weeding versus weeding with the robot. There are still opportunities for the future of the robot though.
As CSSF also mentioned, the costs of the robot will probably decline after the first production phase. Production can be made more efficient, and as happens with all new technologies, costs will decline after a while. Of course, this still means that the robot is very expensive in the beginning phase. This could be solved by the possibility of subsidizing the robot. This would be a plausible possibility, since the robot would support the cause of agroforestry, which is better for agriculture, as explained before. It is known to many people that change needs to happen in the field of agriculture with the high need for food with little space, especially in the Netherlands. Agroforestry seems promising for these issues, and thus it would be likely to be able to receive subsidies for a robot which would enable farmers to convert to agroforestry.
In addition, a large part of the difference between the costs of the robot and the costs of traditional weeding are caused by the current error rate of the weeding robot. Using the performance of the current system, the percentage of wanted plants that will be damaged by the robot is estimated to be at least 2%. Since John Heesakkers indicated that the mechanical damage of traditional weeding is very limited, the damage percentage of traditional weeding is estimated at 0.5%. This provides opportunities for cost reduction of the robot. If the performance of the weed recognition system would be further improved such that the costs of damage to plants would be equal or even lower to those in traditional weeding, the weeding robot could become profitable. Ways to improve the system are further elaborated below in the chapter on “Further research and developments”.
From this, it can be concluded that while the robot may not necessarily be profitable for farmers at first compared to traditional weeding, there are still opportunities. In the first years, the difference in the costs and profits can be overcome using subsidies. In that period, attention should be given to a business plan which enables reduction of costs of the robots, due to new developments in technology, reduction in material costs, and good marketing such that R&D costs can be divided over a larger number of products.
There are also other motivations that could play a role for farmers in the purchase of a weeding robot. While profit seems most obvious, it is not necessarily the main motive. As CSSF explained, for many farmers it is acceptable if the robot costs more than traditional weeding, if that means they can contribute in this way to the development. There are other motives for purchasing the weeding robot. Farmer John Heesakkers explained his scenario: “Weeding by hand is not pleasurable work to do. Also, I have to search for workers in the months that the weeds grow really fast, because I cannot keep up with the work on my own anymore.”.
Agroforestry is a relatively new form of farming. As explained earlier, it might prove very effective in different ways: it is better for the ground since the issue of soil depletion is reduced, the harvest is less vulnerable to plagues and diseases, and it improves biodiversity. However, to make this way of agriculture possible, new developments are required. Weeding becomes more difficult, since multiple kinds of crops are growing close to each other. Where it used to be enough to distinguish the only crop on the soil from weeds, it now takes more knowledge to be able to do weeding since multiple kinds of plants need to be distinguished. In addition, it is more difficult to weed between the rows of planted crops, since the rows are not necessarily straight, and sometimes even overlapping. Farmer John explained: "In the months of spring, when the weeds grow really fast, it would take two people weeding full-time to keep up with the weeds". Weeding is quite heavy work, and even requires quite some knowledge of plants in agroforestry. It is difficult to find people capable and willing to do this work.
The weeding robot can take tiresome work off the hands of the farmer, and provide the ability for the farmer to spend time on other things. In addition, the robot provides stability: it will no longer form a problem if more weeds start growing quicker. The farmer does not need to concern himself with the need to hire extra employees, and to instruct and supervise them.
Relating results to user needs
The end result of the project is a computer vision tool with a certain accuracy and confusion matrix. From the confusion matrix follow the number of true positives, false positives, true negatives and false negatives. To determine whether these values are acceptable, the user’s needs are of importance. The project is focused on two users, CSSF and John Heesakkers. The needs of the two users overlap but are not the same. Therefore both needs are discussed separately.
For John Heesakkers running his farm efficiently is important. Ideally, the weeding robot removes all weeds and does not damage the crops. This means a network with an accuracy of 100% and zero false positives or negatives. Since the model will have false positives and negatives, it is important to define the boundaries of these numbers. These boundaries are indicated by Mr. Heesakkers. The robot has to remove 80% of all the weeds in one try. He equals the maximum of damaged crops to 2-3% of all crops. Both percentages are determined intuitively. The percentages indicate that the priority of not damaging crops lies higher than removing weeds. It is more important to reduce the number of false positives than to increase the accuracy. This follows from the preferences of the user.
Training the model with two categories means creating a network that makes a distinction between weeds and non-weeds. All the data of the crops is merged into one category and the same goes for the data of the weeds. The model has to give a positive result when it detects a weed and a negative result when it detects a non-weed. The number of false positives gives an indication of the percentage of crops that will be damaged by the robot. The percentage of damaged crops has to be lower than 3% in total. The percentage of false positives has to be even lower than 3%. The robot will pass the crops multiple times while weeding. If the robot damages 3% of the crops each time, the total percentage will be much higher. For example, damaging 0.5% of the crops each time the robot passes all plants means 3% will be damaged after six times. Weeds can grow from February through October. Weeding every other week means the robot will pass the crops 18 times. In order to only damage 3% in total, 0.17% can be damaged each time the robot passes all plants. The percentage will be even lower if errors in other parts of the robot are taken into account. The accuracy gives an indication of the weeds that are not removed. The robot has to remove 80% of all weeds in one try. The percentage of the accuracy has to be significantly higher than 80%. Errors in other parts of the robot is the reason why. Besides detecting the weed, the position has to be determined and the action of actual removing the weed has to take place. Each step has a certain error, which can cause the weed not to be removed or accidently damaging a crop. Since these steps are not worked out yet, an assumption has to be made. The weeding robot of Raja et al. (2020) had a gap of 15% between software accuracy and actual accuracy. Using this as a guideline, the accuracy has to be at least 95%.
Training the model with eleven categories means creating a network that recognizes types of plants. It can recognize the different types crops and weeds. Even though the farmer is only interested in the distinction between weeds and non-weeds, working with multiple categories is useful. It shows the capabilities of the network. Also, the software is useful for a different farmer who wants to keep certain weeds. From the network the ratio of false positives for non-weeds has to be extracted. This ratio is used to give an indication of the percentage damaged crops. The accuracy of this network can be determined in two different ways. The overall accuracy of all the categories says something about the ability to recognize a plant species. The accuracy of only the weed plant species indicates the ability to recognize weeds. It is difficult to determine the exact accuracy of detecting weeds with only the overall accuracy. The degree of influence of the non-weeds is unknown. However, the overall accuracy is related to the accuracy of weed detection and therefore has to be above than 95%.
CSSF does not indicate desired percentages like Mr. Heesakkers. They are interested in proof of concept. Instead of reaching a specific accuracy or percentage of false positives, gaining technical insights about the network is important to them. Their goal is to create an autonomous weeding and harvesting robot with the help of multiple projects of different scales. This robot should work with a network which has a high accuracy, close to no false positives and gives negative as result in case of doubt. This project is a step towards the goal of CSSF. If the results do not reach the desired goal of Mr. Heesakkers, the project is still useful to CSSF. They have gained insights on what to do and not to do during the following projects. The final report including proof of concept and steps that have to be taken to create the final network is the result that meets the needs of CSSF.
Approach and Milestones
The main challenge is the ability to distinguish undesired (weeds) and desired (crops) plants. Previous attempts (Su et al., 2020)(Raja et al., 2020) have utilised chemicals to mark plants as a measurable classification method, and other attempts only try to distinguish a single type of crop. In sustainable farming based on biodiversity, a large variety of crops are grown at the same time, meaning that it is extremely important for automatic weed detection software to be able to recognise many different crops as well. To achieve this, the first main objective is collecting data, and determines which plants can be recognised. The data should be colour images of many species of plants, of an as high as possible quality, meaning that it should be of high resolution, in focus and with good lighting. Species that do not have enough images will be removed. Next, using the gathered data, the next main objective will be training and testing neural networks with varying architectures. The architectures can range from very simple networks with one hidden layer to using pre-existing networks, such as ResNet (He et al., 2015) trained on datasets such as ImageNet (Russakovsky et al., 2015). Then, weeds will be defined as a species of plant that is not desired, or not recognised. Based on this, the final objective will be testing the best neural network(s) using new images from a farm, to see its accuracy in a real environment.
To summarize:
- Images of plants will be collected for training.
- Neural networks will be trained to recognise plants and weeds.
- The best neural networks will be tested in real situations.
Deliverables
The main deliverable will be a neural network that is trained to distinguish desired plants and undesired plants on a diverse farm, that is as accurate as possible, and can recognise as many different species as possible. The performance of this neural network, as well as the explored architectures and encountered problems will be described in this wiki, which is the second part of the deliverables.
Planning
End of week 1:
Milestone | Responsible | Done |
---|---|---|
Form a group | Everyone | Yes |
Choose a subject | Everyone | Yes |
Make a plan | Everyone | Yes |
End of week 2:
Milestone | Responsible | Done |
---|---|---|
Improve user section | Hilde | Yes |
Specify requirements | Tom | Yes |
Make an informed choice for model | Leighton | Yes |
Read up on (Python) neural networks | Everyone | Yes |
End of week 3:
Milestone | Responsible | Done |
---|---|---|
Set up a collaborative development environment | Timon | Yes |
Have a training dataset | Karla | Yes |
End of week 4:
Milestone | Responsible | Done |
---|---|---|
Implement basic neural network structure | Timon, Leighton | Yes |
Justify all design choices on the wiki | Tom | Yes |
End of week 5:
Milestone | Responsible | Done |
---|---|---|
Implement a working neural network | Timon, Leighton | Yes |
End of week 6:
Milestone | Responsible | Done |
---|---|---|
Explain our process of tweaking the hyperparameters | Timon | Yes |
End of week 7:
Milestone | Responsible | Done |
---|---|---|
Finish tweaking the hyperparameters and collect results | Timon, Leighton, Tom | Yes |
Finding the costs and benefits of the weeding robot | Hilde | Yes |
End of week 8:
Milestone | Responsible | Done |
---|---|---|
Create the final presentation | Everyone | Yes |
Hand in peer review | Everyone | Yes |
Translating the results and costs to the user needs | Hilde and Karla | Yes |
Determining future research | Hilde and Karla | Yes |
Writing prototype section | Leighton, Timon, Tom | Yes |
Week 9:
Milestone | Responsible | Done |
---|---|---|
Do the final presentation | Everyone | Yes |
State of the art
This section contains the results of many researches done on the subject of the project. Following are the main conclusions drawn from the literature research. In most existing cases, the camera observes the plants from above. This will be difficult when there are also trees. Three-dimensional images could be a solution. Secondly, lighting has a big influence on the functioning of the weed recognition software. This has to be taken into account when working on the project. A solution could be turning the images into binary black- and white pictures. Also, there are already many neural networks that can make the distinction between weeds and crops. It is also used in practice. However, all of the applications are used in monoculture agriculture. The challenge of agroforestry is the combination of multiple crops. Another conclusion is that the resolution of the camera has to be high enough. This has a large impact on the accuracy of the system. In most cases an RGB camera is used, since a hyperspectral camera is very expensive. RGB images are also sufficient enough to work with. A conclusion can be drawn about datasets. Most researches mention the problem of obtaining a sufficient dataset to use for training the neural network. This slows down the process of improving weed recognition software. At last, recognition can be based on color, shape, texture, feature extraction or 3D image. There are many options to choose from for this project.
A weed is a plant that is unwanted at the place where it grows. This is a rather broad definition, though, and therefore Perrins et al. (1992) looked into what plants are regarded as weeds among 56 scientists. Again, it was discovered that views greatly differed among the scientists. Therefore it is not possible to clearly classify plants into weeds or non-weeds, since it depends on the views of a person, and the context of the plant.
Hemming et al. (2013) and Hemming et al. (2018) have written research reports about a working system using weed detection. The first research works with three crops: onions, carrots and spinach. The research shows that recognition based on color requires less computational force than recognition based on shape. It is however in certain cases necessary to use shape recognition. It is important that the signal of the crop predominates compared to the weeds. For proper detection of an object, a minimum image resolution of 3 times the size of the object is required (based on Shannon's sampling theorem). The second research works with color recognition. HSI color dimension is used to convert the color observed by the camera into usable input for the software. The robot has a user interface so the user can help the neural network to learn the color of the plant. The user can determine the range of colors in which the plant’s colors are. This way the software becomes broadly applicable. Two interactive color segmentations are evaluated: the GrabCut algorithm and the FloodFill algorithm. The two algorithms fail due to the effect of shadows and multiple colors on a plant. The research shows that the settings of saturation and intensity are very important. To realize a working system, hardware is introduced and software is added to determine the relative positioning of the plant. In both researches, the camera observed from above the plant.
Potato blackleg is a bacterial disease that can occur in potato plants that causes decay of the plant, and may spread to neighbouring plants if the diseased plant is not taken away. So far, only systems have been devised that were able to detect the disease after harvesting the plants. Afonso et al. (2019) created a system that had a 95% precision rate in detection of healthy and diseased potato plants. This system consisted of a deep learning algorithm, which used a neural network trained by a dataset of 532 labelled images. There is a downside to the system, however, since it was devised, and trained, to detect plants that were separate and do not overlap. In most scenarios, this is not the case. Further developments need to be made to be able to use the system in all scenarios. In addition, it proved to be difficult to gain enough labelled images of the plants.
Most weed recognition and detection systems designed up to now are specifically designed for a sole purpose or context. Plants are generally considered weeds when they either compete with the crops or are harmful to livestock. Weeds are traditionally mostly battled using pesticides, but this diminishes the quality of the crops. The Broad-leaved dock weed plant is one of the most common grassland weeds, and Kounalakis et al. (2018) aim to create a general weed recognition system for this weed. The system designed relied on images and feature extraction, instead of the classical choice for neural networks. It had a 89% accuracy.
Salman et al. (2017) researched a method to classify plants based on 15 features of their leaves. This yielded a 85% accuracy for classification of 22 species with a training dataset of 660 images. The algorithm was based on feature extraction, with the help of the Canny Edge Detector and SVM Classifier.
Li et al. (2020) have compared multiple convolutional neural networks for recognizing crop pests. The used dataset consisted of 5629 images and was manually collected. They found that GoogLeNet outperformed VGG-16, VGG-19, ResNet50 and ResNet152 in terms of accuracy, robustness and model complexity. As input RGB images were used and in the future infrared images are also an option.
Riehle et al. (2020) give a novel algorithm that can be used for plant/background segmentation in RGB images, which is a key component in digital image analysis dealing with plants. The algorithm has shown to work in spite of over- or underexposure of the camera, as well as with varying colours of the crops and background. The algorithm is index-based, and has shown to be more accurate and robust than other index-based approaches. The algorithm has an accuracy of 97.4% and was tested with 200 images.
Dos Santos Ferreira et al. (2017) created data by taking pictures with a drone at a height of 4 meters above ground level. The approach used convolutional neural networks. The results achieved high accuracy in discriminating different types of weeds. In comparison to traditional neural networks and support vector machines deep learning has the key factor that features extraction is automatically learned from raw data. Thus it requires little by hand effort. Convolutional neural networks have been proven to be successful in image recognition. For image segmentation the simple linear iterative clustering algorithm (SLIC) is used, which is based upon the k-means centroid based clustering algorithm. The goal was to separate the image into segments that contain multiple leaves of soy or weeds. Important is that the pictures have a high resolution of 4000 by 3000 pixels. Segmentation was significantly influenced by lighting conditions. The convolutional neural network consists of 8 layers, 5 convolutional layers and 3 fully connected layers. The last layer uses SoftMax to produce the probability distribution. ReLU was used for the output of the fully connected layers and the convolutional layers. The classification of the segments was done with high robustness and had superior results to other approaches such as random forests and support vector machines. If a threshold of 0.98 is set to than 96.3% of the images are classified correctly and none received incorrect identification.
Yu et al. (2019) argued that the deep convolutional neural networks (DCNN) takes much time in training (hours), and little time in classification (under a second). The researchers compared different existing DCNN for weed detection in perennial ryegrass and detection between different weeds too. Due to the recency of the paper and the comparison across different approaches it is a good estimation of the current state of the art. The best results seem to be > 0.98. It also shows weed detection in perennial ryegrass, so not perfectly aligned crops. However, only the distinction between the ryegrass or weeds is made. For robotics applications in agroforestry, different plants should be discriminated from different weeds.
Wu et al. (2020) try to improve the functioning of vision-based weed control, and do this by taking a slower approach to visual processing and decision-making. Multiple overhead cameras are used, which are not suited for all types of crops. However, 3D vision is used, so the camera position might be modifiable. A note that should be added is that the test were done using sugar beets which are easy to recognize on camera.
Piron et al. (2011) suggest that there are two different types of problems. First a problem that is characterized by detection of weeds between rows or more generally structurally placed crops. The second problem is characterized by random positions. Computer vision has led to successful discrimination between weeds and rows of crops. Knowing where, and in which patterns, crops are expected to grow and assuming everything outside that region is a weed has proven to be successful. This study has shown that plant height is a discriminating factor between crop and weed at early growth stages since the speed of growth of these plants differ. An approach with three-dimensional images is used to facilitate this. The classification is by far not robust enough, but the study shows that plant height is a key feature. The researchers also suggest that camera position and ground irregularities influences classification accuracy negatively.
Weeds hold particular features among: fast growth rate, greater growth increment and competition for resources such as water, fertilizer and space. These features are harmful for crops growth. Lots of line detection algorithms use Hough transformations and the perspective method. The robustness of Hough transformations is high. The problem with the perspective method is that it cannot accurately calculate the position of the lines for the crops on the sides of an image. Tang et al. (2016) propose to combine the vertical projection method and linear scanning method to reduce the shortcomings of other approaches. It is roughly based upon transforming the pictures into binary black- and white pictures to control for different illumination conditions and then drawing a line between the bottom and top of the image such that the amount of white pixels is maximized. In contrast to other methods, this method is real-time and its accuracy is relatively high.
Gašparović et al. (2020) discuss the use of unmanned aerial vehicles (UAV) to acquire spatial data which can be used to locate weeds. In this paper four classification algorithms are tested, based on the random forest machine learning algorithm. The automatic object-based classification method achieved the highest classification accuracy. Belgiu et al. (2016) have shown that the random forest machine learning algorithm is the best algorithm for the automation of classification as it requires very little parameters. Random forest algorithms were proposed by Breiman (2001).
Espejo-Garcia et al. (2020) deal with weed classification through transfer learning, where pre-trained convolutional neural networks (Xception, Inception-Resnet, VGNets, Mobilenet and Densenet) are combined with more "traditional" machine learning methods for classification (Support Vector Machines, XGBoost and Logistic Regression), in order to avoid overfitting and providing consistent and robust performance. This provides some impressively accurate classification algorithms, with the most accurate being a combination of fine-tuned Densenet and Support Vector Machine.
Different approaches might exist: machine vision methods and spectroscopic methods (utilizing spectral reflectance or absorbance patterns). With spectroscopic methods features such as water content, moisture or humidity can be measured. Field studies have shown that weeds and agricultural crops can be distinguished based on their relative spectral reflectance characteristics. Alchanatis et al. (2005) propose an image processing algorithm based on image texture to discriminate weeds from cotton. They used images hyperspectral images to perform basic segmentation between crop and soil. The authors used a robust statistics algorithm yielding an average false alarm rate of 15%. This is worse than newer existing options.
Booij et al. (2020) researched autonomous robots that can combat against unwanted potato plants. Previous robots could not distinguish between the potato and beetroot plants. Using deep learning, this has now succeeded with a 96% success rate. A robot was developed which drives on the land and makes pictures, which are sent to a KPN-cloud through 5G. The pictures are then analyzed by the deep learning algorithm, and the result is sent back to the robot. This deep learning algorithm was constructed with a dataset of about 5500 labelled pictures of potato and sugar beet plants to train the system. Next, the robot combats the plants that have been detected as the unwanted potato plants using a spraying unit, which is instructed by the system. This development is already a big step forward, but the fault rate is still too large for the system to be put into practice.
Raja et al. (2020) created a vision and control system that was able to remove most weeds from an area, without explained visual features of crops and weeds. It achieved a crop detection accuracy of 97.8%, and was able to remove 83% of weeds around plants. This seemed to be in a very controlled setting, however, and still works with mainly simple farms.
Su et al. (2020) and Raja et al. (2020) investigated the use of specific signaling compounds to mark desired plants, so that weeds can be removed. This created a way of marking the plants with a "machine-readable signal". This could thus be used for automatic classification of plants. According to one of the studies, an accuracy of at least 98% was achieved for detecting weeds and crops. Further work still needs to be done to get this method practically functioning.
Herck et al. (2020) describe how they modified farm environment and design to best suit a robot harvester. This took into account what kind of harvesting is possible for a robot, and what is possible for different crops, and then tried to determine how the robot could best do its job.
Design
This section elaborates on the design, which consists of multiple parts. The goal is to create detection software for a robot that is capable of weeding, thus some more information on the robot is needed to determine the design. Moreover, this section will elaborate on the type of solution that will be implemented in the software.
Robot
The software will depend on the robot design, therefore this section briefly elaborates on such a design. To keep costs as low as possible it is wise to create robots that can do multiple tasks, thus not only weeding. It will be assumed that such a general-purpose robot possesses a RGB-camera, to take pictures and/or videos. It is assumed that the robot will take the form of a land-vehicle instead of an unmanned aerial-vehicle (UAV). This decision is made because for the specific context an UAV would not be appropriate. In a latter stage of the farm, the trees would pose serious restrictions to the flying path. The trees will form obstacles, which the drone will have to avoid. Moreover weeds or bushes grow on the ground, so it could be that the tree blocks the line of sight to such weeds. It could be that the UAV therefore has to constantly adapt the flying height, which would yield inconsistency in the gathered data and presumably negatively affect the classification accuracy. Because of these reasons it has been decided to focus on a land-vehicle. This distinction is important as it influences the type of data that is gathered, and thus the type of data the software should be designed for. Therefore, the data gathered consists of pictures from the side, slightly above the ground. Moreover, such a robot would need a particular speed to cover the farm by itself. Weeds grow and within 2 or 3 days they are clearly visible and easily removable. Because the weeding task will be only one of the tasks of such a robot it will be important that the classification can be done quickly. It is clear that it should be able to cover the farm in under 2 days. Another important factor which influences the available time is lighting. For now it is assumed that it can only work when there is natural light, so created by the sun. Thus it can only work with daylight, for which the duration in various parts of the world might differ according to the time of year. All these factors combined argue for the need of quick identification.
To adhere to the requirement that the robot should be as less as possible the target of theft it should be able to be kept away when it is not working. In addition, the value of the robot should be minimized whenever possible. Hardware necessary to do image processing will be rather expensive, therefore it is more convenient to process the images off the vehicle, for example by cloud computation. Moreover this would also minimize the maintenance and power needs of the robot. On the other hand it does need a stable and fast Internet connection, with the arrival of 5G this should be possible.
Software
For image classification multiple approaches exist. This section elaborates on the choice that is made for a specific approach to be implemented. From above section it is clear that the implementation should have relatively quick identification times and for which type of data it should work. Moreover, as stated in the requirements section, the goal is to get an as high as possible classification accuracy. There are different approaches to such a computer vision task. Particularly aimed at plant or weed recognition the approaches include: support vector machines (SVM), random forests, adaboost, neural networks (NN), convolutional neural networks (CNN) and convolutional neural networks using transfer learning.
Worth noting is that not all of these classifiers were trained using the actual input image. Some researchers choose to first segment the image in different regions and feeding those segments for classification. Dos Santos Ferreira et al. (2017) used SLIC for segmentation of images, which is based upon the k-means algorithm. More recently, Riehle et al. (2020) were able to distinguish plants from the background with 98% accuracy using segmentation. The importance of segmentation is that by using it the position of the weed can be derived, which is of course crucial if the weed has to be removed.
Kounalakis et al. (2018) achieved 89% classification accuracy with the SVM approach to recognize weeds and Salman et al. (2017) achieved an 85% accuracy using the same approach for leaf classification and identification. Gašparović et al. (2020) have achieved an 89% accuracy recognizing weeds using the random forests approach. Notable is that the researchers have implemented 4 different algorithms for the random forests approach and that the accuracy result is from the best implementation. Tang et al. (2016) found an accuracy of 89% for an ordinary neural network with the backpropagation algorithm. Li et al. (2020) achieved an accuracy of 98% recognizing crop pests using a CNN. Yu et al. (2019) found an accuracy larger than 98% recognizing weeds in perennial ryegrass using a CNN. Espejo-Garcia et al. (2020) used a CNN with transfer learning and evaluated different models. Taking the best model (with a SVM for transfer learning) they achieved a 99% accuracy. Comparing these numbers it is clear that the CNN generally achieves the best result. However, it must be taken into account that these classifiers have all been trained on different datasets and therefore comparing these numbers cannot fully argue for which approach is actually the best.
Dos Santos Ferreira et al. (2017) tried to compare their CNN to a SVM, adaboost and a random forest. The CNN outperformed the other approaches in terms of classification accuracy. Since all approaches were tested on the same dataset we can argue that CNN’s seem most appropriate to achieve a high classification accuracy. Now in this particular context false positives weigh more heavily than false negatives in weed identification, because the false negatives could be solved if the robot goes by the same plant more than once. However, removing a crop due to falsely identifying it as a weed could have larger negative effects if the robot passes the crops relatively frequently. Dos Santos Ferreira et al. (2017) also found an important property of their CNN. When setting a threshold in determining classification they were able to achieve an 96.3% accuracy, with no false positives. The researchers also noted that using deep neural networks removes the tedious task of feature extraction, because the features are automatically learned from the raw data. This might enlarge the CNN’s generalizability. To further argue for the use of a convolutional neural network two other factors should be evaluated, namely; time taken for classification and it ability to use this approach for land-vehicles. Yu et al. (2019) state that these deep convolutional networks (DCNN) take much time in training (hours), whereas classification is done in little time (under a second). Booij et al. (2020) made a driving robot that had an identification with 96% accuracy and it could drive up to 4 km/h. Notable is that the researchers were able to use 5G and cloud computing, which might be crucial for real-time identification. Moreover, Raja et al. (2020) have made a weeding robot with a crop detection accuracy of 97.8%. The land-vehicle was able to move up to a speed of 3.2 km/h. However, there is still quite a gap between the detection accuracy and the 83% of weeds removed in the controlled setting where it was tested. However these researches confirm the possibility of a land-vehicle. Lastly, implicitly it is proven that a CNN is suitable for agriculture. This implicit prove is done by noting that the researches named above all focus on agriculture. But also explicitly it is argued that CNNs have proven to deliver good results in precision agriculture for identifying plants (Espejo-Garcia et al., 2020).
Creating the dataset
One of the main obstacles of creating a functioning network is the dataset which is used for training. The dataset has to consist of many pictures in order to reach a high accuracy. The dataset has to represent the system in which the robot will be operating. This means the plants on the pictures have to look the same as the plants on the farm. Obtaining these pictures of the specific plants on a farm online is not easy. Most datasets are owned by companies and not shared. The solution to this problem would be obtaining the pictures ourselves. This will be done in cooperation with the farmer.
The dataset which is used in this project has three downsides compared to the dataset which can be created by taking the pictures ourselves. The first downside is the number of pictures. This number is not as large as desired. Furthermore, the pictures in the current dataset do not represent the farm as well as pictures taken at the farm. The current dataset is formed with pictures found online. It is possible to create a working network without using photos from the operating environment. However, using pictures from the operating environment is preferred. The final downside is the distribution in numbers of pictures per category. The current dataset contains significantly more pictures of weeds than non-weeds. When creating the dataset, it is important to take the mentioned distribution into account.
It has multiple benefits to involve the farmer in the process of creating the dataset. A condition to the benefits is that clear instructions and explanations are given to the farmer. The farmer knows which plants grow on his farm. He can tell which plants are unwanted weeds. There is no external person needed to identify the plants. There are other aspects in which the farmer can specify the network according to his wishes. The acceptable damage to the crops can be determined. A tradeoff has to be made between not damaging the crops and weeding all the undesired plants. Each farmer can decide to what extent one is more important than the other. Another benefit is that the farmer is introduced to the workings of the robot without the robot weeding plants. The robot will probably be new to the farmer. The farmer might have worries whether the robot will indeed only weed the unwanted plants. By starting with data obtaining, trust in the robot’s performance can be build.
To create the dataset, the robot has to be able to ride around the crops and take pictures. In the beginning the robot will not be able to recognize weeds or crops. It will simply take pictures of the plants. A benefit of using pictures taken by the robot is that the angle in which the photos are taken will be the same during training and operating. The photos can be uploaded to a cloud storage immediately if this is possible. This would make processing the photos while the robot is taking them possible. The robot will probably have a wireless connection, but it is not certain whether this connection is strong enough to upload many pictures. If uploading to a cloud is not possible, the robot would have to store the pictures. The pictures can then be processed after the robot has taken pictures of the plants. Uploading the pictures to a cloud storage would have as benefit that the pictures can be processed anywhere. Employees whose task it is to process the pictures can work from home. A downside is that working with cloud storage does call for extra data security. Uploading the pictures after the robot has stored them internally is also possible. This would have to be done by the farmer or automatically. The latter is preferred.
Which sizes of the weeds are important to record in the dataset depends on how much the robot will be used by the farmer. If the robot were to be used every day, larger size weeds are not important since the weeds do not have the time to grow that large. The accuracy of the network is also of influence. With a lower accuracy, the chance of not removing all the weeds in one go is larger. Weeds that are not removed have more time to grow larger in size. An assumption has to be made of the maximum time a weed can have to grow. It is assumed that the robot will weed the plants every week and a weed will be removed in a maximum of three attempts. This means the sizes the weeds can have during three weeks are important to include in the dataset. Creating the dataset will therefore take up roughly three weeks. These weeks should fall in a time period in which the weeds will grow well. The weather will play a part in this. Some more days should be added to the weeks to include larger sizes in the dataset. This is for weeks in which the weeds grow faster than normally. This means the farmer will have to grow the weeds for about 3. in order to create the dataset. It is not necessary to take pictures each day during this period of time.
The number of pictures that will be taken in total depends on the network, number of plant species and the time to process a picture. If only the network was of influence, as many pictures as possible would be made. For deep learning goes that more data is merrier. However, processing the pictures will be done manually so the number of pictures has to be limited. The minimum number of pictures the network needs, depends on the desired accuracy of the network. A high accuracy calls for a lot of data. The number of plant species is also of influence. More species means more pictures. Literature study showed thousands of pictures are necessary. A precise number is difficult to indicate for now. If a thousand pictures were made each day for five days, this would give five thousand pictures. With a helpful interface to work with, the average time spend on processing one picture is assumed to be 15 seconds. This means processing five thousand pictures takes up 21 hours. Assuming an hourly wage of 12 euros, processing the data will cost 252 euros. This amount will increase when the number of pictures increases and decrease when the processing interface is made easier and quicker. A tool to increase the dataset after processing is image augmentation. This tool is relatively fast and cheap.
In order to process the pictures efficiently, a clear plan of approach has to be made. This plan will make it possible for multiple employees of many levels to process data. With the plan of approach, these employees will not have to know the specifics of the farm. It will also ensure that the data is processed correctly and can be used to train the network without further adaptations. Since the robot will be used by multiple farms with different plant species, the process of creating a dataset will have to take place multiple times. It will be easier to have new employees working on data processing for a farm with a clear plan of approach. The plan has to contain descriptions of the wanted and unwanted plans. This will be specific for each farm. The farmer can help with these descriptions. The plan has to indicate how to make the data ready for training. This means deleting pictures with no plants at all. Pictures with one plant species on them have to be sorted. Pictures of multiple plant species have to be divided in a way. Whether this will be done by dividing into multiple pictures or divisions in one pictures is yet to decide. The divided parts have to be sorted. As mentioned before, a helpful interface to process the pictures will speed up the work. The plan of approach should go hand in hand with the processing software.
After the data is processed on the cloud storage, the dataset is finished and accessible for the employees who will train the network. After the first time of executing this process of creating a database, the results might be insufficient. In that case, the process has to be altered. This has to be taken into account when starting with this process. After the start up phase, the process will be fine tuned and can be performed as desired at the following farms. The datasets and trained networks that are made can be reused, if the plant species are the same. If the other farm has more or different plant species, more data has to be collected. Creating a dataset at multiple farms will create a large dataset of many plant species. This could lead to a phase in which creating a dataset will not be necessary anymore.
Prototype
Some parts of the design were chosen based on early results of the experiments done to make this software. In this section we will explain how these choices were made. The architecture of a model is very dependent on the task, and each variation has its own benefits and downsides. One major choice was between the use of transfer learning or a model made from scratch. Transfer learning models can converge more quickly and with less data, and have a higher performance if used for the right task, but tend to be larger and more complicated. Here a “right task” is characterized by high similarity between the data the network is pre-trained on and the target data (the data you would like to predict). Simple models can take more time and data to converge, but are smaller and more flexible. Based on some early performance experiments, a simple model performed similarly, if not better than the transfer learning models. Deep learning requires a lot of data to train a model correctly (the ImageNet database consists of over one million pictures). The amount of data sufficient for training depends on the type of data and model, but generally at least a thousand images are required for computer vision tasks. Quality images are often hard to find or owned by private companies, which limits the available data significantly. Because images of weeds etc. were limited in this manner, data augmentation was applied to prevent overfitting of the neural network by increasing the amount of different images.
Data
As mentioned previously, the dataset is not balanced as can be seen in Figure 2. As can be seen from the two bar plots is that first of all, there is a huge gap between the amount of data for weeds and non-weeds, in total there are 298 images. Approximately 92% of the images are of weeds. Moreover, in te bar plot displaying the distribution of the images over classes it is noticeable that there is an unbalanced amount of canada thistle images (approximately 29%), whereas there is a clear lack of trees and shrubbery. All other data amounts seem relatively equal. The variability in size is also rather extensive, the maximal height is 4128 pixels, the maximal width is 4272 pixels, the minimal height is 84 pixels and the minimal width is 29 pixels. Figure 3 depicts the average height and width for the image classes. As one can see there is great variability in sizes, thus there is a need to resize all the data to one size (apart from implementation). Note how some image classes generally have a higher width than height, thus if resizing this data to the state of the art standard (a square) these images will turn out a bit squeezed horizontally.
Data pre-processing
First of all due to the lack of data, augmentation techniques were applied. The data augmentations were selected in such a way that the end result is realistic: for example, left-right flipping was applied because plants are somewhat vertically symmetric, but up-down flipping was not applied as the vertical orientation is specific due to gravity. The augmentations are: left-right flipping, increased saturation, increased brightness, decreased brightness, blurring (to simulate out-of-focus plants) and center-cropping. This yields in total 6 augmentations and thus the dataset size is increased by a factor of 7. Apart from preprocessing the images using data augmentation another technique was applied, namely resizing. The images were resized in such a fashion that they have become square which is easier to handle for the models. For transfer learning networks this type of input is actually requested, with an upperbound on the size of 224 by 224 pixels. Since most of the data was larger than this size, the images were effectively downsized. Generally, larger images take longer and need better hardware, but can provide better results.
Transfer Learning Models
Python with tensorflow (including keras) 2.0 or higher is used to create models. The defaults for these models are chosen by current state of the art standards. From there on different models were created, including one network from scratch (ScratchNet) and three pre-trained models: MobilenetV2, DenseNet201 and InceptionResNetV2. These pre-trained models have been trained on the ImageNet database. The characteristics of these models are shown in Table 1. As can be seen in the table different networks are used with varying complexity. For the ImageNet data, it seems to be the more complex the network is, the better it performs.
Table 1: Characteristics of pre-trained models on the ImageNet validation dataset.
Model | Top-5 accuracy | Number of parameters | Topological Depth |
---|---|---|---|
MobileNetV2 | 0.901 | 3.538.984 | 88 |
DenseNet201 | 0.936 | 20.242.984 | 201 |
InceptionResNetV2 | 0.953 | 55.873.736 | 572 |
The defaults for the transfer learning networks; optimizer: Adam; number of hidden layers : 1; pooling method: globalAverage2DPooling; loss function: categorical cross entropy; data : augmented data; image size: 224 by 224 pixels; initialization weights : ImageNet; class weights: 1; batch size: 8 images; maximal number of epochs: 10; hidden layer’s activation function: rectified linear unit (ReLU); output layer’s activation function: SoftMax (creates a probability density distribution). Moreover some hyperparameters are optimized with the Keras hyperband tuner. This tuner evaluates possible combinations and eventually takes the most promising combination of hyperparameters. Concerning the transfer learning networks the following hyperparameters have been tuned: the number of neurons in the hidden layer (between 32 and 512 in steps of 32) and the learning rate (0.001, 0.0001, 0.00001, 0.000001). The transfer learning networks were adopted and the top (classifier) was replaced where the training of the base (convolutional layers) was disabled. The output layer consisted of the number of classes (either 11 or 2) nodes.
ScratchNet
ScratchNet is a relatively simple convolutional neural network. The optimizer and pooling was handled the same as with the transfer learning models: Adam and global average 2D pooling respectively. This model used a smaller input image resolution of 165x165 pixels and a batch size of 32. The resolution was lowered to prevent the machine training the net to run out of memory while tuning the hyperparameters, but in hindsight using a higher resolution and smaller batch size would probably have had better results and would have also resolved the memory issue.
The network is structured as follows: an input layer of 165x165x3, which feeds into a convolutional layer tuned as follows: amount of filters between 8 and 32 with steps of 4 (in this case 28) and a kernel size between 2 and 6 with steps of 1 (in this case 5), with ReLU as activation function. After that a pooling layer, with pooling of X*X, where X is tuned between 1 and 8 with steps of 1 (in this case 7). Then a flattening layer, a dense layer with between 32 and 512 nodes (steps of 32) (in this case 224) and finally the output layer, which contained either 11 nodes (one for each class in our dataset), or two nodes (weeds and non-weeds).
Models With 11 Prediction Classes
In the 11 class case the classes are: ladysthumb, shrubbery, conference trees, burlat trees, purple deathnettle, sheperds purse, saltbush, broad leaf dock, Canada thistle, lambsquarters and chickweed. The different architectures have been applied to this problem with varying characteristics to investigate the following questions:
- Which network with 11 prediction classes performs the best?
- Does a model with an extra hidden layer perform better?
- Is knowledge “transferred” with transfer learning?
- How does assigning class weights impact performance?
- What is the effect of augmenting data?
The results of training different networks for the 11 class case can be found in Table 2. To answer the questions well some terms need to be elaborated on. The false positives ratio is the ratio of non-weeds being classified as weeds (thus between 0 and 1, lower is better. The best performing network is the network with the lowest false positives ratio satisfying that the accuracy is 80% or higher with a classification time within 500 milliseconds. Class weights are used for weighting the loss function, loosely speaking it defines the importance for each class in training. Proportional class weights are class weights that show how much data is available for each class, it is computed by dividing the total amount of images by the product of the number of classes and the amount of images in the class.
Table 2: accuracy, false positive ratio and classification time for different models predicting to 11 classes
Row/Model number | Model | Accuracy | False Positive Ratio | Classification time (ms) | Comment |
---|---|---|---|---|---|
1 | MobileNetV2, defaults | 0.5208 | 1.0 | 12 | |
2 | MobileNetV2, with 2 hidden layers | 0.4499 | 1.0 | 12 | |
3 | MobileNetV2, random weights initialization | 0.2917 | 1.0 | 12 | All predictions are canada thistle, ~29% of total data |
4 | MobileNetV2, with proportional class weights | 0.5333 | 1.0 | 12 | |
5 | MobileNetV2, on raw data | 0.3392 | 1.0 | 11 | |
6 | MobileNetV2, with GlobalMax2DPooling | 0.4368 | 0.9188 | 12 | |
7 | MobileNetV2, on raw data with proportional class weights and GlobalMax2DPooling | 0.2857 | 1.0 | 11 | |
8 | MobileNetV2, with proportional class weights and globalMax2DPooling | 0.2745 | 1.0 | 11 | |
9 | DenseNet201, defaults | 0.5613 | 1.0 | 30 | |
10 | DenseNet201, with AdaDelta | 0.5348 | 1.0 | 29 | |
11 | InceptionResNetV2, defaults | 0.3559 | 1.0 | 35 | predictions mostly on 2 classes |
12 | InceptionResNetV2, with AdaDelta | 0.2966 | 1.0 | 35 | predictions mostly on canada thistle, ~29% of total data |
13 | ScratchNet, default | 0.9651 | 0.0631 | 7 | visible bias towards canada thistle, ~29% of total data |
14 | ScratchNet, with proportional class weights | 0.9676 | 0.0325 | 7 | |
15 | ScratchNet, on raw data | 0.6962 | 0.3836 | 7 | big bias towards canada thistle, ~29% of total data |
16 | ScratchNet, on raw data with proportional class weights | 0.5139 | 0.7153 | 7 |
Question 1, it seems to be that the ScratchNet with proportional class weights has the lowest false positive ratio, with the highest accuracy and the lowest classification time. Thus it is chosen as the best model with 11 prediction classes. The visible bias towards one class signals that it could be possible that with a more balanced dataset the performance might increase. Note that all transfer learning networks have a false positive ratio so high that they would harm crops more than they would remove weeds making these networks useless given this data and training method.
Question 2, compare the first and the second row of Table 2. The network with an additional hidden layer (in total 2) does not perform better, it even has a lower accuracy. Thus no, more hidden layers do not necessarily imply better performance.
Question 3, compare the first and third row of Table 2. It seems to be that the model with random weights, thus not pre-trained, has the same false positive ratio but performs significantly worse since it always predicts only one class yielding a far lower accuracy. Thus yes, it seems to be the case that that some knowledge is transferred, in other words the initialized weights provides a better starting point compared to random weights.
Question 4, compare the following pairs of rows: (1) row 1 and row 4; (2) row 6 and row 8; (3) row 13 and row 14; (4) row 15 and 16. At first sight, only at 2 out of the 4 pairs the models with class weights outperform the models without them. However it seems to be that models with no additional changes from the defaults (apart from the class weights) are consistent in showing better results. Even by comparing pair 3 (row 13 and 14), it seems to be that adding the class weights almost halves the False Positive Ratio while maintaining the accuracy. So to conclude, class weights can improve performance significantly.
Question 5, compare the following pairs of rows: (1) row 1 and row 5; (2) row 7 and row 8; (3) row 13 and row 15; (4) row 14 and row 16. Apart from pair 2 there seems to be far better performance when using the augmented data versus using the raw data. Concerning pair two, there seems to be no significant difference. To conclude, augmenting data seems helpful (given this dataset), to improve performance.
Apart from these questions there are some general remarks to be made about these results: essentially 4 different architectures have been compared: MobileNetV2, DenseNet201, InceptionResNetV2 and ScratchNet. It seems to be that all models perform their classification task relatively quickly. Comparing Table 1 and Table 2, there seems to be a positive correlation between network complexity (number of parameters and topological depth) and classification times, in other words: more complex networks seem to take longer. This is not only true for classification times, but also for training times. It seems to be that MobileNetV2 is almost three times as fast as the other transfer learning networks and there seems to be only a small improvement in performance from InceptionResNetV2 to MobileNetV2. Because of these differences in computational times, most of the questions have been tested with the MobileNetV2 model. Moreover the computational times are mainly dependent on whether the GPU or CPU (this case the GPU) is used for computations and the specific hardware used (Intel i7-7700HQ, NVIDIA Quadro M1200). Lastly, most comments relate towards the imbalanced dataset, suggesting that a more balanced dataset could yield better results.
Models With 2 Prediction Classes
Here all data is divided into only two classes: weeds and non-weeds. We wanted to see if we could get lower false positive ratios when only classifying into two classes while keeping accuracy as high as possible. A similar approach is used as above.
Table 3: accuracy, false positive ratio and classification time for different models predicting to 2 classes
Row/Model number | Model | Accuracy | False Positive Ratio | Classification time (ms) | Comment |
---|---|---|---|---|---|
1 | MobileNetV2, defaults | 0.9240 | 1.0 | 12 | ~0.92 accuracy baseline |
2 | MobileNetV2, equal number of images in each class | 0.5313 | 1.0 | 11 | ~0.5 accuracy baseline |
3 | MobileNetV2, with proportional class weights | 0.9250 | 1.0 | 12 | Results stayed the same for class weights up to non-weeds having 50x more weight |
4 | ScratchNet, defaults | 0.9955 | 0.0653 | 6 | |
5 | ScratchNet, with proportional class weights | 0.9960 | 0.0473 | 7 | |
6 | ScratchNet, on raw data | 0.9263 | 0.7439 | 6 | |
7 | ScratchNet, on raw data with proportional class weights | 0.9358 | 0.6170 | 6 |
In row 2 of the table , a training dataset was used which had an equal amount of images in either class. This suggests that the high accuracy in row 1 is only achieved by classifying every plant as a weed, this is supported by the fact that in both rows the false positive ratio is 1.0 in both rows.
The results above confirm some findings we had for 11 classes:
- ScratchNet achieves the lowest false positive ratios
- Using proportional class weights decreases false positive ratios
- Using the augmented dataset improves results
The best results for 11 and 2 classes (in both cases ScratchNet with proportional class weights) are very similar, with 2 output classes resulting in slightly higher accuracy and a slightly higher false positive ratio, although the latter may be whithin margin of error.
Thresholds
False positives, where the software/weeding robot recognises a desired crop plant as an undesired weed, are the most important to get rid of in the prototype. For a user on a farm, false positives mean less income due to lower crop yield. To reduce the false positive rate more after training, a thresholding layer is introduced to prevent classification when the neural network is “unsure” about the classification. In theory, this will reduce both the classification of weeds and crops. For weeds, this reduced classification can be offset by having the weeding robot do multiple passes, or by using more images during a single pass.
The thresholding is implemented in the following way: a trained model (the main model used is the best model from scratch for the false positive rate) is loaded in and a threshold layer is added after the classification layer. The threshold value given means that any value below the threshold is set to 0. The model is then used to make predictions, and any predictions with only 0, predictions where the model was unsure, are removed. The false positives from the rest of the data are then visualised.
In the following graphs the results of this thresholding are shown. The graphs are made using the network from scratch with 11 classes and class weights. Due to the way the data is loaded in, small fluctuations in the data occur, and these are reason for the fluctuations in the graphs. What can be seen is that after a threshold of 0.6 no false positives occur. The accuracy of the model with thresholding does not decrease drastically, even with a threshold of 0.9.
To test the effectivity of thresholding on non-plant objects, a few images were applied to the model to observe the output. These images were: a pair of shoes, two chickens, a hose found on the farm of the user and a section of dirt found on the farm of the user. The highest prediction of these images was the chicken, with a 90% certainty, and the next highest was the dirt with 80% certainty. All images were classified as weeds, so this cannot be ignored. As seen from the earlier results, a threshold of 0.9 will not affect the model immensely, and could thus be used to prevent false positives from foreign objects.
Preliminary Conclusions
Based on the results, ScratchNet has the best performance in terms of accuracy and false positives. However, the amount of classes used (2 or 11) affect the accuracy and the false positive rate in a manner that does not give a clear best option. The user has specified that optimizing the false positive rate is more important than the accuracy, and thus the network with 11 classes comes out on top. This is further supported by the network's response to thresholding: a threshold of 0.6 can fully remove the false positives with a small loss in accuracy. Based on this, ScratchNet with 11 classes and a 0.6 threshold is the best model for the users.
Discussion
Results and the users
The final results have to be translated back to the users’ needs, starting with the farmer’s needs. The end result is a model which is able to distinguish weeds from non-weeds with an accuracy higher than 95%. Furthermore, the model has a false positive ratio equal to zero. A remark to add to these results is that the pictures used for validation were not from the farm. The actual number of false positives and accuracy can be different with pictures taken at the farm. The overall percentage of removed weeds and damaged crops also relies on other parts of the robot. It is likely that those parts will have error percentages as well. It is therefore not possible to say what the actual overall accuracy of the weeding robot and percentage of damaged crops will be. However, the results meet the farmer’s needs with the knowledge available at this moment. The model is able to detect enough weeds and will not see to many crops as weeds.
It can also be said that the needs of CSSF have been met. It is proven that the concept of weed detection with a neural network is possible. The final accuracy and false positive ratio indicate that this method is a success. Furthermore, multiple technical insights during this project have been documented. During the project, conclusions have been drawn about how useful a certain choice was. Image augmentation, class weights and thresholding are examples of useful tools that have been applied. CSSF can use this knowledge in the following projects.
Overall conclusions
The goal of this project is to identify weeds by means of computer vision. During the project not only technical aspects are taken into account. The perspective of the user has played an important role as well. Several conclusions can be made from contact with the user. Profit is important to the user, however it is not the main goal. The weeding robot will not create much profit in the first years. The user is willing to accept that because he gains sustainability and stability. During the set up process of the robot on the farm, it is beneficial to include the farmer.
Furthermore, there are several technical conclusions to draw. Creating a sufficient database to train the network is a large obstacle. Finding pictures online is difficult since they are owned by companies. Making the pictures ourselves is too much work. It is important to create a database with a balance in the amount of data per category. To create a sufficient database, time and cooperation with the farmer are necessary.
Literature study shows that convolutional neural networks with transfer learning is the best option for the project goal. It also shows that setting a threshold in determining classification helps achieving a higher accuracy and less false positives. In the project, different models were created, including one network from scratch (ScratchNet) and three pre-trained models: MobilenetV2, DenseNet201 and InceptionResNetV2. The networks have been trained, tested and compared. Comparison was done by accuracy, false positive ratio and classification time. Multiple conclusions can be drawn by comparing the different models. More hidden layers does not necessarily imply better performance. Using one of the pre-trained models together with the dataset of this project leads to a model that will damage an excessive amount of crops. A pre-trained model is in this case not useful. Applying initialized weights provides a better starting point compared to random weights. It helped to decrease the false positive ratio for ScratchNet. Data augmentation seems to have improved the performance in this project. More complex networks seem to have a longer classification time and training time. The best model in this project is the one made from scratch working with 11 categories.
Applying a threshold proves to be very successful. Using the best model, ScratchNet, a threshold has been implemented. This led to the result of no false positives and an accuracy above 95%. A remark to add to this result is that the pictures used for validation were not from the farm. The actual number of false positives and accuracy can be different with pictures from the farm.
Conclusions on system requirements
Earlier, system requirements were established. Now, these will be revisited to discuss whether our system meets each of the requirements.
- The system is flexible in its views what may be concerned as weeds, as it classifies many different plant species. The outcome of the system specifies which plant species it is. From there, it can be concluded whether this plant species specifies as a weed, and whether it should be removed. In addition, using the Neural Network as we created it, different classes can be added as well. It is thus possible to add more plant species to be recognized.
- The high accuracy rates show that the system is able to distinguish the different types of weeds.
- The system is not yet able to recognize multiple plants in one image. There was not enough time to look at this aspect, and it was also not a very high priority for this project. However, it should not prove difficult to achieve this goal as well with the use of the system. If an image were to be divided into several fragments, these fragments could be fed to the system, leading to an output of a plant species within that fragment. Using the outputs of the different fragments, different plants within one image could be recognized.
- This requirement is not tested explicitly, but is likely to be fulfilled. The dataset used to train and test system contained images of both the weeds and non-weeds in different stages in their growth. The accuracy of the network proved to be high, and therefore it can be assumed that the system is able to deal with plant species within different growing phases.
- As shown in the results section, the accuracy can indeed be above 95%, even with a threshold that prevents false positives for crops. However, this is not the case for models with a high threshold to prevent false positives for other objects.
- The limit of false positives has been met when using thresholds of above 0.6 without drastically reducing the accuracy of the model. This threshold can be increased to 0.9 to prevent other objects from being classified as weeds.
- For this requirement, it can be assumed that it is met. This is based on two factors in which the amount of light was accounted for: the creation of the dataset and the manipulation of the images of the dataset. To create the dataset, images with varying lighting conditions were used, which means the dataset contains a range of lighting conditions. It was taken into account that the robot is supposed to work outside. Therefore, only images in natural light were used. In addition, data augmentations were used in which the contrast was varied, which supplied an even broader range of lighting conditions.
- Considering the recognition system, this should not lead to any problems. However, since the physical design of the robot is not finished yet, it is impossible to make any firm conclusions about this requirement. For the recognition system, the only thing that would matter is whether the hardware is protected, and whether the camera is kept free from water.
- Regarding the system as far as it is finished at the moment, this requirement seems to be very feasible. The classification processing time is far beneath this required time. However, additional processing time will add to the classification, such as taking the picture, feeding the picture to the classification system, and processing the output of the classification. We anticipate that the processing time will not exceed the requirement, as the classification time is quite short, but it cannot be confirmed before the complete system is finished.
- The system is not able to locate a plant within an image. Similar to the ability to locate multiple plants within one image, this requirement was not focused on yet, but could be reached by building onto the classification system. Thus, while the system does not meet the requirement yet, it provides some useful building blocks.
- As discussed in our costs and benefits, the robot will be constructed using some valuable parts. This could make it an attractive piece of technology to steal. This was somewhat taken into account, by accounting for multiple smaller weeding robots, instead of one bigger robot for the whole farm. The robots would then be not as visible, since they are smaller. Also, CSSF proposed to install the software in such a way that whenever it would lose contact or be at the wrong location, the software would become disabled. In other words: the robot would become useless if it were stolen from the farm. However, this does not obstruct the possibility to steal the robot and sell it in separate parts. This was also not accounted for in other ways, so this will still prove a challenge for further development.
- The current system has given a proof of concept of accurate classification of multiple plant species, even while the given dataset was quite small. Therefore, it is anticipated the classification system will not take much of the farmer’s time. However, in the beginning the farmer must probably learn about the workings of the robot. To do so, a training provided by CSSF is accounted for. This means that the robot will temporarily need some time of the farmer, but not much once it is completely adapted to the farm. Therefore, this requirement is considered to be met.
Further research and developments
As mentioned before, this project was executed under the direction of CSSF. The weed recognition system forms one of many parts required to build a robot for autonomous weeding and harvesting.
CSSF is still in the early phases of the development of this robot. Therefore, the main deliverable that was required from this project for their research was a proof of concept. While the recognition system was not required yet to have an ultimate accuracy, it had to show that it is possible to classify different kinds of weeds and plants, distinguishing between multiple classes that look quite alike.
However, there are still improvements possible for the weed recognition. First of all, the main limitation of the neural network as it is, is that the dataset of images used for training of the network is very small. The accuracy could be further improved by increasing the size and quality of the dataset. Currently, the dataset consists of a limited amount of images, containing a limited amount of classes (plant species). These images are not shot at the location of actual agroforestry farms. In addition, the balance between images of wanted plants and unwanted plants is very unequal: in the current dataset there are far more pictures of the weeds than the crops, leading to a bias in the algorithm. The sizes of the individual pictures is also something to take into account. The dataset should have the pictures with the same size to improve performance. The dataset could thus be improved in several ways:
- More pictures of the current species at relevant locations, especially the crops
- Including pictures of other plant species that can be found within agroforestry
- A new category for unrelated images to prevent misclassification of non-plant objects
- Balance between the amount of data per species or category
- Making sure all pictures are the same size.
As already indirectly indicated, the algorithm as it is, is based on the farm of John Heesakkers. Therefore, the species included in the dataset are species that occur on his farm, both for the weeds and the crops. The system should still be generalized by extending the dataset with species occurring at other farms. Then, the user of the system could indicate which species should be considered by the algorithm for the specific farm, or in other words: what species occur on that farm.
Apart from the dataset, the implementation of the algorithm in the robot should be considered. To deal with the stream of pictures that is generated by the camera on the robot, recurrent neural networks could be useful by using multiple frames from a video to classify based on multiple angles. These networks work especially well for sequence data such as time series. It is recommended to also look into recurrent neural networks for further research.
Another possible improvement is changing the last layer of the model to a simgoid layer, so that the model can predict multiple plant species in a single image. Training this model could be done by attaching multiple single-plant images a a single multi-plant image.
Furthermore, the output generated by the recognition algorithm is still limited: it only indicates the species that is recognized from the image fed to the algorithm, but does not indicate its position yet. Building from the current network, the system can be further extended such that it divides the image into different image parts, and indicates the position and species of all recognized plants within each part of the image. A remark to add is that the training data used for positioning needs more information than only pictures and categories. The position has to be indicated. With the software being able to position the weed, it is still necessary to activate the hardware to remove the weeds. In conclusion, multiple steps are still necessary to get from the detection software to an actual weeding robot.
Appendix
Figure 7 shows the calculated costs of the weeding robot per year. There are some things that should be explained about this calculation:
- As CSSF indicated, the R&D costs will be divided over the first 100 robots.
- This calculation was focused on John's farm. It was estimated that John's farm would need five weeding robots. Therefore, all costs that concern the individual robot are multiplied by five.
- The fault costs are based on the expectation of a 2% loss due to mechanical damage (false positives), and 3000 plants per hectare (total 504000 plants). The price per plant is based on the average costs of the plants that John currently has on its farm.
Figure 8 shows the calculated costs of traditional weeding per year. Again, some things should be elaborated on to understand this calculation:
- The calculated hours for the farmer are based on the expectation of 4 hours a week, except when there are extra workers (14 weeks) or in the winter months (12 weeks). This is as indicated by John Heesakkers.
- The calculated hours for extra workers are based on the expectation of 2 full-time working employees in the months May, June, July and half of August. According to John, these are the months in which the weeds are most problematic. The estimation of the expected amount of workers was also indicated by him.
- The costs due to mechanical damage are based on the expectation of a 0.5% loss due to mechanical damage, and 3000 plants per hectare (total 504000 plants).
References
Afonso, M. V., Blok, P. M., Polder, G., van der Wolf, J. M., & Kamp, J. A. L. M. (2019). Blackleg Detection in Potato Plants using Convolutional Neural Networks. Paper presented at 6th IFAC Conference on Sensing, Control and Automation Technologies for Agriculture, AgriControl 2019, Sydney, Australia.
Alchanatis, V., Ridel, L., Hetzroni, A., & Yaroslavsky, L. (2005). Weed detection in multi-spectral images of cotton fields. Computers and Electronics in Agriculture, 47(3), 243-260. doi:10.1016/j.compag.2004.11.019
Bawden, O., Kulk, J., Russell, R., McCool, C., English, A., Dayoub, F., . . . Perez, T. (2017). Robot for weed species plant-specific management. Journal of Field Robotics, 34(6), 1179-1199. doi:10.1002/rob.21727
Belgiu, M., Drăguţ, L. (2016). Random forest in remote sensing: a review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 114, 24–31. https://doi:10.1016/j.isprsjprs.2016.01.011
Booij, J., Nieuwenhuizen, A., van Boheemen, K., de Vissr, C., Veldhuisen, B., Vroegop, A., ... Ruigrok, T. (2020). 5G Fieldlab Rural Drenthe: duurzame en autonome onkruidbestrijding. (Rapport / Stichting Wageningen Research, Wageningen Plant Research, Business unit Agrosysteemkunde; No. WPR). Wageningen: Stichting Wageningen Research, Wageningen Plant Research, Business unit Agrosysteemkunde. https://doi.org/10.18174/517141
Breiman, L. (2001). Random Forests. Machine Learning 45, 5–32. https://doi.org/10.1023/A:1010933404324
Carvalho, L., & Von Wangenheim, A. (2019). 3d object recognition and classification: A systematic literature review. Pattern Analysis and Applications, 22(4), 1243-1292. doi:10.1007/s10044-019-00804-4
Comer, S., Ekanem, E., Muhammad, S., Singh, S. P., & Tegegne, F. (1999). Sustainable and conventional farmers: A comparison of socio-economic characteristics, attitude, and beliefs. Journal of Sustainable Agriculture, 15(1), 29-45.
Dos Santos Ferreira, A., Matte Freitas, D., Gonçalves da Silva, G., Pistori, H., & Theophilo Folhes, M. (2017). Weed detection in soybean crops using convnets. Computers and Electronics in Agriculture, 143, 314-324. doi:10.1016/j.compag.2017.10.027
Duong, L.T., Nguyen, P.T., Sipio, C., Ruscio, D. (2020). Automated fruit recognition using EfficientNet and MixNet. Computers and Electronics in Agriculture, 171. https://doi.org/10.1016/j.compag.2020.105326
Espejo-Garcia, B., Mylonas, N., Athanasakos, L., Fountas, S., & Vasilakoglou, I. (2020). Towards weeds identification assistance through transfer learning. Computers and Electronics in Agriculture, 171, 0168-1699. https://doi.org/10.1016/j.compag.2020.105306
Finger, R., Scott M. S., Nadja El B., and Achim W. 2019. “Precision Farming at the Nexus of Agricultural Production and the Environment.” Annual Review of Resource Economics 11(1):313–35.
Gašparović, M., Zrinjski, M., Barković, D., & Radočaj, D. (2020). An automatic method for weed mapping in oat fields based on UAV imagery. Computers and Electronics in Agriculture, 173, 0168-1699. https://doi.org/10.1016/j.compag.2020.105385
Haggblade, S., Smale, M., Kergna, A., Theriault V., Assima, A. (2017). Causes and Consequences of Increasing Herbicide Use in Mali. Eur J Dev Res 29, 648–674. https://doi.org/10.1057/s41287-017-0087-2
He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2015). Deep Residual Learning for Image Recognition.
Hemming, J., Barth, R., & Nieuwenhuizen, A. T. (2013). Automatisch onkruid bestrijden PPL-094 : doorontwikkelen algoritmes voor herkenning onkruid in uien, peen en spinazie. Wageningen: Plant Research International, Business Unit Agrosysteemkunde.
Hemming, J., Blok, P., & Ruizendaal, J. (2018). Precisietechnologie Tuinbouw: PPS Autonoom onkruid verwijderen: Eindrapportage. (Rapport WPR; No. 750). Bleiswijk: Wageningen Plant Research, Business unit Glastuinbouw. https://doi.org/10.18174/442083
Herck, L., Kurtser, P., Wittemans, L., & Edan, Y. (2020). Crop design for improved robotic harvesting: A case study of sweet pepper harvesting, Biosystems Engineering, 192, 1537-5110. https://doi.org/10.1016/j.biosystemseng.2020.01.021.
Koh, Lian Pin. (2010). Agroforestry Implications. Biotropica 42(6):760–60.
Kounalakis, T., Triantafyllidis, G. A., & Nalpantidis, L. (2018). Image-based recognition framework for robotic weed control systems. Multimedia Tools and Applications, 77(8), 9567-9594. http://dx.doi.org/10.1007/s11042-017-5337-y
Li, Y., Wang, H., Dang, L.M., Sadeghi-Niaraki, A., & Moon, H. (2020). Crop pest recognition in natural scenes using convolutional neural networks. Computers and Electronics in Agriculture, 169, 0168-1699. https://doi.org/10.1016/j.compag.2019.105174
Ministerie van Landbouw, Natuur en Voedselkwaliteit. (2019). Landbouwbeleid. Consulted from: https://www.rijksoverheid.nl/onderwerpen/landbouw-en-tuinbouw/landbouwbeleid
Perrins, J., Williamson, M., & Fitter, A. (1992). A survey of differing views of weed classification: implications for regulation of introductions. Biological Conservation, 60(1), 47-56.
Pingali, P.L. (2001). Environmental consequences of agricultural commercialization in asia. Environment and Development Economics, 6(4), 483–502
Piron, A., van der Heijden, F. & Destain, M.F. Weed detection in 3D images. Precision Agric 12, 607–622 (2011). https://doi-org.dianus.libr.tue.nl/10.1007/s11119-010-9205-2
Plourde J.D, Pijanowski B.C, & Pekin B.K. (2013). “Evidence for Increased Monoculture Cropping in the Central United States.” Agriculture, Ecosystems and Environment 165:50–59.
Raja, R., Nguyen, T.T., Slaughter, D.C., Fennimore, S.A. (2020). Real-time robotic weed knife control system for tomato and lettuce based on geometric appearance of plant labels. Biosystems Engineering, 194, 1537-5110. https://doi.org/10.1016/j.biosystemseng.2020.03.022
Raja, R., Nguyen, T.T., Slaughter, D.C., & Fennimore, S.A. (2020). Real-time weed-crop classification and localisation technique for robotic weed control in lettuce. Biosystems Engineering, 192, 1537-5110. https://doi.org/10.1016/j.biosystemseng.2020.02.002
Riehle, D., Reiser, D. & Griepentrog, H.W. (2020). Robust index-based semantic plant/background segmentation for RGB- images. Computers and Electronics in Agriculture, 169, 0168-1699. https://doi.org/10.1016/j.compag.2019.105201
Russakovsky, Olga; Deng, Jia; Su, Hao; Krause, Jonathan; Satheesh, Sanjeev; Ma, Sean; Huang, Zhiheng; Karpathy, Andrej; Khosla, Aditya; Bernstein, Michael; Berg, Alexander C.; Fei-Fei, Li (2015). ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision.
Salman, A., Semwal, A., Bhatt, U., Thakkar, V.M. (2017). Leaf classification and identification using Canny Edge Detector and SVM classifier. 2017 International Conference on Inventive Systems and Control (ICISC), Coimbatore, pp. 1-4.
Su, W., Fennimore, S.A., & Slaughter, D.C. (2020). Development of a systemic crop signalling system for automated real-time plant care in vegetable crops. Biosystems Engineering, 193, 1537-5110. https://doi.org/10.1016/j.biosystemseng.2020.02.011
Tang, J. L., Chen, X. Q., Miao, R. H., & Wang, D. (2016). Weed detection using image processing under different illumination for site-specific areas spraying. Computers and Electronics in Agriculture, 122, 103-111.
Wu, X., Aravecchia, S., Lottes, P., Stachniss, C., & Pradalier, C. (2020) Robotic weed control using automated weed and crop classification. J Field Robotics, 37, 322– 340. https://doi.org/10.1002/rob.21938
Yu, J., Schumann, A., Cao, Z., Sharpe, S., & Boyd, N. (2019). Weed detection in perennial ryegrass with deep learning convolutional neural network. Frontiers in Plant Science, 10, 1422-1422. doi:10.3389/fpls.2019.01422
Who has done what
Week 1:
Name (ID) | Hours | Work done |
---|---|---|
Hilde van Esch (1306219) | 11 | Intro lecture + group formation (1 hour) + Meetings (3 hours) + Brainstorming ideas (1 hour) + Literature research (4 hours) + User (2 hours) |
Leighton van Gellecom (1223623) | 13 | Intro lecture + group formation (1 hour) + Meetings (3 hours) + Brainstorming ideas (1 hour) + Literature research (6.5 hours) + Problem statement (1.5 hours) |
Tom van Leeuwen (1222283) | 9 | Intro lecture + group formation (1 hour) + Meetings (3 hours) + Brainstorming ideas (1 hour) + Literature research (2 hours) + Approach, milestones and deliverables (2 hours) |
Karla Gloudemans (0988750) | 15 | Intro lecture + group formation (1 hour) + Meetings (3 hours) + Brainstorming ideas (1 hour) + Literature research & State of the Art combining (9 hours) + Typing out minutes (1 hour) |
Timon Heuwekemeijer (1003212) | 9 | Intro lecture + group formation (1 hour) + Meetings (3 hours) + Brainstorming ideas (1 hour) + Literature research (4 hours) |
Week 2:
Name (ID) | Hours | Work done |
---|---|---|
Hilde van Esch (1306219) | 4.5 | Meetings (2.5 hour) + Reviewing wiki page (1 hour) + User (1 hour) |
Leighton van Gellecom (1223623) | 4.5 | Meetings (2,5 hours) + Python recap/OOP (2 hours) |
Tom van Leeuwen (1222283) | 4 | Meetings (2.5 hours) + Requirements (1.5 hours) |
Karla Gloudemans (0988750) | 6 | Meetings (2,5 hours) + Create database of weeds (3,5 hours) |
Timon Heuwekemeijer (1003212) | 4,5 | Meetings (2,5 hours), create a planning (2 hours) |
Week 3:
Name (ID) | Hours | Work done |
---|---|---|
Hilde van Esch (1306219) | 5.5 | Meetings (1.5 hour) + Create database of 2 weed species (1 hour) + Install all the programs for the project (3 hours) |
Leighton van Gellecom (1223623) | 15 | Meetings (1.5 hours) + Design section (4.5 hours) + TensorFlow installation /trouble (4.5 hours) + Tensorflow introduction (2.5 hours) + Data acquisition (2 hours) |
Tom van Leeuwen (1222283) | 4.5 | Meetings (1.5 Hours) + Data Acquisition (3 Hours) |
Karla Gloudemans (0988750) | 5,5 | Meetings (1,5 hour) + Create database of 2 weed species (3 hours) + Install all the programs for the project (1 hour) |
Timon Heuwekemeijer (1003212) | 9,5 | Meetings (1,5 hours), Creating and troubleshooting a collaborative development environment(5 hrs), collect database (3 hours) |
Week 4:
Name (ID) | Hours | Work done |
---|---|---|
Hilde van Esch (1306219) | 10.5 | Meetings (3 hours), Installation software (4 hours), preparing meeting John (0.5 hours), creating database (1.5 hours), researching neural networks (1.5 hours) |
Leighton van Gellecom (1223623) | 9 | 2.5 hours meeting + 1 hour meeting John + 1.5 hours improving requirements/ design + 30 min tensorflow examples + 3.5 hours CNN/transfer learning tensorflow |
Tom van Leeuwen (1222283) | 5 | Meetings (3 hours), Data augmentation (2 hours) |
Karla Gloudemans (0988750) | 5 | Meetings (3 hours) + Reading into programs used for this project (2 hours) |
Timon Heuwekemeijer (1003212) | 4,5 | Meetings (3 hours), Data sorting and naming (1,5 hours) |
Week 5:
Name (ID) | Hours | Work done |
---|---|---|
Hilde van Esch (1306219) | 8.75 | Meetings (2.25 hours), implementation neural network NASNetMobile (6.5 hours) |
Leighton van Gellecom (1223623) | 13.75 | Meetings (2.25 hours) + Research into neural nets and basic implementation (5.5 hours) + reworked LoadData class and usage (2.5 hours) + implemented and evaluated CNN with transfer learning (4 hours) |
Tom van Leeuwen (1222283) | 7.25 | Meetings (2.25 hours) + implementation, training and testing of InceptionResNetV2 (5 hours) |
Karla Gloudemans (0988750) | 8.25 | Meetings (2.25 hours) + Understanding and training network with Densenet (6 hours) |
Timon Heuwekemeijer (1003212) | 8.25 | Meetings (2.25 hours), implementing an automatically tuning neural network(6 hours) |
Week 6:
Name (ID) | Hours | Work done |
---|---|---|
Hilde van Esch (1306219) | 7 | Meetings (2 hours) + Working on costs and benefits (5 hours) |
Leighton van Gellecom (1223623) | 11 | Meetings (2 hours) + Generating and testing hypotheses for transfer learning (4 hours) + training models and evaluation (5 hours) |
Tom van Leeuwen (1222283) | 6.5 | Meetings (2 hours) + Code for confusion matrix (3 hours) + Writing Design process (1.5 hours) |
Karla Gloudemans (0988750) | 8 | Meetings (2 hours) + Writing 'Creating the dataset' (6 hours) |
Timon Heuwekemeijer (1003212) | 9 | Meetings (2 hours) + Creating and finetuning a convolutional neural network, 2 output classes and class weights (7 hours) |
Week 7:
Name (ID) | Hours | Work done |
---|---|---|
Hilde van Esch (1306219) | 7 | Meetings (2.5 hours) + Working on costs and benefits (5 hours) |
Leighton van Gellecom (1223623) | 10.25 | Meetings (2.5 hours) + TF training, getting results and extra meeting (7.45 hours) |
Tom van Leeuwen (1222283) | 5.5 | Meetings (2.5 hours) + Design Choices (3 hours) |
Karla Gloudemans (0988750) | 8.5 | Meetings (2,5 hours) + writing 'Relating results to user needs' and final changes 'Creating the dataset'(6 hours) |
Timon Heuwekemeijer (1003212) | 9.5 | Meetings (2,5 hours) + training different networks, collecting and interpreting results (7 hours) |
Week 8:
Name (ID) | Hours | Work done |
---|---|---|
Hilde van Esch (1306219) | 9 | Meetings (2.5 hours) + writing on wiki (Discussion + User needs) (5.5 hours) + working on presentation (1 hour) |
Leighton van Gellecom (1223623) | 13.75 | Meetings (2.5 hours) + write prototype sections/ adding code to do so (visualizations) (8 hours) + reviewing wiki (45 min) + create data demo visualizations (2.5 hours) |
Tom van Leeuwen (1222283) | 9 | Meeting (2 hours) + Thresholding (6 hours) + Conclusion (1 hour) |
Karla Gloudemans (0988750) | 8.5 | Meetings (2.5 hours) + Conclusion and relating results to the user (6 hours) |
Timon Heuwekemeijer (1003212) | 5.5 | Meetings (2.5 hours) + documenting results on the wiki (3 hours) |
Week 9:
Name (ID) | Hours | Work done |
---|---|---|
Hilde van Esch (1306219) | 5.5 | Meetings (2.5 hours), Presentation (1 hour), Proofreading (2 hours) |
Leighton van Gellecom (1223623) | 5.5 | Meetings (2.5 hours) + Presentation (2 hours) + Proofreading (1 hour) |
Tom van Leeuwen (1222283) | 4.5 | Meetings (2.5 hours) + Presentation (1 hour) + Proofreading (1 hour) |
Karla Gloudemans (0988750) | 15.5 | Meetings (2.5 hours) + Presentation (12 hour) + Proofreading (1 hour) |
Timon Heuwekemeijer (1003212) | 5.5 | Meetings (2.5 hours) + Presentation (2 hours) + Proofreading (1 hour) |