Like many college students, I faced a lot of uncertainty because of COVID-19. My initial summer internship and fall study abroad plans were canceled, and suddenly I had nothing to do for the rest of the year. I applied for this position because of a cryptography course I had taken that turned out to be a lot of fun. Even though I didn’t have much experience with Kubernetes, Docker, or containers, I was excited for the prospect of working in cloud security. Fortunately, my interviews went well and I got the offer! My interviewers and recruiters made sure the process was effective and quick.
The entire internship was done remotely. I was sent all of the tech equipment I would ever need, my accounts were set up quickly, and onboarding was done through videos and other virtual formats. All in all, starting a new job without doing anything in person is a daunting task, but the amazing HR and IT team made sure things went very smoothly.
The first couple of weeks were spent onboarding and learning all the technologies we would be using, from Kubernetes to machine learning to Metasploit. I cannot emphasize enough how helpful and crucial this learning period was for the rest of the internship. We were given tons of resources that helped us understand the scope of the project itself and equipped us with the knowledge we would need to get the job done. Another really fun part of the learning period was getting to hear from other engineers at Sysdig. We had multiple sessions with experts on different topics that helped connect our individual learning to how Sysdig uses these technologies. Thank you Lorenzo, Mark, and Yathi for taking the time to speak with us, your input was invaluable! I was a little worried about whether we had spent enough time at this stage, but the real learning came from applying everything I had read about to our actual project.
The actual project we were working on was a machine learning detection program. We wanted to use machine learning to identify when suspicious activity occurs inside a container. Sysdig currently has an open source threat detection engine called Falco, however it cannot detect attacks if a command has been re-named or if it comes across an unseen attack pattern/binary name. Our program, Systerns, uses our trained model to predict the true name of the command being executed. This prevents a hacker from renaming a command to something trusted by Falco, thus bypassing the engine. We also trained our model to identify crypto miner attacks. The syscall patterns generated by crypto miner attacks is something that our model is able to pick up on and will flag as malicious. An example of this is with the kinsing binary. This malware will deploy and run a crypto miner called kdevtmpfsi. Even though our model has not seen the exact Sysdig output of this miner, it has been trained with data from other crypto miners, and so it is able to categorize this binary as a malicious attack.
The time we spent working on the project went by quickly. We started by experimenting with different machine learning models to figure out which one would best suit our needs. After we narrowed our choices down to one, we worked on setting up our data pipeline. We had to figure out a way to get the container’s raw Sysdig output to our actual model. Here we spent a lot of time converting, uploading, and taking files from our AWS S3 Bucket. Once we had our data pipeline implemented, we started to connect all the dots. Connecting our model to this flow of data required a lot of careful timing and formatting to have this flow be continuous and as real time as possible. We also decided to expand on the amount of classes our model could predict, so we could incorporate more commands into the engine. Our attack detection method also saw a few changes, as we went from having a second, binary classification model to a rule-based engine as we were not getting as accurate results as we would have liked from the binary model. Overall, there was a lot of trial and error as Tanya and I had never done something like this. However, our modifications along the way were always met with encouragement. Getting that green light to experiment with the project was a welcome thing and motivated us to think outside of the box.
One of the main concerns I had with the remote format was the level of communication I would have with my mentors and co-intern. However, this quickly became one of the last things I had to worry about. We had at least two TA sessions a week where all of our questions were addressed, and a weekly sync to update everyone on our progress and talk about the next week’s goals. In addition to these meetings, our mentors were always available on Slack to answer questions and willing to set up extra meetings when we needed more help. This constant communication really helped mimic the in-person conversations that I was familiar with. Tanya and I worked closely together on each part of the project, so our Slack conversations were very active throughout the entire internship. We also made good use of Google Colab and Github to collaborate on our actual code.
From a technical perspective, I had the opportunity to use a lot of new technologies in a working environment. This includes Kubernetes, Docker, Go, TensorFlow, Keras, AWS EC2, and S3. This kind of experience is really different from the type you get from your traditional college classes, which is one of the reasons I enjoy internships so much.
In a not so technical frame, there were tons of little things I learned as well. Getting the opportunity to plan out our project, set milestones, and change the scope of the project was one of the bigger lessons. I went in wanting to check off every requirement and be able to identify every type of attack out there, but having to scale back based on our time frame and progress taught me that focusing on a smaller goal and executing it well is much more valuable. And finally, the most important lesson that I will never forget is to kill your cryptominers as soon as you’re done analyzing them :)
This internship would not have been as fun, educational, or rewarding without the amazing team I got to work with. To my co-intern Tanya, I’m going to miss talking to you literally all day long. I’ve really enjoyed working with you and I’m so proud of what we’ve accomplished. Kaizhe, thank you for walking us through every step of the internship. We seriously could not have done it without all of your guidance and encouragement. Flavio, thank you for all the help with the machine learning aspect of the project. I don’t know how you did it, but you had an answer for every single one of our questions and the progress we made on our model would not have been as significant without your input. Omer, thank you for helping us see the bigger picture and how Systerns connects to Sysdig’s existing pipeline. Having that bigger goal in sight was important for us to have as we worked through the project. Finally, a big thank you to Sysdig for hosting me, I’ve truly enjoyed my time here!