-[$(=/$[

OSS Sec: From Path-traversals to RCE.

By

Patrick Peng

March 26, 2024

Recently, I been dedicating my time to bug hunting of large OSS Projects, which is both a time and brain consuming job with these complex architecture and these intricating X-referencing and API calls. Nevertheless, these hardworks paid off (In my worklist you can find all vulnerabilities that I worked on) . Concentrating in such task for around 5 days actually leads me to pretty decent bounus :))

Today, I am going to introduce one of the interesting vulnerabilities I worked on that goes beyond my expectations: From a Path-traversals to RCE!!!

Personally, I always prefer applicational education with instances instead of reading things on textbooks theory-wise. In this instance I worked on, you might learn:

  • How Path Traversals exist in real-life scenarios.

  • The danger hidden in importlib .

(Since at the time I started to work on this writeup*(around 3/26) ,* this specific report hasn't being disclosed, I will use it to replace some essential specifications.)

Remote Code Execution caused by a path traversal due to the lack of path sanitization in[REDACTED]in[REDACTED]

In `[REDACTED].py` users can reinstall their binding for `[REDACTED]`. However, path traversal allows users to cd to an Arbitrary directory, which the server further loads the__init__.py file, causing Arbitrary Code Execution.

Path traversal?

💡
The project I worked on is a around 4k stars projects which provides LLM hosting service with various functions and user-friendly interface to access and utilize various LLM and other AI models for a wide range of tasks. The project itself is incredibly useful and experiencing and allowed uploading RAG jobs. Nevertheless, it had a minor sanitizational flaw however leads to severe consequences.

To begin with, this application is based on FastAPI endpoints for users and the webui to interact with the backend of the server; within the huge selection of custom LLM services and models, users can choice arbitrary models as they intended; By changing binding (models), the /reinstall_binding endpoint is exposed to users.

@router.post("/reinstall_binding")
def reinstall_binding(data:BindingInstallParams):
    """Reinstall an already installed binding on the server.

    Args:
        data (BindingInstallParams): Parameters required for reinstallation.
        format:
            name: str : the name of the binding

    Returns:
        dict: Status of operation.
    """    
    ASCIIColors.info(f"- Reinstalling binding {data.name}...")
    try:
        ASCIIColors.info("Unmounting binding and model")
        del [REDACTED].binding
        [REDACTED].binding = None
        gc.collect()
        ASCIIColors.info("Reinstalling binding")
        old_bn = [REDACTED].config.binding_name
        [REDACTED].config.binding_name = data.name
        [REDACTED].binding =  BindingBuilder().build_binding([REDACTED].config, [REDACTED].[REDACTED]_paths, InstallOption.FORCE_INSTALL, [REDACTED]=[REDACTED])
        [REDACTED].success("Binding reinstalled successfully")

As we can see here, reinstall_binding seemed like a really normal and safe functon for users to reinstall local binding via rebuilding the entire binding BindingBuilder().build_binding([REDACTED].config, [REDACTED].[REDACTED]_paths, InstallOption.FORCE_INSTALL, [REDACTED]=[REDACTED]) in which the targeted binding is process as [REDACTED].config that assign and inherit from [REDACTED].config.binding_name = data.name , which data.name came directly from the external data passed in the request. The function seemed pretty safe and sound for now, but lets dig into how BindingBuilder().build_binding( process a binding:

BindingBuilder().build_binding(

class BindingBuilder:
    def build_binding(
                        self, 
                        config: [REDACTED]Config, 
                        [REDACTED]_paths:[REDACTED]Paths,
                        installation_option:InstallOption=InstallOption.INSTALL_IF_NECESSARY,
                        [REDACTED]Com=None
                    )->LLMBinding:

        binding:LLMBinding = self.getBinding(config, [REDACTED]_paths)
        return binding(
                config,
                [REDACTED]_paths=[REDACTED]_paths,
                installation_option = installation_option,
                [REDACTED]Com=[REDACTED]Com
                )

    def getBinding(
                        self, 
                        config: [REDACTED]Config, 
                        [REDACTED]_paths:[REDACTED]Paths,
                    )->LLMBinding:

        if len(str(config.binding_name).split("/"))>1:
            binding_path = Path(config.binding_name)
        else:
            binding_path = [REDACTED].bindings_zoo_path / config["binding_name"]

        # define the full absolute path to the module
        absolute_path = binding_path.resolve()
        # infer the module name from the file path
        module_name = binding_path.stem
        # use importlib to load the module from the file path
        loader = importlib.machinery.SourceFileLoader(module_name, str(absolute_path / "__init__.py"))
        binding_module = loader.load_module()
        binding:LLMBinding = getattr(binding_module, binding_module.binding_name)
        return binding

BindingBuilder().build_binding( seem to read and initalize a binding via setting binding:LLMBinding into the [REDACTED]_paths that passed as a argument into the build_binding via getBinding, then return the new build binding with binding ; However, the core of our path traversal came from here:

        if len(str(config.binding_name).split("/"))>1:
            binding_path = Path(config.binding_name)
        else:
            binding_path = [REDACTED].bindings_zoo_path / config["binding_name"]

As described above, self.getBinding is called in its father function to get the binding information via [REDACTED]_paths to assign the right specified binding . For it to get the right blinding, it checks handles for the config.binding_name , if it contains a complex path (if len(str(config.binding_name).split("/"))>1:) binding_path will be initalized directly by converting config.binding_name into a Path subject, otherwise, it will load it's relative path concatenate with the [REDACTED].bindings_zoo_path that previously setted.

The program seemed pretty normal even now, nevertheless, taking a look back at how the config.binding_name is being parsed into this function, you might get an epiphany about why this function is dangerous.

Looking back at the Source, the call chain appears to be like:

@router.post("/reinstall_binding"):data -> 
[REDACTED].config.binding_name = data.name ->
[REDACTED].binding =  BindingBuilder().build_binding([REDACTED].config ->
binding:LLMBinding = self.getBinding(config, [REDACTED]_paths -> 
binding_path = Path(config.binding_name)

Yep, indeed, as you might realize now, thedata.namehad being parsed into the actual point path to the binding without any form of sanitization or asserting handles. This mean that if we enter anything that fits len(str(config.binding_name).split("/"))>1 , it will be assign at the address pointing the binding immediately. For instance, if the data.name looked like...

  • /xxx/xxx/xxx/xxx/xxx/xxx/xxx/xxx/folder

  • ../../../../../../../../../../folder

the folder in these case will be directly seemed as the folder to the actual binding, leading us with Path Traversals . Usually, here's where we stop and handle a disclosure report to the maintains but in this case, its bit different.

From Path Traversals To RCE

Usually this path traversal left us chances to arbitrary leak files and informations and in an arbitrary path if we explore further on binding-related methods and functions. Nevertheless, we dicided to take a bold move and go further with the potentials of bindings.

Taking a look of what a binding folder looks like (usingopen_aibinding in this projects as an example):

 ▲ zoos/bindings_zoo/open_ai tree                                                                            main 22h ⬢
.
├── __init__.py
├── binding_card.yaml
├── logo.png
├── logo2.png
├── models.yaml
└── requirements.txt

1 directory, 7 files

You may find that a binding folder is usually consist of:

  • logo*.png : logos of the bindings .

  • models.yaml : the configurations of this specific binding (model) .

  • binding_card.yaml : similiar to models.yaml , description related.

  • requirements.txt : requirements to install this model.

Lastly:

  • __init__.py : complex model initialization scripts to build up our specificed binding .

__init__.py itself is very complex, we might find a way to exploit it but it might cause a considerable large amount of work and efforts. However, it we focused on the right thing, how does BindingBuilder().build_binding( actually loads in these __init__.py , we will get our key to success.

Now, if we take a look back to were we described BindingBuilder().build_binding( -> getBinding you might find a fun command:

 loader = importlib.machinery.SourceFileLoader(module_name, str(absolute_path / "__init__.py"))

Oops! this seems very fun, since the server have to re-initalize the specificed binding , it will have to import the original __init__.py file via importlib.machinery.SourceFileLoader to successfully initalize it. However, as we all know, when importing a package, the top-level or non-indented code will be executed (such as imports in import) immediately when a .py file is imported, (also why you might need to useif __name__ == "__main__":in complex projects); This leads us to a exploitation vector utilizing both Path Traversals and usage of importlib :

Exploiting

While the Path Traversal provided us to change binding_folder in an arbitrary location , the usage of importlib provided us the ability to execute a specificed file in binding_folder . This leaded us to a really clear vector of exploitation; By utilizing the upload function in the application (which mentioned in the description of the application) Thus, we can exploit and Upgrade our Path Traversals into Remote-Code-Executions by:

  1. Create a malicious __init__.py with malicious code in non-indented level.

  2. Upload __init__.py file into a discussion (session) via RAG functions

  3. Change bindings to the discussion (session) path with malicious __init__.py uploaded

  4. reinstall_binding with specific discussion (session) as data.name

  5. Trigger importlib.machinery.SourceFileLoader , resulting Remote-Code-Executions

In the actual exploitation process, this will be a more difficult job. the report is disclosed on Huntr at https://huntr.com/bounties/63266c77-408b-45ff-962c-8163db50a864. More detailed Exploitations and PoC video is available there!

Summarization

Started on the reinstall_binding function within the project's FastAPI framework. we delve into the BindingBuilder().build_binding() method, which was supposed to safely load and initialize a binding. with call chain that leads data.name to binding_path , a Path traversal vulnerability was discovered that allowed arbitrary load of binding_folder

Arbitrary load of binding_folder seemed less exploitable, at first. However, after delving into the specifications of binding , we get the new vector by path traversal in conjunction with the importlib.machinery.SourceFileLoader function (which imports and executes the __init__.py file), to achieve RCE.

Lastly, here, sometime works consist of both tolerance of boredom and luck :) make sure you always covered all references and possible vectors, there might be suprises waiting for you.