OSS Sec: From Path-traversals to RCE.
By
March 26, 2024
Recently, I been dedicating my time to bug hunting
of large OSS
Projects, which is both a time and brain consuming job with these complex architecture and these intricating X-referencing and API calls. Nevertheless, these hardworks paid off (In my worklist you can find all vulnerabilities that I worked on) . Concentrating in such task for around 5 days actually leads me to pretty decent bounus :))
Today, I am going to introduce one of the interesting vulnerabilities I worked on that goes beyond my expectations: From a Path-traversals to RCE!!!
Personally, I always prefer applicational education with instances instead of reading things on textbooks theory-wise. In this instance I worked on, you might learn:
How
Path Traversals
exist in real-life scenarios.The danger hidden in
importlib
.
(Since at the time I started to work on this writeup*(around 3/26) ,* this specific report hasn't being disclosed, I will use it to replace some essential specifications.)
Remote Code Execution caused by a path traversal due to the lack of path sanitization in[REDACTED]
in[REDACTED]
In `[REDACTED].py
` users can reinstall their binding for `[REDACTED]
`. However, path traversal allows users to cd to an Arbitrary directory, which the server further loads the__init__.py
file, causing Arbitrary Code Execution.
Path traversal?
4k stars
projects which provides LLM hosting service
with various functions and user-friendly interface to access and utilize various LLM and other AI models for a wide range of tasks. The project itself is incredibly useful and experiencing and allowed uploading RAG jobs. Nevertheless, it had a minor sanitizational flaw however leads to severe consequences.To begin with, this application is based on FastAPI
endpoints for users and the webui to interact with the backend of the server; within the huge selection of custom LLM services and models, users can choice arbitrary models as they intended; By changing binding (models), the /reinstall_binding
endpoint is exposed to users.
@router.post("/reinstall_binding")
def reinstall_binding(data:BindingInstallParams):
"""Reinstall an already installed binding on the server.
Args:
data (BindingInstallParams): Parameters required for reinstallation.
format:
name: str : the name of the binding
Returns:
dict: Status of operation.
"""
ASCIIColors.info(f"- Reinstalling binding {data.name}...")
try:
ASCIIColors.info("Unmounting binding and model")
del [REDACTED].binding
[REDACTED].binding = None
gc.collect()
ASCIIColors.info("Reinstalling binding")
old_bn = [REDACTED].config.binding_name
[REDACTED].config.binding_name = data.name
[REDACTED].binding = BindingBuilder().build_binding([REDACTED].config, [REDACTED].[REDACTED]_paths, InstallOption.FORCE_INSTALL, [REDACTED]=[REDACTED])
[REDACTED].success("Binding reinstalled successfully")
As we can see here, reinstall_binding
seemed like a really normal and safe functon for users to reinstall local binding via rebuilding the entire binding BindingBuilder().build_binding([REDACTED].config, [REDACTED].[REDACTED]_paths, InstallOption.FORCE_INSTALL, [REDACTED]=[REDACTED])
in which the targeted binding is process as [REDACTED].config
that assign and inherit from [REDACTED].config.binding_name = data.name
, which data.name
came directly from the external data passed in the request. The function seemed pretty safe and sound for now, but lets dig into how BindingBuilder().build_binding(
process a binding:
BindingBuilder().build_binding(
class BindingBuilder:
def build_binding(
self,
config: [REDACTED]Config,
[REDACTED]_paths:[REDACTED]Paths,
installation_option:InstallOption=InstallOption.INSTALL_IF_NECESSARY,
[REDACTED]Com=None
)->LLMBinding:
binding:LLMBinding = self.getBinding(config, [REDACTED]_paths)
return binding(
config,
[REDACTED]_paths=[REDACTED]_paths,
installation_option = installation_option,
[REDACTED]Com=[REDACTED]Com
)
def getBinding(
self,
config: [REDACTED]Config,
[REDACTED]_paths:[REDACTED]Paths,
)->LLMBinding:
if len(str(config.binding_name).split("/"))>1:
binding_path = Path(config.binding_name)
else:
binding_path = [REDACTED].bindings_zoo_path / config["binding_name"]
# define the full absolute path to the module
absolute_path = binding_path.resolve()
# infer the module name from the file path
module_name = binding_path.stem
# use importlib to load the module from the file path
loader = importlib.machinery.SourceFileLoader(module_name, str(absolute_path / "__init__.py"))
binding_module = loader.load_module()
binding:LLMBinding = getattr(binding_module, binding_module.binding_name)
return binding
BindingBuilder().build_binding(
seem to read and initalize a binding via setting binding:LLMBinding
into the [REDACTED]_paths
that passed as a argument into the build_binding
via getBinding
, then return the new build binding with binding
; However, the core of our path traversal came from here:
if len(str(config.binding_name).split("/"))>1:
binding_path = Path(config.binding_name)
else:
binding_path = [REDACTED].bindings_zoo_path / config["binding_name"]
As described above, self.getBinding
is called in its father function to get the binding
information via [REDACTED]_paths
to assign the right specified binding
. For it to get the right blinding, it checks handles for the config.binding_name
, if it contains a complex path (if len(str(config.binding_name).split("/"))>1:
) binding_path
will be initalized directly by converting config.binding_name
into a Path
subject, otherwise, it will load it's relative path concatenate with the [REDACTED].bindings_zoo_path
that previously setted.
The program seemed pretty normal even now, nevertheless, taking a look back at how the config.binding_name
is being parsed into this function, you might get an epiphany about why this function is dangerous.
Looking back at the Source, the call chain appears to be like:
@router.post("/reinstall_binding"):data ->
[REDACTED].config.binding_name = data.name ->
[REDACTED].binding = BindingBuilder().build_binding([REDACTED].config ->
binding:LLMBinding = self.getBinding(config, [REDACTED]_paths ->
binding_path = Path(config.binding_name)
Yep, indeed, as you might realize now, thedata.name
had being parsed into the actual point path to the binding without any form of sanitization or asserting handles. This mean that if we enter anything that fits len(str(config.binding_name).split("/"))>1
, it will be assign at the address pointing the binding immediately. For instance, if the data.name
looked like...
/xxx/xxx/xxx/xxx/xxx/xxx/xxx/xxx/folder
../../../../../../../../../../folder
the folder in these case will be directly seemed as the folder to the actual binding, leading us with Path Traversals
. Usually, here's where we stop and handle a disclosure report to the maintains but in this case, its bit different.
From Path Traversals To RCE
Usually this path traversal left us chances to arbitrary leak files and informations
and in an arbitrary path
if we explore further on binding-related
methods and functions. Nevertheless, we dicided to take a bold move and go further with the potentials of bindings
.
Taking a look of what a binding folder looks like (usingopen_ai
binding in this projects as an example):
▲ zoos/bindings_zoo/open_ai tree main 22h ⬢
.
├── __init__.py
├── binding_card.yaml
├── logo.png
├── logo2.png
├── models.yaml
└── requirements.txt
1 directory, 7 files
You may find that a binding folder is usually consist of:
logo*.png
: logos of thebindings
.models.yaml
: the configurations of this specificbinding (model)
.binding_card.yaml
: similiar tomodels.yaml
, description related.requirements.txt
: requirements to install this model.
Lastly:
__init__.py
: complex model initialization scripts to build up our specificedbinding
.
__init__.py
itself is very complex, we might find a way to exploit it but it might cause a considerable large amount of work and efforts. However, it we focused on the right thing, how does BindingBuilder().build_binding(
actually loads in these __init__.py
, we will get our key to success.
Now, if we take a look back to were we described BindingBuilder().build_binding(
-> getBinding
you might find a fun command:
loader = importlib.machinery.SourceFileLoader(module_name, str(absolute_path / "__init__.py"))
Oops! this seems very fun, since the server have to re-initalize the specificed binding
, it will have to import the original __init__.py
file via importlib.machinery.SourceFileLoader
to successfully initalize it. However, as we all know, when importing a package, the top-level
or non-indented code
will be executed (such as imports in import
) immediately when a .py
file is imported, (also why you might need to useif __name__ == "__main__":
in complex projects); This leads us to a exploitation vector utilizing both Path Traversals
and usage of importlib
:
Exploiting
While the Path Traversal
provided us to change binding_folder
in an arbitrary location
, the usage of importlib
provided us the ability to execute a specificed file in binding_folder
. This leaded us to a really clear vector of exploitation; By utilizing the upload function in the application (which mentioned in the description of the application) Thus, we can exploit and Upgrade our Path Traversals
into Remote-Code-Executions
by:
Create a malicious
__init__.py
with malicious code in non-indented level.Upload
__init__.py
file into adiscussion (session)
viaRAG functions
Change bindings to the
discussion (session)
path with malicious__init__.py
uploadedreinstall_binding
with specificdiscussion (session)
asdata.name
Trigger
importlib.machinery.SourceFileLoader
, resultingRemote-Code-Executions
In the actual exploitation process, this will be a more difficult job. the report is disclosed on Huntr at https://huntr.com/bounties/63266c77-408b-45ff-962c-8163db50a864. More detailed Exploitations and PoC video is available there!
Summarization
Started on the reinstall_binding
function within the project's FastAPI framework. we delve into the BindingBuilder().build_binding()
method, which was supposed to safely load and initialize a binding. with call chain that leads data.name
to binding_path
, a Path traversal
vulnerability was discovered that allowed arbitrary load
of binding_folder
Arbitrary load
of binding_folder
seemed less exploitable, at first. However, after delving into the specifications of binding
, we get the new vector by path traversal
in conjunction with the importlib.machinery.SourceFileLoader
function (which imports and executes the __init__.py
file), to achieve RCE.
Lastly, here, sometime works consist of both tolerance of boredom and luck :) make sure you always covered all references and possible vectors, there might be suprises waiting for you.