Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] [Question] How to detect taint on elements in a collection #18098

Open
JustusAdam opened this issue Nov 25, 2024 · 4 comments
Open

[C++] [Question] How to detect taint on elements in a collection #18098

JustusAdam opened this issue Nov 25, 2024 · 4 comments
Labels
question Further information is requested

Comments

@JustusAdam
Copy link

I am trying to detect the flow into potential_leak in the following, simplified code. This is just the minimal example, the vector can be constructed any way, e.g. with a series if push_back or via iterator etc and I’m trying to find a way to reliably detect taint on any elements at the sink location. Also assume that I do not have access to the source code of potential_leak and thus could detect the taint when the elements are accessed.

std::vector<int> v { sensitive_data };
potential_leak(v);

My simplified query is

import cpp
import semmle.code.cpp.dataflow.new.TaintTracking

module TaintConfig implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    exists(VariableAccess v | 
      v.getTarget().getName() = "sensitive_data" 
    }
  }

  predicate isSink(DataFlow::Node sink) {
    exists(Call c |
      c.getTarget().getName() = "potential_leak" and
      c.getArgument(0) = e
    )
  }
}

module Flow = TaintTracking::Global<TaintConfig>;

from DataFlow::Node src, DataFlow::Node sink
where Flow::flow(src, sink)
select src, sink

However this does not detect the flow. Is there some way to select the elements inside of v as sinks for this query?

CodeQL version: 2.19.3

@JustusAdam JustusAdam added the question Further information is requested label Nov 25, 2024
@redsun82
Copy link
Contributor

👋 @JustusAdam

I'm guessing you might have edited your code snippet leaving out some information (the exists in isSource is not closed, and there's an undefined e in isSink).

However, trying out this example, it would indeed seem we don't currently track taint through vectors. I will ask my colleagues if it's really the case.

In the meantime, this seems to cover your simple example, by defining additional flow steps:

import cpp
import semmle.code.cpp.dataflow.new.TaintTracking

module TaintConfig implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    source.asExpr().(VariableAccess).getTarget().getName() = "sensitive_data"
  }

  predicate isAdditionalFlowStep(DataFlow::Node lhs, DataFlow::Node rhs) {
    exists(ConstructorCall c | c.getTarget().getName() = ["vector", "initializer_list"]
       and c = rhs.asExpr() and c.getAnArgument() = lhs.asExpr())
  }

  predicate isSink(DataFlow::Node sink) {
    exists(Call c | c.getTarget().getName() = "potential_leak" and
    c.getArgument(0) = sink.asExpr())
  }
}

module Flow = TaintTracking::Global<TaintConfig>;

from DataFlow::Node src, DataFlow::Node sink
where Flow::flow(src, sink)
select src, "flow to $@", sink, sink.toString()

notice however that modelling all ways in which an element can be inserted into a vector might be tricky (push_back, emplace, assign, insert at the minimum, but then also via iterators like std::back_inserter or std::iota or the iterator overload of constructors, assign and insert, and probably other ways...). I will circle back after I'll ask if there's no better way.

@redsun82
Copy link
Contributor

👋 @JustusAdam

  • It turns out we do model flow through many vector constructors and member functions, the full list of things covered can be found here. You may notice that:
    • push_back, emplace_back, insert, assign from an iterator are covered
    • construction from an initializer_list is not. I've opened an internal issue to cover that, so we may have that in the future (though I cannot commit on any specific roadmap). In the meantime the isAdditionalFlowStep I provided should get that covered.
  • If a sink is a function call argument on a container, like potential_leak in your example, then the container containing tainted elements will indeed be implicitly tainted as well. So nothing needs to be done to get that.
  • One has, however, to pay attention when defining sources, sinks and additional flow steps, to use the correct predicate among Node.asExpr and Node.asIndirectExpr. The former deal with taint propagating through values, the latter with taint propagating through references. In your case we should for example use asIndirectExpr in isSink if potential_leak takes the vector in by reference. As also push_back and emplace_back take arguments by reference we may want to cover both cases in isSource using [source.asExpr(), source.asIndirectExpr()] instead of just source.asExpr().

@JustusAdam
Copy link
Author

Hi @redsun82,

Thank you very much for looking into this. I did know that you modeled collections but I didn't realize they overtainted to the container itself. Nice.

Sorry for the buggy code, I was trying to fix the indentation after copying it and I deleted too much by accident.

So it seems I just ran into one of the unsupported methods 😅. Anyway your workaround appears to work, thanks a lot.

Since I have you I have two related questions:

  1. Using models is the only way to achieve this taint propagation right? There is no way to taint the entire object at the src expression right? And there's no general way of tainting at src or doing this taint propagation, you have to model for every data structure? Because I also have this same issue with unordered_map where, when iterating over it, the taint propagates to the iterators and tuples, but not to their fields.
  2. Just for my understanding, why exactly do you need the models? I was playing around with it yesterday by creating a vector manually and iteration and indexing worked with the taint analysis but push_back didn't. Do you know why this is?

@redsun82
Copy link
Contributor

Hi @JustusAdam

Sorry for the buggy code, I was trying to fix the indentation after copying it and I deleted too much by accident.

No worries!

Using models is the only way to achieve this taint propagation right? There is no way to taint the entire object at the src expression right? And there's no general way of tainting at src or doing this taint propagation, you have to model for every data structure? Because I also have this same issue with unordered_map where, when iterating over it, the taint propagates to the iterators and tuples, but not to their fields.

Well, not really: everything that models do can be done with explicit and careful coding of the ConfigSig implementation. It's just that that is extremely tedious and error-prone to get right for all cases, and models provide a much faster shortcut (and in some cases we also have ways of generating them).

About

There is no way to taint the entire object at the src expression right?

Well, again, you could do that with additional taint steps. For example this (untested) snippet:

import semmle.code.cpp.ir.dataflow.internal.DataFlowPrivate

predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) {
  // anything that generates a store step into a vector type is now also an additional flow step that taints the whole vector
  storeStep(node1, _, node2) and
  node2.getType().hasQualifiedName("std", "", "vector")
}

but in general we wouldn't recommend it.

Just for my understanding, why exactly do you need the models? I was playing around with it yesterday by creating a vector manually and iteration and indexing worked with the taint analysis but push_back didn't. Do you know why this is?

How did push_back not work?

Just for clarification, the fact that potentially_leak(v) works currently, is not given by the fact that taint spreads to the vector in general, but just when a vector with tainted elements flows into an argument sink. Technically this happens here.

As for unordered_map, I'll file an issue to extend modeling to more STL containers, I think that makes a lot of sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants